Best Practice to Use Httpclient in Multithreaded Environment

Best Practice to Use HttpClient in Multithreaded Environment

Method A is recommended by httpclient developer community.

Please refer http://www.mail-archive.com/httpclient-users@hc.apache.org/msg02455.html for more details.

Best way for apache HttpClients using in a multithreaded environment

  1. The use of HttpClient instance as a singleton per distinct HTTP service is correct and is in line with the Apache HttpClient best practices. It should not be static, though.

    http://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/preparation.html

  2. One should close HttpClient when releasing the HTTP service. In your case ApacheHttpClient should implement Closeable and close the internal instance of CloseableHttpClient in its #close method.

  3. You probably should not, but it really depends on how exactly your application deals with request execution.

Apache HttpClient and HttpConnection in a multithreaded applicatio

I'm currently using a single PoolingHttpClientConnectionManager, and HttpClientBuilder.setConnectionManager(connectionManager).build() for every request.

Building a new HttpClient for each request is a huge waste. You should use an HttpClient per configuration (each client can have different connection manager, max concurrent requests, etc) or for each independent module of your application (in order to not create dependencies between otherwise independent modules).

Also do not forget that .build() returns a CloseableHttpClient which means that you should call httpClient.close() when you are done using it otherwise you may leak resources.


Update in responde to a comment from @Nati:

what will be "wasted" ? is HttpClient a heavy object ?

Here you can see the source code for the creation of an http client. As you can see it's a lot of code and is pointless to be executed on each request. This unnecessary consumes CPU and creates a lot of garbage which reduces the performance of the whole application. The less allocations you do - the better! In other words there are no benefits from creating new client for each request - only downsides.

does it make any sense of keeping it as a bean for the entire lifespan of the application

IMHO it does, unless it's used very (very) rarely.

relation between the HttpConnection and HttpClient

Each http client can execute multiple http requests. Each request is executed in the context of the client (it's configuration - i.e proxy, concurrency, keep-alive, etc) Each response to a request has to be closed (reset(), close(), don't remember the exact name) in order to free the connection so it can be reused for another request.

Apache HTTP Client: build simulator using multithreaded environment

You can toggle re-use of connections within the pool using the builder's setConnectionReuseStrategy. The DefaultConnectionReuseStrategy will re-use connections whenever possible, but the NoConnectionReuseStrategy will close all connections returned to the pool.

I used these connection re-use strategies in reverse: in production no re-use was set (to ensure proper load-balancing - every new connection is directed to a healthy server), but during testing I had to switch back to the default re-use strategy since the test was creating so many connections that the test-machine quickly ran out of ports to use (after a local port is used the OS keeps the port in a waiting/cooldown room, part of the TCP protocol). The good thing is that the test-code only defers from production code for this one setting of the connection re-use strategy.

Note that the combination of a connection pool and no re-use of connections still has it purpose: the pool will prevent more than it's maximum allowed size of open connections. E.g. if the application decides it wants to open 100 connections at the same time, and the pool has a maximum size of 30, the pool will let the other requests for the other 70 connections wait until connections are returned. This is a good way to make clients behave nice and prevent them from overloading the server.

Apache HttpClient Connection Management

As per Apache Commons HTTP Client Documentation option 2 is the most sensible one.

First, it says:

The process of establishing a connection from one host to another is
quite complex and involves multiple packet exchanges between two
endpoints, which can be quite time consuming. The overhead of
connection handshaking can be significant, especially for small HTTP
messages. One can achieve a much higher data throughput if open
connections can be re-used to execute multiple requests.

HTTP/1.1 states that HTTP connections can be re-used for multiple
requests per default. HTTP/1.0 compliant endpoints can also use a
mechanism to explicitly communicate their preference to keep
connection alive and use it for multiple requests. HTTP agents can
also keep idle connections alive for a certain period time in case a
connection to the same target host is needed for subsequent requests.
The ability to keep connections alive is usually refered to as
connection persistence. HttpClient fully supports connection
persistence.

So, after that paragraph, we can conclude that yes it is a very bad idea to instantiate HTTP connections every time we want to make an HTTP request and what you call option 1 in your question is not the best way to go.

And later under "Pooling connection manager" it says:

PoolingHttpClientConnectionManager is a more complex implementation
that manages a pool of client connections and is able to service
connection requests from multiple execution threads. Connections are
pooled on a per route basis. A request for a route for which the
manager already has a persistent connection available in the pool will
be serviced by leasing a connection from the pool rather than creating
a brand new connection.

So, after reading this paragraph we can conclude that yes, it makes sense to have a single connection pool shared by all threads of the application. So, ideally, you instantiate it once and share it everywhere you need to obtain an HTTP connection.

Finally, regarding option 3, the documentations says:

BasicHttpClientConnectionManager is a simple connection manager that
maintains only one connection at a time. Even though this class is
thread-safe it ought to be used by one execution thread only.
BasicHttpClientConnectionManager will make an effort to reuse the
connection for subsequent requests with the same route. It will,
however, close the existing connection and re-open it for the given
route, if the route of the persistent connection does not match that
of the connection request. If the connection has been already been
allocated, then java.lang.IllegalStateException is thrown.

So, option 3 makes sense, but definitively this does not sound better than option 2 in terms of reusing expensive resources.

Howto use basic authentication in multi threaded HTTPClient environment?

I ran into the same issue recently.
There's this logged issue: https://issues.apache.org/jira/browse/HTTPCLIENT-1168
You have two choices it seems:

  1. Create an HttpContext for each thread or request
  2. Use SyncBasicHttpContext which is a thread safe implementation


Related Topics



Leave a reply



Submit