Efficient OkHttp configuration for Multithreaded environment - multithreading

What is the best configuration I can use to set up the OkHttp3 client correctly in a multi threaded environment? Had 2 main questions:
Connection pool - How do we define the number of available connections in the pool? Can it be scaled at runtime? The number of concurrent users will be very high and need to make sure users aren't waiting a long time for the connection to be available from the pool.
I read the OkHttp might end up doing multiple retries in case of failures or timeouts. Is it possible to only enable this for only the "Gets" and not "Post" while using just 1 OkHttp client?
Also Anything else I should be considering?
Here is my starting code for the client.
private static final int timeout = 15000;
private static final OkHttpClient okClient = new OkHttpClient()
.newBuilder()
.connectTimeout(timeout, TimeUnit.MILLISECONDS)
.readTimeout(timeout, TimeUnit.MILLISECONDS)
.writeTimeout(timeout, TimeUnit.MILLISECONDS)
.retryOnConnectionFailure(false)
.addInterceptor(new HttpLoggingInterceptor().setLevel(HttpLoggingInterceptor.Level.BASIC))
.build();

You can configure the connection pool then pass into the client builder.
https://square.github.io/okhttp/3.x/okhttp/okhttp3/ConnectionPool.html
See Connection Pool - OkHttp for an example.
For the second question, you can disable automatic retries and do this in your application code instead. Use retryOnConnectionFailure(false) as you show above.
To have this applied differently for get and posts you should use customise one client like the following
val postClient = client.newBuilder().retryOnConnectionFailure(false).build()

Related

How to stop outbound HTTP connections from timing out

Backgound:
I'm currently hosting an ASP.NET application in Azure with the following specs:
ASP .Net Core 2.2
Using Flurl for HTTP requests
Kestrel Webserver
Docker (Linux - mcr.microsoft.com/dotnet/core/aspnet:2.2 runtime)
Azure App Service on P2V2 tier app service plan
I have a a couple of background jobs that run on the service that makes a lot of outbound HTTP calls to a 3rd party service.
Issue:
Under a small load (approximately 1 call per 10 seconds), all requests are completed in under a second with no issue. The issue I'm having is that under a heavy load, when service can make up to 3/4 calls in a 10 second span, some of the requests will randomly timeout and throw an exception. When I was using RestSharp the exception would read "The operation has timed out". Now that I'm using Flurl, the exception reads "The call timed out".
Here's the kicker - If I run the same job from my laptop running Windows 10 / Visual Studios 2017, this problem does NOT occur. This leads me to believe I'm hitting some limit or running out of some resource in my hosted environment. Unclear if that is connection/socket or thread related.
Things I've tried:
Ensure all code paths to the request are using async/await to prevent lockouts
Ensure Kestrel Defaults allow unlimited connections (it does by default)
Ensure Dockers default connection limits are sufficient (2000 by default, more than enough)
Configuring ServicePointManager settings for connection limits
Here is the code in my startup.cs that I'm currently using to try and prevent this issue:
public class Startup
{
public Startup(IHostingEnvironment hostingEnvironment)
{
...
// ServicePointManager setup
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.EnableDnsRoundRobin = true;
ServicePointManager.ReusePort = true;
// Set Service point timeouts
var sp = ServicePointManager.FindServicePoint(new Uri("https://placeholder.thirdparty.com"));
sp.ConnectionLeaseTimeout = 15 * 1000; // 15 seconds
FlurlHttp.ConfigureClient("https://placeholder.thirdparty.com", cli => cli.Settings.ConnectionLeaseTimeout = new TimeSpan(0, 0, 15));
}
}
Has anyone else run into a similar issue to this? I'm open to any suggestions on how to best debug this situation, or possible methods to correct the issue. I'm at a complete loss after researching this for several days.
Thank you in advance.
I had similar issues. Take a look at Asp.net Core HttpClient has many TIME_WAIT or CLOSE_WAIT connections . Debugging via netstat helped identify the problem for me. As one possible solution. I suggest you use IHttpClientFactory. You can get more info from https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.2 It should be fairly easy to use as described in Flurl client lifetime in ASP.Net Core 2.1 and IHttpClientFactory

Limiting the number of concurrent jobs on Azure Functions queue

I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.
If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.
I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern

scala: apache httpclient in multi-threaded environment

I am writing a singleton class (Object in scala) which uses apache httpclient(4.5.2) to post some file content and return status to caller.
object HttpUtils{
protected val retryHandler = new HttpRequestRetryHandler() {
def retryRequest(exception: IOException, executionCount: Int, context: HttpContext): Boolean = {
//retry logic
true
}
}
private val connectionManager = new PoolingHttpClientConnectionManager()
// Reusing same client for each request that might be coming from different threads .
// Is it correct ????
val httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.setRetryHandler(retryHandler)
.build()
def restApiCall (url : String, rDD: RDD[SomeMessage]) : Boolean = {
// Creating new context for each request
val httpContext: HttpClientContext = HttpClientContext.create
val post = new HttpPost(url)
// convert RDD to text file using rDD.collect
// add this file as MultipartEntity to post
var response = None: Option[CloseableHttpResponse] // Is it correct way of using it ?
try {
response = Some(httpClient.execute(post, httpContext))
val responseCode = response.get.getStatusLine.getStatusCode
EntityUtils.consume(response.get.getEntity) // Is it require ???
if (responseCode == 200) true
else false
}
finally {
if (response.isDefined) response.get.close
post.releaseConnection() // Is it require ???
}
}
def onShutDown = {
connectionManager.close()
httpClient.close()
}
}
Multiple threads (More specifically from spark streaming context) are calling restApiCall method. I am relatively new to scala and apache httpClient. I have to make frequent connections to only few fixed server (i.e. 5-6 fixed URL's with different request parameters).
I went through multiple online resource but still not confident about it.
Is it the best way to use http client in multi-threaded environment?
Is it possible to keep live connections and use it for various requests ? Will it be beneficial in this case ?
Am i using/releasing all resources efficiently ? If not please suggest.
Is it good to use it in Scala or there exist some better library ?
Thanks in advance.
It seems the official docs have answers to all your questions:
2.3.3. Pooling connection manager
PoolingHttpClientConnectionManager is a more complex implementation
that manages a pool of client connections and is able to service
connection requests from multiple execution threads. Connections are
pooled on a per route basis. A request for a route for which the
manager already has a persistent connection available in the pool will
be serviced by leasing a connection from the pool rather than creating
a brand new connection.
PoolingHttpClientConnectionManager maintains a maximum limit of
connections on a per route basis and in total. Per default this
implementation will create no more than 2 concurrent connections per
given route and no more 20 connections in total. For many real-world
applications these limits may prove too constraining, especially if
they use HTTP as a transport protocol for their services.
2.4. Multithreaded request execution
When equipped with a pooling connection manager such as
PoolingClientConnectionManager, HttpClient can be used to execute
multiple requests simultaneously using multiple threads of execution.
The PoolingClientConnectionManager will allocate connections based on
its configuration. If all connections for a given route have already
been leased, a request for a connection will block until a connection
is released back to the pool. One can ensure the connection manager
does not block indefinitely in the connection request operation by
setting 'http.conn-manager.timeout' to a positive value. If the
connection request cannot be serviced within the given time period
ConnectionPoolTimeoutException will be thrown.

Spring Cache with Redis - How to gracefully handle or even skip Caching in case of Connection Failure to Redis

I've enabled Caching in my Spring app and I use Redis to serve the purpose.
However, whenever a connection failure occurs, the app stops working whereas I think it had better
skip the Caching and go on with normal execution flow.
So, does anyone have any idea on how to gracefully do it in Spring ?
Here is the exception I got.
Caused by: org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
As from Spring Framework 4.1, there is a CacheErrorHandler that you can implement to handle such exceptions. Refer to the javadoc for more details.
You can register it by having your #Configuration class extends CachingConfigurerSupport (see errorHandler()).
You can use CacheErrorHandler as suggested by Stephane Nicoll. But you should make sure to make
RedisCacheManager transactionAware to false in your Redis Cache Config(to make sure the transaction is committed early when executing the caching part and the error is caught by CacheErrorHandler and don't wait until the end of the execution which skips CacheErrorHandler part). The function to set transactionAware to false looks like this:
#Bean
public RedisCacheManager redisCacheManager(LettuceConnectionFactory lettuceConnectionFactory) {
JdkSerializationRedisSerializer redisSerializer = new JdkSerializationRedisSerializer(getClass().getClassLoader());
RedisCacheConfiguration redisCacheConfiguration = RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofHours(redisDataTTL))
.serializeValuesWith(RedisSerializationContext.SerializationPair.fromSerializer(redisSerializer));
redisCacheConfiguration.usePrefix();
RedisCacheManager redisCacheManager = RedisCacheManager.RedisCacheManagerBuilder.fromConnectionFactory(lettuceConnectionFactory)
.cacheDefaults(redisCacheConfiguration)
.build();
redisCacheManager.setTransactionAware(false);
return redisCacheManager;
}
Similar to what Stephane has mentioned, I have done in by consuming the error in try catch block. Adding a fall back mechanism where if Redis is not up or may be the data is not present then I fetch the data from DB.(Later if I find one then I add the same data in Redis,if it is up to maintain consistency.)

.NET 4.5 Increase WCF Client Calls Async?

I have a .NET 4.5 WCF client app that uses the async/await pattern to make volumes of calls. My development machine is dual-proc with 8gb RAM (production will be 5 CPU with 8gb RAM at Amazon AWS) . The remote WCF service called by my code uses out and ref parameters on a web method that I need. My code instances a proxy client each time, writes any results to a public ConcurrentDictionary, and then returns null.
I ran Perfmon, watching the thread count on the system, and it goes between 28-30. It takes hours for my client to complete the volumes of calls that are made. Yes, hours. The remote service is backed by a big company, they have many servers to receive my WCF calls, so the more calls I can throw at them, the better.
I think that things are actually still happening synchronously, even though the method that makes the WCF call is decorated with "async" because the proxy method cannot have "await". Is that true?
My code looks like this:
async private void CallMe()
{
Console.WriteLine( DateTime.Now );
var workTasks = this.AnotherConcurrentDict.Select( oneB => GetData( etcetcetc ).Cast<Task>().ToList();
await Task.WhenAll( workTasks );
}
private async Task<WorkingBits> GetData(etcetcetc)
{
var commClient = new RemoteClient();
var cpResponse = new GetPackage();
var responseInfo = commClient.GetData( name, password , ref (cpResponse.aproperty), filterid , out cpResponse.Identifiers);
foreach (var onething in cpResponse.Identifiers)
{
// add to the ConcurrentDictionary
}
return null; // I already wrote to the ConcurrentDictionary so no need to return anything
responseInfo is not awaitable beacuse the WCF call has ref and out parameters.
I was thinking that way to speed this up is not to put async/await in this method, but instead create a wrapper method where I can make things await/async, but I am not that is the smartest/safest way to work it.
What is a smart way to get more outbound calls to the service (expand IO completion thread pool, trick calls into running in the background so Task.WhenAll can complete quicker)?
Thanks for all ideas/samples/pointers. I am hitting a bottleneck somewhere.
1) Make sure you're really calling it asynchronously, rather than just blocking on the calls. Code samples would help here.
2) You may need to do this:
ServicePointManager.DefaultConnectionLimit = 100;
By default it only allows 2 simultaneous connections to the same server.
3) Make sure you dispose the proxy object after the call is complete so you're not tying up resources.
If you're doing things asynchronously the threadpool size shouldn't be a bottleneck. To get a better idea of what kind of problem you're having, you can use Interlocked.Increment and Interlocked.Decrement to track the number of pending calls and see if it's being limited somewhere.
You could also substitute your real call with a call to a very simple method that you know will not have any bottlenecks, to see if the problem is in the client or server.

Resources