Spring-Boot App with CompletableFuture , How to manage large number of request? - multithreading

I understand a Spring-Boot app has server.tomcat.max-threads = 200 by default,
Lets say now I have a Rest controller , in service I am using a CompletableFuture for calling another third party API asynchronously.
Now Assume I get 100 requests at same time for my API, (Which invokes 100 more threads for 3rd party API).
Now the question is will the CompletableFuture threads also be considered from the server.tomcat.max-threads or they are from different thread quota ForkJoinPool.commonPool().
What if I get 101th request , will that be a blocker until others complete.
Just wanted to understand how my application will behave in huge amount of request.
Can I control this ? Any advice will help me to design my application and avoid any flaw in advance.

Related

air traffic controller for threads when calling a REST API

DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
Function
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
Security
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
Performance
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
Timing
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Scalability
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.

Play Framework Scala thread affinity

We have our HTTP layer served by Play Framework in Scala. One of our APIs is something of the form:
POST /customer/:id
Requests are sent by our UI team which calls these APIs through a React Framework.
The issue is that, sometimes, the requests are issued in batches, successively one after the other for the same customer ID. When this happens, different threads process these requests and so our persistent layer (MySQL) reaches an inconsistent state due to the difference in the timestamp of the handling of these requests.
Is it possible to configure some sort of thread affinity in Play Scala? What I mean by that is, can I configure Play to ensure that requests of a particular customer ID are handled by the same thread throughout the life-cycle of the application?
Batch is
put several API calls into a single HTTP request.
A batch request is a set of command in one HTTP request, like here https://developers.facebook.com/docs/graph-api/making-multiple-requests/
You describe it as
The issue is that, sometimes, the requests are issued in batches, successively one after the other for the same customer ID. When this happens, different threads process these requests and so our persistent layer (MySQL) reaches an inconsistent state due to the difference in the timestamp of the handling of these requests.
This is a set of concurrent requests. Play framework usually works as a stateless server. I assume you also organize it as stateless. There is nothing that binds one request to another, you can't control order. Well, you can, if you create a special protocol, like "opening batch request", request #1, #2, ... "closing batch request". You need to check if exactly all request was correct. You also need to run some stateful threads and some queues ... Thought akka can help with this but I am pretty sure you wan't do it.
This issue is not a "play-framework" depended. You will reproduce it in any server. For example, the general case: Is it possible to receive out-of-order responses with HTTP?
You can go in either way:
1. "Batch" the command in one request
You need to change the client so it jams "batch" requests into one. You also need to change server so it processes all the commands from the batch one after another.
Example of the requests: https://developers.facebook.com/docs/graph-api/making-multiple-requests/
2. "Pipeline" requests
You need to change the client so it sends the next request after receive the response from the previous.
Example: Is it possible to receive out-of-order responses with HTTP?
The solution to this is to pipeline Ajax requests, transmitting them serially. ... . The next request sent only after the previous one has returned successfully."

Thread management in asp.net core / kestrel

I'm troubleshooting performance / scalability issues with an asp.net app we've migrated to asp.net core 2.0. Our app is hosted on azure as an app service, and is falling over far too easily with any moderate traffic.
One thing that's puzzling me is how multiple concurrent requests are handled. From what I've read here, Kestrel uses multiple event loops to handle your requests. But the actual user code is handled on the .net thread pool (that's from here).
So, as an experiment - I've created a new asp.net core 2.0 MVC app, and added a rather nasty action method:
[AllowAnonymous]
public ActionResult Wait1()
{
System.Threading.Tasks.Task.Delay(1000).Wait();
return new StatusCodeResult((int)HttpStatusCode.OK);
}
Now, when I push this to azure, I'd expect that if I send say 100 requests at the same time, then I should be OK, because 100 requests sounds like minor load, right? And the waiting will happen on the thread pool threads, right?
So - I do just this and get some rather poor results - sample highlighted in red:
Hmm, not what I expected, about 50 seconds per request... If however I change the frequency so the requests are spaced a second apart, then the response time is fine - back to just over 1000ms as you'd expect. It seems if I go over 30 requests at the same time, it starts to suffer, which seems somewhat low to me.
So - I realise that my nasty action method blocks, but I'd have expected it to block on a thread pool thread, and therefore be able to cope with more than my 30.
Is this expected behaviour - and if so is it just a case of making sure that no IO-bound work is done without using async code?
Is this expected behaviour - and if so is it just a case of making sure that no IO-bound work is done without using async code?
Based on my experience, it seems that is as expected behaviour. We could get answer from this blog.
Now suppose you are running your ASP.Net application on IIS and your web server has a total of four CPUs. Assume that at any given point in time, there are 100 requests to be processed. By default the runtime would create four threads, which would be available to service the first four requests. Because no additional threads will be added until 500 milliseconds have elapsed, the other 96 requests will have to wait in the queue. After 500 milliseconds have passed, a new thread is created.
As you can see, it will take 100*500ms intervals to catch up with the workload.
This is a good reason for using asynchronous programming. With async programming, threads aren’t blocked while requests are being handled, so the four threads would be freed up almost immediately.
I recommand that you could use async code to improve the performance.
public async Task<ActionResult> Wait1()
{
await Task.Delay(TimeSpan.FromSeconds(15));
return new StatusCodeResult((int)HttpStatusCode.OK);
}
I also find another SO thread, you could refernce to it.

Thread per request in play framework

I am a J2ee developer and i am new to play framework. I did a thorough research but not able to find any clear documentation on that.
The question is, how play handles a request. Will it creates a thread for every request just like J2ee containers?
If it is not Thread per request then what happens if we deploy the play application in Tomcat as war file.
First, play2 framework does not support tomcat.
With play and netty, you don't assign one thread per request.
By default you have one thread per core in Play but lets assume that you have only one thread for all requests;
In this architecture one thread is shared by all requests. So the thread handles the first request and when it's idle (it is idle when it calls to db or a url etc.) it begins to handle second request. So the thread does not have to return response for the first request to start the second one.
One might think that the system will get too slow with this architecture but it's not since the performance depends on cpu.
Play 2.3.x uses Netty under the hood to handle HTTP request. You can learn more about Netty here
You will also find informations on the Play documentation : https://www.playframework.com/documentation/2.3.x/ThreadPools

How does Spring handle multiple post requests?

In my application, I have a multiple file upload AJAX client. I noticed (using a stub file processing class) that Spring usually opens 6 threads at once, and the rest of the file upload requests are blocked until any of those 6 threads finishes its job. It is then assigned a new request, as in a thread pool.
I haven't done anything specific to reach this behavior. Is this something that Spring does by default behind the scenes?
While uploading, I haven't had any problems browsing the other parts of the application, with pretty much no significant overhead in performance.
I noticed however that one of my "behind the scenes" calls to the server (I poll for new notifications every 20 secs) gets blocked as well. On the server side, my app calls a Redis-based key-value store which should always return even if there are no new notifications. The requests to it start getting normally processed only after the uploads get finished. Any explanation for this kind of blocking?
Edit: I think it has to deal with a maximum of concurrent requests per session
I believe this type of treading belongs to the Servlet Container but not to Spring.

Resources