Thread management in asp.net core / kestrel - multithreading

I'm troubleshooting performance / scalability issues with an asp.net app we've migrated to asp.net core 2.0. Our app is hosted on azure as an app service, and is falling over far too easily with any moderate traffic.
One thing that's puzzling me is how multiple concurrent requests are handled. From what I've read here, Kestrel uses multiple event loops to handle your requests. But the actual user code is handled on the .net thread pool (that's from here).
So, as an experiment - I've created a new asp.net core 2.0 MVC app, and added a rather nasty action method:
[AllowAnonymous]
public ActionResult Wait1()
{
System.Threading.Tasks.Task.Delay(1000).Wait();
return new StatusCodeResult((int)HttpStatusCode.OK);
}
Now, when I push this to azure, I'd expect that if I send say 100 requests at the same time, then I should be OK, because 100 requests sounds like minor load, right? And the waiting will happen on the thread pool threads, right?
So - I do just this and get some rather poor results - sample highlighted in red:
Hmm, not what I expected, about 50 seconds per request... If however I change the frequency so the requests are spaced a second apart, then the response time is fine - back to just over 1000ms as you'd expect. It seems if I go over 30 requests at the same time, it starts to suffer, which seems somewhat low to me.
So - I realise that my nasty action method blocks, but I'd have expected it to block on a thread pool thread, and therefore be able to cope with more than my 30.
Is this expected behaviour - and if so is it just a case of making sure that no IO-bound work is done without using async code?

Is this expected behaviour - and if so is it just a case of making sure that no IO-bound work is done without using async code?
Based on my experience, it seems that is as expected behaviour. We could get answer from this blog.
Now suppose you are running your ASP.Net application on IIS and your web server has a total of four CPUs. Assume that at any given point in time, there are 100 requests to be processed. By default the runtime would create four threads, which would be available to service the first four requests. Because no additional threads will be added until 500 milliseconds have elapsed, the other 96 requests will have to wait in the queue. After 500 milliseconds have passed, a new thread is created.
As you can see, it will take 100*500ms intervals to catch up with the workload.
This is a good reason for using asynchronous programming. With async programming, threads aren’t blocked while requests are being handled, so the four threads would be freed up almost immediately.
I recommand that you could use async code to improve the performance.
public async Task<ActionResult> Wait1()
{
await Task.Delay(TimeSpan.FromSeconds(15));
return new StatusCodeResult((int)HttpStatusCode.OK);
}
I also find another SO thread, you could refernce to it.

Related

Ktor, Netty and increasing the number of threads per endpoint

Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?
Netty use what is called Non-blocking IO model (http://tutorials.jenkov.com/java-concurrency/single-threaded-concurrency.html).
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).
You might need to check the following configuration options for Netty https://ktor.io/docs/engines.html#configure-engine
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.

Execute something which takes 5 seconds (like email send) but return with response immediately?

Context
In an ASP.NET Core application I would like to execute an operation which takes say 5 seconds (like sending email). I do know async/await and its purpose in ASP.NET Core, however I do not want to wait the end of the operation, instead I would like to return back to the to the client immediately.
Issue
So it is kinda Fire and Forget either homebrew, either Hangfire's BackgroundJob.Enqueue<IEmailSender>(x => x.Send("hangfire#example.com"));
Suppose I have some more complex method with injected ILogger and other stuff and I would like to Fire and Forget that method. In the method there are error handling and logging.(note: not necessary with Hangfire, the issue is agnostic to how the background worker is implemented). My problem is that method will run completely out of context, probably nothing will work inside, no HttpContext (I mean HttpContextAccessor will give null etc) so no User, no Session etc.
Question
How to correctly solve say this particular email sending problem? No one wants wait with the response 5 seconds, and the same time no one wants to throw and email, and not even logging if the send operation returned with error...
How to correctly solve say this particular email sending problem?
This is a specific instance of the "run a background job from my web app" problem.
there is no universal solution
There is - or at least, a universal pattern; it's just that many developers try to avoid it because it's not easy.
I describe it pretty fully in my blog post series on the basic distributed architecture. I think one important thing to acknowledge is that since your background work (sending an email) is done outside of an HTTP request, it really should be done outside of your web app process. Once you accept that, the rest of the solution falls into place:
You need a durable storage queue for the work. Hangfire uses your database; I tend to prefer cloud queues like Azure Storage Queues.
This means you'll need to copy all the data over that you will need, since it needs to be serialized into that queue. The same restriction applies to Hangfire, it's just not obvious because Hangfire runs in the same web application process.
You need a background process to execute your work queue. I tend to prefer Azure Functions, but another common approach is to run an ASP.NET Core Worker Service as a Win32 service or Linux daemon. Hangfire has its own ad-hoc in-process thread. Running an ASP.NET Core hosted service in-process would also work, though that has some of the same drawbacks as Hangfire since it also runs in the web application process.
Finally, your work queue processor application has its own service injection, and you can code it to create a dependency scope per work queue item if desired.
IMO, this is a normal threshold that's reached as your web application "grows up". It's more complex than a simple web app: now you have a web app, a durable queue, and a background processor. So your deployment becomes more complex, you need to think about things like versioning your worker queue schema so you can upgrade without downtime (something Hangfire can't handle well), etc. And some devs really balk at this because it's more complex when "all" they want to do is send an email without waiting for it, but the fact is that this is the necessary step upwards when a baby web app becomes distributed.

How can NodeJS scale an enterprise application?

Suppose I have an enterprise Java application that basically does the following:
gather user input, query the backend databases (maybe multiple), run some algorithm (say do some in-memory calculation of the queried data sets to produce some statistics etc.), then return the data in some html pages.
My question is: If the bottleneck of the application is on the db query, how can NodeJS helps me in this scenarios since I still need to do all those post-db algorithm before I render the page? How an application architecture looks like?
Of course node can't speed up your storage layer and make that single request that's incurring so much backend processing satisfy that request any faster to the end user. But what it can do is not tie up a thread in the application server thread pool. The single thread can continue on it's loop while that work is going on and accept another request.
That other request might be a cheaper request that will return when it's work is done. That can also happen in an application server with a thread pool model ... that is unless all the threads in the thread pool model are tied up blocked on I/O requests - along with the overhead of each thread. The cheaper request will get queued waiting on a thread out of the thread pool because they are all blocking. Nodes single thread would loop and server the cheap request.
This works because node mandates that all I/O is async and the only work that blocks the loop is your code. That's why the saying "everything in node runs in parallel except your code". While it's possible to write async code in other application servers and achieve similar results, many offer non-async thread pool models where the coding is easier but sometimes less scalable.
For example, this hanselman post illustrates how asp.net is capable of doing async requests but it's not the common model that most have used.

How does Spring handle multiple post requests?

In my application, I have a multiple file upload AJAX client. I noticed (using a stub file processing class) that Spring usually opens 6 threads at once, and the rest of the file upload requests are blocked until any of those 6 threads finishes its job. It is then assigned a new request, as in a thread pool.
I haven't done anything specific to reach this behavior. Is this something that Spring does by default behind the scenes?
While uploading, I haven't had any problems browsing the other parts of the application, with pretty much no significant overhead in performance.
I noticed however that one of my "behind the scenes" calls to the server (I poll for new notifications every 20 secs) gets blocked as well. On the server side, my app calls a Redis-based key-value store which should always return even if there are no new notifications. The requests to it start getting normally processed only after the uploads get finished. Any explanation for this kind of blocking?
Edit: I think it has to deal with a maximum of concurrent requests per session
I believe this type of treading belongs to the Servlet Container but not to Spring.

Is it acceptable to use ThreadPool.GetAvailableThreads to throttle the amount of work a service performs?

I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)
I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.
I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.
I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.

Resources