Proper way to use executor service multi-threading in web server?

Proper way to use executor service multi-threading in web server? - multithreading

I have a web server that is run by a fleet of 50 hosts. One request to the server can result in 10 subsequent network calls, all of which can be done in parallel.
My idea is to create an executor service thread pool with 10 threads, so that each host can make the network calls in parallel.
However, there seems to be a problem with this. What if I get 1000 requests at once? And suppose a single host is tasked with 20 requests at the same time? Does this mean that the host will only have 10 threads available, and thus all 20 requests will compete with each other for the 10 threads? This seems WORSE than without thread pooling, in which case each request lives on its own thread and there's effectively 20 threads running at once.
Thus, it appears as if executor service is very dangerous in this situation, and has potential to actually make my application slower when in spiky volume. Am I understanding the situation correctly? If so, what is a way to solve it? Should I have each request CREATE the 10 threads manually, rather than attempting to share from a pool and introduce that entanglement between different requests?

You seem to be conflating thread pooling to mean easier thread creation. But, its primary aim to reduce the thread requirements of an application, because threads get reused. So, if the first request ends up starting 10 threads, when the second request comes in, some of them may be available to be reused. So, the second request may not end up creating another 10 additional threads, but maybe 5. And the third request may not create any new thread at all. So, based on this, at a time, your service may need a thread-pool with only 15 threads. The advantage of the thread-pool in this case is these 15 threads will get created shortly after the requests start coming in, and will get reused till it dies, and your application, its runtime, and the underlying OS will not waste time creating and destroying threads, allocating stacks for them, etc.

Related

Ktor, Netty and increasing the number of threads per endpoint

Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?

Netty use what is called Non-blocking IO model (http://tutorials.jenkov.com/java-concurrency/single-threaded-concurrency.html).
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).

You might need to check the following configuration options for Netty https://ktor.io/docs/engines.html#configure-engine
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.

Tuning gRPC thread pool

I'm dealing with a legacy synchronous server that has operations running for upto a minute and exposes 3 ports to overcome this problem. There is "light-requests" port, "heavy-but-important" requests port and "heavy" port.
They all expose the same service, but since they run on separate ports, they end up with dedicated thread pools.
Now this approach is running into a problem with load balancing, as Envoy can't handle a single services exposing the same proto on 3 different ports.
I'm trying to come up with a single threadpool configuration that would work (probably an extremely overprovisioned one), but I can't find any documentation on what the threadpool settings actually do.
NUM_CQS
Number of completion queues.
MIN_POLLERS
Minimum number of polling threads.
MAX_POLLERS
Maximum number of polling threads.
CQ_TIMEOUT_MSEC
Completion queue timeout in milliseconds.

Is there some reason why you need the requests split into three different thread pools? By default, there is no limit to the number of request handler threads. The sync server will spawn a new thread for each request, so the number of threads will be determined by the number of concurrent requests -- the only real limit is what your server machine can handle. (If you actually want to bound the number of threads, I think you can do so via ResourceQuota::SetMaxThreads(), although that's a global limit, not one per class of requests.)
Note that the request handler threads are independent from the number of polling threads set via MIN_POLLERS and MAX_POLLERS, so those settings probably aren't relevant here.
UPDATE: Actually, I just learned that my description above, while correct in a practical sense, got some of the internal details wrong. There is actually just one thread pool for both polling and request handlers. When a request comes in, an existing polling thread basically becomes a request handler thread, and when the request handler completes, that thread is available to become a polling thread again. The MIN_POLLERS and MAX_POLLERS options allow tuning the number of threads that are used for polling: when a polling thread becomes a request handler thread, if there are not enough polling threads remaining, a new one will be spawned, and when a request handler finishes, if there are too many polling threads, the thread will terminate. But none of this affects the number of threads used for request handlers -- that is still unbounded by default.

Are Rails Thread.current variables isolated to a single user request?

I am using Thread.current to store a current user id so that I can see who did various updates to our database. However, after some usage in production, it is returning other user ids than those who could be updating this data. Locally and on lesser-used QA instances, the user ids saved are appropriate.
We are using Rails 5.1, ruby 2.5.1 with Puma. RAILS_MAX_THREADS=1, but we do have a RAILS_POOL_SIZE=5. Any ideas what might cause this issue or how to fix it? Specifically, does a single Thread.current variable last longer than a single user request?

Why would Thread.current be limited to a request?
The same thread(s) are used for multiple requests.
Threads aren't killed at the end of the request, they just picks up the next request from the queue (or wait for the a request to arrive in the queue).
It would be different if you used the Timeout middleware, since timeouts actually use a thread to count the passage of time (and stop processing)... but creating new threads per request introduces performance costs.
Sidenote
Depending on your database usage (blocking IO), RAILS_MAX_THREADS might need to be significantly higher. The more database calls / data you have the more time threads will spend blocking on database IO (essentially sleeping).
By limiting the thread pool to a single thread, you are limiting request concurrency in a significant way. The CPU could be handling other requests while waiting for the database to return the data.

Does a game server create threads for each user request (like dota 2)?

For a user base of 100,000 and 4 users per game session, should we create new threads for each request such as create_session, move_player, use_attack, etc. ?
I wanted to know what would be the optimal way to handle large connections because if we create large number of threads, context switching will eat up most of the cycles and if no threads are created each request has to wait for previous request to complete.

I would avoid thread-per-connection if your goal is scalability. It would be better to have a queue of events and a thread pool.
A game company would probably use a non-connection-based internet protocol like UDP. All requests can theoretically come in on the same socket, so you only need 1 thread to handle that. That thread can assign work to other threads.
You can have a larger threadpool where any thread can be assigned any job. Or you could further organize the work into specific jobs, each with a threadpool to process a queue of tasks. But I wouldn't launch a new thread for each request.
How you design your threadpools and task distribution system depends on the libraries for whatever language you're using and the application requirements.

Is it acceptable to use ThreadPool.GetAvailableThreads to throttle the amount of work a service performs?

I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)

I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.

I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.

I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string