Need suggestion on best EventLoopGroup configuration for particular case - multithreading

I was going through multiple docs and examples about the correct configuration of EventLoopGroup but couldn't decide what's best for my use case.
I am using Netty 4.1.86.Final on linux.
Use case:
We have a web server that accepts 50-100k connections (both http :80 and https :443) and handles 20-30k requests per second. It's server-to-server communication, where multiple servers maintain connections for a short duration (keep-alive few minutes to an hour) and then recycle (close and open a new connection) the connection. Every request typically takes 50-100ms to process. The processing involves parsing JSON payload and running some complex business logic and some async network call to internal servers as well (cache, db etc.).
What I understood so far is that there are a couple of options that I can try -
use same EventLoopGroup for boss and worker and use separate EventExecutorGroup in pipeline for my business logic like: pipeline.addLast(handlerExecutorGroup, "handler", new MyHttpEndServer());
use different EventLoopGroup for boss and worker (.group(ioThreadExecutors, workerThreadExecutors)) and use a very high number of threads for the worker EventLoopGroup (cpu * 8)
use 3 different EventLoopGroup for boss, worker and handler (cpu * 8 thread).
use same EventLoopGroup for boss and worker and use SO_REUSEPORT and bind multiple threads to the same port so that instead of 1 I/O thread per port, now we will have more I/O thread per port to accept the connection along with using separate handlerExecutorGroup in pipeline for my business logic same as option 1.
.option(UnixChannelOption.SO_REUSEPORT, true)
for (int i = 0; i < channelThreadCount; i++) {
ChannelFuture f = httpServer.bind(port).sync();
which among these seems a better option for my use case? Or if there is some other better approach that I should try.
Note: I did the load test with all 4 options, but didn't notice much difference.


Ktor, Netty and increasing the number of threads per endpoint

Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?
Netty use what is called Non-blocking IO model (
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).
You might need to check the following configuration options for Netty
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.

Azure Service Bus Send throughput .Net SDK

I am currently implementing a library to send the messages faster to the Service bus queue. What is observed is that, if I used the same ServiceBusClient and use the same sender to send the messages in Parallel.For, the throughput is not so high and my network upload speed is not fully utilized. The moment I make individual clients and use them to send, the throughput increases drastically and even utilizes my upload bandwidth very well.
Is my understanding correct or a single client-sender must do? Also, I am averse to create multiple clients as it will use a lot of resources to establish the client connection. Any articles that throw some light on this?
There is a throughput test tool and its code also creates multiple client.
protected override Task OnStartAsync()
for (int i = 0; i < this.Settings.SenderCount; i++)
return Task.WhenAll(senders);
async Task SendTask()
var client = new ServiceBusClient(this.Settings.ConnectionString);
ServiceBusSender sender = client.CreateSender(this.Settings.SendPath);
var payload = new byte[this.Settings.MessageSizeInBytes];
var semaphore = new DynamicSemaphoreSlim(this.Settings.MaxInflightSends.Value);
var done = new SemaphoreSlim(1);
long totalSends = 0;
Is there a library to manage the connections in a pool?
From the patterns in your code, I'm assuming that you're using the Azure.Messaging.ServiceBus package. If that isn't the case, please ignore the remainder of this post.
ServiceBusClient represents a single AMQP connection to the service. Any senders, receivers, and processors spawned from this client will share that connection. This gives your application the ability to control the number of connections used and pool them in the manner that works best in your context.
It is recommended to reuse clients, senders, receivers, and processors for the lifetime of your application; though the connection is shared, each time a new child type is spawned, it must establish a new AMQP link and perform the authorization handshake - which is non-trivial overhead.
These types are self-managing with respect to resources. For idle periods, connections and links will be closed to avoid waste, and they'll automatically be recreated for the first operation that requires them.
With respect to using multiple clients, senders, receivers, and processors - it is a valid approach and can yield better performance in some scenarios. The one caveat that I'll mention is that using more clients than the number of CPU cores in your host environment comes with an increased risk of causing contention in the thread pool. The Service Bus library is highly asynchronous, and its performance relies on continuations for async calls being scheduled in a timely manner.
Unfortunately, performance tuning is very difficult to generalize due to how much it varies for different application and hosting contexts. To find the right number of senders to maximize throughput for your application, we recommend that you spend time testing different values and observing the performance characteristics in your specific system.
For the new SDK, the same principle of connection management is applied i.e., re-creating connections is expensive.
You can connect client objects directly to the bus or by creating a ServiceBusConnection, can share a single connection between client
This is for the scenario to send as many messages as possible to a single queue then you can increase throughput by spinning up multiple ServiceBusConnection and client objects on separate threads.
Is there a library to manage the connections in a pool?
There’s no connection pooling happening under the hood and new connections are relatively expensive to create. With the previous SDK the advice was to re-use factories and clients where possible.
Refer this article for more information.

nodejs cluster distributing connection

In nodejs api doc, it says
The cluster module supports two methods of distributing incoming
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
The second approach should, in theory, give the best performance. In
practice however, distribution tends to be very unbalanced due to
operating system scheduler vagaries. Loads have been observed where
over 70% of all connections ended up in just two processes, out of a
total of eight.
I know PM2 is using the first one, but why it doesn't use the second? Just because of unbalnced distribution? thanks.
The second may add CPU load when every child process is trying to 'grab' the socket master sent.

Netty multi threading per connection

I am new to netty. I would like to develop a server which aims at receiving requests from possibly few(say Max is of 2) clients. But each client will be sending many requests to server continuously. Server has to process such requests and respond to client. So, here I assume that even though if I configure multiple worker threds,it may not be useful as there are only 2 active connections. Worker thread again block till it process and respond to client. So, please let me know how to handle these type of problems.
If I use threadpoolexecutor in worker thread to process both clients requests in multi threaded manner, will it be efficient? Or if it cane achieved through netty framework, plz let me know how to do this?
Thanks in advance...
If I understand correctly: your clients (2) will send many messages, each of them implying an answear as quickly as possible from the server.
2 options can be seen:
The answear process is short time (short enough to not be an isssue for the rate you want to reach, meaning 1 thread is able to answear as fast as you need for 1 client): then you can stay with the standard threads from Netty (1 worker thread for 1 client at a time) set up in the server bootstrap. This is the shortest path.
The answear process is not short time enough (the rate will be terrible, for instance because there is a "long time" process, such as blocking call, database access, file writing, ...): then you can add a thread pool (a group) in the Netty pipeline for you ChannelHandler doing such blocking/long process.
Here is an extract of the API documentation taken from ChannelPipeline:
// Tell the pipeline to run MyBusinessLogicHandler's event handler methods
// in a different thread than an I/O thread so that the I/O thread is not blocked by
// a time-consuming task.
// If your business logic is fully asynchronous or finished very quickly, you don't
// need to specify a group.
pipeline.addLast(group, "handler", new MyBusinessLogicHandler());
just add a ChannelHandler with a special EventExecutorGroup to the ChannelPipeline. For example UnorderedThreadPoolEventExecutor (src).
something like this.
UnorderedThreadPoolEventExecutor executorGroup = ...;
pipeline.addLast(executorGroup, new MyChannelHandler());

How a thread service two data sockets (not control sockets) equally?

Suppose that we have a single-thread application, and it needs to service two clients by writing 1G bytes data to two separate tcp sockets (one socket per client) respectively, in this situcation how can the thread work on the two tasks equally and continually?
I think this problem exists in server applications like Apache, take the Apache Web Server as an example, the Apache sets a max thread limit for itself, say it is MAX_THREADS, and if there are (MAX_THREADS + 1) outstanding requests and sockets there which means at least one thread must handle two sockets equally. Then how would apache handle this situation?
Usually when we want to handle several sockets in a single threaded application then one of the following system calls are generally used
select (
poll (
epoll (
More on these calls can be found in the man pages.
the general idea is to make the single thread not get blocked waiting to get a resource and periodically check if data is available to send or receive
