gRPC Server - maximum number of executor threads reached - multithreading

I have implemented a gRPC server which has just only one RPC method. It takes the input object (contained in the request) and write synchronously it into an Apache Kafka topic thanks to the Kafka clients producer API. I have set as executor a fixed thread pool of 50 threads.
Suppose that Kafka brokers are not available due to a temporary fault and the gRPC server receives so many requests such that all the 50 threads to handle them become busy since they are all blocked due to the synchronous write retries for the Kafka topic.
What does it happen if other requests arrives while all the 50 threads are busy?
Does the gRPC queue them safely? Is there a risk to lose some request?
Do you know where this concept might be described in the official documentation?
Thank you very much.
P.s: Kafka is just an example I used to explain the question, you can think of any other service which requires a synchronous write.

I'm assuming that you are using C++. I'll also preface this answer by saying that the C++ synchronous API for gRPC is not very performant. The async API is what is generally recommended for performant code.
Yes, those bytes would get queued up in the transport layer. To enforce limits on how many bytes would get queued up, please configure ResourceQuota [1]


gRPC cpp synchronous vs asynchronous server performance

I understand the differences between sync and async server however I am wondering if have these 2 cases, which one would be more performant async or sync?
Sync: Write call will be blocking till message is ready to be sent on the wire from the internal completion queue.
Async : write call immediately returns and we need to wait on completion queue.
In Sync server what if we add queue which basically populated for evry Write call and other thread draining it and doing stream.write then performance will be same?
Sync: gRPC internally creats threadpool with threads equal to number of CPUs
Async: threading is upto implementation. So if for each client if we create separate thread and completion queue, would the performance wil be same for sync and async?
It's difficult to compare performance theoretically. As a rule of thumb, if your choices are between letting gRPC handle concurrent calls internally in a way it's designed to handle, vs managing gRPC call concurrency yourself with the sync API, chances are gRPC internals will be better tuned for performance than you can manage yourself. There may be exceptions to that advice, depending on many variables ... for example, if the server is doing something very fast and inexpensive and the messages are small, the sync API might be fine.
In the end, benchmarks are your friends.
gRPC performance advice:
The official gRPC benchmarks (under development): The tests that underly those benchmarks may be informative in your design choices.

Improving Amazon SQS Performance

Everything I can find about performance of Amazon Simple Queue Service (SQS), including their own documentation, suggests that getting high throughput requires multiple threads. And I've verified this myself using the JS API with Node 12. If I create multiple threads, I get about the same throughput on each thread, so the total throughput increase is pretty much linear. But I'm running this on a nice machine with lots of cores. When I run in Lambda on a single core, multiple threads don't improve the performance, and generally this is what I would expect of multi-threaded apps.
But here's what I don't understand - there should be very little going on here in the way of CPU, most of the time is spent waiting on web requests. The AWS SQS API appears to be asynchronous in that all of the methods use callbacks for the responses, and I'm using Promises to "asyncify" all of the API calls, with multiple tasks running concurrently. Normally doing this with any kind of async IO is handled great by Node, and improves throughput hugely, I do it all the time with database APIs, multiple streams, etc. But SQS definitely isn't behaving that way, it's behaving as though its IO is actually synchronous and blocking threads on the network calls, which would be outrageous for any modern API.
Has anyone had success getting high SQS message throughput in a single Node thread? The max I'm seeing is about 50 to 100 messages/sec for FIFO queues (send, receive, and delete, all of which are calling the batch methods with the max batch size of 10). And this is running in lambda, i.e. on their own network, which is only slightly faster than running it on my laptop over the Internet, another surprising find. Amazon's documentation says FIFO queues should support up to 3000 messages per second when batching, which would be just fine for me. Does it really take multiple threads on multiple cores or virtual CPUs to achieve this? That would be ridiculous, I just can't believe that much CPU would be used, it should be mostly IO time, which should be asynchronous.
As I continued to test, I found that the linear improvement with the number of threads only happened when each thread was processing a different queue. If the threads are all processing the same queue, there is no improvement by adding threads. So it behaves as though each queue is throttled by Amazon. But the throughput to which it seems to be throttling is way below what I found documented as the max throughput. Really confused and disappointed right now!
Michael's comments to the original question were right on. I was sending all messages to the same message group. I had previously been working with AMQP message queues, in which messages will be ordered in the queue in the order they're sent, and they'll be distributed to subscribers in that order. But when multiple listeners are consuming the AMQP queue, because of varying network latencies, there is no guarantee that they'll be received in that order chronologically.
So that's actually a really cool feature of SQS, the guarantee that messages will be chronologically received in the order they were sent within the same message group. In my case, I don't care about the receipt order. So now I'm setting a unique message group ID on each message, and scaling up performance by increasing the number of async message receive loops, still just in one thread, and the throughput is amazing!
So the bottom line: If exact receipt order of messages isn't important for your FIFO queue, set the message group ID to a unique value on each message, and scale out with more receiver tasks to get the best throughput performance. If you do need guaranteed message ordering, it looks like around 50 messages per second is about the best you'll do.

Queue vs Non Blocking I/O

So, we're designing a new micro-service architecture. One of the biggest challenge is internal communication. For communication, in which response is required, we're using REST APIs. But for the services, which just wants to relay the information, this API processing is unnecessary overhead.
One way is to use Queue. The service1 will push the information into a queue, and service2 can consume from there. Therefore service1 don't have to wait (unlike an API call). (If there is any error in processing the information, service2 can either inform via a callback URL to service1, or any other way; this is not a concern at this point [1])
Now with Queue, there are two options, one is RabbitMQ. And another is AWS SQS. With RabbitMQ I've to worry about server-setup and everything (which can be done, but wants to avoid it). So after a POC of SQS, it seems like a good option, but the thing is SQS internally uses Rest APIs to communicate with AWS servers, at both point (service1 when pushing, service2 when consuming), there will be overhead. So now I'm thinking why not do it in NodeJS, service1 will hit the service2 with information. Service2 will respond immediately, acknowledging that it has received the information, if there is any error then [1].
Now Pros/Cons I could summarise is -
Easy to implement
In case of unavailability of receiver, sender won't have to worry about retrying.
Server Setup Cost + Maintenance (+ Tuning)
Easiest to implement
Constant Polling for Messages
Overhead at push/receive
Non-blocking APIs
No 3rd medium required for communication
Service1 has to manage retry mechanism
Relative to SQS, less overhead
Information will be in-memory until processed
So to some up, my question is, is it a good idea to go with Non-blocking APIs? Or which one will be better approach, in terms of making system scalable.
Edit -
Can a PubSub provider like PubNub or Pusher can be used instead of Queue?
SQS uses XML over http, RabbitMQ uses AMQP, all protocols have overhead. Serializing/deserializing has a cost. Both the amazon SQS and AMQP are very efficient. I would exclude these "overheads" from your calculations, and instead focus on your other requirements.
One of the big advantages of using a queue is the handling of surge activity. If you get 100K hits, and need to send 100K messages, and you try to implement this as inter-service calls (non-blocking or otherwise), you will hit real limits on the scalability of your system (from a port count if nothing else). If you instead put 100K messages on a queue, those messages can be processed basically at the remote server's "leisure".
Additionally, as you have mentioned above, queues have a persistence that is much more difficult to implement on your own. If you data is not critical, this is not a big concern, but if this data is of higher importance, you really want something that pushes to a persistent store (Like SQS, or Rabbit persistent queues)...
I am late here but off late I have started working with NON Blocking I/O and see a great benefit of NIO especially when you are calling external services which cannot be given access to a message queue. Using a fixed connection pool will ensure that 100K problem is handled with non blocking I/O and too many connections are not created.
While calling internal services a message queue is prefered, but lets say you do not have that option, you can leverage NIO with a retry mechanism and connection pooling to given you the same scalability message queues would give. This is assuming that receivers are able to handle the load of NIO calls.

Netty multi threading per connection

I am new to netty. I would like to develop a server which aims at receiving requests from possibly few(say Max is of 2) clients. But each client will be sending many requests to server continuously. Server has to process such requests and respond to client. So, here I assume that even though if I configure multiple worker threds,it may not be useful as there are only 2 active connections. Worker thread again block till it process and respond to client. So, please let me know how to handle these type of problems.
If I use threadpoolexecutor in worker thread to process both clients requests in multi threaded manner, will it be efficient? Or if it cane achieved through netty framework, plz let me know how to do this?
Thanks in advance...
If I understand correctly: your clients (2) will send many messages, each of them implying an answear as quickly as possible from the server.
2 options can be seen:
The answear process is short time (short enough to not be an isssue for the rate you want to reach, meaning 1 thread is able to answear as fast as you need for 1 client): then you can stay with the standard threads from Netty (1 worker thread for 1 client at a time) set up in the server bootstrap. This is the shortest path.
The answear process is not short time enough (the rate will be terrible, for instance because there is a "long time" process, such as blocking call, database access, file writing, ...): then you can add a thread pool (a group) in the Netty pipeline for you ChannelHandler doing such blocking/long process.
Here is an extract of the API documentation taken from ChannelPipeline:
// Tell the pipeline to run MyBusinessLogicHandler's event handler methods
// in a different thread than an I/O thread so that the I/O thread is not blocked by
// a time-consuming task.
// If your business logic is fully asynchronous or finished very quickly, you don't
// need to specify a group.
pipeline.addLast(group, "handler", new MyBusinessLogicHandler());
just add a ChannelHandler with a special EventExecutorGroup to the ChannelPipeline. For example UnorderedThreadPoolEventExecutor (src).
something like this.
UnorderedThreadPoolEventExecutor executorGroup = ...;
pipeline.addLast(executorGroup, new MyChannelHandler());

Mule: Thread count under load with doThreading="false"

we have a mule app with HTTP inbound endpoint and I'm trying to figure out how to control the thread count under load. As an experiment I have added the following configuration:
<core:default-threading-profile doThreading="false" maxThreadsActive="500" poolExhaustedAction="RUN"/>
Under load I'm seeing the thread count peak at over 1000 threads. Am not sure why this is the case give the maxThreadsActive setting and the doThreading="false". Reading about poolExhaustedAction="RUN", I would expect the listener thread to block while processing inbound requests rather than spawn new ones, and finally reject the connection if its backlog queue is full. I never see rejected client connections.
Does Mule maintain a separate thread pool for each inbound endpoint in the app (sorry if this is in the documentation)? Even if so, don't think it helps explain what I'm seeing.
Any help appreciated. We are running a number of mule apps in one container and I'd like to control the total number of threads.
Thanks, Alfie.
Clearly the doThreading attribute on default-threading-profile is not enough to control Mule threading as a whole nor limit with a global cap the specific threading behaviour of transports. I reckon you're getting 500 threads for the HTTP message receiver pool and 500 for the VM message dispatcher pool.
I strongly suggest you reading about tuning Mule:
My gut feel is that you need to
configure threading on each transport (VM, HTTP), strictly specifying the pool size for receivers and dispatchers,
select flow processing strategies that prevent Mule from spawning new threads (i.e. use synchronous to hog the receiver threads),
select exchange patterns that also prevent Mule from spawning new threads (i.e. use request-response to piggyback the current execution thread).
