I have the following code where I start getting an error during long-running tests on the same Service Bus Client.
ServiceBusMessageBatch batch = this._serviceBusSender.CreateMessageBatchAsync().GetAwaiter().GetResult();
The error is,
Azure.Messaging.ServiceBus.ServiceBusException: 'The operation did not complete within the allocated time 00:01:00 for object request42. (ServiceTimeout)'
Why is this statement throwing this error? Is the creation of a batch object such a heavy operation that it can even timeout? If this is the case, should I switch to the overload of using the List of ServiceBusMessage instead of this batch mode?
My understanding is that this way of batch creation can protect me from creating a batch that the queue may not allow. I am finding it difficult to understand why it times out after 1 min
.
In order for a batch to be able to enforce limits on the size, it has to establish an AMQP link to the entity that you'll be sending to and read the maximum allowable message size from the service. This results in a network operation that, in this case, timed out. This overhead is performed only in the case that there is not an existing AMQP link already established - typically on the first call that requires a network operation.
What jumps out at me from your code is the use of GetAwaiter().GetResult() to perform sync-over-async. This is really not a good idea and is very likely to cause contention in the thread pool that prevents continuations from being scheduled in a timely manner. Because network operations in Service Bus are asynchronous - including establishing the AMQP link - delays in scheduling continuations would certainly increase the chance of timeouts.
I'd strongly advise refactoring your sync-over-async code paths and shifting to an asynchronous approach. In those scenarios where it's not possible to go full async, limiting sync-over-async to the outermost layer of your code would be the next best thing.
Related
Everything I can find about performance of Amazon Simple Queue Service (SQS), including their own documentation, suggests that getting high throughput requires multiple threads. And I've verified this myself using the JS API with Node 12. If I create multiple threads, I get about the same throughput on each thread, so the total throughput increase is pretty much linear. But I'm running this on a nice machine with lots of cores. When I run in Lambda on a single core, multiple threads don't improve the performance, and generally this is what I would expect of multi-threaded apps.
But here's what I don't understand - there should be very little going on here in the way of CPU, most of the time is spent waiting on web requests. The AWS SQS API appears to be asynchronous in that all of the methods use callbacks for the responses, and I'm using Promises to "asyncify" all of the API calls, with multiple tasks running concurrently. Normally doing this with any kind of async IO is handled great by Node, and improves throughput hugely, I do it all the time with database APIs, multiple streams, etc. But SQS definitely isn't behaving that way, it's behaving as though its IO is actually synchronous and blocking threads on the network calls, which would be outrageous for any modern API.
Has anyone had success getting high SQS message throughput in a single Node thread? The max I'm seeing is about 50 to 100 messages/sec for FIFO queues (send, receive, and delete, all of which are calling the batch methods with the max batch size of 10). And this is running in lambda, i.e. on their own network, which is only slightly faster than running it on my laptop over the Internet, another surprising find. Amazon's documentation says FIFO queues should support up to 3000 messages per second when batching, which would be just fine for me. Does it really take multiple threads on multiple cores or virtual CPUs to achieve this? That would be ridiculous, I just can't believe that much CPU would be used, it should be mostly IO time, which should be asynchronous.
Edit:
As I continued to test, I found that the linear improvement with the number of threads only happened when each thread was processing a different queue. If the threads are all processing the same queue, there is no improvement by adding threads. So it behaves as though each queue is throttled by Amazon. But the throughput to which it seems to be throttling is way below what I found documented as the max throughput. Really confused and disappointed right now!
Michael's comments to the original question were right on. I was sending all messages to the same message group. I had previously been working with AMQP message queues, in which messages will be ordered in the queue in the order they're sent, and they'll be distributed to subscribers in that order. But when multiple listeners are consuming the AMQP queue, because of varying network latencies, there is no guarantee that they'll be received in that order chronologically.
So that's actually a really cool feature of SQS, the guarantee that messages will be chronologically received in the order they were sent within the same message group. In my case, I don't care about the receipt order. So now I'm setting a unique message group ID on each message, and scaling up performance by increasing the number of async message receive loops, still just in one thread, and the throughput is amazing!
So the bottom line: If exact receipt order of messages isn't important for your FIFO queue, set the message group ID to a unique value on each message, and scale out with more receiver tasks to get the best throughput performance. If you do need guaranteed message ordering, it looks like around 50 messages per second is about the best you'll do.
We are using Azure service bus via NServiceBus and I am facing a problem with deciding the correct architecture for dealing with long running tasks as a result of messages.
As is good practice, we don't want to block the message handler from returning by making it wait for long running processes (downloading a large file from a remote server), and actually doing so will cause the lock on the message to be lost with Azure SB. The plan is to respond by spawning a separate task and allow the message handler to return immediately.
However this means that the handler is now immediately available for the next message which will cause another task to be spawned and so on until the message queue is empty. What I'd like is some way to stop taking messages while we are processing (a limited number of) earlier messages. Is there an accepted pattern for this with NServiceBus and Azure Service Bus?
The following is what I'd kind of do if I was programming directly against the Azure SB
{
while(true)
{
var message = bus.Next();
message.Complete();
// Do long running stuff here
}
}
The verbs Next and Complete are probably wrong but what happens under Azure is that Next gets a temporary lock on the message so that other consumers can no longer see the message. Then you can decide if you really want to process the message and if so call Complete. That then removes the message from the queue entirely, failing to do so will cause the message to appear back on the queue after a period of time as Azure assumes you crashed. As dirty as this code looks it would achieve my goals (so why not do it?) as my consumer is only going to consume the next time I'm available (after the long running task). Other consumers (other instances) can jump in if necessary.
The problem is that NServiceBus adds a level of abstraction so that now handling a message is via a method on a handler class.
void Handle(NewFileMessage message)
{
// Do work here
}
The problem is that Azure does not get the call to message.Complete() until after your work and after the Handle method exits. This is why you need to keep the work short. However if you exit you also signal that you are ready to handle another message. This is my Catch 22
Downloading on a background thread is a good idea. You don't want to to increase lock duration, because that's a symptom, not the problem. Your download can easily get longer than maximum lock duration (5mins) and then you're back to square one.
What you can do is have an orchestrating saga for download. Saga can monitor the download process and when download is completed, b/g process would signal to the saga about completion. If download is never finished, you can have a timeout (or multiple timeouts) to indicate that and have a compensating action or retry, whatever works for your business case.
Documentation on Sagas should get you going: http://docs.particular.net/nservicebus/sagas/
In Azure Service Bus you can increase the lock duration of a message (default set to 30 seconds) in case the handling will take a long time.
But, besides you are able to increase your lock duration, it's generally an indication that your handler takes care of to much work which can be divided over different handlers.
If it is critical that the file is downloaded, I would keep the download operation in the handler. That way if the download fails the message can be handled again and the download can be retried. If however you want to free up the handler instantly to handle more messages, I would suggest that you scale out the workers that perform the download task so that the system can cope with the demand.
How can the disruptor be used effectively on processes where there are variable-duration "business logic" tasks? Has it been done before?
Can it be done with a second ring buffer processing the response stage? If so, how do I go about that.
I understand the Disruptor and see some specific parts of my call chain where I could apply the concept. To be specific, the application is a middleware-type application which performs following steps:
read inbound message, unmarshall to request
identify customer details for request, determine workflow steps to process
call backend system to execute steps
collate responses, log response, marshall, and return to consumer
The issue is that some instances of the backend steps can take a "long" time, which potentially forces the response stage for short running tasks to wait for longer running tasks. Assume call of backend system can be done either async or sync - so the idea would be that the backend system call is simply a consumer that triggers an async request to backend.
Back end system response time can be anywhere from 5ms (some requests), 50ms (90% of requests) to - 5000ms (1% of requests) (think large disk I/O).
I can see the Disruptor as potentially highly efficient but can't get my head around this hurdle to keep average latency down.
I have an question about architecture/performance. I'm talking about a SIP server that processes multiples client requests concurrently. I suppose that each request is treated in a dedicated thread. At the end of the process, the concerned thread log request specific infos in a file. I want to optimize the last part of processing. I mean I want to know what alternatives you propose instead of logging these infos in a file. Why? Because writing in a file after processing uses resources that I would use to process other arriving requests.
First, what do you think about the question? And, if you think that it's a "true" question (I mean that an alternative may optimize the performances), what do you propose?
I thought about logging the data into a queue and to use another process IN ANOTHER MACHINE that would read from the queue and write to a file.
Thanks for your suggestions
If it is NOT a requirement that the log is written before the request returns - i.e. the logging is not part of the atomic response - then you have the option of returning the response and just initiating the logging action.
Putting the logging data in a queue in memory seems reasonable. You can read that queue and write to disk either on the same machine or another. I would start with a thread in your app as this is easiest to implement and since the disk I/O is going to be the limiting factor, it shouldn't impact your server much.
If the log is required to be written BEFORE the response is returned, you still have the option of using a reliable queue like MSMQ.
I suspect that network overhead involved in moving the logging to another machine is problably going to create more problems than it solves. I would go with #Nicholas' solution - queue off the logs to one thread on the same machine. The queue allows slack so that occasional disk latency is mitigated and the logging thread can make its own optimizations, eg. waiting until it has a cluster-size of logs before writing. Other stuff, like opening a new log file every day or whenever the log-file reaches a limiting size are also much easier without affecting the performance of the main server.
Even if you log on another machine, you should still queue off the logging to mitigate network latency.
If the log objects on the queue contain, say, a 'request' enumeration, (eg. ElogWrite, ElogNewFile, ElogPath, ElogShutdown), you could try both - you could queue up a request for the log thread to close its current log file and open a path to a file on a networked machine at runtime - the queue buffer would absorb the delay of doing this.
I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)
I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.
I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.
I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.