How to maximize WebJob CPU Usage - azure

I have an azure storage queue that has over 100,000 queue items on it. The average processing time is about 1 minute to complete each item (as reported in the WebJob dashboard).
I have set the max batch size for my webJob to be 32 like this:
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.BatchSize = 32;
var host = new JobHost(config);
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
If I set it any higher than 32 the webjob won't start and keeps flipping between (pending restart and starting) so I assume 32 is the max batch size.
However, my app service plan is running with a cool 4% CPU utilization. I have enabled auto-scale based on CPU usage.
What I want to do is figure out how to make the web job do more tasks in parallel so it can start using more of that CPU usage if it needs it and hopefully cause it to auto scale and then process more. What levers can I pull to make my WebJob take better advantage of my App Service Plan instances?

Note that the BatchSize maximum of 32 is a limit imposed by Azure Queues that the WebJobs SDK doesn't control. A single queue listener can only pull a maximum of 32 messages at a time because that’s all queues allow. That's why your job is not starting properly when you set it greater than 32 - if you check your error logs you should see an error to that effect.
However, there is a second config knob that relates to parallel throughput that you can also configure. See config.Queues.NewBatchThreshold. This value defaults to half the BatchSize when not explicitly set. Basically, this setting is the threshold that governs when a new batch will be fetched. So if you increase this value (say setting it to 100), more queue messages will be processed in parallel. If set to 100, when the number of messages being processed dips below 100, a new batch will be fetched.
You can also further increase throughput by scaling out your job to multiple instances. I recommend trying the NewBatchThreshold setting first and see where that gets you.

This comment in the code explains the situation:
// Azure Queues currently limits the number of messages retrieved to 32. We enforce this constraint here because
// the runtime error message the user would receive from the SDK otherwise is not as helpful.
private const int MaxBatchSize = 32;
More information about this can be found on https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-queues/:
There are two ways you can customize message retrieval from a queue. First, you can get a batch of messages (up to 32). [etc...]
So that's where this limit is coming from. However, I'm thinking that the WebJobs SDK could theoretically process multiple queue batches at the same time, so it doesn't have to be bound to this Storage Queue limitation. That's something that you should bring up on https://github.com/Azure/azure-webjobs-sdk/issues for further discussion to see what can be done. But as it stands, that is indeed the limitation.

Related

Azure Service Bus Queue Performance

I am using the Azure service bus queue for one of my requirements. The requirement is simple, an azure function will act as an API and creates multiple jobs in the queue. The function is scalable and on-demand new instance creation. The job which microservice creates will be processed by a windows service. So the sender is Azure function and the receiver is windows service. Since the azure function is scalable, there will be multiple numbers of functions will be executed in parallel. So, the number of jobs getting created into the queue will be in parallel, and probably one job in every 500MS. Windows service is a single instance that is a Queue listener listens to this Queue and executes in parallel. So, the number of senders might be more, the receiver is one instance. And each job can run in parallel must be limited(4, since it takes more time and CPU) Right now, I am using Aure Service Bus Queue with the following configuration. My doubt is which configuration produces the best performance for this particular requirement.
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created. My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Receiving messages in PeekLock receive mode will less performant than ReceiveAndDelete. You'll be saving roundtrips to the broker to complete messages.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
MaxDeliveryCount is how many times a message can be attempted before it's dead-lettered. It appears to be equal to the number of cores, but it shouldn't. Could be just a coincidence.
MessageLockDuration will only matter if you use PeekLock receive mode. For ReceiveAndDelete it won't matter.
As for Concurrency, even though your work is CPU bound, I'd benchmark if higher concurrency would be possible.
An additional parameter on the message receiver to look into would be PrefetchCount. It can improve the overall performance by making fewer roundtrips to the broker.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The listener will immediately start processing the 5th message as your concurrency is set to 4 and one message processing has been completed.
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created.
To process messages in the order they were sent in you will need to send and receive messages using sessions.
My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
There are multiple things to take into consideration. The location of your Windows Service location would impact the latency and message throughput. Scaling out could help, etc.

How do you scale Azure function app(background job) based on the number items pending in a database?

So suppose that you have an application that lets user request a job. For example (hypothetical): user uploads a video. There is an entry made in RDBMs with the URL to video on blob and the status is set to "Pending".
There is a recurring time triggered functionapp that is executed every 10 seconds or so which gets 10 pending jobs from RDBMS and performs some compression etc.
The problem here is that as long as the number of requests stay 10-30 videos per 10 seconds we should be fine. But if the number of requests increase all of a sudden .. say 200 requests per 10 seconds this would mean that there will be a lot of job pending and the user would have to wait 10 times longer than usual to see status change. How do you scale out function app automatically in such scenario? Does it have to be manual?
There's an easier way to get fan out and parallel processing through multiple concurrently running Azure Functions.
Add an Azure Service Bus Queue to your solution.
For each video that needs to be processed, enqueue a service bus message with the appropriate data you'll need to retrieve and process the video (like the BlobId).
Have your Azure Function triggered by an ServiceBusTrigger.
Azure will spin up additional instances of your Azure Function as the queue depth increases. It'll also scale in idle instances after there's no more data to process.

Bursts of Redis errors

We've recently created a new Standard 1 GB Azure Redis cache specifically for distributed locking - separated from our main Redis cache. This was done to improve stability on our main Redis cache which is a very long term issue which this action seems to of significantly helped with.
On our new cache, we observe bursts of ~100 errors within the same few seconds every 1 - 3 days. The errors are either:
No connection is available to service this operation (StackExchange.Redis error)
Or:
Could not acquire distributed lock: Conflicted (RedLock.net error)
As they are errors from different packages, I suspect the Redis cache itself is the problem here. None of the stats during this time look out of the ordinary and the workload should fit comfortably in the Standard 1GB size.
I'm guessing this could be caused by the advertised Low network performance advertised, is this likely the cause?
Your theory sounds plausible.
Checking for insufficient network bandwidth
Here is a handy table showing the maximum observed bandwidth for various pricing tiers. Take a look at the observed maximum bandwidth for your SKU, then head over to your Redis blade in the Azure Portal and choose Metrics. Set the aggregation to Max, and look at the sum of cache read and cache write. This is your total bandwidth consumed. Overlay the sum of these two against the time period when you're experiencing the errors, and see if the problem is network throughput. If that's the case, scale up.
Checking server load
Also on the Metrics tab, take a look at server load. This is the percentage that Redis is busy and is unable to process requests. If you hit 100%, Redis cannot respond to new requests and you will experience timeout issues. If that's the case, scale up.
Reusing ConnectionMultiplexer
You can also run out of connections to a Redis server if you're spinning up a new instance of StackExchange.Redis.ConnectionMultiplexer per request. The service limits for the number of connections available based on your SKU are here on the pricing page. You can see if you're exceeding the maximum allowed connections for your SKU on the Metrics tab, select max aggregation, and choose Connected Clients as your metric.
Thread Exhaustion
This doesn't sound like your error, but I'll include it for completeness in this Rogue's Gallery of Redis issues, and it comes into play with Azure Web Apps. By default, the thread pool will start with 4 threads that can be immediately allocated to work. When you need more than four threads, they're doled out at a rate of one thread per 500ms. So if you dump a ton of requests on a Web App in a short period of time, you can end up queuing work and eventually having requests dropped before they even get to Redis. To test to see if this is a problem, go to Metrics for your Web App and choose Threads and set the aggregation to max. If you see a huge spike in a short period of time that corresponds with your trouble, you've found a culprit. Resolutions include making proper use of async/await. And when that gets you no further, use ThreadPool.SetMinThreads to a higher value, preferably one that is close to or above the max thread usage that you see in your bursts.
Rob has some great suggestions but did want to add information on troubleshooting traffic burst and poor ThreadPool settings. Please see: Troubleshoot Azure Cache for Redis client-side issues
Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.
I hope you find this additional information is helpful.

Azure Functions EventHub trigger scale job function instances

I have an Azure Function that has EventHub trigger, with Consumption plan. In my test I shoot 3000 events to event hub using in a few batches. Since time for those 3000 events was almost 10 times bigger than the time for 300 events, I suspected that this Azure Function didn't scale to multiple VMs/instances.
To verify that hypothesis, I used a Guid static variable, which I initialized once and logged in every run of the function. All 3000 runs logged the same Guid.
That happens even if I specify following configuration in host.json:
"eventHub": {
"maxBatchSize": 1,
"prefetchCount": 10
}
Logic was that this would limit parallel processing within single instance and multiple instances would be started because of that, but again only 1 Guid is logged.
As a note, this is not the only function in App Service. Could that be the issue? What is the condition that needs to be satisfied so that Function is started on multiple VMs?
Edit:
I have 32 partitions and 20 throughput units. First issue was that I was using SendBatchAsync, which doesn't partition events. Even SendAsync didn't bring any scale, like it wasn't partitioning. So I created partitioned eventhub senders and did round robin partitioning when sending events in client application.
That increased number of events processed by AzureFunction, but still didn't create more than 1 VM.
Furthermore, number of events processed per second was much larger in the beginning (~200 in each moment), and after 2000 events, or near the end, they dropped to ~5. This has nothing to do with load of the system, as same behavior was observed with 9000 events, where slowing down happened after ~5k events.
This Azure function lasts 50-250 ms, depending on the load.
It also sends event to another Azure function through Azure Storage Queue trigger. What is interesting is that neither that function which is triggered by Queue trigger scales to more than 1 VM, and it has ~1k messages in queue at the beginning, before slowness of eventhub triggered azure function. Queue settings in host.json are "queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 1
}
Thanks.
It depends on a few factors:
the number of partitions your event hub has and whether the events you are writing are being distributed across your partitions. Azure Functions uses Event Processor Host to process your workload and the maximum scale you can get in this mode is one VM per partition.
the per-event workload you're executing. For example if your function does nothing but log, those 3000 events could be processed in less than 5 seconds on a single VM. This would not warrant scaling your application onto multiple instances.
However if you're writing a batch of events across several partitions which takes several minutes in total to process and you don't see your throughput accelerating as your function scales up then that could indicate that something is not working right and would warrant further investigation.

Azure service bus queue only receiving 450 messages at a time

Does Azure have some kind of limitation in terms of batch items received? The following code is only retrieving 450 messages despite being asked for more:
QueueConnector.MyQueueClient.ReceiveBatch(1000, new TimeSpan(0, 0, 10));
I've tried increased times, but it doesn't have any impact--450, every time. This appears to be the recommended way in the Azure SDK docs of batch receiving.
Note: there are tens of thousands of items in the queue.
The count passed to the ReceiveBatch is an upper bound and that is also mentioned right in the docs, so this is expected behavior. Service Bus will release a batch based on message availability or batch size. Batches are capped at 256 kByte for send and receive. For SendBatch, that's also said in the docs.

Resources