Azure Functions - Service Bus Scaling - azure

I have an azure function listening to a service bus queue trigger using the dynamic consumption plan. Based on this documentation of the host.json config...
https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json
... you can set the following values
"serviceBus": {
// The maximum number of concurrent calls to the callback the message
// pump should initiate. The default is 16.
"maxConcurrentCalls": 16,
// The default PrefetchCount that will be used by the underlying MessageReceiver.
"prefetchCount": 100
},
Is there any documentation on setting the above for use with functions - particularly using a consumption plan.
The service bus performance best practices documentation suggests:
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-performance-improvements
When using the default lock expiration of 60 seconds, a good value for >SubscriptionClient.PrefetchCount is 20 times the maximum processing rates of >all receivers of the factory. For example, a factory creates 3 receivers, and >each receiver can process up to 10 messages per second. The prefetch count >should not exceed 20*3*10 = 600. By default, QueueClient.PrefetchCount is set >to 0, which means that no additional messages are fetched from the service.
Can somebody please shed some light on how these are/should be used within functions?
Thanks!

Looking at the ASB code for Azure WebJobs (base for Functions) it looks like there's a single receiver that is created. Hence the settings that you see that take into consideration a single receiver created.
ASB performance documentation is describing a scenario where you create your own message pumps and control number of factories and receivers.

Related

Controlling queue polling times

I have a piece of code that pushes a message to a service bus queue every time a new article is added on my web app. This then gets picked up by a ServiceBusTrigger with SendGrid output in my functions app that sends me an email that a new article has been added by someone.
This doesn't happen often at all and the only reason i decided to make it behave this way is to get my feet wet with some of the awesome Azure services.
My question is - since i don't really care about reeciving these notification email in real time... how can I reduce the frequency with which the trigger is checking the queue?
In my functions app's host.json I've already minimized the maxConcurrentCalls to 1 (default is 16).
"serviceBus": {
"maxConcurrentCalls": 1,
"prefetchCount": 100,
"autoRenewTimeout": "00:05:00"
}
Is there a way to also set it so that my trigger only checks the queue every 30 minutes or something like that?
No. Message retrieval is managed by Scaling Controller, which you don't have much influence on, apart from host.json parameters that you have already seen.
To implement your scenario, you would need to switch to Timer trigger running every 30 minutes and retrieve messages manually from Service Bus, arguably losing many benefits of Azure Functions.
Update: You can now integrate your Service Bus to Azure Event Grid and then use Event Grid triggered Function. Unfortunately, as of today it only works for Premium Service Bus namespace, so you'd most probably have to wait until they expand the feature to lower tiers.

Azure Function Queue Trigger falling way behind

I'm working on a demo for azure functions using queue triggers. I created a recursive Sudoku solver to show how to take depth first search and convert to using queued recursion. The code is on github.
I was expecting it to scale out and process an insane number of messages per second, but it is barely processing 30/s. The queue is filling up and the utilization seems minimal.
How can I get better performance from this? I tried increasing the batch size in the host.json, but didn't seem to help. I have over 200k messages in the queue and it's growing.
Update 1
I tried setting the host.json file as
{
"queues": {
"visibilityTimeout": "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 100
}
}
but request per second remained the same.
I deployed the same function to another instance, but tied it to S4 service plan. This is able to process about 64 requests per second, but still seems slow.
I can serial process the messages locally way faster than this.
Update 2
I scaled the S4 to 10 instances and each instance is handling about 60-70 requests per second. But that's insanely expensive to still not be able to process as fast as I can with a single core locally. The queue used with the service plan functions has 500k messages piled up.
Azure functions do not listen for an item to be added to a queue, they actually pole the queue using a polling algorithm which you can over ride with the maxPollingInterval property.
Adding "maxPollingInterval": "00:00:01" to the options you have already mentioned above should solve your problem.
maxPollingInterval azure documentaiton

Azure Functions EventHub trigger scale job function instances

I have an Azure Function that has EventHub trigger, with Consumption plan. In my test I shoot 3000 events to event hub using in a few batches. Since time for those 3000 events was almost 10 times bigger than the time for 300 events, I suspected that this Azure Function didn't scale to multiple VMs/instances.
To verify that hypothesis, I used a Guid static variable, which I initialized once and logged in every run of the function. All 3000 runs logged the same Guid.
That happens even if I specify following configuration in host.json:
"eventHub": {
"maxBatchSize": 1,
"prefetchCount": 10
}
Logic was that this would limit parallel processing within single instance and multiple instances would be started because of that, but again only 1 Guid is logged.
As a note, this is not the only function in App Service. Could that be the issue? What is the condition that needs to be satisfied so that Function is started on multiple VMs?
Edit:
I have 32 partitions and 20 throughput units. First issue was that I was using SendBatchAsync, which doesn't partition events. Even SendAsync didn't bring any scale, like it wasn't partitioning. So I created partitioned eventhub senders and did round robin partitioning when sending events in client application.
That increased number of events processed by AzureFunction, but still didn't create more than 1 VM.
Furthermore, number of events processed per second was much larger in the beginning (~200 in each moment), and after 2000 events, or near the end, they dropped to ~5. This has nothing to do with load of the system, as same behavior was observed with 9000 events, where slowing down happened after ~5k events.
This Azure function lasts 50-250 ms, depending on the load.
It also sends event to another Azure function through Azure Storage Queue trigger. What is interesting is that neither that function which is triggered by Queue trigger scales to more than 1 VM, and it has ~1k messages in queue at the beginning, before slowness of eventhub triggered azure function. Queue settings in host.json are "queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 1
}
Thanks.
It depends on a few factors:
the number of partitions your event hub has and whether the events you are writing are being distributed across your partitions. Azure Functions uses Event Processor Host to process your workload and the maximum scale you can get in this mode is one VM per partition.
the per-event workload you're executing. For example if your function does nothing but log, those 3000 events could be processed in less than 5 seconds on a single VM. This would not warrant scaling your application onto multiple instances.
However if you're writing a batch of events across several partitions which takes several minutes in total to process and you don't see your throughput accelerating as your function scales up then that could indicate that something is not working right and would warrant further investigation.

How to maximize WebJob CPU Usage

I have an azure storage queue that has over 100,000 queue items on it. The average processing time is about 1 minute to complete each item (as reported in the WebJob dashboard).
I have set the max batch size for my webJob to be 32 like this:
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.BatchSize = 32;
var host = new JobHost(config);
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
If I set it any higher than 32 the webjob won't start and keeps flipping between (pending restart and starting) so I assume 32 is the max batch size.
However, my app service plan is running with a cool 4% CPU utilization. I have enabled auto-scale based on CPU usage.
What I want to do is figure out how to make the web job do more tasks in parallel so it can start using more of that CPU usage if it needs it and hopefully cause it to auto scale and then process more. What levers can I pull to make my WebJob take better advantage of my App Service Plan instances?
Note that the BatchSize maximum of 32 is a limit imposed by Azure Queues that the WebJobs SDK doesn't control. A single queue listener can only pull a maximum of 32 messages at a time because that’s all queues allow. That's why your job is not starting properly when you set it greater than 32 - if you check your error logs you should see an error to that effect.
However, there is a second config knob that relates to parallel throughput that you can also configure. See config.Queues.NewBatchThreshold. This value defaults to half the BatchSize when not explicitly set. Basically, this setting is the threshold that governs when a new batch will be fetched. So if you increase this value (say setting it to 100), more queue messages will be processed in parallel. If set to 100, when the number of messages being processed dips below 100, a new batch will be fetched.
You can also further increase throughput by scaling out your job to multiple instances. I recommend trying the NewBatchThreshold setting first and see where that gets you.
This comment in the code explains the situation:
// Azure Queues currently limits the number of messages retrieved to 32. We enforce this constraint here because
// the runtime error message the user would receive from the SDK otherwise is not as helpful.
private const int MaxBatchSize = 32;
More information about this can be found on https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-queues/:
There are two ways you can customize message retrieval from a queue. First, you can get a batch of messages (up to 32). [etc...]
So that's where this limit is coming from. However, I'm thinking that the WebJobs SDK could theoretically process multiple queue batches at the same time, so it doesn't have to be bound to this Storage Queue limitation. That's something that you should bring up on https://github.com/Azure/azure-webjobs-sdk/issues for further discussion to see what can be done. But as it stands, that is indeed the limitation.

Azure service bus queue only receiving 450 messages at a time

Does Azure have some kind of limitation in terms of batch items received? The following code is only retrieving 450 messages despite being asked for more:
QueueConnector.MyQueueClient.ReceiveBatch(1000, new TimeSpan(0, 0, 10));
I've tried increased times, but it doesn't have any impact--450, every time. This appears to be the recommended way in the Azure SDK docs of batch receiving.
Note: there are tens of thousands of items in the queue.
The count passed to the ReceiveBatch is an upper bound and that is also mentioned right in the docs, so this is expected behavior. Service Bus will release a batch based on message availability or batch size. Batches are capped at 256 kByte for send and receive. For SendBatch, that's also said in the docs.

Resources