Azure Functions EventHub trigger scale job function instances - azure

I have an Azure Function that has EventHub trigger, with Consumption plan. In my test I shoot 3000 events to event hub using in a few batches. Since time for those 3000 events was almost 10 times bigger than the time for 300 events, I suspected that this Azure Function didn't scale to multiple VMs/instances.
To verify that hypothesis, I used a Guid static variable, which I initialized once and logged in every run of the function. All 3000 runs logged the same Guid.
That happens even if I specify following configuration in host.json:
"eventHub": {
"maxBatchSize": 1,
"prefetchCount": 10
}
Logic was that this would limit parallel processing within single instance and multiple instances would be started because of that, but again only 1 Guid is logged.
As a note, this is not the only function in App Service. Could that be the issue? What is the condition that needs to be satisfied so that Function is started on multiple VMs?
Edit:
I have 32 partitions and 20 throughput units. First issue was that I was using SendBatchAsync, which doesn't partition events. Even SendAsync didn't bring any scale, like it wasn't partitioning. So I created partitioned eventhub senders and did round robin partitioning when sending events in client application.
That increased number of events processed by AzureFunction, but still didn't create more than 1 VM.
Furthermore, number of events processed per second was much larger in the beginning (~200 in each moment), and after 2000 events, or near the end, they dropped to ~5. This has nothing to do with load of the system, as same behavior was observed with 9000 events, where slowing down happened after ~5k events.
This Azure function lasts 50-250 ms, depending on the load.
It also sends event to another Azure function through Azure Storage Queue trigger. What is interesting is that neither that function which is triggered by Queue trigger scales to more than 1 VM, and it has ~1k messages in queue at the beginning, before slowness of eventhub triggered azure function. Queue settings in host.json are "queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 1
}
Thanks.

It depends on a few factors:
the number of partitions your event hub has and whether the events you are writing are being distributed across your partitions. Azure Functions uses Event Processor Host to process your workload and the maximum scale you can get in this mode is one VM per partition.
the per-event workload you're executing. For example if your function does nothing but log, those 3000 events could be processed in less than 5 seconds on a single VM. This would not warrant scaling your application onto multiple instances.
However if you're writing a batch of events across several partitions which takes several minutes in total to process and you don't see your throughput accelerating as your function scales up then that could indicate that something is not working right and would warrant further investigation.

Related

Azure Service Bus Queue Performance

I am using the Azure service bus queue for one of my requirements. The requirement is simple, an azure function will act as an API and creates multiple jobs in the queue. The function is scalable and on-demand new instance creation. The job which microservice creates will be processed by a windows service. So the sender is Azure function and the receiver is windows service. Since the azure function is scalable, there will be multiple numbers of functions will be executed in parallel. So, the number of jobs getting created into the queue will be in parallel, and probably one job in every 500MS. Windows service is a single instance that is a Queue listener listens to this Queue and executes in parallel. So, the number of senders might be more, the receiver is one instance. And each job can run in parallel must be limited(4, since it takes more time and CPU) Right now, I am using Aure Service Bus Queue with the following configuration. My doubt is which configuration produces the best performance for this particular requirement.
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created. My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Receiving messages in PeekLock receive mode will less performant than ReceiveAndDelete. You'll be saving roundtrips to the broker to complete messages.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
MaxDeliveryCount is how many times a message can be attempted before it's dead-lettered. It appears to be equal to the number of cores, but it shouldn't. Could be just a coincidence.
MessageLockDuration will only matter if you use PeekLock receive mode. For ReceiveAndDelete it won't matter.
As for Concurrency, even though your work is CPU bound, I'd benchmark if higher concurrency would be possible.
An additional parameter on the message receiver to look into would be PrefetchCount. It can improve the overall performance by making fewer roundtrips to the broker.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The listener will immediately start processing the 5th message as your concurrency is set to 4 and one message processing has been completed.
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created.
To process messages in the order they were sent in you will need to send and receive messages using sessions.
My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
There are multiple things to take into consideration. The location of your Windows Service location would impact the latency and message throughput. Scaling out could help, etc.

How do you scale Azure function app(background job) based on the number items pending in a database?

So suppose that you have an application that lets user request a job. For example (hypothetical): user uploads a video. There is an entry made in RDBMs with the URL to video on blob and the status is set to "Pending".
There is a recurring time triggered functionapp that is executed every 10 seconds or so which gets 10 pending jobs from RDBMS and performs some compression etc.
The problem here is that as long as the number of requests stay 10-30 videos per 10 seconds we should be fine. But if the number of requests increase all of a sudden .. say 200 requests per 10 seconds this would mean that there will be a lot of job pending and the user would have to wait 10 times longer than usual to see status change. How do you scale out function app automatically in such scenario? Does it have to be manual?
There's an easier way to get fan out and parallel processing through multiple concurrently running Azure Functions.
Add an Azure Service Bus Queue to your solution.
For each video that needs to be processed, enqueue a service bus message with the appropriate data you'll need to retrieve and process the video (like the BlobId).
Have your Azure Function triggered by an ServiceBusTrigger.
Azure will spin up additional instances of your Azure Function as the queue depth increases. It'll also scale in idle instances after there's no more data to process.

Azure Functions Eventhub Input Trigger scaling

We have an EventHub with a retention of 1 day, containing millions of messages. To consume this, we have an Azure Function which reads from this event hub via the Event Hub Binding. The function basically reads the raw bytes, deserializes it into a json, does some transformations, and outputs it to another event hub.
It takes an EventData[] as input, to allow us to receive a batch of EventData at once. We have configured it to receive 1024 messages per batch.
When we start the function, and it needs to reprocess the last 24 hours, it's only using 1 node of the available 5 we have in the app service plan, as can be seen in the metrics:
According to the docs, scaling should behave like this:
When your function is first enabled, there is only one instance of the function. Let's call this function instance Function_0. Function_0 has a single EventProcessorHost instance that has a lease on all ten partitions. This instance is reading events from partitions 0-9. From this point forward, one of the following happens:
New function instances are not needed: Function_0 is able to process all 1000 events before the Functions scaling logic kicks in. In this case, all 1000 messages are processed by Function_0.
An additional function instance is added: The Functions scaling logic determines that Function_0 has more messages than it can process. In this case, a new function app instance (Function_1) is created, along with a new EventProcessorHost instance. Event Hubs detects that a new host instance is trying read messages. Event Hubs load balances the partitions across the its host instances. For example, partitions 0-4 may be assigned to Function_0 and partitions 5-9 to Function_1.
N more function instances are added: The Functions scaling logic determines that both Function_0 and Function_1 have more messages than they can process. New function app instances Function_2...Functions_N are created, where N is greater than the number of event hub partitions. In our example, Event Hubs again load balances the partitions, in this case across the instances Function_0...Functions_9.
I believe we're hitting option #1, even though we have 24 hours of data on the event hub with only 1 node processing the data. At this rate it takes many hours to process, while 4 nodes are idle.
How does Azure Function know when to scale in this scenario, and can we influence this behavior?

Azure Function Queue Trigger falling way behind

I'm working on a demo for azure functions using queue triggers. I created a recursive Sudoku solver to show how to take depth first search and convert to using queued recursion. The code is on github.
I was expecting it to scale out and process an insane number of messages per second, but it is barely processing 30/s. The queue is filling up and the utilization seems minimal.
How can I get better performance from this? I tried increasing the batch size in the host.json, but didn't seem to help. I have over 200k messages in the queue and it's growing.
Update 1
I tried setting the host.json file as
{
"queues": {
"visibilityTimeout": "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 100
}
}
but request per second remained the same.
I deployed the same function to another instance, but tied it to S4 service plan. This is able to process about 64 requests per second, but still seems slow.
I can serial process the messages locally way faster than this.
Update 2
I scaled the S4 to 10 instances and each instance is handling about 60-70 requests per second. But that's insanely expensive to still not be able to process as fast as I can with a single core locally. The queue used with the service plan functions has 500k messages piled up.
Azure functions do not listen for an item to be added to a queue, they actually pole the queue using a polling algorithm which you can over ride with the maxPollingInterval property.
Adding "maxPollingInterval": "00:00:01" to the options you have already mentioned above should solve your problem.
maxPollingInterval azure documentaiton

Azure Functions Event Hub trigger bindings

Just have a couple of questions regarding the usage of Azure Functions with an EventHub in an IoT scenario.
EventHub has partitions. Typically messages from a specific device go to the same partition. How are the instances of an Azure Function distributed across EventHub partitions? Is it based on the performance? In case one instance of an Azure Function manages to process events from all partitions then it is enough otherwise one might end up with one instance of an Azure Function per EventHub partition?
What about the read-offset? Does this binding somehow records where it stopped reading the event stream? I thought the functions are meant to be stateless and here we have some state.
Thanks
Each instance of an Event Hub-Triggered Function is backed by only 1 EventProcessorHost(EPH) instance. Event Hub ensures that only 1 EPH can get a lease on a given partition.
Answer to Question 1:
Let's elaborate on this with a contrived example. Suppose we begin with the following setup and assumptions for an EventHub:
10 partitions.
1000 events distributed evenly across all partitions => 100 messages in each partition.
When your Function is first enabled, there is only 1 instance of the Function. Let's call this Function instance Function_0. Function_0 will have 1 EPH that manages to get a lease on all 10 partitions. Let this EPH be called EPH_0, and it will start reading events from partitions 0-9. From this point forward, one of the following will happen:
Only 1 Function instance is needed - Function_0 is able to process all 1000 before the Azure Functions' scaling logic kicks in.
Hence, all 1000 messages are processed by Function_0.
Add 1 more Function instance - Azure Functions' scaling logic determines that Function_0 seems sluggish, so a new instance
Function_1 is created, resulting in EPH_1. Event Hub detects that a new EPH instance is trying read messages. Event Hub will start load
balancing the partitions across the EPH instances, e.g., partitions
0-4 are assigned to EPH_0 and partitions 5-9 are assigned to EPH_1.
If all Function execution succeed without errors, both EPH_0 and
EPH_1 checkpoints successfully and all 1000 messages are processed. When check-pointing succeeds, all 1000 messages should never be retrieved again.
Add N more function instances - Azure Functions' scaling logic determines that both Function_0 and Function_1 are still sluggish and
will repeat workflow 2 again for Function_2...N, where N>9. Event Hub will load balance the partitions across Function_0...9 instances.
Unique to Azure Functions' current scaling logic is the fact that N is >(number of partitions). This is done to ensure
that there are always instances of EPH readily available to quickly
get a lock on the partition(s). As a customer, you are only charged for the resources used when your Function instance executes, but you are not charged for this over-provisioning.
Answer to Question 2:
EPH uses a check-pointing mechanism to mark the last known successfully read message. An EventHub-Triggered Function can be setup to process 1 message or a batch of messages at a time. The option you choose needs to consider the following:
1. Speed of message processing - Processing messages in batches instead of a single message at a time is one of the factors that will speed up the ability of your Azure Function workflow to keep up with the incoming messages in your Event Hub.
2. Tolerance for duplicates - If check-pointing fails due to errors in your Function code/(Updated Aug 24th, 2017) timeout/partition least lost, then the next EPH that gets a lease on that partition will start retrieving messages from the last known checkpoint. Event Hub guarantees at-least-once delivery but not at-most-once delivery. Azure Functions will not attempt to change that behavior. If not having duplicate messages is a priority, then you will need to mitigate it in your workflow. As such, when check-pointing fails, there are more duplicate messages to manage if your Function is processing messages at batch level.
Function Apps are based on WebJobs SDK, which use EventHostProcessor to consume events from Event Hubs. So you can lookup information about EventHostProcessor and it will be applicable to your Function App.
Particularly, you can find the implementation of IEventProcessor
here.
To your questions:
Not sure what you mean by "one instance". One listener will be created per partition, but they can be both hosted inside a single App Plan Instance if the load is low. On the high level, you should not care much: in Consumption Plan you pay per execution time, no matter how many servers/processes/threads are running. Of course, you should care whether the auto-scaling works good enough for high load, but that needs to be tested anyway.
Functions are stateless in a sense that you can't save anything in-memory between two function executions. You are totally fine to save state in external storage. Function App will use PartitionContext.CheckpointAsync() for checkpointing of the current offset. Azure Storage is used internally; again you can read more about how it works in Event Hubs and EventHostProcessor docs, e.g. here.

Resources