Disabling Azure Function multiple simultaneous triggers - node.js

I have an Azure Function App with 7 Functions written in Node.js, deployed though Bitbucket CI and the following host.json-file at the root of the project:
{
"id": "...",
"queues": {
// retrieve only 1 queue message at a time
"batchSize": 1,
"maxDequeueCount": 1,
"newBatchThreshold": 0
}
}
As far as I know, this should make sure that only 1 queue-triggered function is triggered at a time.
However, If I put the following statement in my function:
context.log(`Started processing at ${new Date()}`);
I still have some cases with the exact same start time for 2 queue messages.
Any idea what I am missing?
Is this batchSize property used per function or per queue?

If you look in the host.json documentation, you can see that batchSize is per (job) function.
Here's where Azure Functions pulls the batch, in queue listener.
There is a separate queue listener created per QueueTrigger binding, so batchSize is per-function. In this case, as you have 7 functions triggering from the same queue, you could have as many as 7 queue messages processing simultaneously on this instance in separate function executions.
I verified by writing a webjobs console app that has two functions triggered from the same queue. I could see that there were two separate QueueListener objects (via GetHashCode), one for each function, listening on the same queue simultaneously.
Why do you want to disable simultaneous execution?

Related

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!
This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.
It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

How to limit concurrent Azure Function executions

I've seen this problem expressed a lot but I've yet to find a working solution.
In short, I periodically have a large batch of processing operations to be done. Each operation is handled by an Azure Function. Each operation makes calls to a database. If I have too many concurrent functions running at the same time, this overloads the database and I get timeout errors. So, I want to be able to limit the number of concurrent Azure Function calls that are run at a single time.
I've switched the function to be queue-triggered and tweaked the batchSize / newBatchThreshold / maxDequeueCount host.json settings in many ways based on what I've seen online. I've also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT application setting to 1 in my function application settings to prevent more than one VM from being spawned.
Yet still, every time I fill that queue, multiple functions will spawn indiscriminately and my database will fall over.
How can I throttle the number of concurrent operations?
The problem ended up being a difference in formatting of host.json in V1 and V2 Functions. Below is the correct configuration (using Microsoft.Azure.WebJobs.Extensions.Storage at least 3.0.1). The following host.json configures a single function app to process queue messages sequentially.
{
"version":"2.0",
"extensions": {
"queues": {
"batchSize": 1,
"newBatchThreshold": 0
}
}
}
Setting App Setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT = 1 restricts a function app from dynamically scaling out beyond one instance.

Azure Queue trigger - execute one message at the time not working

I have an azure queue trigger associated to a queue, and I want to ensure that the trigger only reads and executes one message at the time. So, when the message is executed (successfuly or not) it processes the next message.
What is happening is that the queue executes one message, yet it begins to execute other message. My host.json:
"queues": {
"maxPollingInterval": 20000,
"visibilityTimeout": "00:01:00",
"batchSize": 1,
"maxDequeueCount": 5,
"newBatchThreshold": 1
}
Following instructions from MS link:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#trigger---configuration
If you want to avoid parallel execution for messages received on one
queue, you can set batchSize to 1
So it would be expected to run only one message at the time (I'm using consumption plan).
This is critical because I need to ensure that only one message it processed at the time.
Is there any setting that I could change?
Or is queue trigger not a good option to address this requirement?
Storage Queue does NOT guarantee ordering - so if needing sequential processing because order of delivery matters you need to consider Azure Service Bus and set in Function setting (host.json)
maxConcurrentCalls = 1
Even if you do the trick by setting maximum number of instances that a function app can scale to as follow, ordering is still not guaranteed with Azure Storage Queue.
WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT = 1
Microsoft documentation is not the perfect one. It's being continuously updated.
If you want to minimize parallel execution for queue-triggered functions in a function app, you can set the batch size to 1. But this setting eliminates concurrency only so long as your function app runs on a single virtual machine (VM).
If you have multiple Virtual Machines and function instances on each VM, there will be one message processed for each function instance running in each virtual machine.
This microsoft document explains concurrency on triggers.
For those coming across this question looking to debug locally and getting issue with multiple queue items making this difficult, you can add the following to your local.settings.json to override the default functionality on your machine only:
{
"IsEncrypted": false,
"Values": {
"AzureFunctionsJobHost__extensions__queues__batchSize": 1
}
}
Documentation

Do Azure Functions triggered by storage queue take single message or all messages?

If I create an Azure Function that is triggered by storage queue messages... will the system launch multiple parallel functions to reach each message from the queue or will a single function get called that reads in all available messages?
In short, are queued messages handled individually or in batches?
API-wise your function will be called once per each individual message in the queue.
But Azure Functions runtime will retrieve and process messages in batches, calling several instances of your function in parallel.
First, as Mikhail said, Azure Functions runtime retrieve and process queue messages in batches. And the default batchSize is 16 and the maximum batchSize is 32
Besides, we can do configuration for 'queue' triggers and specify/modify batchSize in host.json file.
Configuration settings for 'queue' triggers
"queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:10",
"batchSize": 16,
"maxDequeueCount": 5,
"newBatchThreshold": 8
}
It doesn't handle all the messages in one go but it supports message batching. In order to enable batching you need to make the function's input an array of the type rather than the type itself. (e.g. EventData[] rather than EventData) then the batching applies. You can set the batch size up to 32 as #Fei mentioned.
Checkout the following link talks about it briefly:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practices

Weird behaviour with Task Parallel Library Framework and Azure Instances

I need some help solving a problem involving the Task Parallel Library with Azure instances. Below is code for my Worker Role.
Whenever I upload multiple files, a request is inserted into the queue and the worker process continously process queries Queues and gets the message. Once a message is retrieved, I do some long runnning process. I used task schedulder so that mutliple request are served by multiple task instance on multiple instances.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Please help me on this problem. My requirement is only one Azure instance of one Ccre handles one task operation not by mutliple by task.
public override void Run()
{
//Step1 : Get the message from Queue
//Step 2:
Task<string>.Factory.StartNew(() =>
{
//Message delete from Queue
PopulateBlobtoTable(uri, localStoragePath);
}
catch (Exception ex)
{
Trace.WriteLine(ex.Message);
throw;
}
finally
{
}
}
return "Finished!";
})
catch (AggregateException ae)
{
foreach (var exception in ae.InnerExceptions)
{
Trace.WriteLine(exception.Message);
}
}
I'm assuming you are using Windows Azure Storage queues, which have a default invisibility timeout of 90 seconds, when using the storage client APIs. If your message is not completely processed and explicitly deleted within that time period, it will reappear on the queue.
While you can increase this invisibility timeout to up to seven days when you add the message to the queue, you should be using operations that are idempotent, meaning it doesn't matter if the message is processed multiple times. It's your job to ensure idempotence, perhaps by recording a unique id (in table storage, SQL database, etc.) associated with each message and ignoring the message if you see it a second time and you find it's already been marked complete.
You might also look at Windows Azure Queues and Windows Azure Service Bus Queues - Compared and Constrasted. You'll note Service Bus queues have some additional constructs you can use to guarantee at-most-once (and at-least-once) delivery.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Are you getting the messages via "GET" semantics? If that's the case, then what's the visibility timeout you have set for your messages. When you "GET" a message, it should become invisible to other callers (read "instances" in your case) for a particular period of time which you can specify using visibility timeout period. Check out the documentation here for this: http://msdn.microsoft.com/en-us/library/windowsazure/ee758454.aspx

Resources