Azure (Durable) Functions - Managing parallelism - azure

I'm posting this question to see if I'm understanding parallelism in Azure Functions correctly, and particularly Durable Functions.
The ability to set max degree of parallelism was recently added to Azure Functions using az cli:
https://github.com/Azure/azure-functions-host/issues/1207
az resource update --resource-type Microsoft.Web/sites -g <resource_group> -n <function_app_name>/config/web --set properties.functionAppScaleLimit=<scale_limit>
I've applied this to my Function App, but what I'm unsure of is how this plays with the MaxConcurrentOrchestratorFunctions and MaxConcurrentActivityFunctions settings for Durable Functions.
Would the below lead to a global max of 250 concurrent activity functions?
functionAppScaleLimit: 5
MaxConcurrentOrchestratorFunctions: 5
MaxConcurrentActivityFunctions: 10

Referring to the link you shared to limit scaling this functionAppScaleLimit will help you to specify the maximum number of instances for your function. Now coming to MaxConcurrentOrchestratorFunctions : sets the maximum number of orchestrator functions that can be processed concurrently on a single host instance and MaxConcurrentActivityFunctions the maximum number of activity functions that can be processed concurrently on a single host instance. Refer to this
Now, I am explaining what MaxConcurrentOrchestratorFunctions does , which would help you understand how it works:
MaxConcurrentOrchestratorFunctions controls how many orchestrator functions can be loaded into memory at any given time. If you set concurrency to 1 and then start 10 orchestrator functions, only one will be loaded in memory at a time. Remember that if an orchestrator function calls an activity function, the orchestrator function will unload from memory while it waits for a response. During this time, another orchestrator function may start. The effect is that you will have as many as 10 orchestrator functions running in an interleaved way, but only 1 should actually be executing code at a time.
The motivation for this feature is to limit CPU and memory used by orchestrator code. It's not going to be useful for implementing any kind of singleton pattern. If you want to limit the number of active orchestrations, then you will need to implement this.

Your global max of activity functions would be 50. This is based on 5 app instances as specified by functionAppScaleLimit and 10 activity functions as specified by MaxConcurrentActivityFunctions. The relationship between the number of orchestrator function executions and activity function executions depends entirely on your specific implementation. You could have 1-1,000 orchestration(s) that spawn 1-1,000 activities. Regardless, the settings you propose will ensure there are never more than 5 orchestrations and 10 activities running concurrently on a single function instance.

Related

Durable Functions could reduce my time execution?

I can execute a process "x" in parallel using Azure Functions Durable Fan In/Fan Out.
If I divide my unique process "x" in multiple process using this concept, can I reduce the execution time for the function?
In general Azure Functions Premium allow for higher timeout values. So, if you don't want to deal with the issue, just upgrade ;-)
Azure Durable Functions might or might not reduce your total runtime.
BUT every "Activity Call" is a new function execution with an own timeout.
Either Fanning out or even calling activities it in serial will prevent timeout issue as long the called activity will not extend the timeout period for functions.
If, however you have an activity which will run for an extended period, you will need premium functions anyway. But your solution with "batch" processing looks quite promising to avoid this.
Making use of the fan-out/fan-in approach, you will run tasks in parallel instead of sequentially so the duration of the execution will be the duration of your longest single task to execute. It's the best approach to use if the requests do not require information from each other to process.
You could make use of Task Asynchronous Programming (TAP) to build tasks, call relevant methods and wait for all tasks to finish if you don't want them to be on Durable Functions

What is the alternative to global variables in Azure Function Apps?

Lets say I want to have a TimerTrigger function app that executes every 10 seconds and prints an increasing count(1...2...3...),
how can I achieve this WITHOUT using environment variable?
You're already using an Azure Storage account for your function. Create a table within that storage account, and increment the counter there. This has the added benefit of persisting across function restarts.
Since you're using a TimerTrigger, it's implicit that there will only ever be one instance of the function running. If this were not the case, you could end up in a race condition with two or more instances interleaving to incorrectly increment your counter.
I suggest you look into Durable Functions. This is an extension for Azure Functions that allow state in your (orchestrator) functions.
In your case, you can have a single HTTP triggered starter function that starts a long running orchestrator function. The HTTP function passes the initial count value to the orchestrator function. You can use the Timer functionality of Durable Functions to have the orchestrator wait for the specified amount of time before continuing/restarting. After the timer expires, the count value is incremented and you can restart the orchestrator function with this new count value by calling the ContinueAsNew method.
This periodic work example is almost what you need I think. You still need to add the initial count to be read as the input, and increment it before the ContinueAsNew method is called.
If you need more details about Durable Functions, I have quite some videos that explain the concepts.

Azure Function batchSize

I am wondering about the parallel principle in Azure Function. If I have a batchSize of 32 and a threshold of 16. If the queue grow to large the Scale controller spins up a new function to withstand the pressure. I understand this bit. What I don't understand is: does a single instance work on the batch? That is, do I only have one function running pr batch, or does the runtime scale out and run multiple threads with the function?
Could I risk having two instances running, each with a 32 messages, and concurrently 32 threads running 32 functions pr once?
Imaging I have a function calling a webapi. This means that the api will get 64 calls at once which I don't want.
What I want is 2 functions working on 32 messages each making 1 call pr message pr function.
I hope you guys understand.
Yes. That is indeed how scaling works. The same is explained with a bit more details in the docs as well.
According to that, your function (one instance) could run up to 48 messages at a time (32 from a new batch + 16 from the existing batch) and could potentially scale to multiple instances depending on the queue length.
To achieve the scenario you've mentioned, you would have to
Set the batchSize to 1 to avoid parallel processing per instance
Set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to 2 to limit scale out to a max of 2 instances
Note that all 32 messages won't be loaded by either instance but will work through the queue nonetheless.

Azure Functions host.json settings per function or global?

Do the settings in host.json apply to each function individually, or apply to all of the functions as a whole?
For example, I've two functions in the same Project that both get messages from Azure ServiceBus Queues.
If I set maxConcurrentCalls to 10 in host.json, does that mean that as a whole only 10 concurrent calls to ServiceBus will be made, or that it is 10 per function, so there will be 20 concurrent calls?
Thanks in advance.
host.json file is shared for all functions of a FunctionApp. That's to say that maxConcurrentCalls value will apply to all functions of the app, as any other setting will.
The effect of maxConcurrentCalls will be independent for each function. In your example, each function will have up to 10 messages processed concurrently. If you set it to 1, there will be 1 thread working per function.
Note that maxConcurrentCalls applies per instance. If you have multiple instances running, the max concurrency increases proportionally.

azure function max execution time

I would like to have a function called on a timer (every X minutes) but I want to ensure that only one instance of this function is running at a time. The work that is happening in the function shouldn't take long, but if for some reason it takes longer than the scheduled timer (X minutes) I don't want another instance to start and the processes to step on each other.
The simplest way that I can think of would be to set a maximum execution time on the function to also be X minutes. I would want to know how to accomplish this in both the App Service and Consumption plans, even if they are different approaches. I also want to be able to set this on an individual function level.
This type of feature is normally built-in to a FaaS environment, but I am having the hardest time google-binging it. Is this possible in the function.json? Or also are there different ways to make sure that this runs only once?
(PS. I know I could this in my own code by wrapping the work in a thread with a timeout. But I was hoping for something more idiomatic.)
Timer functions already have this behavior - they take out a blob lease from the AzureWebJobsStorage storage account to ensure that only one instance is executing the timer function. Also, the timer will not execute while a previous scheduled execution is in flight.
Another roll-your-own possibility is to handle this with storage queues and visibility timeout - when the queue has finished processing, push a new queue message with visibility timeout to match the desired schedule.
I want to mention that the functionTimeout host.json property will add a timeout to all of your functions, but has the side effect that your function will fail with a timeout error and that function instance will restart, so I wouldn't rely on it in this case.
You can specify 'functionTimeout' property in host.json
https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json
// Value indicating the timeout duration for all functions.
// In Dynamic SKUs, the valid range is from 1 second to 10 minutes and the default value is 5 minutes.
// In Paid SKUs there is no limit and the default value is null (indicating no timeut).
"functionTimeout": "00:05:00"
There is a new Azure Functions plan called Premium (in public preview as of May 2019) that allows for unlimited execution duration:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale
It will probably end up the goto plan for most Enterprise scenarios.

Resources