Durable Functions could reduce my time execution? - azure

I can execute a process "x" in parallel using Azure Functions Durable Fan In/Fan Out.
If I divide my unique process "x" in multiple process using this concept, can I reduce the execution time for the function?

In general Azure Functions Premium allow for higher timeout values. So, if you don't want to deal with the issue, just upgrade ;-)
Azure Durable Functions might or might not reduce your total runtime.
BUT every "Activity Call" is a new function execution with an own timeout.
Either Fanning out or even calling activities it in serial will prevent timeout issue as long the called activity will not extend the timeout period for functions.
If, however you have an activity which will run for an extended period, you will need premium functions anyway. But your solution with "batch" processing looks quite promising to avoid this.

Making use of the fan-out/fan-in approach, you will run tasks in parallel instead of sequentially so the duration of the execution will be the duration of your longest single task to execute. It's the best approach to use if the requests do not require information from each other to process.
You could make use of Task Asynchronous Programming (TAP) to build tasks, call relevant methods and wait for all tasks to finish if you don't want them to be on Durable Functions

Related

What is the alternative to global variables in Azure Function Apps?

Lets say I want to have a TimerTrigger function app that executes every 10 seconds and prints an increasing count(1...2...3...),
how can I achieve this WITHOUT using environment variable?
You're already using an Azure Storage account for your function. Create a table within that storage account, and increment the counter there. This has the added benefit of persisting across function restarts.
Since you're using a TimerTrigger, it's implicit that there will only ever be one instance of the function running. If this were not the case, you could end up in a race condition with two or more instances interleaving to incorrectly increment your counter.
I suggest you look into Durable Functions. This is an extension for Azure Functions that allow state in your (orchestrator) functions.
In your case, you can have a single HTTP triggered starter function that starts a long running orchestrator function. The HTTP function passes the initial count value to the orchestrator function. You can use the Timer functionality of Durable Functions to have the orchestrator wait for the specified amount of time before continuing/restarting. After the timer expires, the count value is incremented and you can restart the orchestrator function with this new count value by calling the ContinueAsNew method.
This periodic work example is almost what you need I think. You still need to add the initial count to be read as the input, and increment it before the ContinueAsNew method is called.
If you need more details about Durable Functions, I have quite some videos that explain the concepts.

Azure (Durable) Functions - Managing parallelism

I'm posting this question to see if I'm understanding parallelism in Azure Functions correctly, and particularly Durable Functions.
The ability to set max degree of parallelism was recently added to Azure Functions using az cli:
https://github.com/Azure/azure-functions-host/issues/1207
az resource update --resource-type Microsoft.Web/sites -g <resource_group> -n <function_app_name>/config/web --set properties.functionAppScaleLimit=<scale_limit>
I've applied this to my Function App, but what I'm unsure of is how this plays with the MaxConcurrentOrchestratorFunctions and MaxConcurrentActivityFunctions settings for Durable Functions.
Would the below lead to a global max of 250 concurrent activity functions?
functionAppScaleLimit: 5
MaxConcurrentOrchestratorFunctions: 5
MaxConcurrentActivityFunctions: 10
Referring to the link you shared to limit scaling this functionAppScaleLimit will help you to specify the maximum number of instances for your function. Now coming to MaxConcurrentOrchestratorFunctions : sets the maximum number of orchestrator functions that can be processed concurrently on a single host instance and MaxConcurrentActivityFunctions the maximum number of activity functions that can be processed concurrently on a single host instance. Refer to this
Now, I am explaining what MaxConcurrentOrchestratorFunctions does , which would help you understand how it works:
MaxConcurrentOrchestratorFunctions controls how many orchestrator functions can be loaded into memory at any given time. If you set concurrency to 1 and then start 10 orchestrator functions, only one will be loaded in memory at a time. Remember that if an orchestrator function calls an activity function, the orchestrator function will unload from memory while it waits for a response. During this time, another orchestrator function may start. The effect is that you will have as many as 10 orchestrator functions running in an interleaved way, but only 1 should actually be executing code at a time.
The motivation for this feature is to limit CPU and memory used by orchestrator code. It's not going to be useful for implementing any kind of singleton pattern. If you want to limit the number of active orchestrations, then you will need to implement this.
Your global max of activity functions would be 50. This is based on 5 app instances as specified by functionAppScaleLimit and 10 activity functions as specified by MaxConcurrentActivityFunctions. The relationship between the number of orchestrator function executions and activity function executions depends entirely on your specific implementation. You could have 1-1,000 orchestration(s) that spawn 1-1,000 activities. Regardless, the settings you propose will ensure there are never more than 5 orchestrations and 10 activities running concurrently on a single function instance.

What is better way to making longer delay inside a series of tasks?

I'm trying to build a workflow system, which will process a series of tasks & delays. Delay can be changed or removed from a running workflow.
What is the better way to making longer delay inside a series of tasks? (Like 3-4 months). Right now two ways are pocking around my head:
Pre-calculating & saving delay time. Setup a scheduler that will check delay repeatedly after a specific interval(1 minute maybe). This will make a lot of database queries, but the delay can be changed instantly.
Schedule a job for a delay. This can reduce a lot of database queries &, but the problem is maintaining & changing delay in these long-running jobs. Also, these jobs need to survive a server crash or restart.
Right now I'm not sure how to do it in a better way and still studying about it. If anyone has a similar experience, please share.
You can store the tasks into the database, like :
{
_id: String,
status: Enum,
executionTime: timestamp,
}
When you declare a new task, push a new entry into the DB.
At your server start, or when a new task is declared, create a setTimeout that will wake up your node.js when it's necessary.
Optimization
To avoid having X setTimeout, with X the number of task to execute. Keep only one setTimeout, with the time to wait equals to the closest task to execute.
For example, you have three task, one must run in 1 hour, one in 2 hour and one in 3 hour. Use a setTimeout of 1 hour. When it get triggered, it execute the task 1 and then look at the remaining tasks to re-run.

Azure Function with Java: configure batchSize and newBatchThreshold efficiently

I'm considering to use such a solution when Function is triggered by Queue on Java. I'm trying to understand how to configure batchSize and newBatchThreshold more efficiently. I would like to mention below what I managed to find out about it. Please correct me as soon as you find a mistake in my reasoning:
Function is executed on 1 CPU-core environment;
Function polls messages from Queue in batches with size 16 by default and executes them in parallel (right from the documentation);
so I make a conclusion that:
if messages need CPU-intensive tasks - they are executed sequentially;
so I make a conclusion that:
since processing of messages starts at the same time (when batch arrived) then processing of more last messages takes longer and longer (confirmed experimentally);
all these longer and longer processings are billable (despite Function's body execution lasts 10 times less);
so I make a conclusion that:
One should set both batchSize and newBatchThreshold to 1 for CPU-intensive tasks and can vary only for non-CPU intensive tasks (looks like only IO-intensive tasks).
Does it make sense?

Multiple Task Lists Polling Using Flow

I'm trying to figure out how utilizing SWF Flow framework, I can have my activity worker poll multiple task list. The use case is for having two different priorities for activity tasks that need to be completed.
Bouns points if someone uses glisten and can point out a way to achieve that.
Thanks!
It is not possible for a single ActivityWorker to poll on multiple task lists. The reason for such design is that each poll request can take up to a minute due to long poll. If a few such polls feed into a single threaded activity implemenation it is not clear how to deal with conflicts that arise if tasks are received on multiple task lists.
Until the SWF natively supports priority task lists the solution is to instantiate one ActivityWorker per task list (priority) and deal with conflicts yourself.

Resources