I have some difficulties understanding how azure durable orchestration functions parallelization (and scaling) works under the hood. I am referring to this official document. There it states:
Because the orchestrator and entity function instances are stateful singletons, it's important that each orchestration or entity is only processed by one worker at a time.
What exactly does "orchestration function and entity function instances are stateful singletons" mean when it comes to running multiple orchestration functions in parallel?
Let's say I have a client function which listens to an HTTP trigger and then starts a new orchestration function instance. If I trigger this client function twice, will there be two instances of the orchestration function running with two separate instance IDs in parallel, or will they run in sequence? Will each instance have its own control queue?
Or by taking this example, does CallSubOrchestratorAsync always execute on the same orchestration instance? If so, what's the benefit here since it won't be really running multiple instances in parallel? Or is "parallelization" here just referring to the process of restarting the instance and re-applying the history table based on the different input values?
The statement "Because the orchestrator and entity function instances are stateful singleton" is not really valid as you cannot run multiple instances of orchestrator functions if it's a singleton stateful. Ideally, the statement should just say "Because the orchestrator and entity function instances are stateful".
The concurrency throttles link clearly confirms that the orchestrator functions run in concurrency and the concurrency can be limited in host.json. Also, concurrency and stateful singleton are mutually exclusive. So according to me, the statement needs to be corrected.
Also, each orchestration instance may not have its own control queue but each orchestration instance or entity is assigned to a single control queue and it is ensured that only one worker processes this queue so that there won't be any duplication.
To sum up, multiple orchestrations can run in parallel, but within each orchestration, only activity functions run in parallel and the complete orchestration workflow executes serially. Hope it helps.
Related
I am new to azure durable functions. According to documents that the orchestrator functions are reliable. But I am wondering if the the starter function is reliable. Suppose that I have a http-trigger orchestrator functions. I guess when durable function framework (or something run in backend) detects that an http request matches a orchestrator function, it starts the starter function to trigger the orchestrator function. I am wondering where the starter function runs, a VM? Can the VM fail? I cannot find much doc on msdn.
Orchestrator functions reliably maintain their execution state by using the event sourcing design pattern. Instead of directly storing the current state of an orchestration, the Durable Task Framework uses an append-only store to record the full series of actions the function orchestration takes. An append-only store has many benefits compared to "dumping" the full runtime state. Benefits include increased performance, scalability, and responsiveness. You also get eventual consistency for transactional data and full audit trails and history. The audit trails support reliable compensating actions.
Durable Functions uses event sourcing transparently. Behind the scenes, the await (C#) or yield (JavaScript) operator in an orchestrator function yields control of the orchestrator thread back to the Durable Task Framework dispatcher. The dispatcher then commits any new actions that the orchestrator function scheduled (such as calling one or more child functions or scheduling a durable timer) to storage. The transparent commit action appends to the execution history of the orchestration instance. The history is stored in a storage table. The commit action then adds messages to a queue to schedule the actual work. At this point, the orchestrator function can be unloaded from memory.
When an orchestration function is given more work to do (for example, a response message is received or a durable timer expires), the orchestrator wakes up and re-executes the entire function from the start to rebuild the local state. During the replay, if the code tries to call a function (or do any other async work), the Durable Task Framework consults the execution history of the current orchestration. If it finds that the activity function has already executed and yielded a result, it replays that function's result and the orchestrator code continues to run. Replay continues until the function code is finished or until it has scheduled new async work.
Offcial doc
I want to process millions of records on-demand, which takes approximate 2-3 hours to process. I want to go serverless that is why tried durable function (first-time). I want to check, how long I can run durable function so I created 3 functions
Http function to kick start Orchestrator function
Orchestrator function
Activity function
My DurableFunction is running and emitting logs in Application Insights from last 5 days and based on my code it would take 15 more days to complete.
I want to know that how to stop Orchestrator function manually?
I can see thousands of entry in ApplicationInsights requests table for single execution, Is there any way to check how many DurableFunction running in backend? and how much time taken by single execution?
I can see some information regarding orchestrator function in "DurableFunctionHubInstance" table but as MS recommended not rely on table.
Since Durable Functions does a lot of checkpointing and replays the orchestration, normal logging might not always be very insightful.
Getting the status
There are several ways to query for the status of orchestrations. One of them is through the Azure Functions Core tools as George Chen mentioned.
Another way to query the status is by using the HTTP API of Durable Functions directly:
GET <rooturl>/runtime/webhooks/durableTask/instances?
taskHub={taskHub}
&connection={connectionName}
&code={systemKey}
&createdTimeFrom={timestamp}
&createdTimeTo={timestamp}
&runtimeStatus={runtimeStatus1,runtimeStatus2,...}
&showInput=[true|false]
&top={integer}
More info in the docs.
The HTTP API also has methods to purge orchestrations. Either a single one by ID or multiple by datetime/status.
DELETE <rooturl>/runtime/webhooks/durabletask/instances/{instanceId}
?taskHub={taskHub}
&connection={connection}
&code={systemKey}
Finally you can also manage your instances using the DurableOrchestrationClient API in C#. Here's a sample on GitHub: HttpGetStatusForMany.cs
I have written & vlogged about using the DurableOrchestrationClient API in case you want to know more about how to use this in C#.
Custom status
Small addition: it's possible to add a custom status object to the orchestration so you can add enriched information about the progress of the orchestration.
Getting the duration
When you query the status of an orchestration instance you get back a DurableOrchestrationStatus object. This contains two properties:
CreatedTime
LastUpdatedTime
I'm guessing you can subtract those and get a reasonable indication of the time it has taken.
You could manage the Durable Functions orchestration instances with Azure Functions Core Tools.
Terminate instances:
func durable terminate --id 0ab8c55a66644d68a3a8b220b12d209c --reason "It was time to be done."
Query instances with filters: you could add the parameter(runtime-status) to filter the running instances.
func durable get-instances --created-after 2018-03-10T13:57:31Z --created-before 2018-03-10T23:59Z --top 15
As for the time functions took, looks like it doesn't support. The similar parameter is the get-history.
I am wondering if a Semaphore (lock) would work in Azure functions.
I do not want two separate webjobs running at the same time. The webjobs live on the same app service plan.
Is this something I can guarantee with a semaphore? (as this enables cross process locking?)
First question: you're talking about Functions and WebJobs. Which one is it?
If your App Service Plan does any scaling, the semaphore will not work since two instances might be started on two different machines. The good thing: (for WebJobs) there's a simple solution for that.
[Singleton]
public static async Task ProcessImage([BlobTrigger("images")] Stream image)
{
// Process the image
}
In this example, only a single instance of the ProcessImage function will run at any given time. When the function is triggered by a new image being added to the images container, the runtime will first attempt to acquire the lock (blob lease). Once acquired, the lock is held (and the blob lease is renewed) for the duration of the function execution, ensuring no other instances will run. If another function instance is triggered while this function is running it will wait for the lock, periodically polling for it.
You can find more information here: Azure WebJobs SDK - Singleton
Edit:
If you're using Azure Functions: those running based on a TimerTrigger seem to run as Singletons.
The timer trigger supports multi-instance scale-out. A single instance of a particular timer function is run across all instances.
I have an Azure worker role whose job is to periodically run some code against a SQL Azure database. Here's my current code:
const int oneHour = 216000000; // milliseconds
while (true)
{
var numConversions = SaveSeedsToSQL.ConvertRemainingPotentialQueryURLsToSeeds();
SaveLogEntryToSQL.Save(new LogEntry { Count = numConversions });
Thread.Sleep(oneHour);
}
Is Thread.Sleep(216000000) the best way of programming such regular but infrequent events or is there some kind of wake-up-and-run-again mechanism for Azure worker roles that I should be utilizing?
This code works of course, but there are some problems:
You can fail somewhere and this schedule gets all thrown off. That
is important if you must actually do it at a specific time.
There is no concurrency control here. If you want something only done once,
you need a mechanism such that a single instance will perform the
work and the other instances won't.
There are a few solutions to this problem:
Run the Windows Scheduler on the role (built in). That solves problem 1, but not 2.
Run Quartz.NET and schedule things. That solves #1 and depending on how you do it, also #2.
Use future scheduled queue messages in either Service Bus or Windows Azure queues. That solves both.
The first two options work with caveats, so I think the last option deserves more attention. You can simply create a message(s) that your role(s) will understand and post it to the queue. Once the time comes, it becomes visible and your normally polling roles will see it and can work on it. The benefit here is that it is both time accurate as well as a single instance operates on it since it is a queue message. When completed with the work, you can have the instance schedule the next one and post it to the queue. We use this technique all the time. You only have to be careful that if for some reason your role fails before scheduling the next one, the whole system kinda fails. You should have some sanity checks and safeguards there.
I have two instances of a worker role.
I want to run a sub-task (on a Thread Pool thread) only on one of the Worker Role instances.
My initial idea was to do something like this:
ThreadPool.QueueUserWorkItem((o) =>
{
if (RoleEnvironment.CurrentRoleInstance.Id == RoleEnvironment.Roles[RoleEnvironment.CurrentRoleInstance.Role.Name].Instances.First().Id)
{
emailWorker.Start();
}
});
However, the above code relies on Role.Instances collection always returning the instances in the same order. Is this the case? or can the items be returned in any order?
Is there another approved way of running a task on one role instance only?
Joe, the solution you are looking for typically rely on:
either acquiring on lease (similar to a lock, but with an expiration) on a specific blob using the Blob Storage as a synchronization point between your role instances.
or queuing / dequeuing a message from the Queue Storage, which is usually the suggested pattern to delay long running operations such as sending an email.
Either ways, you need to go through the Azure Storage to make it work. I suggest to have a look at Lokad.Cloud, as we have designed this open-source framework precisely to handle this sort of situations.
If they need to be doing different things, then it sounds to me like you don't have 2 instances of a single worker role. In reality you have 2 different worker roles.
Especially when looking at the scalability of your application, processes need to be able to run on more than one instance. What happens when that task that you only want to run on one role gets large enough that it needs to scale to 2 or more role instances?
One of the benefits of developing for Azure is that you get scalability automatically if you design your app properly. If makes you work extra to get something that's not scalable, which is what you're trying to do.
What triggers this task to be started? If you use a message on Queue Storage (as suggested by Joannes) then only one worker role will pick up the message and process it and it doesn't matter which instance of your worker role does that.
So for now if you've got one worker role that's doing the sub task and another worker role that's doing everything else, just add 2 worker roles to your Azure solution. However, even if you do that, the worker role that processes the sub task should be written in such a way that if you ever scale it to run more than a single instance that it will run properly. In that case, you might as well stick with the single worker role and code for processing messages off the Queue to start your sub task.