How to check running status and stop Durable function - azure

I want to process millions of records on-demand, which takes approximate 2-3 hours to process. I want to go serverless that is why tried durable function (first-time). I want to check, how long I can run durable function so I created 3 functions
Http function to kick start Orchestrator function
Orchestrator function
Activity function
My DurableFunction is running and emitting logs in Application Insights from last 5 days and based on my code it would take 15 more days to complete.
I want to know that how to stop Orchestrator function manually?
I can see thousands of entry in ApplicationInsights requests table for single execution, Is there any way to check how many DurableFunction running in backend? and how much time taken by single execution?
I can see some information regarding orchestrator function in "DurableFunctionHubInstance" table but as MS recommended not rely on table.

Since Durable Functions does a lot of checkpointing and replays the orchestration, normal logging might not always be very insightful.
Getting the status
There are several ways to query for the status of orchestrations. One of them is through the Azure Functions Core tools as George Chen mentioned.
Another way to query the status is by using the HTTP API of Durable Functions directly:
GET <rooturl>/runtime/webhooks/durableTask/instances?
taskHub={taskHub}
&connection={connectionName}
&code={systemKey}
&createdTimeFrom={timestamp}
&createdTimeTo={timestamp}
&runtimeStatus={runtimeStatus1,runtimeStatus2,...}
&showInput=[true|false]
&top={integer}
More info in the docs.
The HTTP API also has methods to purge orchestrations. Either a single one by ID or multiple by datetime/status.
DELETE <rooturl>/runtime/webhooks/durabletask/instances/{instanceId}
?taskHub={taskHub}
&connection={connection}
&code={systemKey}
Finally you can also manage your instances using the DurableOrchestrationClient API in C#. Here's a sample on GitHub: HttpGetStatusForMany.cs
I have written & vlogged about using the DurableOrchestrationClient API in case you want to know more about how to use this in C#.
Custom status
Small addition: it's possible to add a custom status object to the orchestration so you can add enriched information about the progress of the orchestration.
Getting the duration
When you query the status of an orchestration instance you get back a DurableOrchestrationStatus object. This contains two properties:
CreatedTime
LastUpdatedTime
I'm guessing you can subtract those and get a reasonable indication of the time it has taken.

You could manage the Durable Functions orchestration instances with Azure Functions Core Tools.
Terminate instances:
func durable terminate --id 0ab8c55a66644d68a3a8b220b12d209c --reason "It was time to be done."
Query instances with filters: you could add the parameter(runtime-status) to filter the running instances.
func durable get-instances --created-after 2018-03-10T13:57:31Z --created-before 2018-03-10T23:59Z --top 15
As for the time functions took, looks like it doesn't support. The similar parameter is the get-history.

Related

Azure durable/entity functions startup delay

I have a scenario where an azure function(http trigger) calls an orchestrator function, which calls multiple entities/activies.
The azure function then waits for the orchestrator to finish all its subtasks.
The following image shows a simplified sequence diagram.
The problem:
Calls to the orchestrator or entity function often(>20% of the time) take 15-20s to start executing(marked red in the diagram).
What I tried so far:
Switched the service plan from consumption(serverless) to premium with a minimum of 5 instances(=more than the number of activities/entities being called).
=> No cold start of new instances should occur
Called the http trigger many times in succession (but one after the other)
=> Some calls work, some don't, seems random
Gave it a few minutes between calls to the http trigger
When I run it locally with azurite, it works flawless every time
Why do the orchestator or entity functions often take so much time to start executing when hosted in Azure?
Update:
Changed maxQueuePollingInterval down to 3 seconds. No change in behavior.

Azure durable function replay behavior and time limit for normal http-trigger azure function

Because http-trigger azure function has a strict time limit for 230s, I created a http-trigger durable azure function. I find that when I trigger durable function multiple times and if the last run is not completed, the current run will continue the last run until it is finished. It is a little confused for me because I only want each run to do the task of the current run, not replay the last un-finished run. So my question is that:
Is it by design for durable function to make sure each run is completed (succeed or failed)?
Can durable function only focus on the current run just like the normal http-trigger azure function?
If 2) is not, is there any way to mitigate the time limit issue for normal http-trigger azure function?
Thanks a lot!
The function runs until it gets results that is Successfully completed or failed message.
According to Microsoft-Documentation it says,
Azure Functions times out after 230 seconds regardless of the functionTimeout setting you've configured in the settings.

Azure Durable Functions as Message Queue

I have a serverless function that receives orders, about ~30 per day. This function is depending on a third-party API to perform some additional lookups and checks. However, this external endpoint isn't 100% reliable and I need to be able to store order requests if the other API isn't available for a couple of hours (or more..).
My initial thought was to split the function into two, the first part would receive orders, do some initial checks such as validating the order, then post the request into a message queue or pub/sub system. On the other side, there's a consumer that reads orders and tries to perform the API requests, if the API isn't available the orders get posted back into the queue.
However, someone suggested to me to simply use an Azure Durable Function for the requests, and store the current backlog in the function state, using the Aggregator Pattern (especially since the API will be working find 99.99..% of the time). This would make the architecture a lot simpler.
What are the advantages/disadvantages of using one over the other, am I missing any important considerations?
I would appreciate any insight or other suggestions you have. Let me know if additional information is needed.
You could solve this problem with Durable Task Framework or Azure Storage or Service Bus Queues, but at your transaction volume, I think that's overcomplicating the solution.
If you're dealing with ~30 orders per day, consider one of the simpler solutions:
Use Polly, a well-supported resilience and fault-tolerance framework.
Write request information to your database. Have an Azure Function Timer Trigger read occasionally and finish processing orders that aren't marked as complete.
Durable Task Framework is great when you get into serious volume. But there's a non-trivial learning curve for the framework.

Timer Trigger Function running long process with Azure Function App with COMSUMPTION plan

I need to develop a process (e.g. Azure fucntion app) that will load a file from FTP once every week, and perform ETL and update to other service for a long time (100mins).
My question is that will Timer Trigger Azure Function app with COMSUMPTION plan works in this scenario, given that the max running time of Azure function app is 10 mins.
Update
My theory of using Timer trigger function with Comumption plan is that if the timer is set to wake up every 4 mins from certain period (e.g. 5am - 10am Monday only), and within the function, a status tells whether or not an existing processing is in progress. If it is, the process continues its on-going job, otherwise, the function exits.
Is it doable or any flaw?
I'm not sure what is your exact scenario, but I would consider one of the following options:
Option 1
Use durable functions. (Here is a C# example)
It will allow you to start your process and while you wait for different tasks to complete, your function won't actually be running.
Option2
In case durable functions doesn't suit your needs, you can try to use a combination of a timer triggered function and ACI with your logic.
In a nutshell, your flow should looks something like this:
Timer function is triggered
Call an API to create the ACI
End of timer function.
The service in the ACI starts his job
After the service is done, it calls an API to remove it's own ACI.
But in anyway, durable functions usually do the trick.
Let me know if something is unclear.
Good luck. :)
With Consumptions plan, the azure function can run for max 10 minutes, still, you need to configure in host.json
You can go for the App Service Plan which has no time limit. Again you need to configure function timeout property in host.json
for more seed the following tutorial
https://sps-cloud-architect.blogspot.com/2019/12/azure-data-load-etl-process-using-azure.html

Can I use a retry policy in an azure function?

I'm using event hubs to temporary store data which will first be saved to azure table storage and then indexed to elasticsearch.
I was thinking that I should do the storage saving calls in an azure function, and do the same for the elasticsearch indexing using NEST.
It is important that the data is processed, so I was thinking that I'll use Polly as a retry policy in case the elasticsearch server is failing. However, won't a retry policy potentially make the azure function expensive?
Is azure functions even the right way to go?
Yes, you can use Polly for retries inside your Azure Functions. Some further considerations:
Yes, you will pay for the retry time. But given that your Elastic Search is "mostly up", the extra price for occasional retries should not be too high.
If you want to retry saving to Table Storage too, you will have to write calls decorated with Polly yourself instead of otherwise preferred output binding
Make sure to check if order of writes is important to you and whether you should retry Table Storage writes to completion before you start writing to Elastic, or vice versa. Otherwise you can do them in parallel with async and then Task.WaitAll
The maximum execution time of a Function is 5 minutes by default, you can configure it up to 10 minutes max. If you need to handle outages longer than that, you probably need a plan B. E.g. start copying the events that are failing for longer than 4 (or 9) minutes to a dedicated Queue, and retry from there. Or disabling the Function for such periods of downtime.
Yes it is. You could use a library or better just write a simple linear backoff strategy —
like try 5 times with 5 seconds sleep in between — and do something like
context.log.error({
message: `Transient failure. This is Retry number ${retryCount}.`,
errorCode: errorCodeFromCallingElasticSearch,
errorDetails: moreContextMaybeSomeStack
});
every time you hit the retry logic so it goes to App Insights (make sure you integrate with App Insights, else you have no ops or it's completely dark ops).
You can then query for how often is it really a miss and get an idea on how well things go at the 95% percentile.
Occasionally running 10 seconds over the normal 1 second execution time for your function is going to cost extra, but probably nowhere near a full dedicated App Service Plan. If it comes close, just switch to that, it means your function is mostly on rather than off - which is still a perfectly good case for running a function.
App Insights can also trigger alerts if some metric goes haywire, like your retry count goes up to 11 for 24 hours, you probably want to know about that deviation. You'll need to send the retry count as a custom metric to trigger an alert off of it:
context.log.metric("CallElasticSearchRetryCount", retryCount);

Resources