DurableOrchestrationClient.GetStatusAsync() always null for Durable Azure Function - azure

I have a queue trigger azure function with DurableOrchestrationClient. I am able to start a new execution of my orchestration function, which triggers multiple activitytrigger functions and waits for them all to process. Everything works great.
My issue is that I am unable to check on the status of my orchestration function("TestFunction"). GetStatusAsync always returns as null. I need to know when the orchestration function is actually complete and process the return object (bool).
public static async void Run([QueueTrigger("photostodownload", Connection = "QueueStorage")]PhotoInfo photoInfo, [OrchestrationClient]DurableOrchestrationClient starter, TraceWriter log)
{
var x = await starter.StartNewAsync("TestFunction", photoInfo);
Thread.Sleep(2 * 1000);
var y = await starter.GetStatusAsync(x);
}

StartNewAsync enqueues a message into the control queuee, it doesn't mean that the orchestration starts immediately.
GetStatusAsync returns null if the instance either doesn't exist or has not yet started running. So, probably the orchestration just doesn't start yet during those 2 seconds of sleep that you have.
Rather than having a fixed wait timeout, you should either periodically poll the status of your orchestration, or send something like a Done event from the orchestration as the last step of the workflow.

Are you using function 1.0 or 2.0? A similar issue has been reported for Function 2.0 runtime on Github.
https://github.com/Azure/azure-functions-durable-extension/issues/126
Also when you say everything works great do you mean activityTrigger functions complete execution?
Are you running functions locally or is it deployed on Azure?

Related

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!
This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.
It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

Automatic retry to CosmosDb output binding

I'm using an Azure function that sends an array of around 200 documents to a CosmosDB via the Output Binding. That function gets triggered about 1000 at the same time by queue messages.
In some cases I get the "Request rate is large" error and the function execution fails. The documentation says when this error occurs, I can retry the execution in some milliseconds, but I suspect the azure function runtime is doing that for me. I couldn't find any documentation explicitly saying that when the output binding throws that exception it will retry automatically (like with the .NET Linq library).
Can someone point me out to see if this is the case?
The Output binding uses SDK 1.13.2 which already has the retry mechanism in place.
Assuming you are using Azure Functions v1, if you are using the IAsyncCollection the Function will do an UpsertDocumentAsync for each AddAsync, if you are using a single document output, then the UpsertDocumentAsync should be happening once.
In any case, the SDK retries by default 9 times on a throttled result, after that, the exception is bubbled and you Function will error; the document should go back to the queue for retrying as per the QueueTrigger design and after a couple of iterations, it goes to the deadletter queue..
If you want more granular control of the flow, you could obtain the DocumentClient and do the UpsertDocumentAsync yourself with a try/catch, if it fails more than 9 times, you can opt to send to another Queue or retry another set of times. Something like:
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
[FunctionName("CosmosDBSample")]
public static async Task<HttpResponseMessage> Run(
[QueueTrigger("my-queue")] MyPOCOClass myMessage,
[DocumentDB("test", "test", ConnectionStringSetting = "CosmosDB"] DocumentClient client,
TraceWriter log)
{
try
{
await client.UpsertDocumentAsync(myMessage);
}
catch(DocumentClientException ex)
{
// retry / queue somewhere else?
log.Warning($"DocumentClientException {ex.Message} in document {myMessage.Id}.");
}
}

How to get runtime status of queue triggered azure function?

My azure function is calculating results of certain request jobs (cca. 5s-5min) where each job has unique jobId based on the hash of the request message. Execution leads to deterministic results. So it is functionally "pure function". Therefore we are caching results of already evaluated jobs in a blob storage based on the jobId. All great so far.
Now if a request for jobId comes three scenarios are possible.
Result is in the cache already => then it is served from the cache.
Result is not in the cache and no function is running the evaluation => new invocation
Result is not in the cache, but some function is already working on it => wait for result
We do some custom table storage based progress tracking magic to tell if function is working on given jobId or not yet.
It works somehow, up to the point of 5 x restart -> poison queue scenarios. There we are quite hopeless.
I feel like we are hacking around some of already reliably implemented feature of Azure Functions internals, because exactly the same info can be seen in the monitor page in azure portal or used to be visible in kudu webjobs monitor page.
How to reliably find out in c# if a given message (jobId) is currently being processed by some function and when it is not?
Azure Durable Functions provide a mechanism how to track progress of execution of smaller tasks.
https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview
Accroding to the "Pattern #3: Async HTTP APIs" the orchestrator can provide information about the function status in form like this:
{"runtimeStatus":"Running","lastUpdatedTime":"2017-03-16T21:20:47Z", ...}
This solves my problem about finding if given message is being processed.
How to reliably find out in c# if a given message (jobId) is currently being processed by some function and when it is not?
If you’d like to detect which message is being processed and get the message ID in queue triggered Azure function, you can try the following code:
#r "Microsoft.WindowsAzure.Storage"
using System;
using Microsoft.WindowsAzure.Storage.Queue;
public static void Run(CloudQueueMessage myQueueItem, TraceWriter log)
{
log.Info($"messageid: {myQueueItem.Id}, messagebody: {myQueueItem.AsString}");
}

Why does my continuous azure webjob run the function twice?

I have created my first azure webjob that runs continously;
I'm not so sure this is a code issue, but for the sake of completeness here is my code:
static void Main()
{
var host = new JobHost();
host.CallAsync(typeof(Functions).GetMethod("ProcessMethod"));
host.RunAndBlock();
}
And for the function:
[NoAutomaticTrigger]
public static async Task ProcessMethod(TextWriter log)
{
log.WriteLine(DateTime.UtcNow.ToShortTimeString() + ": Started");
while (true)
{
Task.Run(() => RunAllAsync(log));
await Task.Delay(TimeSpan.FromSeconds(60));
}
log.WriteLine(DateTime.UtcNow.ToShortTimeString() + "Shutting down..");
}
Note that the async task fires off a task of its own. This was to ensure they were started quite accurately with the same interval. The job itself is to download an url, parse and input some data in a db, but that shouldnt be relevant for the multiple instance issue I am experiencing.
My problem is that once this has been running for roughly 5 minutes a second ProcessMethod is called which makes me have two sessions simoultaniously doing the same thing. The second method says it is "started from Dashboard" even though I am 100% confident I did not click anything to start it off myself.
Anyone experienced anything like it?
Change the instance count to 1 from Scale tab of WebApp in Azure portal. By default it is set to 2 instances which is causing it to run two times.
I can't explain why it's getting called twice, but I think you'd be better served with a triggered job using a CRON schedule (https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/#CreateScheduledCRON), instead of a Continuous WebJob.
Also, it doesn't seem like you are using the WebJobs SDK, so you can completely skip that. Your WebJob can be as simple as a Main that directly does the work. No JobHost, no async, and generally easier to get right.

Azure Triggered Webjob - Detecting when webjob stops

I am developing a triggered webjob that use TimerTrigger.
Before the webjob stops, I need to dispose some objects but I don't know how to trigger the "webjob stop".
Having a NoAutomaticTrigger function, I know that I can use the WebJobsShutdownWatcher class to handle when the webjob is stopping but with a triggered job I need some help...
I had a look at Extensible Triggers and Binders with Azure WebJobs SDK 1.1.0-alpha1.
Is it a good idea to create a custom trigger (StopTrigger) that used the WebJobsShutdownWatcher class to fire action before the webjob stops ?
Ok The answer was in the question :
Yes I can use the WebJobsShutdownWatcher class because it has a Register function that is called when the cancellation token is canceled, in other words when the webjob is stopping.
static void Main()
{
var cancellationToken = new WebJobsShutdownWatcher().Token;
cancellationToken.Register(() =>
{
Console.Out.WriteLine("Do whatever you want before the webjob is stopped...");
});
var host = new JobHost();
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
}
EDIT (Based on Matthew comment):
If you use Triggered functions, you can add a CancellationToken parameter to your function signatures. The runtime will cancel that token when the host is shutting down automatically, allowing your function to receive the notification.
public static void QueueFunction(
[QueueTrigger("QueueName")] string message,
TextWriter log,
CancellationToken cancellationToken)
{
...
if(cancellationToken.IsCancellationRequested) return;
...
}
I was recently trying to figure out how to do this without the
WebJobs SDK which contains the WebJobShutdownWatcher, this is what I
found out.
What the underlying runtime does (and what the WebJobsShutdownWatcher referenced above checks), is create a local file at the location specified by the environment variable %WEBJOBS_SHUTDOWN_FILE%. If this file exists, it is essentially the runtime's signal to the webjob that it must shutdown within a configurable wait period (default of 5 seconds for continuous jobs, 30 for triggered jobs), otherwise the runtime will kill the job.
The net effect is, if you are not using the Azure WebJobs SDK, which contains the WebJobsShutdownWatcher as described above, you can still achieve graceful shutdown of your Azure Web Job by monitoring for the shutdown file on an interval shorter than the configured wait period.
Additional details, including how to configure the wait period, are described here: https://github.com/projectkudu/kudu/wiki/WebJobs#graceful-shutdown

Resources