Azure Functions notification on failure - azure

I have timer-triggered Azure functions running in production, but now I want to be notified if the function fails.
In my case, access to various connected services can cause crashes, and there are many to troubleshoot. The crash is the type of error I need notification for.
When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
I know that blob and queue bindings, for instance, support the creation of poison queue entries, but timer trigger binding doesn't say anything about any trigger outputs of that nature.
I see that functions can pass their $return status as input to other functions, but that operation is not explained in depth in the docs. Also, in that case, I need to write another function to process the error status, and I was looking for something built-in.
I Have inquired with #AzureSupport on this, but their answer had nothing to do with Azure Functions, instead referring me to DLL notification hooks, then recommending I file on uservoice.
I'm sure there must be people here who have implemented some sort of error status notification. I prefer a solution that doesn't require code.

The recommended way to monitor and alert on failures is to use AppInsights which integrates fully with Azure Functions now
https://blogs.msdn.microsoft.com/appserviceteam/2017/04/06/azure-functions-application-insights/
Since all the logs are available in AppInsights it's easy to monitor for failures and setup alerts based on your own criteria.
However, if you only care about alerting and not things like monitoring etc, you could use Azure Monitor instead: https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-get-started

When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
...
I prefer a solution that doesn't require code.
This is a zero-code solution:
I poked #AzureFunctions once before on this topic, and a suggested response was to use Application Insights. It can handle the alerts upon failure and also can use webhooks.
See the Azure Functions App-Insights documentation on how to link your function app to App Insights. Then set up any alerts you want.

Unfortunately this hook doesn't exist.
Can you switch from a timer trigger to a queue trigger?
You can get retries (if you want them), and after the specified number of attempts the message is sent to a poison queue.
To schedule executions you can add queue messages with a visibility timeout to match your schedule.

In order to get alerts on failure you have two options:
A timer trigger than scans the execution logs (via SFTP) for failures.
Wrap the whole function in a try/catch block and in the catch block write a few lines to send you an email with the error details.
Hope this helps.

No code:
Go to your azure cloud account
From the menu select Monitor
Then select Add New Rule
Then Select your condition, action and add the alert details.

Related

Is there a way to programmatically restart an azure function

I have an Azure function running on a timer every few minutes that after a varied amount of time of running will begin to fail every time it runs because of an external API and hitting the restart button manually in the azure portal fixes the problem and the job works again.
Is there a way to either get an azure function to restart itself or have something externally restart an azure function via a web hook or API request or running on a timer
I have tried using Azures API Management service which can be used to restart other kinds of app services in azure but it turns out there is no functionality in the API to request a restart of an azure function, Also looked into power shell and it seems to be the same problem you can restart different app services but not azure functions
i have tried working with the API
https://learn.microsoft.com/en-us/rest/api/azure/
Example API request where you can list functions within an azure function
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Web/sites/{name}/functions?api-version=2016-08-01
but there is no functionality to restart an azure function from what i have researched
Basically i want to Restart the Azure function as if i was to hit this button
Azure functions manual stop/start and restart buttons in azure portal
because there is a case where the job gets into a bad state every time it runs because of an external API i have no control over and hitting restart manually gets the job going again
Another way to restart your function is by using the "watchDirectories" setting in the host.json file. If your host.json looks like this:
{
"version": "2.0",
"watchDirectories": [ "Toggle" ]
}
You could toggle a restart by using following statement in a function:
System.IO.File.WriteAllText("D:/home/site/wwwroot/Toggle/restart.conf", DateTime.Now.ToString());
Looking at the logs, the function reloads as it has detected the file change in the directory:
Watched directory change of type 'Changed' detected for 'D:\home\site\wwwroot\Toggle\restart.conf'
Host configuration has changed. Signaling restart
Azure functions by their nature are called upon an event. That may be a timer, a trigger or invocation like a HTTP event. They cannot be restarted per se, i.e. if you a function throws and exception, you cannot find the specific instance and re-run it using the out of the box functionality.
However, you can engineer your way to a more reliable solution:
Replay the event that invoked the function (i.e. kick it off again)
For non-sensitive data, log the payload of the function and create a another function that can be called on demand to re-run it. I.e. you create a proxy to "re-invoke" the function.
Harden your code by implementing a retry policy. See Polly.
Add a service bus in to your architecture. Have a simple function to write the call payload to a message bus payload. Have another function to pick up the payload and process it more extensively where there may be unreliable integrations etc). That way if the call fails you can abandon and dead letter failures for later reprocessing.
Consider using Durable Function Extensions and leveraging the durable patterns, these can help make your functions code more robust and manage state.
Why don't you try below ARM API. Since Azure function also fall under App service category, sometimes this may be helpful,
https://learn.microsoft.com/en-us/rest/api/appservice/webapps/restart

Azure Function Event Hub Trigger reliability

I'm a bit confused regarding the EventHubTrigger for Azure functions.
I've got an IoT Hub, and am using its eventhub-compatible endpoint to trigger an Azure function that is going to process and store the received data.
However, if my function fails (= throws an exception), that message (or messages) being processed during that function call will get lost. I actually would expect the Azure function runtime to process the messages at a later time again. Specifically, I would expect this behavior because the EventHubTrigger is keeping checkpoints in the Function Apps storage account in order to keep track of where in the event stream it has to continue.
The documention of the EventHubTrigger even states that
If all function executions succeed without errors, checkpoints are added to the associated storage account
But still, even when I deliberately throw exceptions in my function, the checkpoints will get updated and the messages will not get received again.
Is my understanding of the EventHubTriggers documentation wrong, or is the EventHubTriggers implementation (or its documentation) wrong?
This piece of documentation seems confusing indeed. I guess they mean the errors of Function App host itself, not of your code. An exception inside function execution doesn't stop the processing and checkpointing progress.
The fact is that Event Hubs are not designed for individual message retries. The processor works in batches, and it can either mark the whole batch as processed (i.e. create a checkpoint after it), or retry the whole batch (e.g. if the process crashed).
See this forum question and answer.
If you still need to re-process failed events from Event Hub (and errors don't happen too often), you could implement such mechanism yourself. E.g.
Add an output Queue binding to your Azure Function.
Add try-catch around processing code.
If exception is thrown, add the problematic event to the Queue.
Have another Function with Queue trigger to process those events.
Note that the downside of this is that you will loose ordering guarantee provided by Event Hubs (since Queue message will be processed later than its neighbors).
Quick fix. As retry policy would not work if down system is down for few hours. You can call Process.GetCurrentProcess().Kill(); in exception handling. This would stop the checkpoint moving forward. I have tested this with consumption based function app. You will not see anything in logs but i added email to notify that something went wrong and to avoid data loss i have killed the function instance.
Hope this helps.
Would put an blog over it and other part of workflow where I stop function in case of continuous failure on down system using logic app.

Azure iothub can't triggering multiple Azure functions

I have azure-iot-hub
Spectraqual-free.azure-devices.net
triggering a function-app
https://spectralqual.azurewebsites.net -consumption plan-
as event hub triggers its functions, and have bindings with azure-table-storage
https://spectralqualstorage.table.core.windows.net/
The problem is that the functions not work smoothly, something goes wrong, from the functions log I can see that some of my functions triggered but didn't receive eventhubmessage and accordingly not updating the bind table storage,
others not triggered at all after it was triggered successfully before,
I developed those functions first with Visual studio code, after debugging and make sure it's free of error, I pushed them to the cloud using github sync, functions that was working -triggered- before on visual studio code, I can't make it triggered again on VScode, and I'm sure that iot-hub is receiving my messages, so the messages are delivered but no trigger happen, any help please!
If multiple Functions are bound to the same Event Hub, and you want each Function to process all events of that Hub, you have to put each Function to a dedicated Consumer Group.
Otherwise, Functions will compete for events, and the one which locks a given partition at a given time will be the only Function to get events from that partition.
Read more about Consumer Groups in docs. Use consumerGroup property of the binding to set its value for each Function.

Can i execute an on-demand web job from a scheduled webjob?

I need to execute a long running webjob on certain schedules or on-demand with some parameters that need to be passed. I had it in a way where the scheduled webjob would put a message on the queue with the parameters and a queue message triggered job would take over - OR - some user interaction would put the same message on the queue with the parameters and the triggered job would take over. However for some reason the triggered function never-finishes - and right now i cannot see any exceptions being displayed in the dashboard outputs (see Time limit on Azure Webjobs triggered by Queue)
I m looking into whether I can execute my triggered webjob as an On-demand webjob and pass the parameters to it? Is there anyway to call an on-demand web job from a scheduled web job and pass it some command line parameters?
Thanks for your help!
QueueTriggered WebJob functions work very well when configured properly. Please see my answer on the other question which points to documentation resources on how to set your WebJobs SDK Continuous host up properly.
Queue messaging is the correct pattern for you to be using for this scenario. It allows you to pass arbitrary data along to your job, and will also allow you to scale out to multiple instances as needed when your load increases.
You can use the WebJobs Dashboard to invoke your job function directly (see "Run Function" button below) - you can specify the queue message input directly in the Dashboard as a string. This allows you to invoke the function directly as needed with whatever inputs you want, in addition to allowing the function to continue to respond to qeueue messages actually added to the queue.

Requeue or delete messages in Azure Storage Queues via WebJobs

I was hoping if someone can clarify a few things regarding Azure Storage Queues and their interaction with WebJobs:
To perform recurring background tasks (i.e. add to queue once, then repeat at set intervals), is there a way to update the same message delivered in the QueueTrigger function so that its lease (visibility) can be extended as a way to requeue and avoid expiry?
With the above-mentioned pattern for recurring background jobs, I'm also trying to figure out a way to delete/expire a job 'on demand'. Since this doesn't seem possible outside the context of WebJobs, I was thinking of maybe storing the messageId and popReceipt for the message(s) to be deleted in Table storage as persistent cache, and then upon delivery of message in the QueueTrigger function do a Table lookup to perform a DeleteMessage, so that the message is not repeated any more.
Any suggestions or tips are appreciated. Cheers :)
Azure Storage Queues are used to store messages that may be consumed by your Azure Webjob, WorkerRole, etc. The Azure Webjobs SDK provides an easy way to interact with Azure Storage (that includes Queues, Table Storage, Blobs, and Service Bus). That being said, you can also have an Azure Webjob that does not use the Webjobs SDK and does not interact with Azure Storage. In fact, I do run a Webjob that interacts with a SQL Azure database.
I'll briefly explain how the Webjobs SDK interact with Azure Queues. Once a message arrives to a queue (or is made 'visible', more on this later) the function in the Webjob is triggered (assuming you're running in continuous mode). If that function returns with no error, the message is deleted. If something goes wrong, the message goes back to the queue to be processed again. You can handle the failed message accordingly. Here is an example on how to do this.
The SDK will call a function up to 5 times to process a queue message. If the fifth try fails, the message is moved to a poison queue. The maximum number of retries is configurable.
Regarding visibility, when you add a message to the queue, there is a visibility timeout property. By default is zero. Therefore, if you want to process a message in the future you can do it (up to 7 days in the future) by setting this property to a desired value.
Optional. If specified, the request must be made using an x-ms-version of 2011-08-18 or newer. If not specified, the default value is 0. Specifies the new visibility timeout value, in seconds, relative to server time. The new value must be larger than or equal to 0, and cannot be larger than 7 days. The visibility timeout of a message cannot be set to a value later than the expiry time. visibilitytimeout should be set to a value smaller than the time-to-live value.
Now the suggestions for your app.
I would just add a message to the queue for every task that you want to accomplish. The message will obviously have the pertinent information for processing. If you need to schedule several tasks, you can run a Scheduled Webjob (on a schedule of your choice) that adds messages to the queue. Then your continuous Webjob will pick up that message and process it.
Add a GUID to each message that goes to the queue. Store that GUID in some other domain of your application (a database). So when you dequeue the message for processing, the first thing you do is check against your database if the message needs to be processed. If you need to cancel the execution of a message, instead of deleting it from the queue, just update the GUID in your database.
There's more info here.
Hope this helps,
As for the first part of the question, you can use the Update Message operation to extend the visibility timeout of a message.
The Update Message operation can be used to continually extend the
invisibility of a queue message. This functionality can be useful if
you want a worker role to “lease” a queue message. For example, if a
worker role calls Get Messages and recognizes that it needs more time
to process a message, it can continually extend the message’s
invisibility until it is processed. If the worker role were to fail
during processing, eventually the message would become visible again
and another worker role could process it.
You can check the REST API documentation here: https://msdn.microsoft.com/en-us/library/azure/hh452234.aspx
For the second part of your question, there are really multiple ways and your method of storing the id/popReceipt as a lookup is a possible option, you can actually have a Web Job dedicated to receive messages on a different queue (e.g plz-delete-msg) and you send a message containing the "messageId" and this Web Job can use Get Message operation then Delete it. (you can make the job generic by passing the queue name!)
https://msdn.microsoft.com/en-us/library/azure/dd179474.aspx
https://msdn.microsoft.com/en-us/library/azure/dd179347.aspx

Resources