Azure Function - Report a faliure - azure

I developed an azure function and deployed it in my subscription. Initially I had hard time setting up correct connection string and that lead Azure Function to fail functionally. But Azure reported it to be successful because the exception generated by code was handled.
How do you report a failure to azure function status?
The Status should be "failed" instead of passing.
The problem is, operation team would not come to know about failure unless they review each logs!

A function execution is marked as failed if there was an uncaught exception. If there was an exception, but you handled it inside the function, it's still a success for the runtime.
To mark executions as failures, don't swallow exceptions.

Related

Application Insights - Faulted Error code from Dependencies in Az blob storage

We are using Az blob storage, and it's reached few time maximum threshold. Due to this getting DNS error code in dependencies, but dependency collector updating as Faulted.
How can we avoid this Faulted error code.
Please check marked error code and share your thoughts.
• The ‘faulted’ error code that you are encountering in the ‘Application Insights’ top response codes is related to the ‘DependencyCollection’ module tracking an Exception event along with a ‘DependencyTelemetry’ in the event of client-side errors like DNS. Since you are also getting DNS error code in dependencies related to Azure blob storage reaching maximum threshold, it is a common error code related to above said scenario irrespective of whether the Azure resource is a blob storage or an APIM.
• Thus, this error code is an exception which is sent to user ‘ikey’ along with Dependency Telemetry. So, if this exception is not tracked, then the only information ‘DependencyCollector’ has is that the call failed, and ‘resultCode’ is reported as "Faulted". As a result, you should modify the result code to be more useful, before removing the actual exception.
For more detailed information regarding this ‘faulted’ error code, please refer to the below SO community thread and its comments discussions as well as also into the github community discussion forum link given below. They discuss the probable cause of this error to be the timeout for ‘GET’ request resulting in thread starvation or poor application performance or might be a lot of context switching resulting in a high threadpool count.
Azure AppInsights - Http Result code Faulted
https://github.com/microsoft/ApplicationInsights-dotnet/issues/1362#issuecomment-511488536

Azure alert on Azure Functions "Failed" metric is triggering with no apparent failures

I want an Azure Alert to trigger when a certain function app fails. I set it up as a GTE 1 threshold on the [function name] Failed metric thinking that would yield the expected result. However, when it runs daily I am getting notifications that the alert fired but I cannot find anything in the Application Insights to indicate the failure and it appears to be running successfully and completing.
Here is the triggered alert summary:
Here is the invocation monitoring from the portal showing that same function over the past few days with no failures:
And here is an application insights search over that time period showing no exceptions and all successful dependency actions:
The question is - what could be causing a Azure Function Failed metric to be registering non-zero values without any telemetry in Application Insights?
Update - here is the alert configuration
And the specific condition settings-
Failures blade for wider time range:
There are some dependency failures on a blob 404 but I think that is from a different function that explicitly checks for the existence of blobs at paths to know which files to download from an external source. Also the timestamps don't fall in the sample period.
No exceptions:
Per comment on the question by #ivan-yang I have switched the alerting to use a custom log search instead of the built-in Azure Function metric. At this point that metric seems to be pretty opaque as to what is triggering it and it was triggering every day when I ran the Azure Function with no apparent underlying failure. I plan to avoid this metric now.
My log based alert is using the following query for now to get what I was looking for (an exception happened or a function failed):
requests
| where success == false
| union (exceptions)
| order by timestamp desc
Thanks to #ivan-yang and #krishnendughosh-msft for the help

Azure Function Event Hub Trigger reliability

I'm a bit confused regarding the EventHubTrigger for Azure functions.
I've got an IoT Hub, and am using its eventhub-compatible endpoint to trigger an Azure function that is going to process and store the received data.
However, if my function fails (= throws an exception), that message (or messages) being processed during that function call will get lost. I actually would expect the Azure function runtime to process the messages at a later time again. Specifically, I would expect this behavior because the EventHubTrigger is keeping checkpoints in the Function Apps storage account in order to keep track of where in the event stream it has to continue.
The documention of the EventHubTrigger even states that
If all function executions succeed without errors, checkpoints are added to the associated storage account
But still, even when I deliberately throw exceptions in my function, the checkpoints will get updated and the messages will not get received again.
Is my understanding of the EventHubTriggers documentation wrong, or is the EventHubTriggers implementation (or its documentation) wrong?
This piece of documentation seems confusing indeed. I guess they mean the errors of Function App host itself, not of your code. An exception inside function execution doesn't stop the processing and checkpointing progress.
The fact is that Event Hubs are not designed for individual message retries. The processor works in batches, and it can either mark the whole batch as processed (i.e. create a checkpoint after it), or retry the whole batch (e.g. if the process crashed).
See this forum question and answer.
If you still need to re-process failed events from Event Hub (and errors don't happen too often), you could implement such mechanism yourself. E.g.
Add an output Queue binding to your Azure Function.
Add try-catch around processing code.
If exception is thrown, add the problematic event to the Queue.
Have another Function with Queue trigger to process those events.
Note that the downside of this is that you will loose ordering guarantee provided by Event Hubs (since Queue message will be processed later than its neighbors).
Quick fix. As retry policy would not work if down system is down for few hours. You can call Process.GetCurrentProcess().Kill(); in exception handling. This would stop the checkpoint moving forward. I have tested this with consumption based function app. You will not see anything in logs but i added email to notify that something went wrong and to avoid data loss i have killed the function instance.
Hope this helps.
Would put an blog over it and other part of workflow where I stop function in case of continuous failure on down system using logic app.

Automatic reboot whenever there's an uncaught exception in a continous WebJob

I'm currently creating a continous webjob that will do polling to an API, and then forward messages to an Azure Service Bus. I've managed to get this to work just fine, but I have one problem; what if my app crashes for whatever reason? What if there's an uncaught exception, or something goes wrong, and the app stops running. How do i get it to run again?
I created a test app, which will send a message every to the Service Bus, then on the 11th message it will crash due to an intentionally placed NullReferenceException. I did this in order to investigate behaviour whenever/if the app crashes.
What happens is that the app runs just fine for the first 10 seconds (as expected). Messages are being sent, and everything looks good. Then after the 10th second, when the exception occurs, nothing happens. No log in Azure saying there was an exception, no reboot - nothing. It just stands there as "running", but messages are no longer being sent.
How do I deal with this? It's essential that the application is able to reboot if it fails. Are there any standard ways to do this? Best practices?
Any help would be appreciated :)
It is always good to handle most of the failure scenarios in the system by ourselves rather than to let the hosting environment to react for the failures.
My suggestion would be to have a check in the code for exceptions like any try catch block in your executable script to catch different kind of failure scenarios and instead of throwing the exceptions, log it your self or take any retry operation if required.
Example, when you got a junk data to process and it failed. Then you can try to do the operation again for eg. 3 times and then finally push a log to deadletter account to manually take care of such junk inputs. And don't let the flow be stopped by throwing the exception but instead handle it your self by logging a message which needs manual intervention.
In any GUI or Web applications, if there is an exception then the flow is re initiated by user click and system will respond. But here as it a background processor, it is ideal to avoid all such control flow blockers.
Hope this would help.

Azure Worker Role generating writing unexpected error to Trace log storage

We have a worker role running in the cloud which polls an Azure CloudQueue periodically retrieving messages that a web role has put on there for us. Currently the worker role and web role are housed in the same Cloud Service application and currently we are only running one instance.
As we are testing we have our logging switched on and so the contents of the messages and other useful information appear in our cloud storage which we view using Cerebrata Azure Diagnostics Manager. (Great product btw)
DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
It all appears to work remarkably well actually, however occasionally we see a Verbose message in the trace log which simply has "Fail"as the message. The code it appears to be generated from is wrapped in a try catch so it is odd that we aren't seeing the message through those means.
It would appear that something is happening that is out of our code's control, perhaps the worker role is being restarted, or the cloud op system is detecting a major error that only it can deal with by restarting our worker role. It recovers and carries on so it is somewhat of a mystery to us what might be happening.
What we haven't ascertained yet is whether we are losing a message.
Any help would be gratefully appreciated.
Cheers
Kindo Malay
Without the stack trace it's hard to say too much, but with the logging set to verbose it's quite likely that you're seeing some internal logging from one of the dlls you're using.
For example if you run a Azure Table query that causes certain kinds of errors, the error will be logged out 3 times because the storage client library is catching the error, tracing it out and then retrying.
If the error is not being caught by your try catch block, then it's likely nothing you need to worry about.
If deliverability of queue messages is important to you, you should ensure that you make use of the visibility timeout overload of CloudQueue.GetMessage and only delete the message when you've finished processing it. You may end up processing some messages twice, but at least you will process all of them.
If your role instance is getting restarted after running for a while, it's often because your process exited due to an unhandled exception.

Resources