Azure alerts not firing on rule 4xx - azure

I have created a AppService on Azure that runs on Tomcat. I'm using metrics for monitoring and set alert rule, that should send me an email, when error 4xx will occur. Nothing happens even though I've created more errors than rule needs to run alert.
Thanks, Dominik

According to your screenshot & the reference for Metric Definitions, it seems to be normal to not happen the event of sending mail, because your alert rule means the count of the HTTP 4xx event is greater than or equal to 5 times over the last 5 minutes, not average count per minute. So you can try to increase the threshold value or shorten the period to check the mail sender whether be triggered when the condition satisfied obviously. Meanwhile, if you doubt the alert trigger whether works fine, you can retrieve the logs via Azure CLI or Powershell.

Related

Azure alert on Azure Functions "Failed" metric is triggering with no apparent failures

I want an Azure Alert to trigger when a certain function app fails. I set it up as a GTE 1 threshold on the [function name] Failed metric thinking that would yield the expected result. However, when it runs daily I am getting notifications that the alert fired but I cannot find anything in the Application Insights to indicate the failure and it appears to be running successfully and completing.
Here is the triggered alert summary:
Here is the invocation monitoring from the portal showing that same function over the past few days with no failures:
And here is an application insights search over that time period showing no exceptions and all successful dependency actions:
The question is - what could be causing a Azure Function Failed metric to be registering non-zero values without any telemetry in Application Insights?
Update - here is the alert configuration
And the specific condition settings-
Failures blade for wider time range:
There are some dependency failures on a blob 404 but I think that is from a different function that explicitly checks for the existence of blobs at paths to know which files to download from an external source. Also the timestamps don't fall in the sample period.
No exceptions:
Per comment on the question by #ivan-yang I have switched the alerting to use a custom log search instead of the built-in Azure Function metric. At this point that metric seems to be pretty opaque as to what is triggering it and it was triggering every day when I ran the Azure Function with no apparent underlying failure. I plan to avoid this metric now.
My log based alert is using the following query for now to get what I was looking for (an exception happened or a function failed):
requests
| where success == false
| union (exceptions)
| order by timestamp desc
Thanks to #ivan-yang and #krishnendughosh-msft for the help

Azure Monitor alert is sending false failure email notifications of failure count of 1 for Functions app

I have a Functions app where I've configured signal logic to send me an alert whenever a failure greater than or equal to one has occurred in my application. I have been getting emails everyday saying my Azure Monitor alert was triggered followed by an email later saying that the failure was resolved. I know that my app didn't fail because I checked in Application Insights. For instance, I did not have a failure today, but did have a failures the prior 2 days:
However, I did receive a failure email today. If I go to configure the signal logic where I set a static threshold of failure count greater than or equal to 1 it shows this:
Why is it showing a failure for today, when I know that isn't true from the Application Insights logs? Also, if I change the signal logic to look at total failures instead of count of failures, it looks correct:
I've decided to use the total failures metric instead, but it seems that the count functionality is broken.
Edit:
Additional screenshot:
I suggest you can use Custom log search as the signal if you have already connected your function app with Application insights(I'd like to use this kind of signal, and don't see such behavior like yours).
The steps as below:
Step 1: For signal, please select Custom log search. The screenshot is as below:
Step 2: When the azure function times out, it will throw an error and the error type is Microsoft.Azure.WebJobs.Host.FunctionTimeoutException, so you can use the query below to check if it times out or not:
exceptions
| where type == "Microsoft.Azure.WebJobs.Host.FunctionTimeoutException"
Put the above query in the "Search query" field, and configure other settings as per your need. The screenshot is as below:
Then configure other settings like action group etc. Please let me know if you still have such issue.
One thing should be noted: Some kinds of triggers support retry logic, like blogtrigger. So if it reties, you can also receive the alert email. But you can disable the retry logic as per this doc.

How to fetch IIS Start log for a corresponding IIS Stop log in Azure Log Analytics outside of Alert's monitoring time period

I'm working on configuring an Azure Log Analytics alert (using KQL) to capture the IIS Stop & Start events (from Events table) in my OMS Workspace, and if the alert query finds that there's no corresponding IIS Start event log generated from a PaaS Role for a particular IIS Stop event log- the user should get notified by an alert so that he can bring IIS back up.
Problem: Let’s say I setup my alert to run over a Time Period & Frequency of 15mins. If the alert triggered at 10:30AM, that means it will scan the IIS logs from 10:15:01 AM to 10:29:59 AM. Now, suppose an IIS Stop event got logged in around 10:28 AM, then the respective IIS Start log (if any) will be logged in after a couple of minutes around 10:31AM or 10:32 AM – and hence it will go out of the alert’s monitoring time period. This will create a false positive failure scenario. (IIS got started back but my alert didn’t captured the Start event log). And thus, it might lead to some unnecessary IIS Start/Reset operations on my PaaS roles.
Attaching a representative quick sketch to explain it figuratively.
Please let me know if there's any possible approach to achieve this. Any suggestions are welcome. Thanks in advance!
Current implementation as follows.
Here we can see False Alert generated at 10:30.
You can see the below approach, where we select last 10 minutes data(Overlapped) every 5 minutes.
For the below case you can generate the alert
See if its helping you.

Monitoring Timer Triggered Azure Functions For Inactivity

We have a couple of functions in a function app. Two of them are triggered by a timer, do some processing and write to queues to trigger other functions.
They normally work very well until recently where the timer trigger just stopped triggering. We fixed this by restarting the application which resolved the issue. The problem is that we were completely unaware of the trigger stopping as there were no failures and the function app is not constantly 'looked at' by our people.
I'd like to configure automatic monitoring and alerting for this special case. I configured Application Insights for the function app and tried to write an alert which watches the count metric of the functions which are triggered by a timer. If the metric is below the set threshold (below 1 in the last 5 minutes) the alert should be triggered.
I tested this by just stopping the function app. My reasoning behind this was that a function app that does not run should fullfill this condition and should trigger an alert within a reasonable time frame. Unfortunately this was not the case. Apparently a non-existing count is not measured and the alert will never be triggered.
Did someone else experience a similar problem and has a way to work around this?
I've added Application Insights alert:
Type: Custom log search
Search query:
requests | where cloud_RoleName =~ '<FUNCTION_APP_NAME_HERE>' and name == '<FUNCTION_NAME_HEER>'
Alert logic: Number of results less than 1
Evaluated based on: Over last N hours, Run every M hours
Alert fires if there are no launches over last N hours.

Get messages from a queue only retrieves a single message

I've created an azure service bus and a new logic-app using a manual trigger. I then add a "Get messages from a queue (peek-lock)" action to the app and set the maximum message count to "20".
I then create 5 new messages in my queue an manually and then trigger my new logic-app. When I then look at the execution of my app, I only see that ONE message was retrieved (and checked, that 4 messages are still in my queue).
Seems like the count of "20" is not being honored. I also checked the settings of my service-bus queue and the "Maximum Delivery Count" is set to "10". This should at least give me batches of 10 (instead of 20).
What am I missing?
It is not that simple to answer without more details. Still I hope this could help.
If you are using a WebJob, make sure associated AzureWebJobsStorage is created in Classical Mode, instead of Remote Mode. That would make your WebJob crash in less than 20 seconds... not reading all queue messages.
Does your logical-app involve a ServiceBusTrigger ? Then it seems like the first call to your method marked with correct trigger fails with exception, and that other messages are not read.
Let me know if I did misunderstood some details.

Resources