Application Insights Alert Trigger History - azure

We are using application insights for sending our metrics. Based on these metrics we have alerts set via a customQuery.
The alerts are working fine. I'm expecting to pull data out of the alerts trigger and put use it for analytical purpose.
Explanation:
I have alerts A,B,C,D,E.....
During a course of period A triggered 5 times, B 3 times, D 10 times.....
Now for this course of the period to start with I'm looking at having an insight into which failure happened most frequently so appropriate action can be taken.
Where can I find this information? Not expecting the Monitor tab as it gives a very basic view.

Related

Azure alert on Azure Functions "Failed" metric is triggering with no apparent failures

I want an Azure Alert to trigger when a certain function app fails. I set it up as a GTE 1 threshold on the [function name] Failed metric thinking that would yield the expected result. However, when it runs daily I am getting notifications that the alert fired but I cannot find anything in the Application Insights to indicate the failure and it appears to be running successfully and completing.
Here is the triggered alert summary:
Here is the invocation monitoring from the portal showing that same function over the past few days with no failures:
And here is an application insights search over that time period showing no exceptions and all successful dependency actions:
The question is - what could be causing a Azure Function Failed metric to be registering non-zero values without any telemetry in Application Insights?
Update - here is the alert configuration
And the specific condition settings-
Failures blade for wider time range:
There are some dependency failures on a blob 404 but I think that is from a different function that explicitly checks for the existence of blobs at paths to know which files to download from an external source. Also the timestamps don't fall in the sample period.
No exceptions:
Per comment on the question by #ivan-yang I have switched the alerting to use a custom log search instead of the built-in Azure Function metric. At this point that metric seems to be pretty opaque as to what is triggering it and it was triggering every day when I ran the Azure Function with no apparent underlying failure. I plan to avoid this metric now.
My log based alert is using the following query for now to get what I was looking for (an exception happened or a function failed):
requests
| where success == false
| union (exceptions)
| order by timestamp desc
Thanks to #ivan-yang and #krishnendughosh-msft for the help

Azure Monitor alert is sending false failure email notifications of failure count of 1 for Functions app

I have a Functions app where I've configured signal logic to send me an alert whenever a failure greater than or equal to one has occurred in my application. I have been getting emails everyday saying my Azure Monitor alert was triggered followed by an email later saying that the failure was resolved. I know that my app didn't fail because I checked in Application Insights. For instance, I did not have a failure today, but did have a failures the prior 2 days:
However, I did receive a failure email today. If I go to configure the signal logic where I set a static threshold of failure count greater than or equal to 1 it shows this:
Why is it showing a failure for today, when I know that isn't true from the Application Insights logs? Also, if I change the signal logic to look at total failures instead of count of failures, it looks correct:
I've decided to use the total failures metric instead, but it seems that the count functionality is broken.
Edit:
Additional screenshot:
I suggest you can use Custom log search as the signal if you have already connected your function app with Application insights(I'd like to use this kind of signal, and don't see such behavior like yours).
The steps as below:
Step 1: For signal, please select Custom log search. The screenshot is as below:
Step 2: When the azure function times out, it will throw an error and the error type is Microsoft.Azure.WebJobs.Host.FunctionTimeoutException, so you can use the query below to check if it times out or not:
exceptions
| where type == "Microsoft.Azure.WebJobs.Host.FunctionTimeoutException"
Put the above query in the "Search query" field, and configure other settings as per your need. The screenshot is as below:
Then configure other settings like action group etc. Please let me know if you still have such issue.
One thing should be noted: Some kinds of triggers support retry logic, like blogtrigger. So if it reties, you can also receive the alert email. But you can disable the retry logic as per this doc.

How to fetch IIS Start log for a corresponding IIS Stop log in Azure Log Analytics outside of Alert's monitoring time period

I'm working on configuring an Azure Log Analytics alert (using KQL) to capture the IIS Stop & Start events (from Events table) in my OMS Workspace, and if the alert query finds that there's no corresponding IIS Start event log generated from a PaaS Role for a particular IIS Stop event log- the user should get notified by an alert so that he can bring IIS back up.
Problem: Let’s say I setup my alert to run over a Time Period & Frequency of 15mins. If the alert triggered at 10:30AM, that means it will scan the IIS logs from 10:15:01 AM to 10:29:59 AM. Now, suppose an IIS Stop event got logged in around 10:28 AM, then the respective IIS Start log (if any) will be logged in after a couple of minutes around 10:31AM or 10:32 AM – and hence it will go out of the alert’s monitoring time period. This will create a false positive failure scenario. (IIS got started back but my alert didn’t captured the Start event log). And thus, it might lead to some unnecessary IIS Start/Reset operations on my PaaS roles.
Attaching a representative quick sketch to explain it figuratively.
Please let me know if there's any possible approach to achieve this. Any suggestions are welcome. Thanks in advance!
Current implementation as follows.
Here we can see False Alert generated at 10:30.
You can see the below approach, where we select last 10 minutes data(Overlapped) every 5 minutes.
For the below case you can generate the alert
See if its helping you.

AppInsights - Monitor for Hung Processes

We are looking at implementing AppInsights for our non-web application. One of the things that we want to monitor for is processes that may be "hung" for more than N number of seconds or minutes. I have been unable to find something built in that does this. The closest thing I have seen or thought of would be to log 2 custom events for the start and end of a process, and then have an alert for a custom log that queries events with no matching "end" event after N minutes.
Is there another way to monitor for hung processes using AppInsights that I am not seeing? Thanks for any help.
If you choose to use application insights, here is the suggestion just for your reference(but if you have another better solution, you can ignore this):
As per this post, you can leverage heartbeat feature, details as below:
if this application runs more than several seconds, you can leverage heartbeat
feature - it sends metric every N minutes/seconds (configurable) and the absence of such
metric will indicate that application is no longer actively running. However, if
Application Insights thread survives, then heartbeat will still be reported.
You can rely on presense/absense of the telemetry from this app in general as well as
couple custom events as you outlined above - Azure Monitor allows to set an alert on
analytics query, so you'll be able to craft a query that returns nothing in case of
application issues and set an alert on 0 count returned by such a query.

How to trigger an alert notification of a long-running process in Azure Data Factory V2 using either Azure Monitor or ADF itself?

I've been trying to find the best way to trigger an alert when an ADF task (i.e. CopyActivity or Stored Procedure Task) has been running for more than N hours, I wanted to use the Azure Monitor as it is one of the recommended notification services in Azure, however I have not been able to find a "Running" criteria, hence I had to play with the available criteria (Succeeded and Failed) and check this every N hours, however this is still not perfect as I don't know when the process started and we may run the process manually multiple times a day, is there any way you would recommend doing this? like a event-based notification that listens to some time variable and as soon as it is greater than the threshold triggers an email notification?
is there any way you would recommend doing this? like a event-based
notification that listens to some time variable and as soon as it is
greater than the threshold triggers an email notification?
Based on your requirements, I suggest you using Azure Data Factory SDKs to monitor your pipelines and activities.
You could create a time trigger Azure Function which is triggered every N hours. In that trigger function :
You could list all running activities in data factory account.
Then loop them to monitor the DurationInMs Property in ActivityRun Class to check if any activity has been running for more than N hours and it's still In-Progress status.
Finally, send the email or kill the activity or do whatever you want.
I would suggest simple solution:
Kusto query for listing all pipeline runs where status is "Queued" and joining it on CorrelationId with those that we are not interested in - typically "Succeeded", "Failed". Join flavor leftanti does the job by "Returning all the records from the left side that don't have matches from the right." (as specified in MS documentation).
Next step would be to set your desired timeout value - it is 30m in the example code below.
Finally, you can configure Alert rule based on this query and get your email notification, or whatever you need.
ADFPipelineRun
| where Status == "Queued"
| join kind=leftanti ( ADFPipelineRun
| where Status in ("Failed", "Succeeded") )
on CorrelationId
| where Start < ago(30m)
I tested this only briefly, maybe there is something missing. I have an idea about adding other statuses to be removed from result - like "Cancelled".

Resources