I have a stateful metric alert which triggers a logic app when the maximum count of a dead-letter queue/topic is greater than 5
(this is a standard metric alert, not a custom one)
The alert is configured with:
Evaluation frequency 5min
Aggregation timeframe 5min
Alert overview:
Problem
Azure monitor executes the action group and triggers the logic app even when the condition is NOT met. Navigating to the Logic app and inspecting the http trigger's payload:
I've even seen values of metricValue: 0 why would that be the case?
Related
How do you create an alert for when an Azure Function on the Consumption plan has timed out after 10 minutes? There used to be a signal in Application Insights called "Failures" that indicated this but it looks like Microsoft removed it. My old signal was whenever the total count of Failures was > 0 it would send me an email, but I don't see anything comparable anymore.
Open your application insights, click on Alerts and Click on Create alert.
There you can select Custom Log Search in condition as below:
Then you use the below query:
requests
| where duration > 1000
Then you can create alert over there, This is one way of getting alerts using Time Duration.
Is there a way to create an alert policy in GCP Monitoring that triggers an incident if there are more than 50 tasks in a Cloud Tasks queue with a retries value greater than 0?
To answer your question, it's not possible to combine each metric to work with each other as you want it. Each condition is separate from each other as metric-based alerting policies are used to track metric data collected by Cloud Monitoring. Cloud Tasks is also currently not a monitored resource for custom metrics.
Cloud Tasks currently have the following metrics relevant to your use case:
Queue depth
Task attempt count
You can set multiple conditions per policy and each condition can relate to a Cloud Tasks metric. What will happen is that an incident will be created if there are more than 50 tasks on a queue, or/and if one of your tasks have a retry higher than 0.
For more information, feel free to check Introduction to Alerting.
I have implemented an Azure alert that should fire when a Application Insights metric is greater than zero. The metric is the number of items in a Azure poison queue. The metric is calculated by a Azure Function described in this article: https://www.scaling-to-the-sky.com/2018/03/08/poison-queue-monitoring-with-azure-functions/?unapproved=160&moderation-hash=072116753136d2008f5e63a856d8e4b0#comment-160.
The alert has only fired once despite the condition being met on several occasions. I don't know why it doesn't fire. I have noticed that on the one alert that was fired the monitor condition has never changed from "Fired" to "Resolved". Maybe that is the reason why no need new alerts are fired? If that is the case then how do I change the state of the alert monitor condition?
A metric alert that's in a "Fired" state would not trigger again until it's resolved. This is done to reduce noise. Resolution happens automatically after 3 healthy evaluations of your condition (evaluations where the condition isn't breached), and there's no way to manually change the monitor condition to "Resolved".
Can you please confirm if you are sending a metric value on every evaluation of the poison queue, even if the value is 0?
Metric alerts are stateful by default, so other alerts aren't fired if there's already a fired alert on a specific time series.
To make a specific metric alert rule stateless and get alerted on every evaluation1 in which the alert condition is met, use one of these options:
If you create the alert rule programmatically, for example, via Azure Resource Manager, PowerShell, REST, or the Azure CLI, set the autoMitigate property to False.
If you create the alert rule via the Azure portal, clear the Automatically resolve alerts option under the Alert rule details section.
Check out the details here:
Make metric alerts occur every time my condition is met
I would like to setup following alerts for domain topics when
Delivery Failed Events (at domain) exceed x in y amount of time
Delivery Failed Events (at domain topic 1) exceed x in y amount of time
Delivery Failed Events (at domain topic 2) exceed x in y amount of time
The reason why I want the domain topic granularity is that topic 1 customer may be fine but topic 2 customer may be having issues. So customer (for topic 2) is down currently and is in extended outage period (that may last more than a day). So I want to be able to disable the alert for topic 2 only and would like to enable it once customer (for topic 2) is up and running again. Meanwhile, I want to have all other topic level alerts enabled.
I did not see a way to configure the above in the portal. Is it possible (or not) to configure above at this time in any other way? If so, can please provide the direction on how to achieve it?
The AEG provides durable delivery for each event message at least once to each subscriber based on its subscription. More details can be found in the docs.
In the case, when the AEG can not successfully deliver a message after retrying, the dead-lettering feature (configured for each subscriber) can be used for notification and/or analyzing process via a storage eventing, where a dead-letter message is stored.
On the publisher side, the publisher received a standard Http response from the event domain endpoint immediately after its posting, see more details in the docs.
The current version of the AEG is not integrated to the Diagnostic settings (for instance, like it is done for Event Hubs) which will be enabled to push the metrics and/or logs to the stream pipeline for their analyzing process.
However, as a workaround for that, the Azure Monitoring REST API can help you.
Using Lists the metrics values for event domain, we can obtained the metrics for topics such as Publish Succeeded, Publish Failed and Unmatched.
the following is an example of the REST Get:
https://management.azure.com/subscriptions/{myId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1M&aggregation=none&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,DroppedEventCount
Based on the polling technique, you can push the event domain metrics values to the stream pipeline for their analyzing, monitoring, alerting, etc. using an Azure Stream Analytics job. Your management requirements (for instance, publisher_topic1 is disabled, etc.) can be referenced to the input stream job.
Note, that the event domain metrics didn't give a topic granularity and also there is no an activity event log at that level. I do recommend to use the AEG feedback page.
Is there a way to monitor the number of times a runbook has been called and then report on it (send email, text)? When I try to create an alert rule I only see an option for activity log not metrics. The runbook is getting called from event grid via webhook.
You can use automation runbooks with three alert types:
Classic metric alerts - Sends a notification when any platform-level metric meets a specific condition. For example, when the value for CPU % on a VM is greater than 90 for the past 5 minutes.
Activity log alerts
Near real-time metric alerts - Sends a notification faster than metric alerts when one or more platform-level metrics meet specified conditions. For example, when the value for CPU % on a VM is greater than 90, and the value for Network In is greater than 500 MB for the past 5 minutes.
When an alert calls a runbook, the actual call is an HTTP POST request to the webhook. The body of the POST request contains a JSON-formated object that has useful properties that are related to the alert.
This Microsoft documentation link might help for metrics alerts for runbook :
https://learn.microsoft.com/en-us/azure/automation/automation-create-alert-triggered-runbook
You can send your Azure Automation runbook status data to Log Analytics. From there, you can alert on the different states. This documentation should help you with this process: https://azure.microsoft.com/en-us/updates/send-your-runbook-job-status-and-job-streams-from-automation-to-log-analytics-oms/
There are multiple answers to this question although none of them are a perfect solution. What I ended up doing was putting a logic app in front of the runbook that then calls the runbook. This allows me to alert on the metrics of my logic app.
We have just added support for Monitoring runbooks and having Alerts on it. Please go to the Alerts experience in Azure Monitor where you should be able to choose an Automation Account and then a Runbook as a dimension and then perform actions based on number of jobs of the Runbook etc.