Azure Monitor metrics based on sum of existing metric values - azure

I have a network resource that only has bytes in and bytes out as metrics, I want to derive another metric with the addition of both bytesin+bytesout. Please suggest how I can add both in & out values and create an Azure Monitor Alert rule based on this new metric.

You cannot create a new metric by combining to existing ones. However, you can create an Alert Rule on a custom query. That means we can create a query like this
AzureMetrics
| where MetricName == "BitsInPerSecondTraffic" or MetricName == "BitsoutPerSecondTraffic"
| where ResourceType == "EXPRESSROUTECIRCUITS"
| summarize AggregatedValue = sum(Total) by bin(TimeGenerated, 15m)
Use that query to create an alert using the azure portal:

Related

Alerts with Azure Monitor Agent Metrics

I am using the Azure Monitor Agent (AMA) to monitor a virtual machine.
I need to make an alert if the free disk is less than 10%.
For this purpose i'm using the guest metric "disk/free_percent", with mean as type of data aggregation.
On the graph, the values on the ordinate are the percentage of free disk? Because using df command on the virtual machine i have quite different values than the ones shown on the dashboard.
I have to make an alert if free disk is below 10%. What query i have to make using "disk/free_percent" to accomplish that task?
I've tryed to use operator "lesset than", unit as "number" and thrshold value as 10.
Disk Space will be computed in GB/MB units in general.
Instead of monitoring on a percentage basis, create an alert to check if the free disk space is less than 10gb.
As discussed here in Microsoft Q&A, I tried in my environment with a few modifications accordingly and I got the expected output for disk space.
Query:
let setgbvalue = 10;
Perf
| where ObjectName == "LogicalDisk" and CounterName == "Free Megabytes"
| where InstanceName !contains "C:"
| where InstanceName !contains "_Total"
| extend FreeSpaceGB = CounterValue/1024
| summarize FreeSpace = max(FreeSpaceGB) by InstanceName
| where FreeSpace < setgbvalue
Output:
If requirement is only with percentage, then you can use computing operations like countervalue/1024 multiplied by 100.

KQL timechart visualization to show the total number of specific resource over time

I am trying to get a visualization of the total number of specific resource over time in Azure resource graph.
For example, in 2018 total number of application insights were 10, in 2019 total is 20 and so on.
This is the query but it has to problems:
1- It does not aggregate the total number of the resource
2- It does not accept render timechart
resources
| where type == "microsoft.insights/components"
| extend CreationDate = todatetime(properties.CreationDate)
| summarize count() by bin(CreationDate, 365d)
You need to add the resource to the by clause, for example if you have a column called "Resource" use this:
resources
| where type == "microsoft.insights/components"
| extend CreationDate = todatetime(properties.CreationDate)
| summarize count() by bin(CreationDate, 365d), Resource
Regarding the render timechart not working, did you check that you have data for more than 1 year? If so, can you provide more details of the app you are trying it in? Is that Application Insights?

AKS Container Insights: How to list not ready pods?

I'm using Azure Container Insights for an AKS cluster and want to filter some logs using Log Analytics and Kusto Query Language. I do it to provide a convenient dashboard and alerts.
What I'm trying to achieve is list only not ready pods. Listing the ones not Running is not enough. This can be easily filtered using kubectl e.g. following this post How to get list of pods which are "ready"?
However this data is not avaiable when querying in Log analytics with Kusto as the containerStatuses seems to be only a string
It should be somehow possible because Container Insights allow this filtering in Metrics section. However it's not fully satisfying because with metrics my filtering capabilities are much smaller.
You can do it for pods as below for last 1h.
let endDateTime = now();
let startDateTime = ago(1h);
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| where PodStatus != "Running"
| distinct Computer, PodUid, TimeGenerated, PodStatus
The efdestegul's answer was only listing not "Running" pods and I was looking for not ready ones. However this answer led me to a query which I actually needed and thank you for that. Maybe this will help others.
let timeGrain=1m;
KubePodInventory
// | where Namespace in ('my-namespace-1', 'my-namespace-2')
| summarize countif(ContainerStatus == 'waiting') by bin(TimeGenerated,timeGrain)
| order by countif_ desc
| render timechart
With this query I'm able to render a chart that displays all not ready pods in time. And actually in a very useful way, only the pods that were not ready for more than expected and they needed to be restarted. You can always filter your results for any namespaces you need.

Azure Monitor avoid false positives on VPN disconnect

We are using Azure Monitor to monitor if our Virtual Network Gateway S2S VPN connections disconnects (we have a few connections in each environment), but we would like to reconfigure so that we only get alerts if the connection been down for more than one minute to avoid alerts when the tunnel is reset.
Today we are using this log analytics query which creates false alerts, do you have any suggestions how we can create this
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| order by TimeGenerated
Here is an example of what we don't want to trigger an alert. Note that just excluding the GlobalStandby change events won't do it since its not guaranteed that the tunnel connects again.
Configuration in Azure Monitor:
Using Log Analytics I came up with this query that will check the next line in the log to see if its Connected or not and compare the timespan between them.
AzureDiagnostics | serialize
| where Category == "TunnelDiagnosticLog"
| where TimeGenerated < ago(120s) and TimeGenerated > ago(600m)
| extend Result = iif(
(OperationName == "TunnelDisconnected"
and next(OperationName) == "TunnelConnected"
and next(TimeGenerated)-TimeGenerated < 1m)
or OperationName == "TunnelConnected", 0, 1)
| project TimeGenerated,
OperationName,
next(OperationName),
Result,
next(TimeGenerated)-TimeGenerated,
Resource,
ResourceGroup,
_ResourceId
| project-rename Downtime=Column2, NextStatus=Column1
| sort by TimeGenerated asc
| where OperationName == "TunnelDisconnected" and Result == 1
You can try creating Metric measurement log alert with AggregatedValue as count of disconnections aggregated by column with values GatewayTenantWorker... (and any other column as needed) and binned per minute in your log query and configure the alert with threshold as 0 (for any disconnections) and trigger based on consecutive breaches greater than 1 (for more than 1 minute, or 2 for more than 2 minutes (to reduce even more false alerts)).
This should fire an alert when there are any disconnections for more than 1 (or 2) minute(s) in any of the VPN connections.
Assumptions about the data -
Tunnel resets are resolved within a minute.
In case of actual long disconnection, there would be log for current status (Disconnected) per minute. Above solution works only in this case.
If assumptions do not hold true, information about log data pattern in case of long disconnection is needed.

How to monitor consecutive exceptions in Azure? (Kusto)

I want to monitor consecutive exceptions.
For example if I get 'X' amount of '500' exceptions in a row, I want it to trigger an action group.
How to write this in Kusto?
I know how to monitor amount of exceptions over a 1min period but I'm a bit stuck on how to monitor consecutive exceptions.
You are looking for setting up a custom log alert on AppInsights
Here is the step by step guide on how to setup
You can use the following query with Summarize Operator
exceptions
| where timestamp >= datetime('2019-01-01')
| summarize min(timestamp) by operation_Id
Please use the query like below:
Exceptions
| summarize count() by xxx
For more details about summarize operator, refer to this article.

Resources