I want to monitor consecutive exceptions.
For example if I get 'X' amount of '500' exceptions in a row, I want it to trigger an action group.
How to write this in Kusto?
I know how to monitor amount of exceptions over a 1min period but I'm a bit stuck on how to monitor consecutive exceptions.
You are looking for setting up a custom log alert on AppInsights
Here is the step by step guide on how to setup
You can use the following query with Summarize Operator
exceptions
| where timestamp >= datetime('2019-01-01')
| summarize min(timestamp) by operation_Id
Please use the query like below:
Exceptions
| summarize count() by xxx
For more details about summarize operator, refer to this article.
Related
I am trying to create dashboard of my services in Azure. I added Azure Metrics Chart of each service and later wanted to add under it specific details to operations included in service.
But when I try to get it from logs, I get much higher number of requests made. KQL:
requests
| where cloud_RoleName startswith "notificationengine"
| summarize Count = count() by operation_Name
| order by Count
And result:
Problem is with some metrics chart I get values with minimal difference or exactly same while with some like one I shown I get completely different values. I tried to modify KQL or search what might be wrong but never got anywhere.
My guess is that those are 2 different values but in that case why both are labeled as "requests" and if so what are actual differences?
I have taken an Azure Function App with 2 Http Trigger Functions with identical names starts with “HttpTrigger” and run both the functions for couple of times.
Test Case 1:
In the Logs Workspace, Requests count got for the two functions that starts with the word “HttpTrigger”:
But I have pinned the chart of only 1 Function Requests Count to the Azure Dashboard:
Probably, I believe you have written the query of requests of all the services/applications that starts with “notificationengine” but pinned only some apps/services logs-chart to the dashboard.
Test Case 2:
We use app insights for collecting service telemetry and our services are ASP NET Core running on Service Fabric Clusters. We are observing plenty of missing logs from one of our services and suspect it is due to Adaptive Sampling.
I have a few questions regarding adaptive sampling for better troubleshooting -
Is there a definitive way to know missing logs may be due to
adaptive sampling? We are running experiments to turn off adaptive
sampling or configure "MaxTelemetryItemsPerSecond" to increase this
number but we have no way of verifying these updates have an impact.
And generally what are the symptoms of missing logs due to adaptive
sampling? Alternatively, how does one determine adaptive sampling is
working for my service?
Any help is appreciated!
One way to check is to open Logs experience and examine "itemCount" property. If there is no sampling (either not configured or adaptive sampling didn't cross a threshold) then all values should be 1.
One way to think about its value is that this particular event (with itemCount = X) was randomly picked from X events. The greater itemCount value => the data is more sampled.
You can use this query:
union requests, dependencies, traces, exceptions
| where timestamp > ago(24h)
| project itemCount
| summarize count() by itemCount
| order by itemCount
Here is an example of quite heavily sampled application:
You can see it in the Logs pane in your Application Insights instance by running this query:
union requests,dependencies,pageViews,browserTimings,exceptions,traces
| where timestamp > ago(1d)
| summarize RetainedPercentage = 100/avg(itemCount) by bin(timestamp, 1h), itemType
Whenever you see RetainedPercentage less than 100 you are missing log data. More information here.
I'm using Azure Container Insights for an AKS cluster and want to filter some logs using Log Analytics and Kusto Query Language. I do it to provide a convenient dashboard and alerts.
What I'm trying to achieve is list only not ready pods. Listing the ones not Running is not enough. This can be easily filtered using kubectl e.g. following this post How to get list of pods which are "ready"?
However this data is not avaiable when querying in Log analytics with Kusto as the containerStatuses seems to be only a string
It should be somehow possible because Container Insights allow this filtering in Metrics section. However it's not fully satisfying because with metrics my filtering capabilities are much smaller.
You can do it for pods as below for last 1h.
let endDateTime = now();
let startDateTime = ago(1h);
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| where PodStatus != "Running"
| distinct Computer, PodUid, TimeGenerated, PodStatus
The efdestegul's answer was only listing not "Running" pods and I was looking for not ready ones. However this answer led me to a query which I actually needed and thank you for that. Maybe this will help others.
let timeGrain=1m;
KubePodInventory
// | where Namespace in ('my-namespace-1', 'my-namespace-2')
| summarize countif(ContainerStatus == 'waiting') by bin(TimeGenerated,timeGrain)
| order by countif_ desc
| render timechart
With this query I'm able to render a chart that displays all not ready pods in time. And actually in a very useful way, only the pods that were not ready for more than expected and they needed to be restarted. You can always filter your results for any namespaces you need.
I've added a chart using KQL and logs from Azure Log Analytics to a dashboard. I'm using make-series which works great but the catch is the following:
The logs I'm getting might not extend to the whole time range dictated by the dashboard. So basically I need access to the starttime/endtime (and time granularity) to make make-series cover the whole timerange.
e.g.
logs
| make-series
P90 = percentile(Elapsed, 90) default = 0,
Average = avg(Elapsed) default = 0
// ??? need start/end time to use in from/to
on TimeGenerated step 1m
Currently, it's not supported. There are some feedbacks about this feature: Support for time granularity selected in Azure Portal Dashboard, and Retrieve the portal time span and use it inside the kusto query.
And some people provided workarounds in the first feedback, you can give it a try.
I posted on another question on this subject - you can do a bit of a hack in your KQL to get this working: https://stackoverflow.com/a/73064218/5785878
I want to query operations like Add-MailboxPermission with FullAccess and deleted emails/calendar events to find compromised accounts (in 30m intervals).
1. How I should modify my code to show operations which fulfil both assumptions at the same time (if I change "or" to "and" then it will check both assumptions in one log)?
2. How can I modify a "count" to decrease the number of logs only to this which show min 10 in the result? Maybe there should be another function?
OfficeActivity
| where parse_json(Parameters)[2].Value like "FullAccess" or Operation contains "Delete"
| summarize Events=count() by bin(TimeGenerated, 30m), Operation, UserId
Welcome to Stack Overflow!
Yes, the logical and operator returns true only if both conditions are true. Check this doc for the query language reference.
Yes again, there is the top operator that's used to return the first N records sorted by the specified columns, used as follows:
OfficeActivity
| where parse_json(Parameters)[2].Value like "FullAccess" and Operation contains "Delete"
| summarize Events=count() by bin(TimeGenerated, 30m), Operation, UserId
| top 10 by Events asc
Additional tip:
There are limit and take operators as well that return resultset up to the specified number of rows, but with a caveat that there is no guarantee as to which records are returned, unless the source data is sorted.
Hope this helps!