I am trying to debug an issue with an Azure Alert not firing. This alert should run every 30 minutes and find any devices that have not emitted a heartbeat in the last 30 minutes up to the hour. In addition, an alert should only be fired once for each device until it becomes healthy again.
The kusto query is:
let missedHeartbeatsFrom30MinsAgo = traces
| where message == “Heartbeat”
| summarize arg_max(timestamp, *) by tostring(customDimensions.id)
| project Id = customDimensions_id, LastHeartbeat = timestamp
| where LastHeartbeat < ago(30m);
let missedHeartbeatsFrom1HourAgo = traces
| where message == "Heartbeat"
| summarize arg_max(timestamp, *) by tostring(customDimensions.id)
| project Id = customDimensions_id, LastHeartbeat = timestamp
| where LastHeartbeat <= ago(1h);
let unhealthyIds = missedHeartbeatsFrom30MinsAgo
| join kind=leftanti missedHeartbeatsFrom1HourAgo on Id;
let deviceDetails = customEvents
| where name == "Heartbeat"
| distinct tostring(customDimensions.deviceId), tostring(customDimensions.fullName)
| project Id = customDimensions_deviceId, FullName = customDimensions_fullName;
unhealthyIds |
join kind=leftouter deviceDetails on Id
| project Id, FullName, LastHeartbeat
| order by FullName asc
The rules for this alert are:
When I pull the plug on a device, wait ~30 minutes, and run the query manually in App Insights, I see the device in the results data set. However, no alert gets generated (nothing shows up in the Alerts history page and no one in the Action Group gets notified). Any help in this matter would be greatly appreciated!
I can see your KQL Query take several times to execute, and it consume more resource usage to run the query.
Optimize your query to avoid more resource utilization and quick response of your query result.
Make sure your alert processing rule Status should be Enabled like below
Once it is done make sure your query result should be Greater than or equal to 1. So that the alert processing rule will check the threshold if it matches the condition the alert will fire.
Still, you get the issue alert not firing try to delete the alert and run your query in a Query Editor and try to create a New alert rule.
Related
I am using azure log analytics workspaces and are trying to write a simple query to get the exception message when a azure function fails.
This is the query I am using
union AppTraces
| union AppExceptions
| union AppRequests
| where AppRoleName has "-NEU"
| where TimeGenerated > ago(1d)
//| where Success == "false"
| order by TimeGenerated asc
| project
Success,
TimeGenerated,
AppRoleName,
message = iff(Message != '', Message, iff(InnermostMessage != '', InnermostMessage, Properties.['prop__{OriginalFormat}'])),
logLevel = Properties.['LogLevel']
| where logLevel != "Information"
The problem is that Success property is always empty and I expect it to be either true or false, I am using the Success property in other queries and it works just fine, for example as follows:
AppRequests
| project TimeGenerated, OperationName, Success, ResultCode, DurationMs, AppRoleName
| where AppRoleName has "NEU"
| where OperationName != "MinimumAppVersionHead" and OperationName != "QueueManagerHead"
| where Success != "true"
| order by TimeGenerated desc
| take 20
In the above case the Success where clause works as expected
Why is it not working in the first query?
Please check the below workaround it may help , we have tried with a simple query to check whether the success property is working or not. It works successfully using below query in logs with true and false.
As its works with second query it means you have added log analytics workspace for your function app successfully.
It seems there is an issue with "" , //| where Success == "false" instead of that try to remove the comment(//) and remove "" and use as below sample format in your query.
requests
| where success == false
| summarize failedCount=sum(itemCount), impactedUsers=dcount(user_Id) by operation_Name
| order by failedCount desc
We have tried with different output with success property as yours in the first query and getting no results . And by removing "" it works at our end.
Here are the below sample screenshots of output:-
OUTPUT OF THE GIVEN QUERY FOR FUNCTION APP FAILURE:
For more information please refer the below links:-
MS DOC| View and query your Function app logs
BLOG| Alerts on Azure Function failures
I have this query that works in Azure logs when i set the scope to the specific application insights I want to use
let usg_events = dynamic(["*"]);
let mainTable = union pageViews, customEvents, requests
| where timestamp > ago(1d)
| where isempty(operation_SyntheticSource)
| extend name =replace("\n", "", name)
| where '*' in (usg_events) or name in (usg_events)
;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension =tostring(client_CountryOrRegion)
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(user_Id) by tostring(dimension)
| extend Users = dcount_hll(hll)
| order by Users desc
| serialize rank = row_number()
| extend dimension = iff(rank > 5, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ["Country or region"] = dimension, Counts = dcount_hll(merged);
cohortedTable
but trying to use the same in grafana just gives an error.
"'union' operator: Failed to resolve table expression named 'pageViews'"
Which is the same i get in azure logs if i dont set the scope to the specific application insights resource. So my question is. how do i make it so grafana targets this specific scope inside the logs? The query jsut gets the countries of the users that log in
As far as I know, Currently, there is no option/feature to add Scope in Grafana.
The Scope is available only in the Azure Log Analytics Workspace.
If you want the Feature/Resolution, please raise a ticket in Grafana Community where all the issues are officially addressed.
I have such a query:
let start=datetime("2019-06-22T01:44:00.000");
let end=datetime("2019-06-22T07:44:00.000");
let timeGrain=5m;
let dataset1= requests
| where timestamp > start and timestamp < end ;
dataset1
| summarize Gesamt=sum(itemCount) , Durchschnittsdauer=round(avg(duration /1000),2), Instanz=dcount(cloud_RoleInstance) by Funktionsname=name
| join kind= inner
(
exceptions
| where timestamp > start and timestamp < end
| summarize Fehler=count() by Funktionsname=operation_Name
) on Funktionsname
| project Funktionsname ,Gesamt , Erfolgreich=Gesamt - Fehler, Fehler, Durchschnittsdauer
If I test it in Application insight query manager, I get data. But after I pin it to the share dashboard, and changing the Time (local and UTC) the dashboard shows me no results. Do you know how can I solve this problem?
I got it
I should change starttime and endtime to:
let start=datetime("2019-06-24 13:44:00.000Z");
let end=datetime("2019-06-24 19:44:00.000Z");
I'm trying to create a custom metric alert based on some metrics in my Application Insights logs. Below is the query I'm using;
let start = customEvents
| where customDimensions.configName == "configName"
| where name == "name"
| extend timestamp, correlationId = tostring(customDimensions.correlationId), configName = tostring(customDimensions.configName);
let ending = customEvents
| where customDimensions.configName == configName"
| where name == "anotherName"
| where customDimensions.taskName == "taskName"
| extend timestamp, correlationId = tostring(customDimensions.correlationId), configName = tostring(customDimensions.configName), name= name, nameTimeStamp= timestamp ;
let timeDiffs = start
| join (ending) on correlationId
| extend timeDiff = nameTimeStamp- timestamp
| project timeDiff, timestamp, nameTimeStamp, name, anotherName, correlationId;
timeDiffs
| summarize AggregatedValue=avg(timeDiff) by bin(timestamp, 1m)
When I run this query in Analytics page, I get results, however when I try to create a custom metric alert, I got the error Search Query should contain 'AggregatedValue' and 'bin(timestamp, [roundTo])' for Metric alert type
The only response I found was adding AggregatedValue which I already have, I'm not sure why custom metric alert page is giving me this error.
I found what was wrong with my query. Essentially, aggregated value needs to be numeric, however AggregatedValue=avg(timeDiff) produces time value, but it was in seconds, so it was a bit hard to notice. Converting it to int solves the problem,
I have just updated last bit as follows
timeDiffs
| summarize AggregatedValue=toint(avg(timeDiff)/time(1ms)) by bin(timestamp, 5m)
This brings another challenge on Aggregate On while creating the alert as AggregatedValue is not part of the grouping that is coming after by statement.
When grabbing search result using Azure Log Analytics Search REST API
I'm able to receive only the first 5000 results (as by the specs, at the top of the document), but know there are many more (by the "total" attribute in the metadata in the response).
Is there a way to paginate so to get the entire result set?
One hacky way would be to attempt to break down the desired time-range iteratively until the "total" is less than 5000 for that timeframe, and do this process iteratively for the entire desired time-range - but this is guesswork that will cost many redundant requests.
While it doesn't appear to be a way to paginate using the REST API itself, you can use your query to perform the pagination. The two key operators here are TOP and SKIP:
Suppose you want page n with pagesize x (starting at page 1), then append to your query:
query | skip (n-1) * x | top x.
For a full reference list, see https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-search-reference
Yes, skip operation is not available anymore but if you want create pagination there is still an option. You need to count total count of entries, use a simple math and two opposite sortings.
Prerequisites for this query are values: ContainerName, Namespace, Page, PageSize.
I'm using it in Workbook where these values are set by fields.
let containers = KubePodInventory
| where ContainerName matches regex '^.*{ContainerName}$' and Namespace == '{Namespace}'
| distinct ContainerID
| project ContainerID;
let TotalCount = toscalar(ContainerLog
| where ContainerID in (containers)
| where LogEntry contains '{SearchText}'
| summarize CountOfLogs = count()
| project CountOfLogs);
ContainerLog
| where ContainerID in (containers)
| where LogEntry contains '{SearchText}'
| extend Log=replace(#'(\x1b\[[0-9]*m|\x1b\[0 [0-9]*m)','', LogEntry)
| project TimeGenerated, Log
| sort by TimeGenerated asc
| take {PageSize}*{Page}
| top iff({PageSize}*{Page} > TotalCount, TotalCount - ({PageSize}*({Page} - 1)) , {PageSize}) by TimeGenerated desc;
// The '| extend' is not needed if in logs are not the annoying special characters