Azure Log Analytics - Alerts Advice - azure

I have a question about azure log analytics alerts, in that I don't quite understand how the time frame works within the context of setting up an alert based on an aggregated value.
I have the code below:
Event | where Source == "EventLog" and EventID == 6008 | project TimeGenerated, Computer | summarize AggregatedValue = count(TimeGenerated) by Computer, bin_at(TimeGenerated,24h, datetime(now()))
For time window : 24/03/2019, 09:46:29 - 25/03/2019, 09:46:29
In the above the alert configuration interface insights on adding the bin_at(TimeGenerated,24h, datetime(now())) so I add the function, passing the arguments for a 24h time period. If you are already adding this then what is the point of the time frame.
Basically the result I am looking for is capturing this event over a 24 hour period and alerting when the event count is over 2. I don't understand why a time window is also necessary on top of this because I just want to run the code every five minutes and alert if it detects more than two instances of this event.
Can anyone help with this?

AFAIK you may use the query something like shown below to accomplish your requirement of capturing the required event over a time period of 24 hour.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= any(EventID) by Computer, bin(TimeGenerated, 1s)
The '1s' in this sample query is the time frame with which we are aggregating and getting the output from Log Analytics workspace repository. For more information, refer https://learn.microsoft.com/en-us/azure/kusto/query/summarizeoperator
And to create an alert, you may have to go to Azure portal -> YOURLOGANALYTICSWORKSPACE -> Monitoring tile -> Alerts -> Manager alert rules -> New alert rule -> Add condition -> Custom log search -> Paste any of the above queries under 'Search query' section -> Type '2' under 'Threshold value' parameter of 'Alert logic' section -> Click 'Done' -> Under 'Action Groups' section, select existing action group or create a new one as explained in the below mentioned article -> Update 'Alert Details' -> Click on 'Create alert rule'.
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/action-groups
Hope this helps!! Cheers!! :)

To answer your question in the comments part, yes the alert insists on adding the bin function and that's the reason I have provided relevant query along with bin function by having '1s' and tried to explain about it in my previous answer.
If you put '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 1second. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
If you put '24h' instead of '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 24hours. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
So in this case, we should not be using '24h' in bin function along with 'any' aggregation because if we use it then we would see only one occurrence of output in 24hours of timespan and that doesn't help you to find out event occurrence count using the above provided query having 'any' for aggregation. Instead you may use 'count' aggregation instead of 'any' if you want to have '24h' in bin function. Then this query would look something like shown below.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= count(EventID) by Computer, bin(TimeGenerated, 24h)
The output of this query would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time, y and z are considered as some numbers.
One other note is, all the above mentioned queries and outputs are in the context of setting up an alert based on an aggregated value i.e., setting up an alert when opting 'metric measurement' under alert logic based on section. In other words, aggregatedvalue column is expected in alert query when you opt 'metric measurement' under alert logic based on section. But when you say 'you get a count of the events' that means If i am not wrong, may be you are opting 'number of results' under alert logic based on section, which would not required any aggregation column in the query.
Hope this clarifies!! Cheers!!

Related

How to properly create once-a-day Azure Log Alert for pod errors?

I created an Azure Alert using a Query (KQL - Kusto Query Language) reading from the Log. That is, it's an Log Alert.
After a few minutes, the alert was triggered (as I expected based on my condition).
My condition checks if there are Pods of a specific name in Failed state:
KubePodInventory
| where TimeGenerated between (now(-24h) .. now())
| where ClusterName == 'mycluster'
| where Namespace == 'mynamespace'
| where Name startswith "myjobname"
| where PodStatus == 'Failed'
| where ContainerStatusReason == 'Completed' //or 'Error' or doesn't matter? (there are duplicated entries, one with Completed and one with Error)
| order by TimeGenerated desc
These errors stay in the log, and I only want to catch (alert about them) once per day (that is, I check if there is at least one entry in the log (threshold), then fire the alert).
Is the log query evaluated every time there is a new entry in the log, or is it evaluated in a set frequency?I could not find in Azure Portal a frequency specified to check Alerts, so maybe it evaluates the Alert(s) condition(s) every time there is something new in the Log?

Azure Log Analytics - Comparing Two message traces to raise an alert if one updates

I have a Function App that pulls the Azure public key usng a PowerShell script and outputs it into log analytics. I am trying to get notified if the public key updates. At the moment, I am comparing the top two results to see changers.
Does anyone know a query that could compare two message outputs to see if the output value changers? This will be to create an alert and more automation if the public key does change.
Yes this is possible, for example by using the next() function:
traces
| where message has "OUTPUT"
| top 2 by timestamp desc
| extend different = next(message) != message
| take 1
| where different == true
using next() you can lookup the value of a column of the next row. We can use that to compare with the current value of message.

Creating an alert for long running pipelines

I currently have an alert setup for Data Factory that sends an email alert if the pipeline runs longer than 120 minutes, following this tutorial: https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/. So when a pipeline does in fact run longer than the expected time, I do receive an alert however, I am also getting additional & unexpected alerts.
My query looks like:
ADFPipelineRun
| where Status =="InProgress" // Pipeline is in progress
| where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
| where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes
I received an alert email on September 28th of course saying a pipeline was running longer than the 120 minutes but when trying to find the pipeline in the Azure Data Factory pipeline runs nothing shows up. In the alert email there is a button that says, "View the alert in Azure monitor" and when I go to that I can then press "View Query Results" above the shown query. Here I can re-enter the query above and filter the date to show all pipelines running longer than 120 minutes since September 27th and it returns 3 pipelines.
Something I noticed about these pipelines is the end time column:
I'm thinking that at some point the UTC time is not properly configured and for that reason, maybe the alert is triggered? Is there something I am doing wrong, or a better way to do this to avoid a bunch of false alarms?
To create Preemptive warnings for long-running jobs.
Create activity.
Click on blank space.
Follow path: Settings > Elapsed time metric
Refer Operationalize Data Pipelines - Azure Data Factory
I'm not sure if you're seeing false alerts. What you've shown here looks like the correct behavior.
You need to keep in mind:
Duration threshold should be offset by the time it takes for the logs to appear in Azure Monitor.
The email alert takes you to the query that triggered the event. Your query is only showing "InProgress" statues and so the End property is not set/updated. You'll need to extend your query to look at one of the other statues to see the actual duration.
Run another query with the RunId of the suspect runs to inspect the durations.
ADFPipelineRun
| where RunId == 'bf461c8b-0b1e-43c4-9cdf-7d9f7ccc6f06'
| distinct TimeGenerated, OperationName, RunId, Start, End, Status
For example:

Unable to set Azure Dashboard card to use "Set in Query"

I have a query to check average response times over:
Last 24 hours
24-192 hours
The difference between them as a percentage
let requests0to24HoursAgo = requests
| where timestamp > ago(24h)
| summarize last0to24HoursAverageRequestDuration=avg(duration), id=1;
let requests24to192HoursAgo = requests
| where timestamp > ago(192h)
| where timestamp < ago(24h)
| summarize last24to192HoursAverageRequestDuration=avg(duration), id=1;
let diff = requests0to24HoursAgo
| join
requests24to192HoursAgo
on id
| extend Diff = (last0to24HoursAverageRequestDuration - last24to192HoursAverageRequestDuration) / last24to192HoursAverageRequestDuration * 100
| project
["Average response (last 0-24 hours)"]=last0to24HoursAverageRequestDuration,
["Average response (last 24-192 hours)"]=last24to192HoursAverageRequestDuration,
Diff;
diff
This works perfectly in the Logs section in Azure, but as soon as I pin the query to a dashboard, it's unable to run it with the date range "Set in query" and returns NaN for 2 of the values.
When I click "Open Editing Pane", set it to "Set in Query" and run it, it works. When I then click "Apply", it is still broken on the dashboard.
As per the documentation, In log analytics the default time range of 24 hours applied to all queries.
We have tested in our local environment, we tried overriding the time range parameter using the dashboard tile setting which didnt help the request you made looks like a feature request.
Would suggest you to submit a feedback forum & raise the same issue over Microsoft Q&A
I stumbled across the same issue when the timespan I set within the query (| where timestamp > ago(7d)) was ignored in the dashboard.
I've tested the way with the tile settings, like VenkateshDodda-MT mentioned:
Open the tile settings in the dashboard (-> Configure tile settings)
Tick Override the dashboard time settings at the tile level.
Select a greater timespan than 24h
Although there is no Set in query option like in the query editor, it would be enough to set a timespan of 30 days in your case.
I've also successfully tested it with your query.

Filter data from CustomEvent

I have data in azure Insights saved in custom events formats.
Now I need to create a dashboard page in my website that will pull data from insights and will show graphs on that data.
Questions is that how I can filter data from the customEvents based on data saved there. like based on custom events or custom data.
Provide me any resource from where I can see that how $filer, $search,$query works?
I am here https://dev.applicationinsights.io/quickstart but not looks like enough.
I tried to add filter like
startswith(customEvent/name, 'BotMessageReceived')
in https://dev.applicationinsights.io/apiexplorer/events
but it not working. is says "Something went wrong while running the query",
I have customEvents which name start with BotMessageReceived
Thanks
Dalvir
update:
There is no like operator, if you wanna use timestamp as a filter, you should use one of the three methods below:
customEvents
| where timestamp >= datetime('2018-11-23T00:00:00.000') and timestamp <=
datetime('2018-11-23T23:59:00.000')
customEvents
| where tostring(timestamp) contains "2018-12-11"
customEvents
| where timestamp between(datetime('2018-11-23T00:00:00.000') ..
datetime('2018-11-23T23:59:00.000') )
Please use this:
customEvents
| where name startswith "BotMessageReceived"
And if you use the api you metioned above, you can use:
https://api.applicationinsights.io/v1/apps/Your_application_id/query?
query=customEvents | where name startswith "BotMessageReceived"
It works at my side.

Resources