We have a requirement to get status of windows service when it is started and stopped do that I have returned one query, but I am facing issue when joining 2 tables to get output.
I have tried using inner and left outer joins but still getting duplicates
Event
| where EventLog == "System" and EventID == 7036 and Source == "Service Control Manager"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| where Windows_Service_State == "running" and Windows_Service_Name == "Microsoft Monitoring Agent Azure VM Extension Heartbeat Service"
| extend startedtime = TimeGenerated
| join (
Event
| where EventLog == "System" and EventID == 7036 and Source == "Service Control Manager"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| where Windows_Service_State == "stopped" and Windows_Service_Name == "Microsoft Monitoring Agent Azure VM Extension Heartbeat Service"
| extend stoppedtime = TimeGenerated
) on Computer
| extend downtime = startedtime - stoppedtime
| project Computer, Windows_Service_Name,stoppedtime , startedtime ,downtime
| top 10 by Windows_Service_Name desc
we want to get no of times that service started and stopped if the service restarted multiple times in a day we are getting duplicate timings in starttime when joining please have a look on link (https://ibb.co/JzqxjC0)
I am not sure I fully understand what is going on, since I don't have access to the data. But. I can see you are using the default join flavor.
The default is inner unique:
The inner-join function is like the standard inner-join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.
Which means a new line in the result is created on every match between the left and the right side. Therefore. let's assume you have a computer that was restarted twice, so it has 2 lines of stopped, and 2 lines of running. That will produce 4 rows in Kusto answer.
Looking at your picture, it makes sense to me because you have lines with negative downtime. I guess that is not possible.
What I would do, is look for an identifier that is unique on every Computer run. Then you can join on that, and stay safe not to generate data that you don't want.
Related
I created an Azure Alert using a Query (KQL - Kusto Query Language) reading from the Log. That is, it's an Log Alert.
After a few minutes, the alert was triggered (as I expected based on my condition).
My condition checks if there are Pods of a specific name in Failed state:
KubePodInventory
| where TimeGenerated between (now(-24h) .. now())
| where ClusterName == 'mycluster'
| where Namespace == 'mynamespace'
| where Name startswith "myjobname"
| where PodStatus == 'Failed'
| where ContainerStatusReason == 'Completed' //or 'Error' or doesn't matter? (there are duplicated entries, one with Completed and one with Error)
| order by TimeGenerated desc
These errors stay in the log, and I only want to catch (alert about them) once per day (that is, I check if there is at least one entry in the log (threshold), then fire the alert).
Is the log query evaluated every time there is a new entry in the log, or is it evaluated in a set frequency?I could not find in Azure Portal a frequency specified to check Alerts, so maybe it evaluates the Alert(s) condition(s) every time there is something new in the Log?
I have a chart in azure monitor (app insight to be exact) essentially I have 5 servers 2 of which are used by client B and 3 by client C. what I want to all server displayed but a drop down option so client B or c can be chosen
At the moment I have two charts which show the servers stood alone and another combined
With the following query
Perf
|extend iif(Computer =="X","B",iif(Computer =="Y","B","C")) | where CounterName == "% Processor Time" | where ObjectName == "Processor" | where Computer contains "SQL" | summarize avg(CounterValue) by bin(TimeGenerated, 5min), iif(Computer =="X","B",iif(Computer
=="Y","B","C")) // bin is used to set the time grain to 15 minutes | render timechart
A solution that may work for you
As suggested by Peter Bons and after testing in our local environment.
In your application insights create a workbook with a new parameter
Pick "dropdown" as the parameter type
Pick "query" as "get data from" option
Set your data source from where you are getting your data.
The below query is for getting the list of subscriptions you can use your own query for getting the list of servers
ResourceContainers | where type =~ "microsoft.resources/subscriptions"
// add any other filters you want here
| project id, name, group=tenantId
For further information you can go through the Microsoft Document.
I would like to have query that would return something like for single vm. So query should be showing results of single vm and what kinda log type / solutions it has used and how much.
I don't know if this is even possible to do anything similar maybe? Tips?
With this query I'm able to list total usage for all vm's reporting to laws but I would like to have more details about a single vm
find where TimeGenerated > ago(30d) project _BilledSize, _IsBillable, Computer
| where _IsBillable == true
| extend computerName = tolower(tostring(split(Computer, '.')[0]))
| summarize BillableDataBytes = sum(_BilledSize) by computerName
| sort by BillableDataBytes nulls last
Mostly you would be able to accomplish it by querying standard columns or properties _BilledSize, Type, _IsBillable and Computer.
Below is the sample query for your reference:
union withsource=tt *
| where TimeGenerated between (ago(7d) .. now())
| where _IsBillable == true
| where isnotempty(Computer)
| where Computer == "MM-VM-RHEL-7"
| summarize BillableDataBytes = sum(_BilledSize) by Computer, _IsBillable, Type
| render piechart
Below is the screenshot for illustration:
Related references:
Log data usage - Understanding ingested data volume
Standard columns in logs
I have past few hours trying to build query that shows/list computers which have generated over 10 events of specific id's.
E.g query list all computers that has occurred event 2222 but I would want to have query that would list computers only if it has reported e.g over 10 events of same id in past 48 hours. If none computer has reported over 10 events of specific id, then result is just empty.
Event | where Source contains "service" | where EventID == 2222
Another example which display results computers that has occurred event 2222 and how many times, but still missing that displaying results if count is bigger than 10.
Event
| where Source contains "service" | where EventID == 2222
| summarize count() by Computer
Logic behind this would be to create some kinda alert. A single event is not alone important but if it occurs more then it start to matter. Any tips?
Good job for this. But we usually give it a meaningful name instead of the auto-generated name like count_.
You can add a meaningful name in the summarize sentence like summarize TotalCount=count() by Computer.
The full query like below:
Event
| where Source contains "service" | where EventID == 2222
| summarize TotalCount=count() by Computer
| where TotalCount > 10
The result:
Got it finally working by this:
Event
| where Source contains "service" | where EventID == 2222
| summarize count() by Computer
| where count_ > 10
I have a question about azure log analytics alerts, in that I don't quite understand how the time frame works within the context of setting up an alert based on an aggregated value.
I have the code below:
Event | where Source == "EventLog" and EventID == 6008 | project TimeGenerated, Computer | summarize AggregatedValue = count(TimeGenerated) by Computer, bin_at(TimeGenerated,24h, datetime(now()))
For time window : 24/03/2019, 09:46:29 - 25/03/2019, 09:46:29
In the above the alert configuration interface insights on adding the bin_at(TimeGenerated,24h, datetime(now())) so I add the function, passing the arguments for a 24h time period. If you are already adding this then what is the point of the time frame.
Basically the result I am looking for is capturing this event over a 24 hour period and alerting when the event count is over 2. I don't understand why a time window is also necessary on top of this because I just want to run the code every five minutes and alert if it detects more than two instances of this event.
Can anyone help with this?
AFAIK you may use the query something like shown below to accomplish your requirement of capturing the required event over a time period of 24 hour.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= any(EventID) by Computer, bin(TimeGenerated, 1s)
The '1s' in this sample query is the time frame with which we are aggregating and getting the output from Log Analytics workspace repository. For more information, refer https://learn.microsoft.com/en-us/azure/kusto/query/summarizeoperator
And to create an alert, you may have to go to Azure portal -> YOURLOGANALYTICSWORKSPACE -> Monitoring tile -> Alerts -> Manager alert rules -> New alert rule -> Add condition -> Custom log search -> Paste any of the above queries under 'Search query' section -> Type '2' under 'Threshold value' parameter of 'Alert logic' section -> Click 'Done' -> Under 'Action Groups' section, select existing action group or create a new one as explained in the below mentioned article -> Update 'Alert Details' -> Click on 'Create alert rule'.
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/action-groups
Hope this helps!! Cheers!! :)
To answer your question in the comments part, yes the alert insists on adding the bin function and that's the reason I have provided relevant query along with bin function by having '1s' and tried to explain about it in my previous answer.
If you put '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 1second. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
If you put '24h' instead of '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 24hours. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
So in this case, we should not be using '24h' in bin function along with 'any' aggregation because if we use it then we would see only one occurrence of output in 24hours of timespan and that doesn't help you to find out event occurrence count using the above provided query having 'any' for aggregation. Instead you may use 'count' aggregation instead of 'any' if you want to have '24h' in bin function. Then this query would look something like shown below.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= count(EventID) by Computer, bin(TimeGenerated, 24h)
The output of this query would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time, y and z are considered as some numbers.
One other note is, all the above mentioned queries and outputs are in the context of setting up an alert based on an aggregated value i.e., setting up an alert when opting 'metric measurement' under alert logic based on section. In other words, aggregatedvalue column is expected in alert query when you opt 'metric measurement' under alert logic based on section. But when you say 'you get a count of the events' that means If i am not wrong, may be you are opting 'number of results' under alert logic based on section, which would not required any aggregation column in the query.
Hope this clarifies!! Cheers!!