I've added a chart using KQL and logs from Azure Log Analytics to a dashboard. I'm using make-series which works great but the catch is the following:
The logs I'm getting might not extend to the whole time range dictated by the dashboard. So basically I need access to the starttime/endtime (and time granularity) to make make-series cover the whole timerange.
e.g.
logs
| make-series
P90 = percentile(Elapsed, 90) default = 0,
Average = avg(Elapsed) default = 0
// ??? need start/end time to use in from/to
on TimeGenerated step 1m
Currently, it's not supported. There are some feedbacks about this feature: Support for time granularity selected in Azure Portal Dashboard, and Retrieve the portal time span and use it inside the kusto query.
And some people provided workarounds in the first feedback, you can give it a try.
I posted on another question on this subject - you can do a bit of a hack in your KQL to get this working: https://stackoverflow.com/a/73064218/5785878
Related
I am trying to create dashboard of my services in Azure. I added Azure Metrics Chart of each service and later wanted to add under it specific details to operations included in service.
But when I try to get it from logs, I get much higher number of requests made. KQL:
requests
| where cloud_RoleName startswith "notificationengine"
| summarize Count = count() by operation_Name
| order by Count
And result:
Problem is with some metrics chart I get values with minimal difference or exactly same while with some like one I shown I get completely different values. I tried to modify KQL or search what might be wrong but never got anywhere.
My guess is that those are 2 different values but in that case why both are labeled as "requests" and if so what are actual differences?
I have taken an Azure Function App with 2 Http Trigger Functions with identical names starts with “HttpTrigger” and run both the functions for couple of times.
Test Case 1:
In the Logs Workspace, Requests count got for the two functions that starts with the word “HttpTrigger”:
But I have pinned the chart of only 1 Function Requests Count to the Azure Dashboard:
Probably, I believe you have written the query of requests of all the services/applications that starts with “notificationengine” but pinned only some apps/services logs-chart to the dashboard.
Test Case 2:
I am trying to find a way to capture the dag stats - i.e run time (start time, end time), status, dag id, task id, etc for various dags and their task in a separate table
found the default logs which goes to elasticsearch/kibana, but not a simple way to pull the required logs from there back to the s3 table.
building a separate process to load those logs in s3 will have replicated data and also there will be too much data to scan and filter as tons of other system-related logs are generated as well.
adding a function to each dag - would have to modify each dag
what other possibilities are to get it don't efficiently, of any other airflow inbuilt feature can be used
You can try using Ad Hoc Query available in Apache airflow.
This option is available at Data Profiling -> Ad Hoc Query and select airflow_db
If you wish to get DAG statistics such as start_time,end_time etc you can simply query in the below format
select start_date,end_date from dag_run where dag_id = 'your_dag_name'
The above query returns start_time and end_time details of the DAG for all the DAG runs. If you wish to get details for a particular run then you can add another filter condition like below
select start_date,end_date from dag_run where dag_id = 'your_dag_name' and execution_date = '2021-01-01 09:12:59.0000' ##this is a sample time
You can get this execution_date from tree or graph views. Also you can get other stats like id,dag_id,execution_date,state,run_id,conf as well.
You can also refer to https://airflow.apache.org/docs/apache-airflow/1.10.1/profiling.html#:~:text=Part%20of%20being%20productive%20with,application%20letting%20you%20visualize%20data. link for more details.
You did not mention do you need this information real time or in batches.
Since you do not want to use ES logs either, you can try airflow metrics, if it suits your need.
However pulling this information from database is not efficient, in any case but it still is an option if you are not looking for real time data collection.
This line is in my Azure Application Insights Kusto query:
pageViews
| where timestamp between(datetime("2020-03-06T00:00:00.000Z")..datetime("2020-06-06T00:00:00.000Z"))
Each time I run it, I manually replace the datetime values with current date and the current date minus ~90 days. Is there a way to write the query in a way that no matter what day I run it, it uses that day minus 90 days by default?
The reason for 90 is I believe Azure Application Insights allows a maximum of the most recent 90 days to exported. In other queries I might choose to use minus 30 days or minus 7 days, if it's possible.
If this is easily spotted in Microsoft documentation and I have missed it in my exploration, I apologize.
Thank you for any insight anyone may have.
IIUC, you're interested in running something like this:
pageViews
| where timestamp between(startofday(ago(90d)) .. startofday(now()))
(depending on your requirement, you can omit the startofday()s, or use endofday(), or perform any other datetime-manipulation/arithmetics)
It should be easy to use ago operator. The query is as below:
pageViews
| where timestamp >ago(90d) //d means days here.
And for this The reason for 90 is I believe Azure Application Insights allows a maximum of the most recent 90 days to exported. You can take a look at Continuous Export feature, it's different from export via query. And you can choose the better one between them as per your requirement.
I am trying to configure the dashboard consists few business critical functionality which we needs to focus for performance monitoring based on the SLAs.
Example a landing page url retrieves a records needs to be faster and accepted SLA is
Green < 1sec
Amber 1 sec - 2 secs
Red > 2 secs
We were able to configure the same in SPLUNK based on flat file logs. However we could not able to configure similar thing in Azure.
As of now I could not able to create a dashboard for our requirement. Any type of graphical representation is Ok for us. Based on this monitoring we might need to react and improve the performance over the period of time when it goes slow.
You can use the below Kusto query in application insights:
requests
| where timestamp > ago(2h) //set the time range
| where url == "http://localhost:54917/" //set the url here
| summarize avg_time =avg(duration)
| extend my_result = case(
avg_time<=1000,"good", //1000 milliseconds
avg_time<=2000,"normal",//2000 milliseconds
"bad"
)
Note:
1.the unit of avg_time is milliseconds
2.when avg_time <=1000 milliseconds, then in dashboard, it shows "good"; when <=2000 milliseconds, it shows "normal"; when >2000 milliseconds, it shows "bad".
The query result(change it to Chart):
Then in dashboard:
An approximated solution which can serve your purpose
use request time vs time char along with reference lines which can be your SLA thresholds
So you can figure out at this moment the response time is below or above the threshold
// Response time trend
// Chart request duration over the last 12 hours
requests
| where timestamp > ago(12h)
| summarize avgRequestDuration=avg(duration) by bin(timestamp, 10m) // use a time grain of 10 minutes
| render timechart
| extend Green = 200
| extend Amber = 400
| extend red = 800
it would look something like below
I think it is much more useful than your previous UI, which has kind of a meter like feel that gives you health indication at that moment, but with continuous time plot you get better picture of the trend
If you run the same query in Azure Workbooks, you could use the "thresholds" renderer in grids or tiles to format cells with if/then/else like that for color for each range.
would get you:
you can then pin that grid/tiles/graph to an azure dashboard. (if the query uses a workbooks time range parameter, it will inherit the dashboard's time range and auto update as well.
I tried to pull the Azure resource usage data for billing metrics. I followed the steps as mentioned in the blog to get Usage data of resources.
https://msdn.microsoft.com/en-us/library/azure/mt219001.aspx
Even If I set "start and endtime" parameter in the URL, its not take effect. It returns entire output [ from resource created/added time ].
For example :
https://management.azure.com/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Commerce/UsageAggregates?api-version=2015-06-01-preview&reportedStartTime=2017-03-03T00%3a00%3a00%2b00%3a00&reportedEndTime=2017-03-04T00%3a00%3a00%2b00%3a00&aggregationGranularity=Hourly&showDetails=true"
As per the above URL, it should return the data between "2017-03-03 to 2017-03-04". But It shows the data from 2nd March [ 2017-03-02]. don't know why this return entire output and time filter section is not working.
Note : Endtime parameter value takes effect, mean it shows the output upto what mentioned in the endtime. But it doesn't consider the start time.
Anyone have a suggestion on this.
So there are a few things to consider:
There is usage date/time and then there is reported date/time.
Former tells you the date/time when the resources were used while the
latter tells you the date/time when this information was received by
the billing sub-system. There will be some delay in when the
resources used versus when they are reported. From this link:
Set {dateTimeOffset-value} for reportedStartTime and reportedEndTime
to valid dateTime values. Please note that this dateTimeOffset value
represents the timestamp at which the resource usage was recorded
within the Azure billing system. As Azure is a distributed system,
spanning across 19 datacenters around the world, there is bound to be
a delay between the resource usage time (when the resource was
actually consumed) and the resource usage reported time (when the
usage event reached the billing system) and callers need a predictable
way to get all usage events for a subscription for a given time
period.
The query only lets you search for reported date/time and there is no provision for usage date/time. However the data returned back to you contains usage date/time and not the reported date/time.
Long story short, because of the delay in propagating the usage information to the billing sub-system, the behavior you're seeing is correct. In my experience, it takes about 24 hours for all the usage information to show up in the billing sub-system.
The way we handle this scenario in our application is we fetch the data for a longer duration and then pick up only the data we're interested in seeing. So for example, if I need to see the data for 1st of March then we query the data for reported date/time from 1st March to say 4th March (i.e. today's date) and then discard any data where usage date is not 1st of March.
If we don't find any data (which is quite possible and is happening in your case as well), we simply tell the users that usage information is not yet available.