Azure Response Time Monitoring per Url with a range - azure

I am trying to configure the dashboard consists few business critical functionality which we needs to focus for performance monitoring based on the SLAs.
Example a landing page url retrieves a records needs to be faster and accepted SLA is
Green < 1sec
Amber 1 sec - 2 secs
Red > 2 secs
We were able to configure the same in SPLUNK based on flat file logs. However we could not able to configure similar thing in Azure.
As of now I could not able to create a dashboard for our requirement. Any type of graphical representation is Ok for us. Based on this monitoring we might need to react and improve the performance over the period of time when it goes slow.

You can use the below Kusto query in application insights:
requests
| where timestamp > ago(2h) //set the time range
| where url == "http://localhost:54917/" //set the url here
| summarize avg_time =avg(duration)
| extend my_result = case(
avg_time<=1000,"good", //1000 milliseconds
avg_time<=2000,"normal",//2000 milliseconds
"bad"
)
Note:
1.the unit of avg_time is milliseconds
2.when avg_time <=1000 milliseconds, then in dashboard, it shows "good"; when <=2000 milliseconds, it shows "normal"; when >2000 milliseconds, it shows "bad".
The query result(change it to Chart):
Then in dashboard:

An approximated solution which can serve your purpose
use request time vs time char along with reference lines which can be your SLA thresholds
So you can figure out at this moment the response time is below or above the threshold
// Response time trend
// Chart request duration over the last 12 hours
requests
| where timestamp > ago(12h)
| summarize avgRequestDuration=avg(duration) by bin(timestamp, 10m) // use a time grain of 10 minutes
| render timechart
| extend Green = 200
| extend Amber = 400
| extend red = 800
it would look something like below
I think it is much more useful than your previous UI, which has kind of a meter like feel that gives you health indication at that moment, but with continuous time plot you get better picture of the trend

If you run the same query in Azure Workbooks, you could use the "thresholds" renderer in grids or tiles to format cells with if/then/else like that for color for each range.
would get you:
you can then pin that grid/tiles/graph to an azure dashboard. (if the query uses a workbooks time range parameter, it will inherit the dashboard's time range and auto update as well.

Related

Unable to reproduce data from Azure Metrics Chart using Logs

I am trying to create dashboard of my services in Azure. I added Azure Metrics Chart of each service and later wanted to add under it specific details to operations included in service.
But when I try to get it from logs, I get much higher number of requests made. KQL:
requests
| where cloud_RoleName startswith "notificationengine"
| summarize Count = count() by operation_Name
| order by Count
And result:
Problem is with some metrics chart I get values with minimal difference or exactly same while with some like one I shown I get completely different values. I tried to modify KQL or search what might be wrong but never got anywhere.
My guess is that those are 2 different values but in that case why both are labeled as "requests" and if so what are actual differences?
I have taken an Azure Function App with 2 Http Trigger Functions with identical names starts with “HttpTrigger” and run both the functions for couple of times.
Test Case 1:
In the Logs Workspace, Requests count got for the two functions that starts with the word “HttpTrigger”:
But I have pinned the chart of only 1 Function Requests Count to the Azure Dashboard:
Probably, I believe you have written the query of requests of all the services/applications that starts with “notificationengine” but pinned only some apps/services logs-chart to the dashboard.
Test Case 2:

Access dashboard's time range and granularity from KQL

I've added a chart using KQL and logs from Azure Log Analytics to a dashboard. I'm using make-series which works great but the catch is the following:
The logs I'm getting might not extend to the whole time range dictated by the dashboard. So basically I need access to the starttime/endtime (and time granularity) to make make-series cover the whole timerange.
e.g.
logs
| make-series
P90 = percentile(Elapsed, 90) default = 0,
Average = avg(Elapsed) default = 0
// ??? need start/end time to use in from/to
on TimeGenerated step 1m
Currently, it's not supported. There are some feedbacks about this feature: Support for time granularity selected in Azure Portal Dashboard, and Retrieve the portal time span and use it inside the kusto query.
And some people provided workarounds in the first feedback, you can give it a try.
I posted on another question on this subject - you can do a bit of a hack in your KQL to get this working: https://stackoverflow.com/a/73064218/5785878

Can we find out if there is an increase to number of requests to a given page?

If i have a web app in Azure, with ApplicationInsights configured, is there a way we can tell if there was an increase in the number of requests to a given page?
I know we can get the "Delta" of performance in a given time slice, compared to the previous period, but doesn't seem like we can do this for requests?
For example, i'd like to answer questions like: "what pages in the last hour had the highest % increase in requests, compared to the previous period"?
Does anyone know how to do this, or can it be done via the AppInsights query language?
Thanks!
Not sure whether it can be done using the Portal, I don't think so. But I came up with the following Kusto query:
requests
| where timestamp > ago(2h) and timestamp < ago(1h)
| summarize previousPeriod = todouble(count()) by url
| join (
requests
| where timestamp > ago(1h)
| summarize lastHour = todouble(count()) by url
) on url
| project url, previousPeriod, lastHour, change = ((lastHour - previousPeriod) / previousPeriod) * 100
| order by change desc
This is about increase/decrease of amount of traffic per url, you can change count() to for example avg(duration) to get the increase/decrease of the average duration.

Ongoing time frame in Azure Application Insights

This line is in my Azure Application Insights Kusto query:
pageViews
| where timestamp between(datetime("2020-03-06T00:00:00.000Z")..datetime("2020-06-06T00:00:00.000Z"))
Each time I run it, I manually replace the datetime values with current date and the current date minus ~90 days. Is there a way to write the query in a way that no matter what day I run it, it uses that day minus 90 days by default?
The reason for 90 is I believe Azure Application Insights allows a maximum of the most recent 90 days to exported. In other queries I might choose to use minus 30 days or minus 7 days, if it's possible.
If this is easily spotted in Microsoft documentation and I have missed it in my exploration, I apologize.
Thank you for any insight anyone may have.
IIUC, you're interested in running something like this:
pageViews
| where timestamp between(startofday(ago(90d)) .. startofday(now()))
(depending on your requirement, you can omit the startofday()s, or use endofday(), or perform any other datetime-manipulation/arithmetics)
It should be easy to use ago operator. The query is as below:
pageViews
| where timestamp >ago(90d) //d means days here.
And for this The reason for 90 is I believe Azure Application Insights allows a maximum of the most recent 90 days to exported. You can take a look at Continuous Export feature, it's different from export via query. And you can choose the better one between them as per your requirement.

Azure Function monitor alert where execution count < 1 never triggered

I have an Azure Function App with Azure Functions that I individually want to monitor with the following rule: If an Azure Function didn't execute for N amount of minutes, send out an email/notification.
I am wondering if this is possible with the Application Insights Alerts, which does provide signal logic for the count on an individual Azure Function basis. But this count is never 0, in the graphs it appears that any count < 0 is not seen as a number. It displays as --, as you can see in the graph for my test function below:
testfunction chart (don't have enough reputation to post images)
The peak on the chart is seen as a 3, but if I use the condition "Whenever the testfunction Count is Less than 1" then the alert is never triggered.
Changing the aggregation granularity doesn't really do much, since the signal logic doesn't ever seem to record a count of 0, or any count smaller than 1.
There are lots of (slightly) more inconvenient ways to do this type of monitoring, but it seemed very possible with the nice built-in Azure Application Insights Alerts and I'd like to use that if at all possible.
Am I trying to misuse Application Insights Alerts or is there something obvious that I'm not getting? I would think it should be possible to have monitoring rules based on a lack of executions.
you might have to do this with log/query alerts instead. If you're doing metric based alerts, some of those don't send 0's as data. so if nothing happened during a time range, there's no 0's to alert on, since nothing is submitting 0, 0, 0, 0.
instead, you'd create alerts based on queries: https://learn.microsoft.com/en-us/azure/azure-monitor/platform/alerts-unified-log
the doc has this exact scenario listed:
In some cases, you may want to create an alert in the absence of an event.
For example, a process may log regular events to indicate that it's working properly. If it doesn't log one of these events within a particular time period, then an alert should be created. In this case, you would set the threshold to less than 1. [emphasis added, this is your scenario, correct]?
Example of Number of Records type log alert
Consider a scenario where you want to know when your web-based App gives a response to users with code 500 (that is) Internal Server Error. You would create an alert rule with the following details:
Query: requests | where resultCode == "500"
Time period: 30 minutes
Alert frequency: five minutes
Threshold value: Greater than 0
in that example the query would end up being something like requests | where timespan < ago(30m) | where resultCode == "500" because of the time period set. (the query itself can then filter that time range/result set down however you want)
so for yours, you'd probably just do requests with no where condition at all, and whatever time period and frequency you have, and "less than one" as the threshold.
you could make much more complicated queries as well, to filter out test data, etc.
one thing to watch out for is that I believe log alerts will fire an alert every time the frequency elapses. so if you had a requests < 1 alert set up for every 5 minutes, and your function had no calls for 2 hours, the alert is going to fire every 5 minutes, sending you 40 emails or whatever. maybe you want that :)

Resources