Azure KQL - Create Total Average Line on top of timechart - azure

I have some data that I'm logging from our internal label print system. I want to see the duration of each request in a timechart, but would also like to have a total average line on top of the other one. Is there anyway that this can be achieved in KQL?
Notice the red line. This is what I want.
I'm currently using the following KQL:
requests
| where url has "api/FileHandover"
| project Duration = duration, Timestamp = timestamp
| summarize Average = avg(Duration) by Timestamp
| render timechart

It's recommended not to group by timestamp, but by the requested bin, E.g., bin(timestamp, 1h)
// Data sample generation. Not part of the solution
let requests = range i from 1 to 1000 step 1 | extend url= "api/FileHandover", duration = 1d * rand() / 1m, timestamp = ago(rand() * 7d);
// Solution starts here
let raw_data = requests | where url has "api/FileHandover";
let total_avg = toscalar(raw_data | summarize avg(duration));
raw_data
| summarize Average = avg(duration) by bin(timestamp, 1h)
| extend total_avg
| render timechart
or
// Data sample generation. Not part of the solution
let requests = range i from 1 to 1000 step 1 | extend url= "api/FileHandover", duration = 1d * rand() / 1m, timestamp = ago(rand() * 7d);
// Solution starts here
requests
| where url has "api/FileHandover"
| as raw_data;
raw_data
| summarize avg(duration)
| as total_avg;
raw_data
| summarize Average = avg(duration) by bin(timestamp, 1h)
| extend total_avg = toscalar(total_avg)
| render timechart

Related

Azure Kusto syntax

I need to run a very simple query
requests
| where cloud_RoleName == "blabla"
| summarize Count=count() by url
| order by Count desc
only thing i need to get the data just from the past 5 minutes
if i try this :
requests | where timestamp < ago(5m)
| where cloud_RoleName == "blabla"
| summarize Count=count() by url
| order by Count desc
or this
requests
| where cloud_RoleName == "blabla" and timestamp < ago(5m)
| summarize Count=count() by url
| order by Count desc
but all of them are returning answers with data older than 5 minutes.
ive read the doc and i see no other way of writing this query
can anyone assist?
Make sure to check if the timestamp is greater than the result of ago().
It returns the timestamp from e.g. 5 minutes ago, so if you want the data that is within last 5 minutes, you want the ones with a timestamp higher than that.
So the query should be:
requests
| where timestamp > ago(5m)
| where cloud_RoleName == "blabla"
| summarize Count=count() by url
| order by Count desc

KQL time graph number of vulnerabilities

I have a query that fetches the number of unique vulnerabilities found in our images in our Azure Container Registry:
securityresources
| where type == 'microsoft.security/assessments/subassessments'
| where id matches regex '(.+?)/providers/Microsoft.Security/assessments/dbd0cb49-b563-45e7-9724-889e799fa648/'
| parse id with registryResourceId '/providers/Microsoft.Security/assessments/' *
| parse registryResourceId with * "/providers/Microsoft.ContainerRegistry/registries/" registryName
| extend imageDigest = tostring(properties.additionalData.imageDigest), repository = tostring(properties.additionalData.repositoryName)
| project
registryName,
repository,
imageDigest,
severity = properties.status.severity,
vulnId = properties.id,
displayName = properties.displayName,
description = properties.description,
remediation = properties.remediation,
category = properties.category,
impact = properties.impact,
timeGenerated = properties.timeGenerated
| distinct tostring(vulnId)
| summarize count()
I would like to have a graph that shows the number of vulnerabilities over a period of time so we can see (visually) that the number of vulnerabilities are going down (or up), but I have no clue on how to do this. Hopefully someone can help me in achieving this.
instead of distinct tostring(vulnId) | summarize count(), try either of the following:
summarize dcount() by bin(timeGenerated, 1h)
make-series dcount() on timeGenerated step 1h
and then add a | render timechart at the end
e.g:
securityresources
| where type == 'microsoft.security/assessments/subassessments'
| where id matches regex '(.+?)/providers/Microsoft.Security/assessments/dbd0cb49-b563-5e7-9724-889e799fa648/'
| extend vulnId = tostring(properties.id)
| summarize dcount(vulnId) by bin(timeGenerated, 1h)
| render timechart

Spark Dataframe complex ordering

I have a event log dataset, like this:
| patient | timestamp | event_st | extra_info |
| 1 | 1/1/2018 2:30 | urg_admission | x |
| 1 | 1/1/2018 3:00 | urg_discharge | x |
| 1 | 1/1/2018 | hosp_admission | y |
| 1 | 1/10/2018 | hosp_discharge | y |
I want to order all rows by patient and timestamp, but unfortunately, depending on the type of event event_st, the timestamp may be in minutes or days granularity.
So, the solution I would use in C++ would be define a complex < operator, where I would use the event_st as a discriminator when time granularity differs. For example, with the shown data, the events with hosp_ prefix will be always ordered after the events with urg_ prefix, when their day are the same.
Is there any equivalent approach using the DataFrame API or other Spark APIs?
Thank you very much.
One option is to first normalize all the timestamp to some standard form like ddMMYY or in epoch. The simplest way is to use an udf.
For example: If you consider all the timestamp to be converted to epoch then your code would look like:
def convertTimestamp(timeStamp:String, event_st:String) : Long = {
if(event_st == 'urg_admission') {
...// Add conversion logic
}
if(event_st == 'hosp_admission') {
...// Add conversion logic
}
...
}
val df = spark.read.json("/path/to/log/dataset") // I am assuming json format
spark.register.udf("convertTimestamp", convertTimestamp _)
df.createOrReplaceTempTable("logdataset")
val df_normalized = spark.sql("select logdataset.*, convertTimestamp(timestamp,event_st) as normalized_timestamp from logdataset")
After this you can use the normalized dataset form subsequent operation.

Azure log analytics timechart with multiple dimensions

In the Azure new log analytics query platform you can query for performance counters and summarize them to finally create a nice graph.
Following the multiple dimensions documentation example it says
Multiple expressions in the by clause creates multiple rows, one for
each combination of values.
I want to query their sample database for networks bytes Send and Received per each computer. Starting with this query it should be something like
Perf
| where TimeGenerated > ago(1d)
| where (CounterName == "Bytes Received/sec" or CounterName == "Bytes Sent/sec")
| summarize avg(CounterValue) by bin(TimeGenerated, 1h), Computer, CounterName
| extend Threshold = 20
| render timechart
The problem is that Send and Received bytes gets grouped in the graph at computer level.
How can multiple dimensions be represented as stated in the documentation so that I have Computer X Bytes Send and Computer X Bytes Received instead of them grouped together witch doesn't make any sense?
Not to mention that in the previous version this was working as expected.
I though that if multiple dimensions are not really accepted a string concatenation would do the trick. A bit hackish in my opinion but it did.
Perf
| where (CounterName == "Bytes Received/sec" or CounterName == "Bytes Sent/sec") and InstanceName matches regex "^Microsoft Hyper-V Network Adapter.*$"
| summarize avg(CounterValue) by strcat(Computer, " ", CounterName), bin(TimeGenerated, 10s)
| render timechart
Another option is this
let RuntimeID = CosmosThroughput_CL
| where MetricName_s == "ProvisionedThroughput" and TimeGenerated between (ago(2h) .. ago(1h))
| order by TimeGenerated desc
| top 1 by TimeGenerated
| distinct RuntimeID_g;
CosmosThroughput_CL
| where MetricName_s == "ProvisionedThroughput" and RuntimeID_g in (RuntimeID)
| project Resource = toupper(Resource), Value = Throughput_d, Container = Container_s, Database = Database_s, MetricName = "Provisioned"
| union
(
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where TimeGenerated between (ago(1d) .. ago(1d-1h))
| summarize Value = sum(todouble(requestCharge_s)) by Resource, databaseName_s, collectionName_s
| project Resource, Container = collectionName_s, Database = databaseName_s, Value, MetricName = "HourlyUsage"
)
| union
(
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where TimeGenerated between (ago(1d) .. ago(1d-1h))
| summarize Value = sum(todouble(requestCharge_s)/3600) by Resource, databaseName_s, collectionName_s
| project Resource, Container = collectionName_s, Database = databaseName_s, Value, MetricName = "RUs"
)
| project Resource, Database, Container, Value, MetricName
The important part is to project the same column names. Value holds the different values from each table. Second union helps me project another value from the same table.

Application Insights: Analytics - how to extract string at specific position

I'd like to do,
Extracting "query" strings where param=1 as follows in "2."
Getting pageViews in Analytics with table as "3."
1. Actual urls included in pageView
https://example.com/dir01/?query=apple&param=1
https://example.com/dir01/?query=apple&param=1
https://example.com/dir01/?query=lemon+juice&param=1
https://example.com/dir01/?query=lemon+juice&param=0
https://example.com/dir01/?query=tasteful+grape+wine&param=1
2. Value expected to extract
apple
lemon+juice
tasteful+grape+wine
3. Expected output in AI Analytics
Query Parameters | Count
apple | 2
lemon+juice | 1
tasteful+grape+wine | 1
Tried to do
https://learn.microsoft.com/en-us/azure/application-insights/app-insights-analytics-reference#parseurl
https://aka.ms/AIAnalyticsDemo
I think extract or parseurl(url) should be useful. I tried the latter parseurl(url) but don't know how to extract "Query Parameters" as one column.
pageViews
| where timestamp > ago(1d)
| extend parsed_url=parseurl(url)
| summarize count() by tostring(parsed_url)
| render barchart
url
http://aiconnect2.cloudapp.net/FabrikamProd/
parsed_url
{"Scheme":"http","Host":"aiconnect2.cloudapp.net","Port":"","Path":"/FabrikamProd/","Username":"","Password":"","Query Parameters":{},"Fragment":""}
Yes, parseurl is the way to do it. It results in a dynamic value which you can use as a json.
To get the "query" value of the query parameters:
pageViews
| where timestamp > ago(1d)
| extend parsed_url=parseurl(url)
| extend query = tostring(parsed_url["Query Parameters"]["query"])
and to summarize by the param value:
pageViews
| where timestamp > ago(1d)
| extend parsed_url=parseurl(url)
| extend query = tostring(parsed_url["Query Parameters"]["query"])
| extend param = toint(parsed["Query Parameters"]["param"])
| summarize sum(param) by query
You can see how it works on your example values in the demo portal:
let vals = datatable(url:string)["https://example.com/dir01/?
query=apple&param=1", "https://example.com/dir01/?query=apple&param=1",
"https://example.com/dir01/?query=lemon+juice&param=1",
"https://example.com/dir01/?query=lemon+juice&param=0",
"https://example.com/dir01/?query=tasteful+grape+wine&param=1"];
vals
| extend parsed = parseurl(url)
| extend query = tostring(parsed["Query Parameters"]["query"])
| extend param = toint(parsed["Query Parameters"]["param"])
| summarize sum(param) by query
Hope this helps,
Asaf

Resources