KQL query showing preceding logs from a specific log - azure

I'm working on a query where I need the log that has a message of "Compromised" in it, then I want it to return the preceding 5 "deny" logs. New to KQL and just don't know the operator, so I appreciate the help!
Current query:
| sort by TimeGenerated
| where SourceIP == "555.555.555.555"
| where TimeGenerated between (datetime(10/20/2021, 16:25:41.750).. datetime(10/20/2021, 16:35:41.750))
| where AdditionalExtensions has "Compromised" or DeviceAction == "deny"
Ideally in my head it would be something like:
Needed query:
| sort by TimeGenerated
| where SourceIP == "555.555.555.555"
| where AdditionalExtensions has "Compromised"
| \\show preceding 5 logs that have DeviceAction = "deny"
Thank you!

You can use the prev() function

Here's how you do it:
let N = 5; // Number of records before/after records for which Cond is true
YourTable
| extend Cond = (SourceIP == "555.555.555.555") and (AdditionalExtensions has "Compromised") and (DeviceAction == "deny") // The predicate to "identify" relevant records
| sort by TimeGenerated asc
| extend rn = row_number(0, Cond)
| extend nxt = next(rn, N), prv = prev(rn, N)
| where nxt < N or (rn <= N and isnotnull(prv)) or Cond
| project-away rn, nxt, prv, Cond
Note that the sorting is done after the extend, and not before - this is more optimal (it's always best to push down the sorting as further down as possible).
(Courtesy of #RoyO)

Related

Exclude Temporary Storage (D:) from KQL QUERY

I have a KQL query from disk logs from Azure Log Insights. Please let me know how to exclude a particular drive like D: or any temporary storage from this query.
InsightsMetrics
| where Name == "FreeSpaceMB"
| extend Tags = parse_json(Tags)
| extend mountId = tostring(Tags["vm.azm.ms/mountId"])
,diskSizeMB = toreal(Tags["vm.azm.ms/diskSizeMB"])
| project-rename FreeSpaceMB = Val
| summarize arg_max(TimeGenerated, diskSizeMB, FreeSpaceMB) by Computer, mountId
,FreeSpacePercentage = round(FreeSpaceMB / diskSizeMB * 100, 1)
| extend diskSizeGB = round(diskSizeMB / 1024, 1)
,FreeSpaceGB = round(FreeSpaceMB / 1024, 1)
| project TimeGenerated, Computer, mountId, diskSizeGB, FreeSpaceGB, FreeSpacePercentage
| order by Computer asc, mountId asc
You just need to do a where statement
| where mountId != "D:"
So in your query it will be
InsightsMetrics
| where Name == "FreeSpaceMB"
| extend Tags = parse_json(Tags)
| extend mountId = tostring(Tags["vm.azm.ms/mountId"])
,diskSizeMB = toreal(Tags["vm.azm.ms/diskSizeMB"])
| where mountId != "D:"
| project-rename FreeSpaceMB = Val
| summarize arg_max(TimeGenerated, diskSizeMB, FreeSpaceMB) by Computer, mountId
,FreeSpacePercentage = round(FreeSpaceMB / diskSizeMB * 100, 1)
| extend diskSizeGB = round(diskSizeMB / 1024, 1)
,FreeSpaceGB = round(FreeSpaceMB / 1024, 1)
| project TimeGenerated, Computer, mountId, diskSizeGB, FreeSpaceGB, FreeSpacePercentage
| order by Computer asc, mountId asc
And if you wanted to exclude multiple drives from the query, you can use the !in operator, will look like below
InsightsMetrics
| where Name == "FreeSpaceMB"
| extend Tags = parse_json(Tags)
| extend mountId = tostring(Tags["vm.azm.ms/mountId"])
,diskSizeMB = toreal(Tags["vm.azm.ms/diskSizeMB"])
| where mountId !in ("D:", "E:")
| project-rename FreeSpaceMB = Val
| summarize arg_max(TimeGenerated, diskSizeMB, FreeSpaceMB) by Computer, mountId
,FreeSpacePercentage = round(FreeSpaceMB / diskSizeMB * 100, 1)
| extend diskSizeGB = round(diskSizeMB / 1024, 1)
,FreeSpaceGB = round(FreeSpaceMB / 1024, 1)
| project TimeGenerated, Computer, mountId, diskSizeGB, FreeSpaceGB, FreeSpacePercentage
| order by Computer asc, mountId asc

Kusto number of overlapping intervals in a time range

I'm trying to write a Kusto query that needs to count how many intervals overlap for a certain date range. This is how my table looks like:
userID | interval1 | interval2
24 | 21.1.2012 10:40 | 21.1.2012 11:00
25 | 21.1.2012 9:55 | 21.1.2012 10:50
I would like to to consider the time range given by [min(interval1), max(interval2)] with 1s step and for each instance of this interval I would like to know how many intervals from the previous table overlap. For example, for 21.1.2012 10:00 there is only one interval but for 10:45 there are two intervals overlapping.
Thank you
Every interval1 indicates additional user's session start (+1).
Every interval2 indicates additional user's session end (-1).
The accumulated sum indicates the number of active sessions.
Solution 1 (Rendering level)
with (accumulate=True)
let t = (datatable (userID:int,interval1:datetime,interval2:datetime)
[
24 ,datetime(2012-01-21 10:40) ,datetime(2012-01-21 11:00)
,25 ,datetime(2012-01-21 09:55) ,datetime(2012-01-21 10:50)
]);
let from_dttm = datetime(2012-01-21 09:30);
let to_dttm = datetime(2012-01-21 11:30);
let sessions_starts = (t | project delta = 1, dttm = interval1);
let sessions_ends = (t | project delta = -1, dttm = interval2);
union sessions_starts, sessions_ends
| make-series delta = sum(delta) on dttm from from_dttm to to_dttm step 1s
| render timechart with (accumulate=True)
Fiddle
Solution 2 (Data level)
mv-apply + row_cumsum
let t = (datatable (userID:int,interval1:datetime,interval2:datetime)
[
24 ,datetime(2012-01-21 10:40) ,datetime(2012-01-21 11:00)
,25 ,datetime(2012-01-21 09:55) ,datetime(2012-01-21 10:50)
]);
let from_dttm = datetime(2012-01-21 09:30);
let to_dttm = datetime(2012-01-21 11:30);
let sessions_starts = (t | project delta = 1, dttm = interval1);
let sessions_ends = (t | project delta = -1, dttm = interval2);
union sessions_starts, sessions_ends
| make-series delta = sum(delta) on dttm from from_dttm to to_dttm step 1s
| mv-apply delta to typeof(long), dttm to typeof(datetime) on (project active_users = row_cumsum(delta), dttm)
| render timechart with (xcolumn=dttm, ycolumns=active_users)
Fiddle
Take a look at this sample from the Kusto docs:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/samples?pivots=azuredataexplorer#chart-concurrent-sessions-over-time
X
| mv-expand samples = range(bin(interval1, 1m), interval2, 1m)
| summarize count_userID = count() by bin(todatetime(samples), 1m)

How to output multiple variables using Azure Kusto?

I'm fairly new to Azure Kusto query-language. I'm trying to output 2 variables. This has to be something very simple, I just don't know how. I have tried using datatable, make-series, print, etc. functions to no avail. Here's my current code:
let allrequests = requests | project itemCount, resultCode, success, timestamp | where timestamp > now(-1h) and timestamp < now(-5m);
let requestcount = allrequests | summarize sum(itemCount);
let errorcount = allrequests | where toint(resultCode) >= 400 and toint(resultCode) <= 499 | summarize sum(itemCount);
requestcount; errorcount
Using union is one way, but if you want them on a single row use the print statement (docs):
let requestcount = requests
| summarize sum(itemCount);
let errorcount = exceptions
| summarize count();
print requests = toscalar(requestcount), exceptions = toscalar(errorcount)
I figured it out. You can join results using the union operator.
let allrequests = requests | project itemCount, resultCode, success, timestamp | where timestamp > now(-1h) and timestamp < now(-5m);
let requestcount = allrequests | summarize sum(itemCount);
let errorcount = allrequests | where toint(resultCode) >= 400 and toint(resultCode) <= 499 | summarize sum(itemCount);
errorcount | union requestcount

How to use series_divide() in Kusto?

I am not able correctly divide time-series data with another time-series.
I get data from my TestTablewhich results in the following view:
TagId, sdata
8862, [0,0,0,0,2,2,2,3,4]
6304, [0,0,0,0,2,2,2,3,2]
I want to divide the sdata series for tagId 8862 with the series from 6304
I expect the following result:
[NaN,NaN,NaN,NaN,1,1,1,1,2]
When I try the below code, I only get two empty ddata rows in my S2 results
TestTable
| where TagId in (8862,6304)
| make-series sdata = avg(todouble(Value)) default=0 on TimeStamp in range (datetime(2019-06-27), datetime(2019-06-29), 1m) by TagId
| as S1;
S1 | project ddata = series_divide(sdata[0].['sdata'], sdata[1].['sdata'])
| as S2
What am I doing wrong?
both arguments to series_divide() can't come from two separate rows in the dataset.
here's an example for how you could achieve that (based on the limited-and-perhaps-not-fully-representative-of-your-real use case, as shown in your question)
let T =
datatable(tag_id:long, sdata:dynamic)
[
8862, dynamic([0,0,0,0,2,2,2,3,4]),
6304, dynamic([0,0,0,0,2,2,2,3,2]),
]
;
let get_value_from_T = (_tag_id:long)
{
toscalar(
T
| where tag_id == _tag_id
| take 1
| project sdata
)
};
print sdata_1 = get_value_from_T(8862), sdata_2 = get_value_from_T(6304)
| extend result = series_divide(sdata_1, sdata_2)
which returns:
|sdata_1 | sdata_2 | result |
|--------------------|---------------------|---------------------------------------------|
|[0,0,0,0,2,2,2,3,4] | [0,0,0,0,2,2,2,3,2] |["NaN","NaN","NaN","NaN",1.0,1.0,1.0,1.0,2.0]|

Spark out of memory with a large number of window functions (lag, lead)

I need to calculate additional features from a dataset using multiple lead's and lag's. The high number of lead's and lag's causes a out-of-memory error.
Data frame:
|----------+----------------+---------+---------+-----+---------|
| DeviceID | Timestamp | Sensor1 | Sensor2 | ... | Sensor9 |
|----------+----------------+---------+---------+-----+---------|
| | | | | | |
| Long | Unix timestamp | Double | Double | | Double |
| | | | | | |
|----------+----------------+---------+---------+-----+---------|
Window definition:
// Each window contains about 600 rows
val w = Window.partitionBy("DeviceID").orderBy("Timestamp")
Compute extra features:
var res = df
val sensors = (1 to 9).map(i => s"Sensor$i")
for (i <- 1 to 5) {
for (s <- sensors) {
res = res.withColumn(lag(s, i).over(w))
.withColumn(lead(s, i)).over(w)
}
// Compute features from all the lag's and lead's
[...]
}
System info:
RAM: 16G
JVM heap: 11G
The code gives correct results with small datasets, but gives an out-of-memory error with 10GB of input data.
I think the culprit is the high number of window functions because the DAG shows a very long sequence of
Window -> WholeStageCodeGen -> Window -> WholeStageCodeGen ...
Is there anyway to calculate the same features in a more efficient way?
For example, is it possible to get lag(Sensor1, 1), lag(Sensor2, 1), ..., lag(Sensor9, 1) without calling lag(..., 1) nine times?
If the answer to the previous question is no, then how can I avoid out-of-memory? I have already tried increasing the number of partitions.
You could try something like
res = res.select('*', lag(s"Sensor$1", 1).over(w), lag(s"Sensor$1", 2).over(w), ...)
That is, to write everything in a select instead of many withColumn
Then there will be only 1 Window in the plan. Maybe it helps with the performance.

Resources