Azure Log Analytics - Search REST API - How to Paginate through results - azure

When grabbing search result using Azure Log Analytics Search REST API
I'm able to receive only the first 5000 results (as by the specs, at the top of the document), but know there are many more (by the "total" attribute in the metadata in the response).
Is there a way to paginate so to get the entire result set?
One hacky way would be to attempt to break down the desired time-range iteratively until the "total" is less than 5000 for that timeframe, and do this process iteratively for the entire desired time-range - but this is guesswork that will cost many redundant requests.

While it doesn't appear to be a way to paginate using the REST API itself, you can use your query to perform the pagination. The two key operators here are TOP and SKIP:
Suppose you want page n with pagesize x (starting at page 1), then append to your query:
query | skip (n-1) * x | top x.
For a full reference list, see https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-search-reference

Yes, skip operation is not available anymore but if you want create pagination there is still an option. You need to count total count of entries, use a simple math and two opposite sortings.
Prerequisites for this query are values: ContainerName, Namespace, Page, PageSize.
I'm using it in Workbook where these values are set by fields.
let containers = KubePodInventory
| where ContainerName matches regex '^.*{ContainerName}$' and Namespace == '{Namespace}'
| distinct ContainerID
| project ContainerID;
let TotalCount = toscalar(ContainerLog
| where ContainerID in (containers)
| where LogEntry contains '{SearchText}'
| summarize CountOfLogs = count()
| project CountOfLogs);
ContainerLog
| where ContainerID in (containers)
| where LogEntry contains '{SearchText}'
| extend Log=replace(#'(\x1b\[[0-9]*m|\x1b\[0 [0-9]*m)','', LogEntry)
| project TimeGenerated, Log
| sort by TimeGenerated asc
| take {PageSize}*{Page}
| top iff({PageSize}*{Page} > TotalCount, TotalCount - ({PageSize}*({Page} - 1)) , {PageSize}) by TimeGenerated desc;
// The '| extend' is not needed if in logs are not the annoying special characters

Related

Grafana azure log analytics transfer query from logs

I have this query that works in Azure logs when i set the scope to the specific application insights I want to use
let usg_events = dynamic(["*"]);
let mainTable = union pageViews, customEvents, requests
| where timestamp > ago(1d)
| where isempty(operation_SyntheticSource)
| extend name =replace("\n", "", name)
| where '*' in (usg_events) or name in (usg_events)
;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension =tostring(client_CountryOrRegion)
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(user_Id) by tostring(dimension)
| extend Users = dcount_hll(hll)
| order by Users desc
| serialize rank = row_number()
| extend dimension = iff(rank > 5, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ["Country or region"] = dimension, Counts = dcount_hll(merged);
cohortedTable
but trying to use the same in grafana just gives an error.
"'union' operator: Failed to resolve table expression named 'pageViews'"
Which is the same i get in azure logs if i dont set the scope to the specific application insights resource. So my question is. how do i make it so grafana targets this specific scope inside the logs? The query jsut gets the countries of the users that log in
As far as I know, Currently, there is no option/feature to add Scope in Grafana.
The Scope is available only in the Azure Log Analytics Workspace.
If you want the Feature/Resolution, please raise a ticket in Grafana Community where all the issues are officially addressed.

How to summarize time window based on a status in Kusto

I have recently started working with Kusto. I am stuck with a use case where i need to confirm the approach i am taking is right.
I have data in the following format
In the above example, if the status is 1 and if the time frame is equal to 15 seconds then i need to assume it as 1 occurrence.
So in this case 2 occurrence of status.
My approach was
if the current and next rows status is equal to 1 then take the time difference and do row_cum_sum and break it if the next(STATUS)!=0.
Even though the approach is giving me correct output, I am assuming the performance can slow down once the size is increased.
I am looking for an alternative approach if any. Also adding the complete scenario to reproduce this with a sample data.
.create-or-alter function with (folder = "Tests", skipvalidation = "true") InsertFakeTrue() {
range LoopTime from ago(365d) to now() step 6s
| project TIME=LoopTime,STATUS=toint(1)
}
.create-or-alter function with (folder = "Tests", skipvalidation = "true") InsertFakeFalse() {
range LoopTime from ago(365d) to now() step 29s
| project TIME=LoopTime,STATUS=toint(0)
}
.set-or-append FAKEDATA <| InsertFakeTrue();
.set-or-append FAKEDATA <| InsertFakeFalse();
FAKEDATA
| order by TIME asc
| serialize
| extend cstatus=STATUS
| extend nstatus=next(STATUS)
| extend WindowRowSum=row_cumsum(iff(nstatus ==1 and cstatus ==1, datetime_diff('second',next(TIME),TIME),0),cstatus !=1)
| extend windowCount=iff(nstatus !=1 or isnull(next(TIME)), iff(WindowRowSum ==15, 1,iff(WindowRowSum >15,(WindowRowSum/15)+((WindowRowSum%15)/15),0)),0 )
| summarize IDLE_COUNT=sum(windowCount)
The approach in the question is the way to achieve such calculations in Kusto and given that the logic requires sorting is also efficient (as long as the sorted data can reside on a single machine).
Regarding union operator - it runs in parallel by default, you can control the concurrency and spread using hints, see: union operator

Search Query should contain 'AggregatedValue' and 'bin(timestamp, [roundTo])' for Metric alert type

I'm trying to create a custom metric alert based on some metrics in my Application Insights logs. Below is the query I'm using;
let start = customEvents
| where customDimensions.configName == "configName"
| where name == "name"
| extend timestamp, correlationId = tostring(customDimensions.correlationId), configName = tostring(customDimensions.configName);
let ending = customEvents
| where customDimensions.configName == configName"
| where name == "anotherName"
| where customDimensions.taskName == "taskName"
| extend timestamp, correlationId = tostring(customDimensions.correlationId), configName = tostring(customDimensions.configName), name= name, nameTimeStamp= timestamp ;
let timeDiffs = start
| join (ending) on correlationId
| extend timeDiff = nameTimeStamp- timestamp
| project timeDiff, timestamp, nameTimeStamp, name, anotherName, correlationId;
timeDiffs
| summarize AggregatedValue=avg(timeDiff) by bin(timestamp, 1m)
When I run this query in Analytics page, I get results, however when I try to create a custom metric alert, I got the error Search Query should contain 'AggregatedValue' and 'bin(timestamp, [roundTo])' for Metric alert type
The only response I found was adding AggregatedValue which I already have, I'm not sure why custom metric alert page is giving me this error.
I found what was wrong with my query. Essentially, aggregated value needs to be numeric, however AggregatedValue=avg(timeDiff) produces time value, but it was in seconds, so it was a bit hard to notice. Converting it to int solves the problem,
I have just updated last bit as follows
timeDiffs
| summarize AggregatedValue=toint(avg(timeDiff)/time(1ms)) by bin(timestamp, 5m)
This brings another challenge on Aggregate On while creating the alert as AggregatedValue is not part of the grouping that is coming after by statement.

Azure Application Insights - values within objects

I'm trying to get my head around writing queries in Azure Application Insights which is capturing interactions with a bot built using Azure Bot Framework.
I have a table with headings such as timestamp, name, customDimensions, customDimensions and within customDimensions are objects such as
{
"conversationData": "{}",
"privateConversationData": "{\"nameForm\":{\"NAME\":\"foo\",\"ICCID\":\"12121212121212121212\"}}",
"userData": "{}",
"conversationId": "878fhiee1k33j5ci",
"userId": "default-user",
"metrics": "92.25833"
}
I can write queries easily to select items by name for example
customEvents
| where name contains "Activity"
but how do I select based on keys within objects such as those within privateConversationData above?
For example "privateConversationData": "{\"nameForm\":{\"NAME\":\"foo\",\"ICCID\":\"12121212121212121212\"}}", refers to one dialog called nameForm, how would I write a query to show the number of times the nameForm was used? Or a query that included the other kinds of dialog (e.g. not just nameForm, but fooForm, barForm) and a count of the times they were used?
Many thanks for any help!
The 'customDimensions' property is a dynamic type and therefore can be treated as a JSON document.
For example - to get the number of times nameForm was used in the last day:
customEvents
| extend conversationData = customDimensions["privateConversationData"]
| where timestamp > ago(1d) and isnotempty(conversationData) and conversationData contains "{\\\"nameForm\\\""
| count
Getting the different dialogs count will be trickier, but possible by parsing the customDimensions JSON document using the parse operator:
customEvents
| where timestamp > ago(1d)
| parse customDimensions with * "privateConversationData\": \"{\\\"" dialogKind "\\\":{\\\"NAME\\\"" *
| where isnotempty(dialogKind) and isnotnull(dialogKind)
| summarize count() by dialogKind
You can read the Analytics Reference to learn more about the language.

ArangoDB AQL Filtering Using Edges and Vertices with Unknown Positions in Graph Traversal Path

I have a generic graph structure where I need to find non-leaf nodes in the graph based on their connections to other nodes in the graph. The position of the node I want to return is not defined, and it is possible there are multiple paths to the node I want to return. I want to run a single query to return a bunch of items I am displaying in a sorted list to a client. I do not want to have to run multiple asynchronous queries and sort on the client side.
This list is filtered based on the edges that connect the vertices together, or if the node is connected to another node. The filter conditions are updated on the client side, which results in the query being re-constructed and the database re-queried. The position of the nodes in the graph that need to be returned is not guaranteed to be the same for all results, they may be leaf nodes, or anywhere in the path. The vertices I want to return can be identified via attributes on the edges leading to them, or away from them. Each edge also has a date attribute on it that is used for sorting and a type attribute that is used for filtering.
Image in a graph 'myGraph' such as I attempted to illustrate below.
-------
| v:1 |\
------- \
| \ \ -------
| | \| v:4 |\
| \ ------- \
| | / ^ \ -------
| \/ | \| v:7 |
| /| return -------
| / \
| / |
------- \
| v:2 |\ |
------- \ \
| \ -------
| \| v:5 |\
| ------- \
| \ -------
| \| v:8 |\
| ------- \
| ^ \ -------
| | \| v:10|
------- return -------
| v:3 |\
------- \
\ -------
\| v:6 |\
------- \
\ -------
\| v:9 |
-------
^
|
return
The above diagram illustrates what I want to return given one set of filtering conditions, but the returned results can vary if I change the filtering conditions. The nodes I want to return are easily identified based on the attributes on the edges leading to them or away from them.
I have a query that looks something like the following, but am having trouble finding a way to index the nodes in the path that have edges leading to or away from them that meet a specific filtering criteria.
FOR item in vertexCollection1
FILTER .... // FILTER the vertices
FOR v, e, p IN 1..4 OUTBOUND item._id GRAPH 'myGraph'
// ?? Not sure how to efficiently return from here
// ?? FILTER p.vertices[??].v == 7 OR p.vertices[??].v == 10
// ?? FILTER p.edges[??].type == "type1" OR p.edges[??].type == "type2"... etc based on user selections
// ?? LET date = p.edges[vertexPosition - 1].date
// ?? LET data = p.vertices[??]
// SORT DATE_TIMESTAMP(date) DESC
// RETURN {date: date, data: data}
I am currently using a [ ** ] operation to get the specific node based on what collection it resides in using something like the following:
LET data = p.vertices[ ** FILTER CONTAINS(CURRENT._id, "collectionName") OR ...]
but this is awkward and requires the vertices to be placed in specific collections to facilitate query construction. This also does not solve the problem of how to index the associated edges connecting to the node I want to return.
I apologize if this question is answered elsewhere, and if it is a pointer to the answer is appreciated. I am not sure on the correct terminology to concisely describe the problem and search accordingly.
Thanks!
I was able to get the behavior I needed using a query structured similar to the following:
LET events = (
FOR v, e, p IN 1..3 OUTBOUND 'collection/document_id' GRAPH 'myGraph' OPTIONS {"uniqueEdges": "global"}
FILTER .... // Filter the vertices
LET children = (
FOR v1, e1, p1 IN 1..1 OUTBOUND v._id GRAPH 'myGraph'
FILTER e1.type == "myEventType" OR ... // Filter immediate neighbors I care about
SORT(e1.date) // I have date timestamps on everything
RETURN { child: v1._id, ... /* other child attributes as needed */ }
)
// FILTER .... conditions on children if necessary in context of v
RETURN DISTINCT (data: v, children: children, ... /* other attributes as needed */ )
)
FOR event IN events
SORT(event.date) // I need chronological sorting and have date attribute on every node
RETURN event
The DISTINCT modifier on the RETURN clause appeared to remove duplicates that resulted from multiple paths to the same node and I was able to add the custom filters I needed based on the attributes on the various children nodes and the parent node.
I am not sure if this is the best or proper approach, but it works for my use case. If there are corrections or optimizations to be made please let me know.
Thanks!
--- Update on Performance
I am currently testing in a graph with approximately 700000 documents and 2000000 edges. The filter conditions are added to the query dynamically based on user selections in a web-app and the performance of the query depends greatly on the filter conditions added. If there are no filter, or very broad filter conditions the query can take over a second to execute (on our test hardware). If the filter conditions are very restrictive the query can execute in milliseconds. However, the default, and most common use case is for the slower versions of the query. I am only working with a small selection of data, we expect the number of documents and edges to grow into the 10's of millions so performance as we scale up is very much a concern. I have currently segmented the database into multiple graphs to try and reduce the scope and volume of nodes/edges any individual query can scan, but have not yet identified other optimizations that I can make to allow the query to scale as the dataset scales. We are currently working on improving our data-import infrastructure to scale the dataset, but have not yet completed that effort so I don't yet have any numbers on performance on a database more representative of our expected configuration.

Resources