Azure log ingestion TimeGenerated problem - azure

I have an Azure Log Analytics workspace and inside it I created a custom table to ingest some of my logs.
I used these two guides for it (mainly the first one):
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/tutorial-logs-ingestion-portal
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/tutorial-logs-ingestion-api
In my logs I have a field:
"Time": "2023-02-07 11:15:23.926060"
Using DCR, I create a field TimeGenerated like this:
source
| extend TimeGenerated = todatetime(Time)
| project-away Time
Everything works fine, I manage to ingest my data and query it with KQL.
The problem is that I can't ingest data with some older timestamp. If timestamp is current time or close to it, it works fine.
If my timestamp, let's say from two days ago, it overwrites it with current time.
Example of the log I send:
{
"Time": "2023-02-05 11:15:23.926060",
"Source": "VM03",
"Status": 1
}
The log I receive:
{
"TimeGenerated": "2023-02-07 19:35:23.926060",
"Source": "VM03",
"Status": 1
}
Can you tell why is it happening, why can't I ingest logs from several days ago and how to fix. The guides I used do not mention any of the sort at all, regrettably.

I've hit this limit once before, a long long time ago. Asked a question and got a response frome someone working on Application Insights and the response was that only data not older than 48h is ingested.
Nowadays AFAIK the same applies to Log Analytics, I am not sure the same limit of 48 hours stills stands but I think it is fair to assume some limit is still enforced and there is no way around it.
Back in the time I took my loss and worked with recent data only.

Related

How can I run a search job periodically in Azure Log Analytics?

I'm trying to visualize the browser statistics of our app hosted in Azure.
For that I'm using the nginx logs and run an Azure Log Analytics query like this:
ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend logEntry=parse_json(LogEntry)
| extend userAgent=parse_user_agent(logEntry.nginx.http_user_agent, "browser")
| extend browser=parse_json(userAgent)
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)
Then I'm getting this diagram, which is exactly what I want:
But the query is very very slow. If I set the time range to more than just the last few hours it takes several minutes or doesn't work at all.
My solution is to use a search job like this:
ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend d=parse_json(LogEntry)
| extend user_agent=parse_user_agent(d.nginx.http_user_agent, "browser")
| extend browser=parse_json(user_agent)
It creates a new table BrowserStats_SRCH on which I can do this search query:
BrowserStats_SRCH
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)
This is much faster now and only takes some seconds.
But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible? Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".
In the end I would like to have a deeplink to the pie chart with the browser stats without the need to do any further click. Any help would be appreciated.
But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible?
You can leverage the api to create a search job. Then use a timer triggered azure function or logic app to call that api on a schedule.
PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-00000000000/resourcegroups/testRG/providers/Microsoft.OperationalInsights/workspaces/testWS/tables/Syslog_suspected_SRCH?api-version=2021-12-01-preview
with a request body containing the query
{
"properties": {
"searchResults": {
"query": "Syslog | where * has 'suspected.exe'",
"limit": 1000,
"startSearchTime": "2020-01-01T00:00:00Z",
"endSearchTime": "2020-01-31T00:00:00Z"
}
}
}
Or you can use the Azure CLI:
az monitor log-analytics workspace table search-job create --subscription ContosoSID --resource-group ContosoRG --workspace-name ContosoWorkspace --name HeartbeatByIp_SRCH --search-query 'Heartbeat | where ComputerIP has "00.000.00.000"' --limit 1500 --start-search-time "2022-01-01T00:00:00.000Z" --end-search-time "2022-01-08T00:00:00.000Z" --no-wait
Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".
Before you start the job as described above, remove the old result table using an api call:
DELETE https://management.azure.com/subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}/tables/{tableName}?api-version=2021-12-01-preview
Optionally, you could check the status of the job using this api before you delete it to make sure it is not InProgress or Deleting

Azure Logs does not have all data

I have an Azure Function and all calls I can see:
but when I go to "Logs" and try the following query:
traces
| project
timestamp,
message,
operation_Name,
operation_Id,
cloud_RoleName
| where cloud_RoleName =~ 'FunctionDeviceManager' and operation_Name =~ 'FunctionAlertServiceCallback'
| order by timestamp desc
| take 2000
I see the following result:
as we can see, many calls (for example, with id: 95ecc6d554d78fa34534813efb82abba, 29b613056e582666c132de6ff73b2c2e, 29b613056e582666c132de6ff73b2c2e and many others, most of them) are not displayed in the result.
What is wrong?
The invocation log is not based on data in the traces collection. Instead, it is based on request data. You can easily see so by choosing Run query in Application Insights
It runs this query
requests
| project
timestamp,
id,
operation_Name,
success,
resultCode,
duration,
operation_Id,
cloud_RoleName,
invocationId=customDimensions['InvocationId']
| where timestamp > ago(30d)
| where cloud_RoleName =~ 'xxx' and operation_Name =~ 'yyy'
| order by timestamp desc
| take 20
So that explains the difference in the result.
Now, regarding the cause of why the traces collection doesn't always contain data related to the request: per default, all types of telemetry are subject to sampling if not specified in the host.json file, see the docs.
For example, when you create a new http triggered function using Visual Studio 2022 the following host.json is added
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
}
}
As you can see request telemetry is excluded from the types of telemetry being sampled. This can cause the issue you are experiencing: the request collection is not sampled, the traces collection is. Hence some data is not in the result list of your query.
Most likely this is the effect of sampling. Unless you have tweaked your Function App config in host.json some executions are skipped in log. As per MS documentation:
Application Insights has a sampling feature that can protect you from
producing too much telemetry data on completed executions at times of
peak load. When the rate of incoming executions exceeds a specified
threshold, Application Insights starts to randomly ignore some of the
incoming executions. The default setting for maximum number of
executions per second is 20 (five in version 1.x). You can configure
sampling in host.json. Here's an example:
See also: https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling

Transcribing Splunk's "transaction" Command into Azure Log Analytics / Azure Data Analytics / Kusto

We're using AKS and have our container logs writing to Log Analytics. We have an application that emits several print statements in the container log per request, and we'd like to group all of those events/log lines into aggregate events, one event per incoming request, so it's easier for us to find lines of interest. So, for example, if the request started with the line "GET /my/app" and then later the application printed something about an access check, we want to be able to search through all the log lines for that request with something like | where LogEntry contains "GET /my/app" and LogEntry contains "access_check".
I'm used to queries with Splunk. Over there, this type of inquiry would be a cinch to handle with the transaction command:
But, with Log Analytics, it seems like multiple commands are needed to pull this off. Seems like I need to use extend with row_window_session in order to give all the related log lines a common timestamp, then summarize with make_list to group the lines of log output together into a JSON blob, then finally parse_json and strcat_array to assemble the lines into a newline-separated string.
Something like this:
ContainerLog
| sort by TimeGenerated asc
| extend RequestStarted= row_window_session(TimeGenerated, 30s, 2s, ContainerID != prev(ContainerID))
| summarize logLines = make_list(LogEntry) by RequestStarted
| extend parsedLogLines = strcat_array(parse_json(logLines), "\n")
| where parsedLogLines contains "GET /my/app" and parsedLogLines contains "access_check"
| project Timestamp=RequestStarted, LogEntry=parsedLogLines
Is there a better/faster/more straightforward way to be able to group multiple lines for the same request together into one event and then perform a search across the contents of that event?
After reading your question, there is no such an easy way to do that in azure log analytics.
If the logs are in this format, you need to do some other work to meet your requirement.

PowerBI MicrosoftAzureConsumptionInsights - changing query to get more than 2 months

I am using PowerBI to analyze Cost data from Azure. I am making a direct connection and pulling in data by: opening PowerBI | Get Data | Online Services | Microsoft Azure Consumption Insights (Beta) This works, however, I am only able to see two months of data and ideally I'd like to see 6. After a lot of searching the general consensus from other users seems to be using the advanced editor to tweak the query by adding "optionalParameters" and specifying the number of months... I came across a few other sites where users were experiencing the same issue but the suggestions didn't work. I am hoping someone here can point me in the right direction.
I'm going to post the query string and below that list out the URLs containing suggestions I've already tried.
let
enrollmentNumber = "xxxxxxx",
optionalParameters = [ numberOfMonth = 6, dataType="DetailCharges" ],
Source = MicrosoftAzureConsumptionInsights.Tables(enrollmentNumber, optionalParameters),
usagedetails = Source{[Key="usagedetails"]}[Data],
#"Parsed JSON" = Table.TransformColumns(usagedetails,{{"Tags", Json.Document}}),
#"Expanded Tags" = Table.ExpandRecordColumn(#"Parsed JSON", "Tags", {"environment", "application", "costCenter", "owner"}, {"Tags.environment", "Tags.application", "Tags.costCenter", "Tags.owner"})
in
#"Expanded Tags"
https://community.powerbi.com/t5/Desktop/Azure-consumption-insights-get-more-than-two-month-usage-details/td-p/541413
https://community.powerbi.com/t5/Desktop/Power-BI-desktop-and-getting-multiple-months-in-one-row-from-the/td-p/50585
https://community.powerbi.com/t5/Desktop/Extend-the-Azure-consupmtion-data/td-p/444508
https://learn.microsoft.com/en-us/power-bi/desktop-connect-azure-consumption-insights
I opened a case and now have a solution to this problem.
First - The version of PowerBI had to be upgraded to the latest version. The version I upgraded to is: Version: 2.75.5649.961 64-bit (November 2019)
Second - Microsoft Azure Consumption Insights is being phased out and is being replaced by Azure Cost Management. This won't work with Consumption insights.
I was able to increase the number of months by:
Get Data | Azure | Azure Cost Management
Fill in the required information specifying number of months.
Get Data Dialog

Get oldest records from Application Insights REST API using ODATA

We want to get the oldest available events in Application insights, but we always get the last events even if we order by timestamp. It´s only the result that's ordered.
https://api.applicationinsights.io/v1/apps/[id]/events/customEvents?$count=true&$filter=timestamp
gt 2000-01-01T00:00:00&$select=timestamp, user/id,
customEvent/name&$orderBy=timestamp asc&$top=100
please use this kind of query, it works at my side:
https://api.applicationinsights.io/v1/apps/your_app_id/query?query=customEvents | where timestamp >= datetime('2018-12-11T00:00:00.000')| project timestamp ,name| order by timestamp asc| take 20
And the test result with postman:

Resources