On starting my on Azure Stream Analytics (ASA) job I get several False Positives (FP) and I want to know what causes this.
I am trying to implement asset tracking in ASA as disccussed in another question. My specific use case is that I want to trigger events when an asset has not send a signal in the last 70 minutes. This works fine when the ASA job is running but triggers false positives on starting the job.
For example when starting the ASA-job at 2017-11-07T09:30:00Z. The ASA-job gives an entry with MostRecentSignalInWindow: 1510042968 (=2017-11-07T08:22:48Z) for name 'A'. while I am sure that there is another event for name 'A' with time: '2017-11-07T08:52:49Z' and one at '2017-11-07T09:22:49Z in the eventhub.
Some events arrive late due to the event ordering policy:
Late: 5 seconds
Out-of-order: 5 seconds
Action: adjust
I use the below query:
WITH
Missing AS (
SELECT
PreviousSignal.name,
PreviousSignal.time,
FROM
[signal-eventhub] PreviousSignal
TIMESTAMP BY
time
LEFT OUTER JOIN
[signal-eventhub] CurrentSignal
TIMESTAMP BY
time
ON
PreviousSignal.name= CurrentSignal.certname
AND
DATEDIFF(second, PreviousSignal, CurrentSignal) BETWEEN 1 AND 4200
WHERE CurrentSignal.name IS NULL
),
EventsInWindow AS (
SELECT
name,
max(DATEDIFF(second, '1970-01-01 00:00:00Z', time)) MostRecentSignalInWindow
FROM
Missing
GROUP BY
name,
TumblingWindow(minute, 1)
)
For anyone reading this, this was a confirmed bug in Azure Stream Analytics and has now been resolved.
Related
I currently have an alert setup for Data Factory that sends an email alert if the pipeline runs longer than 120 minutes, following this tutorial: https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/. So when a pipeline does in fact run longer than the expected time, I do receive an alert however, I am also getting additional & unexpected alerts.
My query looks like:
ADFPipelineRun
| where Status =="InProgress" // Pipeline is in progress
| where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
| where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes
I received an alert email on September 28th of course saying a pipeline was running longer than the 120 minutes but when trying to find the pipeline in the Azure Data Factory pipeline runs nothing shows up. In the alert email there is a button that says, "View the alert in Azure monitor" and when I go to that I can then press "View Query Results" above the shown query. Here I can re-enter the query above and filter the date to show all pipelines running longer than 120 minutes since September 27th and it returns 3 pipelines.
Something I noticed about these pipelines is the end time column:
I'm thinking that at some point the UTC time is not properly configured and for that reason, maybe the alert is triggered? Is there something I am doing wrong, or a better way to do this to avoid a bunch of false alarms?
To create Preemptive warnings for long-running jobs.
Create activity.
Click on blank space.
Follow path: Settings > Elapsed time metric
Refer Operationalize Data Pipelines - Azure Data Factory
I'm not sure if you're seeing false alerts. What you've shown here looks like the correct behavior.
You need to keep in mind:
Duration threshold should be offset by the time it takes for the logs to appear in Azure Monitor.
The email alert takes you to the query that triggered the event. Your query is only showing "InProgress" statues and so the End property is not set/updated. You'll need to extend your query to look at one of the other statues to see the actual duration.
Run another query with the RunId of the suspect runs to inspect the durations.
ADFPipelineRun
| where RunId == 'bf461c8b-0b1e-43c4-9cdf-7d9f7ccc6f06'
| distinct TimeGenerated, OperationName, RunId, Start, End, Status
For example:
I have a question about azure log analytics alerts, in that I don't quite understand how the time frame works within the context of setting up an alert based on an aggregated value.
I have the code below:
Event | where Source == "EventLog" and EventID == 6008 | project TimeGenerated, Computer | summarize AggregatedValue = count(TimeGenerated) by Computer, bin_at(TimeGenerated,24h, datetime(now()))
For time window : 24/03/2019, 09:46:29 - 25/03/2019, 09:46:29
In the above the alert configuration interface insights on adding the bin_at(TimeGenerated,24h, datetime(now())) so I add the function, passing the arguments for a 24h time period. If you are already adding this then what is the point of the time frame.
Basically the result I am looking for is capturing this event over a 24 hour period and alerting when the event count is over 2. I don't understand why a time window is also necessary on top of this because I just want to run the code every five minutes and alert if it detects more than two instances of this event.
Can anyone help with this?
AFAIK you may use the query something like shown below to accomplish your requirement of capturing the required event over a time period of 24 hour.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= any(EventID) by Computer, bin(TimeGenerated, 1s)
The '1s' in this sample query is the time frame with which we are aggregating and getting the output from Log Analytics workspace repository. For more information, refer https://learn.microsoft.com/en-us/azure/kusto/query/summarizeoperator
And to create an alert, you may have to go to Azure portal -> YOURLOGANALYTICSWORKSPACE -> Monitoring tile -> Alerts -> Manager alert rules -> New alert rule -> Add condition -> Custom log search -> Paste any of the above queries under 'Search query' section -> Type '2' under 'Threshold value' parameter of 'Alert logic' section -> Click 'Done' -> Under 'Action Groups' section, select existing action group or create a new one as explained in the below mentioned article -> Update 'Alert Details' -> Click on 'Create alert rule'.
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/action-groups
Hope this helps!! Cheers!! :)
To answer your question in the comments part, yes the alert insists on adding the bin function and that's the reason I have provided relevant query along with bin function by having '1s' and tried to explain about it in my previous answer.
If you put '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 1second. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
If you put '24h' instead of '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 24hours. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
So in this case, we should not be using '24h' in bin function along with 'any' aggregation because if we use it then we would see only one occurrence of output in 24hours of timespan and that doesn't help you to find out event occurrence count using the above provided query having 'any' for aggregation. Instead you may use 'count' aggregation instead of 'any' if you want to have '24h' in bin function. Then this query would look something like shown below.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= count(EventID) by Computer, bin(TimeGenerated, 24h)
The output of this query would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time, y and z are considered as some numbers.
One other note is, all the above mentioned queries and outputs are in the context of setting up an alert based on an aggregated value i.e., setting up an alert when opting 'metric measurement' under alert logic based on section. In other words, aggregatedvalue column is expected in alert query when you opt 'metric measurement' under alert logic based on section. But when you say 'you get a count of the events' that means If i am not wrong, may be you are opting 'number of results' under alert logic based on section, which would not required any aggregation column in the query.
Hope this clarifies!! Cheers!!
Some part of my project using Esper in Java for complex Event processing. I'm planning to replace Esper with Azure Stream Analytics.
Use Case: FTOD (First Ticket of the Day) & FTOP (First Ticket of Project)
I'm continuously getting ticket data from Eventhub and want to generate 2 types of alerts (FTOD & FTOP). I think thumblingWindow is the best fit for this scenario.
But I'm not able to pick first record in window. Any suggestion how to pick first record in 24 hours window?
Below is Esper query for FTOD
String statementQuery = "context context_" + plantIdStr
+ " select distinct * from TicketInfoComplete as ticket where plantId = '"
+ entry.getKey() + "' and ruleType='FTOD' output first every 24 hours";
Below is my incoming message data
[{"DeviceSerialNumber":"190203XXX001TEST","MessageTimestamp":"2019-02-11T13:46:08.0000000Z","PlantId":"141","ProjectId":"Mobitest","ProjectName":"Mobitest","TicketNumber":"84855","TicketDateTimeinUTC":"2019-02-11T13:46:08.0000000Z","AdditionalInfo":{"value123":"value2"},"Timeout":60000,"Traffic":1,"Make":"Z99","TruckMake":"Z99","PlantName":"RMZ","Status":"Valid","PlantMakeSerialNumber":"Z99|190203XXX001TEST","ErrorMessageJsonString":"[]","Timezone":"India Standard Time"}]
Based on your description, I think you could know about LAST operator with the GROUP BY condition. LAST allows one to look up the most recent event in an event stream within defined constraints.
In Stream Analytics, the scope of LAST (that is, how far back in
history from the current event it needs to look) is always limited to
a finite time interval, using the LIMIT DURATION clause. LAST can
optionally be limited to only consider events that match the current
event on a certain property or condition using the PARTITION BY and
WHEN clauses. LAST is not affected by predicates in WHERE clause, join
conditions in JOIN clause, or grouping expressions in GROUP BY clause
of the current query.
Please see the example in above document:
SELECT
LAST(TicketNumber) OVER (LIMIT DURATION(hour, 24))
FROM input
Just for summarized, the isFirst method need to be considered when you want to get the first item.
Exact Query What I have used after using IsFirst Method for FTOD & FTOP alert.
SELECT
DeviceSerialNumber,MessageTimestamp,PlantId,TruckId,ProjectId,ProjectName,
CustomerId,CustomerName,TicketNumber,TicketDateTimeinUTC,TruckSerialNumber,
TruckMake,PlantName,PlantMakeSerialNumber,Timezone,'FTOD' as alertType
INTO
[alertOutput]
FROM
[ticketInput]
where ISFIRST(mi, 2)=1
SELECT
DeviceSerialNumber,MessageTimestamp,PlantId,TruckId,ProjectId,ProjectName,
CustomerId,CustomerName,TicketNumber,TicketDateTimeinUTC,TruckSerialNumber,
TruckMake,PlantName,PlantMakeSerialNumber,Timezone,'FTOP' as alertType
INTO
[ftopOutput]
FROM
[ticketInput]
where ISFIRST(mi, 2) OVER (PARTITION BY PlantId) = 1
I've written a Streaming Analytics query to emit 2 date-time values: one from my stream and the other the 'ingest' date-time into Azure IOT / StreamingAnalytics. My stream's value is in UTC, but I'm finding that the 'ingest' date-time is offset from 1/1/1970, rather than Utc.Now.
This is my Streaming Analytics query:
SELECT
deviceId
,System.Timestamp as IngestTimeUTC
,date as GenerateTimeUTC
INTO
[YourOutputAlias]
FROM
MyDevice
Sample output:
DEVICEID ... INGESTTIME ... GENERATEDTIMEUTC
"myFirstDevice" ... "1970-01-01T12:01:01.0010000Z"..."2016-11-18T15:25:54.5660000Z"
How can I normalize ingest-time to UTC for 'today' ?
It looks like my above query does work as desired. I neglected to mention that I had been observing the output via the 'Test' option within the Azure Streaming Analytics portal. When I saved everything and actually ran the job ... I get the IngestTimeUTC data normalized in the proper way -- to UTC for 'today' as desired.
So ... the 'test' mechanism does have this inherent behavior with regard to System.Timestamp.
I'm trying to use ETW for logging with several custom EventSource classes in Azure SDK 2.6.
When testing locally with the compute/storage emulator, three of my custom WADMyEventXYZ tables show up; however, the final expected table "WADMyDataSets" never seems to be created. How should I determine what is causing this problem? I see no errors from the compute emulator when the debugger is attached and stepping through the code in the debugger shows that WriteEntry on the EventSource is definitely called. The other tables show up in SchemasTable in the developer storage account, but there is no entry there for WADMyDataSets.
I exported WADDiagnosticInfrastrureLogsTable into CSV and examined it in Excel and see the following messages that reference "MyDataSets":
Validating table MyDataSets; DiskMB:451; RequiredQuota:451 RetentionSeconds:7776000 Pri:2 MinQuotaMB:0 RunningTotal:3757
Table does not exist
table C:\Users\Caleb\AppData\Local\dftmp\Resources\b316f531-f673-4db3-ac1c-e4649e289871\WAD0104\Tables\MyDataSets does not exist, CreationDisposition = 4
Table MyDataSets does not exist, will create a new one
Delaying the creation of table MyDataSets until the schema is known
Later on:
Converted EventSource provider name "MyDataSets" to {74a2b9c9-0bd8-547f-6cad-453da47055be}
Matched task with query id MyDataSetsQuery and regex ^MyDataSets$ to source table MyDataSets
Registering query MyDataSetsQuery_MyDataSets_XTableWadAccount:
Adding standard PkRk (MA) fields to 'MyDataSetsQuery_MyDataSets'
Successfully compiled the query 'MyDataSetsQuery_MyDataSets'
Added task MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount from MyDataSets - Partitions:-1 Pri:normal TSPolicy:start StoreType:Central Repeat:2147483647 Timeout:3600s Deadline:300s DelayRange:0.00
Later on:
No checkpoint found for task MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount after time 2015-05-13T00:44:21.000Z; retry time out is 3600 seconds
First scheduled task for MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount is at 2015-05-13T01:44:00.000Z (plus a delay of 20s)
Later on:
Increasing query delay of task MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount from 20 to 40 seconds to introduce randomness to the upload schedule
Later on:
Starting scheduled task MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount from 2015-05-13T01:43:00.000Z to 2015-05-13T01:44:00.000Z; query delay 40 seconds
Table C:\Users\Caleb\AppData\Local\dftmp\Resources\b316f531-f673-4db3-ac1c-e4649e289871\WAD0104\Tables\MyDataSets does not exist
Ending scheduled task MyDataSetsQuery_MyDataSets_WADMyDataSets_PT1M_XTableWadAccount from 2015-05-13T01:43:00.000Z to 2015-05-13T01:44:00.000Z in 1ms
Update
The EventSource in question had one event on it:
[Event(1)]
public void DataSetLoaded(string traceActivityId, string userId, string reportCode, long timeToLoadMs)
Removing the fourth parameter "timeToLoadMs" resulted in the WAD event table showing up as expected. I tried changing the last parameter to a string, and it failed to show up again. Is there a documented limit on the number of parameters for an event method? I'm pretty sure I've seen samples that have four parameters.
I upgraded my web project to .NET 4.5.1 and now the WAD table shows up as expected (I had been running on just .NET 4.5 before this).
It would seem that there might be a bug with having 4 parameters on an EventSource event when using .NET 4.5.0.
As a side note, with 4.5.1, I now have the System.Diagnostics.Tracing.EventSource.SetCurrentThreadActivityId method which will let me get rid of manually including the CorrelationManager.ActivityId in my event output.
https://channel9.msdn.com/Series/ConnectOn-Demand/240 video released today says full support for Azure table logging for ETW eventsources.