Azure Stream Analytics input corrupts strings containing timezone info - azure

I am using Azure Event Hub for collection of timebased events. Connected Azure Stream Analytics (ASA) to it.
This results in losing the timezone info at ASA level.
What I ascertained is the following:
I have sent data in JSON format containing a string with a timestamp compatible with ISO 8601. e.g.:
"event_timestamp": "2016-09-02T19:51:38.657+02:00"
I checked by means of ServiceBus Explorer (thanks to the guys who wrote this tool) that this string arrived exactly as-is in Event Hub.
In the Stream Analytics I added the event hub as a input. When I use the option SAMPLE DATA in the Azure portal this result in data containing: "event_timestamp":"2016-09-02T17:51:38.6570000"
Why is Stream Analytics removing timezone info???
According to ISO 8601 not specifying a timezone in a timestamp means that de timestamp is converted to localtime. Does that mean the timezone where the Azure resource is running? How can I use geo-replication in that case?
This means that after consuming the data and presenting it in a dashboard all times are related to the time of the server where the stream analytic runs?
Do I need to add the timezone information seperately in the JSON payload and reconstruct it afterwards?
My conclusion is that actually ASA removes/destruct information from my data stream.
Imagine this ASA query: SELECT * INTO [myoutput] FROM [myinput]
This would change the content (*) of my data. All strings that appear to be a datetime with timezone info will be converted.
In my opinion this is very unwanted behaviour.
I am very interested in the opinions of others in this forum.

Everything in Azure runs in UTC Timezone, unless otherwise supported and explicitly configured (there are not many services which support setting timezone).
If you look at your quoted samples closely you will notice that the timestamp is converted to UTC in the ASA, that's why the TimeZone info is missing:
Sent to event hub: "event_timestamp": "2016-09-02T19:51:38.657+02:00"
Received in ASA: "event_timestamp":"2016-09-02T21:51:38.6570000"
Note that your event is sent in 19:51:38.657 +2:00 and ASA reads 21:51:38.6570000 which is absolutely the same.
UPDATE
I am not expert on ISO standard, but here are some exerpts from ASA Docu:
Azure Stream Analytics data types
datetime Defines a date that is combined with a time of day with
fractional seconds that is based on a 24-hour clock and relative to
UTC (time zone offset 0).
convertions:
datetime string converted to datetime following ISO 8601 standard
It is documented that date time is in UTC. Hence no need to explicitly specify it. Whether this comforts with the ISO I cannot tell, first because WikiPedia is not ISO Document, second because I am not ISO expert.

Related

Get difference between two dates in seconds using Time Series Expression Syntax in Azure Time Series Insights Explorer

I have an Event Hub that sends data to Time Series Insights, with the following message format:
{
"deviceId" : "Device1",
"time" : "2022-03-30T21:27:29Z"
}
I want to calculate the difference in seconds between the Event Hub EnqueuedTimeUtc property and time property.
I created a Time Series Insights with an Event Source without specifying the Timestamp property name, in that way in Time Series Insights our Timestamp ($ts) property will be the EnqueuedTimeUtc property of the Event.
Now with those two properties, using TSX (Time Series Expression Language), I want to do something like this:
$event.$ts - $event.time.DateTime
The problem I'm facing is that the result of that operation returns a DateTime, but in Time Series Expression there isn't a function to convert DateTime to Seconds, or to Unix Timestamp. Time Series Expresion Doc
Is there a way of achieving this using Time Series Insights and TSX (Time Series Expression)?
Thanks!
TSI is an depreciated service in Azure and there are not much features (inbuilt functions) available in it to explore data. Therefore, I suggest you to use Azure Data Explorer to work with the Event Hub Data.
Azure Data Explorer provides inbuild datetime_diff function which allows to calculate the period in many supported formats based on your requirement using simple Kusto Query Language.
datetime_diff(): Calculates calendarian difference between two datetime values.
Syntax:
datetime_diff(period,datetime_1,datetime_2)
Example:
second = datetime_diff('second',datetime(2017-10-30 23:00:10.100),datetime(2017-10-30 23:00:00.900))

data being overwritten when outputting data from stream analytics to powerbi

lately I've been playing around with Stream Analytics queries with PowerBI as output sink. I made a simple query which retrieves the total count of http responsecodes of our website requests over time and groups them by date and response code.
The input data is retrieved from a storage account which holds BLOB storage. This is my query:
SELECT
DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0) as datum,
request.ArrayValue.responseCode,
count(request.ArrayValue.responseCode)
INTO
[requests-httpresponsecode]
FROM
[cvweu-internet-pr-sa-requests] R TIMESTAMP BY R.context.data.eventTime
OUTER APPLY GetArrayElements(R.request) as request
GROUP BY DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0), request.ArrayValue.responseCode, System.TimeStamp
Since continuous export became active on 3 september 2018, I chose a job start time of 3 september 2018. Since I am interested in the statistics until today, I did not include a date interval so I am expecting to see data from 3 september 2018 until now (20 december 2018). The job is running fine without errors and I chose PowerBI as an output sink. Immediately I saw the chart being propagated starting from 3 september grouped by day and counting. So far, so good. A few days later I noticed the output dataset didnt start from 3 september anymore but from 2 December until now. Apparently data is being overwritten.
The following link says:
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
"defaultRetentionPolicy: BasicFIFO: Data is FIFO, with a maximum of 200,000 rows."
But my output table does not have close to 200.000 rows:
datum,count,responsecode
2018-12-02 00:00:00,332348,527387
2018-12-03 00:00:00,3178250,3282791
2018-12-04 00:00:00,3170981,4236046
2018-12-05 00:00:00,2943513,3911390
2018-12-06 00:00:00,2966448,3914963
2018-12-07 00:00:00,2825741,3999027
2018-12-08 00:00:00,1621555,3353481
2018-12-09 00:00:00,2278784,3706966
2018-12-10 00:00:00,3160370,3911582
2018-12-11 00:00:00,3806272,3681742
2018-12-12 00:00:00,4402169,3751960
2018-12-13 00:00:00,2924212,3733805
2018-12-14 00:00:00,2815931,3618851
2018-12-15 00:00:00,1954330,3240276
2018-12-16 00:00:00,2327456,3375378
2018-12-17 00:00:00,3321780,3794147
2018-12-18 00:00:00,3229474,4335080
2018-12-19 00:00:00,3329212,4269236
2018-12-20 00:00:00,651642,1195501
EDIT: I have created the STREAM input source according to
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal. I can create a REFERENCE input as well, but this invalidates my query since APPLY and GROUP BY are not supported and I also think STREAM input is what I want according to https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs.
What am I missing? Is it my query?
It looks like you are streaming to a Streaming dataset. Streaming datasets doesn't store the data in a database, but keeps only the last hour of data. If you want to keep the data pushed to it, then you must enable Historic data analysis option, when you create the dataset:
This will create PushStreaming dataset (a.k.a. Hybrid) with basicFIFO retention policy (i.e. about 200k-210k records kept).
You're correct that Azure Stream Analytics should be creating a "PushStreaming" or "Hybrid" dataset. Can you confirm that your dataset is correctly configured as "Hybrid" (you can check this attribute even after creation as shown here)?
If it is the correct type, can you please clarify the following:
Does the schema of your data change? If, for example, you send the datum {a: 1, b: 2} and then {c: 3, d: 4}, Azure Stream Analytics will attempt to change the schema of your table, which can invalidate older data.
How are you confirming the number of rows in the dataset?
Looks like my query was the problem. I had to use TUMBLINGWINDOW(day,1) instead of System.TimeStamp.
TUMBLINGWINDOW and System.TimeStamp produce exactly the same chart output on the frontend, but seem to be processed in a different way in the backend. This was not reflected to the frontend in any way so this was confusing. I suspect something is happening in the backend due to the way the query is processed when not using TUMBLINGWINDOW and you happen to hit the 200k row per dataset limit sooner than expected. The query below is the one which is producing the expected result.
SELECT
request.ArrayValue.responseCode,
count(request.ArrayValue.responseCode),
DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0) as date
INTO
[requests-httpstatuscode]
FROM
[cvweu-internet-pr-sa-requests] R TIMESTAMP BY R.context.data.eventTime
OUTER APPLY GetArrayElements(R.request) as request
GROUP BY DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0),
TUMBLINGWINDOW(day,1),
request.ArrayValue.responseCode
As we speak my stream analytics job is running smoothly and producing the expected output from 3 september until now without data being overwritten.

Azure stream analytics timestamp by format

When running stream analytics, I get an error message:
"Dropping events due to improper timestamps. Stream Analytics only
supports ISO8601 format for DateTime values"
I have tried the following formats:
2017-09-19T13:17:29.0111070Z
2017-09-19T13:17:29.123456
2017-09-19 13:17:29.123456
2017-09-19T13:17:29.123
2017-09-19 13:17:29.123
However, when I use the Test button on the query in Stream Analytics, the output comes out fine. Also, when I comment out the timestamp by clause, the query works, but the System.timestamp in the select statment will not return the correct time.
Is this a formatting issue or something else?
Firstly, as Vignesh Chandramohan mentioned, you can try to use CAST to convert the expression to DateTime, and check if it returns data conversion error that indicates any input data/value cannot cast to type 'datetime'.
Secondly, many factors can cause no output issue, for example: A where clause in the query filtered out their events that prevented outputs from being generated; Timestamp for events is before the job start time and therefore events are being dropped etc.
For detailed steps to debug with Azure Stream Analytics jobs, please check the Diagnose and solve problems on Azure portal or this article: Troubleshooting guide for Azure Stream Analytics.

How does time scan work in Azure Stream Analytics tumbling window and hopping window?

I've an Event Hub in Azure Cloud that takes messages where I have a timestamp value and other parameters.
The timestamp is aligned to stream analytics using the command
TIMESTAMP AS [TimeStamp]
This is the Stream Analytics query (the input is the Event Hub, the output in this case is a blob)
SELECT
DateAdd(minute, -1, System.Timestamp) as FromTimestamp, System.Timestamp as ToTimestamp,
[MachineType], [MachineNumber], [Part], [PartNumber], [ValueKind], AVG(Value) AS AverageValue
INTO
[blob-avg]
FROM
[input]
TIMESTAMP BY [TimeStamp]
WHERE [ValueKind]='RPM' OR [ValueKind]='CUR' OR [ValueKind]='POW'
GROUP BY [MachineType], [MachineNumber], [Part], [PartNumber], [ValueKind], SlidingWindow(minute, 1)
I think that the timestamp of the message will be considered as the timestamp to compare but, how it is scanned?
At the Utc time? Let me say, in the message I've a timestamp 12:00 (GMT+2), and the UTC now is 10.00
Does the tumbling consider the data arrived 2 hours ago instead of the actual? (at timestamp 10:00 (GMT+2)) (actually it seems to me something like that).
And what append if a message arrives with a delay greater than the 2 hours? let me say that a message arrives with one day of delay, will the tumbling be recalcoulated?
[Timestamp] column will be converted to datetime, if the format was GMT, it will be taken into account and it is safe to assume that everything will be converted to UTC when time related calculations are made.
Azure Stream Analytics is continuously reading data from the source. And late arrival policy + window decides how to handle late events.
Please have a look at
https://msdn.microsoft.com/en-us/library/azure/mt674682.aspx
and
https://blogs.msdn.microsoft.com/streamanalytics/2015/05/17/out-of-order-events/
for more details about out of order policies.
For your specific example, if the message arrives with a delay of greater than 2 hours and your late arrival policy is to drop events, the events will be dropped. If it is adjust, the timestamp will be adjusted to current processing time.

Unix time convert to Date in Azure stream analytics

I'm using stream analytics to process some RFID data in realtime. The events from the RFID reader is sending to event hub as an input. Right now I'm facing a problem that the time in events is in UNIX time format, which looks like "TimeStamp":1460471242.22402," It's very strange that when I test the query(Not start the job but use the sample data from input), the UNIX time changed to "2016-04-12T14:48:00.0000000Z" , but when I start the SA job, it failed and said that the column 'timestamp' doesn't conforms to ISO 8601 standard. Is there any way to convert UNIX time to standard date format in SA without change the input raw data?
My query is simple like:
SELECT
EPCValue, Antenna, System.TimeStamp AS Time
INTO
dataoutput
FROM
datainput timestamp by TimeStamp
Please take a look at the sample from this page. It describes how to convert UNIX time to SQL datetime format
https://msdn.microsoft.com/en-us/library/mt573293.aspx

Resources