Compare ingest-time to generated-time for an IOT Stream

Compare ingest-time to generated-time for an IOT Stream - azure

I've written a Streaming Analytics query to emit 2 date-time values: one from my stream and the other the 'ingest' date-time into Azure IOT / StreamingAnalytics. My stream's value is in UTC, but I'm finding that the 'ingest' date-time is offset from 1/1/1970, rather than Utc.Now.
This is my Streaming Analytics query:
SELECT
deviceId
,System.Timestamp as IngestTimeUTC
,date as GenerateTimeUTC
INTO
[YourOutputAlias]
FROM
MyDevice
Sample output:
DEVICEID ... INGESTTIME ... GENERATEDTIMEUTC
"myFirstDevice" ... "1970-01-01T12:01:01.0010000Z"..."2016-11-18T15:25:54.5660000Z"
How can I normalize ingest-time to UTC for 'today' ?

It looks like my above query does work as desired. I neglected to mention that I had been observing the output via the 'Test' option within the Azure Streaming Analytics portal. When I saved everything and actually ran the job ... I get the IngestTimeUTC data normalized in the proper way -- to UTC for 'today' as desired.
So ... the 'test' mechanism does have this inherent behavior with regard to System.Timestamp.

Related

send datetime with offset field in Stream Analytics

I'm trying to send a Timestamp field which is ISO 8601 with offset (
"2023-02-01T11:11:12.2220000+03:00" )
Azure doesn't really work with offsets, I first encountered that when sending data to event hub.
I was hoping to resolve this by splitting timestamp field into 2 fields:
timestamp: 2023-02-01T11:11:12.2220000
offset: +03:00
and combining them is SA query.
This seemed to have worked in Query editor, where test output is shown as a correct timestamp+offset
however when data is sent to output (in this case SQL, field type datetimeoffset), value looks like this:
2023-02-01T08:11:12.2220000+00:00
I suspect this is because timestamp field type in SA is datetime (seen in query explorer test results window)
even if I cast to nvarchar field type is still datetime.
is there a way to force SA to use specific types for fields (in this case, treat field as a string and not datetime)?
or, in general, how pass value like "2023-02-01T11:11:12.2220000+03:00" through SA without altering it? bonus points if it can be done in Event Hub as well

How to Separate Data from Multiple Devices on Microsoft Azure Stream Analytics

I am currently trying to connect 2 different devices to the IoT Hub, and I need to separate the data from each device. In order to do so, I tried configuring my stream analytics query like this:
SELECT
deviceId, temperature, humidity, CAST(iothub.EnqueuedTime AS datetime) AS event_date
INTO
NodeMCUOutput
FROM
iothubevents
WHERE
deviceId = "NodeMCU1"
However, for some reason, the output is not shown if the WHERE statement is in the code (the outputs are shown without it, but the data is not filtered). I need the WHERE statement in order to sort the data the way I want it. Am I missing something? Are there any solutions to this? Thanks a lot. Cheers!

The device ID and other properties that are not in the message itself are included as metadata on the message. You can read that metadata using the GetMetadataPropertyValue() function. This should work for you:
SELECT
GetMetadataPropertyValue(iothubevents, 'IoTHub.ConnectionDeviceId') as deviceId,
temperature,
humidity,
CAST(GetMetadataPropertyValue(iothubevents, 'IoTHub.EnqueuedTime') AS datetime) AS event_date
INTO
NodeMCUOutput
FROM
iothubevents
WHERE
GetMetadataPropertyValue(iothubevents, 'IoTHub.ConnectionDeviceId') = 'NodeMCU1'

I noticed you use a double quote in the WHERE clause.
You need a simple quote to get a match on strings. In this case it will be
WHERE deviceId = 'NodeMCU1'
If the deviceId is the one from IoT Hub metadata, Matthijs answer will help you to retrieve it.

filtering in Azure SAQL if an element does not exist in the input data

I am trying to write an SAQL on the data which is coming from event Hub in json format.
The input to the azure Stream Analytics job is as shown below.
{"ver":"2019-12-28 18:41:45.4184730","Data":"Data01","d":{"IDNUM":"XXXXX01","Time1":"2017-12-20T00:00:00.0000000Z","abc":"610000","efg":"0000","XYZ":"00000","ver":"2017-12-20T18:41:45.4184730Z"}}
{"ver":"2019-12-28 18:41:45.4184730","Data":"Data01","d":{"IDNUM":"XXXXX02","Time1":"2017-12-20T00:00:00.0000000Z","abc":"750000","efg":"0000","XYZ":"90000","ver":"2017-12-20T18:41:45.4184730Z"}}
{"ver":"2017-01-01 06:28:52.5041237","Data":"Data02","d":{"IDNUM":"XXXXX03","acc":-10.7000,"PQR":35.420639038085938,"XYZ":139.95817565917969,"ver":"2017-01-01T06:28:52.5041237Z"}}
{"ver":"2017-01-01 06:28:52.5041237","Data":"Data02","d":{"IDNUM":"XXXXX04","acc":-8.5999,"PQR":35.924240112304688,"XYZ":139.6097412109375,"ver":"2017-01-01T06:28:52.5041237Z"}}
In the first two rows, the attribute Time1 is available where as in last two rows Time1 attribute itself is not present.
I have to store the data into cosmos DB based on the Time1 attribute in the input data.
Path in json data >>> input.d.Time1.
I have to store data which are having Time1 into a cosmosDB container and data which are not having Time1 into another container.
I tried with the below SAQL.
SELECT [input].ver,
[input].Data,
d.*
INTO [cosmosDB01]
FROM [input] PARTITION BY PartitionId
WHERE [input].Data is not null
AND [input].d.Time1 is not null
SELECT [input].ver,
[input].Data,
d.*
INTO [cosmosDB01]
FROM [input] PARTITION BY PartitionId
WHERE [input].Data is not null
AND [input].d.Time1 is null
Is there any other ways like IS EXISTS keyword in stream analytics query ?

Per my knowledge,there is no is_exists or is_defined sql built-in keyword in ASA so far. You have to follow the way you mentioned in the question to deal with multiple outputs scenario.
(Similar case:Azure Stream Analytics How to handle multiple output table?)
Surely,you could submit feedback to ASA team to push the progress of ASA.

Is there any function for adding timestamp to our message before storing in cassandra using kafka-cassandra sink connector?

I'm setting up my logserver. I'm forwarding logs using Fluentd to Kafka and then storing them in Cassandra for later use. For this I'm using kafka-cassandra sink connector. I have to store data chronologically for which I need to add timestamp to my messages in cassandra. How can this be done?
Datamountaineer connector uses kcql which i think doesn't support inserting timestamp to a log.
My connector configuration is as follows:
name=cassandra-sink
connector.class=com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector
tasks.max=1
topics=test_AF1
connect.cassandra.kcql=INSERT INTO test_event1 SELECT now() as id, message as msg FROM test_AF1 TIMESTAMP=sys_time()
connect.cassandra.port=9042
connect.cassandra.contact.points=localhost
connect.cassandra.key.space=demo

Kafka Connect's Single Message Transform can do this. Here's an example:
{
"connector.class": "com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector",
"topics": "test_AF1",
…
"transforms": "addTS",
"transforms.addTS.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addTS.timestamp.field": "op_ts"
}'
This adds a field to the message payload called op_ts with the timestamp of the Kafka message.
I don't know how this interacts with KCQL; you might want to check out the other two Cassandra sinks that I'm aware of:
https://www.confluent.io/hub/confluentinc/kafka-connect-cassandra
https://www.confluent.io/hub/datastax/kafka-connect-dse

Azure Stream Analytics job triggers False Positives missing assets on job start

On starting my on Azure Stream Analytics (ASA) job I get several False Positives (FP) and I want to know what causes this.
I am trying to implement asset tracking in ASA as disccussed in another question. My specific use case is that I want to trigger events when an asset has not send a signal in the last 70 minutes. This works fine when the ASA job is running but triggers false positives on starting the job.
For example when starting the ASA-job at 2017-11-07T09:30:00Z. The ASA-job gives an entry with MostRecentSignalInWindow: 1510042968 (=2017-11-07T08:22:48Z) for name 'A'. while I am sure that there is another event for name 'A' with time: '2017-11-07T08:52:49Z' and one at '2017-11-07T09:22:49Z in the eventhub.
Some events arrive late due to the event ordering policy:
Late: 5 seconds
Out-of-order: 5 seconds
Action: adjust
I use the below query:
WITH
Missing AS (
SELECT
PreviousSignal.name,
PreviousSignal.time,
FROM
[signal-eventhub] PreviousSignal
TIMESTAMP BY
time
LEFT OUTER JOIN
[signal-eventhub] CurrentSignal
TIMESTAMP BY
time
ON
PreviousSignal.name= CurrentSignal.certname
AND
DATEDIFF(second, PreviousSignal, CurrentSignal) BETWEEN 1 AND 4200
WHERE CurrentSignal.name IS NULL
),
EventsInWindow AS (
SELECT
name,
max(DATEDIFF(second, '1970-01-01 00:00:00Z', time)) MostRecentSignalInWindow
FROM
Missing
GROUP BY
name,
TumblingWindow(minute, 1)
)

For anyone reading this, this was a confirmed bug in Azure Stream Analytics and has now been resolved.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compare ingest-time to generated-time for an IOT Stream - azure

Related

send datetime with offset field in Stream Analytics

How to Separate Data from Multiple Devices on Microsoft Azure Stream Analytics

filtering in Azure SAQL if an element does not exist in the input data

Is there any function for adding timestamp to our message before storing in cassandra using kafka-cassandra sink connector?

Azure Stream Analytics job triggers False Positives missing assets on job start

Categories

Resources