Azure Stream Analytics and Event Order - azure

I am trying to set up a ASA job. My use case is that, I receive telemetry data from vehicles in the field. They transmit various attributes of the vehicle. e.g. Engine currently on or off
This data goes to IoT Hub and then ASA consumes it.
My problem is with the inputs coming out of sequence. For example, in the above diagram #9 came before #8.
He is my ASA Query
With AllSpeeder AS {
SELECT
telemetry.deviceId as vehicleid,
telemetry.enginestatus as currentenginestatus,
telemetry.datetime as currenttime,
Last(telemetry.containerdt) over (partition by telemetry.vehicleid limit duration(day,7)
when (telemetry.enginestatus = 'On2Off')) as engineontime
FROM
theiot
Timestamp by cast ( telemetry.containerdt as datetime)
where
telemetry.enginestatus = 'Off' or telemetry.enginestatus = 'On2Off'
}
SELECT * INTO theblob FROM AllSpeeder
But the above query (TimeStamp by) did not fix it.
I experimented with Event Order in ASA But still 8 and 9 above are not getting reordered.

Related

Azure Stream analytics how to write window function when getting the data from different devices and all devices have diff fields/properties

I have telemetry coming to IOT hub from 16 devices and all devices are of different types. I want to apply window function to get the average of the last 10mins data. Since each device will have very different parameters I am unable to define a query. The device parameters are
Voltage L1
Voltage L2
Voltage L3
Current L1
Current L2
Current L3
Power
Consumer Energy
Main_Tank_Oil_Level
Calibration_Level
Temperature
Actual Pressure
Actual_Flow
Main_Tank_Oil_Level
Calibration_Level
Temperature
Actual Pressure
Oil_Flow
and each device have multiple parameters.
How do we apply window function in such scenario
From the comments, I understand you actually have multiple logical pipelines, inside a single physical stream. You have multiple device types, requiring each a different Azure Stream Analytics (ASA) query. They all share the same input event/iot hub.
There are 3 approaches here.
1 - Multiple ASA jobs
You can create multiple ASA jobs, all reading from the same source event/iot hub. Each one will have a different query addressing a specific device type. The first step of each query will be to filter the stream to look only at the device type they're scoped on (WHERE deviceType = 1 or WHERE VoltageL1 IS NOT NULL).
They will have a different output schema. So you can either target a single output across them, but it will need to be a "schema-less" service like event hub, cosmos db, storage account... Or you can output to different destinations if you're targeting a strongly typed storage component (different tables in a SQL database).
The key here is to define and use a different consumer groups for each ASA jobs in the event hub.
2 - Single job, independent queries
To reduce cost, and if the volume of data allows it, you can put all your queries into a single ASA job. It will look something like this:
--- This step concentrates reads on the input source to reduce consumer group exhaustion
WITH SingleSource AS (
SELECT * FROM MyInput
)
SELECT
DeviceId,
DeviceType,
System.Timestamp() AS WindowEndtime,
AVG(VoltageL1) as AvgVoltageL1,
AVG(VoltageL2) as AvgVoltageL2
INTO MyOutput
FROM SingleSource
WHERE DeviceType = 1 -- (or something like VoltageL1 IS NOT NULL)
GROUP BY DeviceId, DeviceType, TUMBLINGWINDOW(minute,10)
SELECT
DeviceId,
DeviceType,
System.Timestamp() AS WindowEndtime,
AVG(Main_Tank_Oil_Level) as AvgMain_Tank_Oil_Level,
AVG(Calibration_Level_Temperature) as AvgCalibration_Level_Temperature
INTO MyOutput
FROM SingleSource
WHERE DeviceType = 2 -- (or something like Main_Tank_Oil_Level IS NOT NULL)
GROUP BY DeviceId, DeviceType, TUMBLINGWINDOW(minute,10)
...
That's if you want to output everything into the same destination (MyOutput here). Same as for multiple jobs, you can always create multiple outputs and vary the INTO clause.
3 - Single job, conforming query
If you also want to process the data so everything fits on a single schema, you can add a final step to do so:
--- This step concentrates reads on the input source to reduce consumer group exhaustion
WITH SingleSource AS (
SELECT * FROM MyInput
),
Type1 AS (
SELECT
DeviceId,
DeviceType,
System.Timestamp() AS WindowEndtime,
AVG(VoltageL1) as AvgVoltageL1,
AVG(VoltageL2) as AvgVoltageL2
INTO MyOutput
FROM SingleSource
WHERE DeviceType = 1 -- (or something like VoltageL1 IS NOT NULL)
GROUP BY DeviceId, DeviceType, TUMBLINGWINDOW(minute,10)
),
Type2 AS (
SELECT
DeviceId,
DeviceType,
System.Timestamp() AS WindowEndtime,
AVG(Main_Tank_Oil_Level) as AvgMain_Tank_Oil_Level,
AVG(Calibration_Level_Temperature) as AvgCalibration_Level_Temperature
INTO MyOutput
FROM SingleSource
WHERE DeviceType = 2 -- (or something like Main_Tank_Oil_Level IS NOT NULL)
GROUP BY DeviceId, DeviceType, TUMBLINGWINDOW(minute,10)
),
...
AllRecords AS (
SELECT DeviceId, DeviceType,WindowEndtime, AvgVoltageL1, AvgVoltageL2, NULL AS AvgMain_Tank_Oil_Level, NULL AS AvgCalibration_Level_Temperature FROM Type1
UNION ALL
SELECT DeviceId, DeviceType,WindowEndtime, NULL AS AvgVoltageL1, NULL AS AvgVoltageL2, AvgMain_Tank_Oil_Level, AvgCalibration_Level_Temperature FROM Type2
...
)
SELECT * INTO MyOutput FROM AllRecords
That's the best option if you share some common metrics and need to put that into a single SQL table.

Azure Stream Analytics - How to save data for more than one IoT device in Azure SQL db

I am able to save data for one running IoT device in Azure SQL DB but not able to understand how to save data in Azure SQL DB for data coming from more than 1 IoT device using stream analytics, the data can be different.
For example -
Device1 may produce temperature, humidity
and Device2 may produce torque, pressure
I have used the stream analytics preview feature in Azure SQL DB. I didn't want to create a schema for the table because I'm not sure what kind of data will be coming from different IoT devices.
The most basic way to proceed here is the following one:
WITH
TempInput AS (SELECT DeviceId, 'Temperature' AS sensorName, Temperature AS sensorValue FROM Input WHERE DeviceType = 'A'),
HumidityInput AS (SELECT DeviceId, 'Humidity' AS sensorName, Humidity AS sensorValue FROM Input WHERE DeviceType = 'A'),
TorqueInput AS (SELECT DeviceId, 'Torque' AS sensorName, Torque AS sensorValue FROM Input WHERE DeviceType = 'B'),
PressureInput AS (SELECT DeviceId, 'Pressure' AS sensorName, Pressure AS sensorValue FROM Input WHERE DeviceType = 'B')
UnionOutput AS (
SELECT DeviceId, sensorName, sensorValue FROM TempInput
UNION
SELECT DeviceId, sensorName, sensorValue FROM HumidityInput
UNION
SELECT DeviceId, sensorName, sensorValue FROM TorqueInput
UNION
SELECT DeviceId, sensorName, sensorValue FROM PressureInput
)
SELECT *
INTO Output
FROM UnionOutput
So this scenario is supported, but the query pattern above can't really be generalized until we know more about the actual schemas involved. We have smarter ways to parse and pivot data that may be applicable in your specific case. It's important to have more details both in terms of input (CSV, JSON, how are fields nested or not, arrays) and output. We have a good doc on general patterns, but also parsing JSON data that can help.

Azure Stream Analytics Reference Input Join

0
I am trying to use following query to join stream input (deviceinput) and reference input (refinputpdjson):
Following are my Inputs for Stream Analytics Job:
Input 1: Stream Input from IoT Hub
Input 2: Reference Data from Azure Blob Storage
SELECT
din.EventProcessedUtcTime,
din.deviceid as streamdeviceid,
din.heartrate as streamheartrate,
refin.deviceid as refdeviceid,
refin.patientid as refpatientid
FROM
deviceinput din
TIMESTAMP BY EventProcessedUtcTime
LEFT OUTER JOIN
refinputpdjson refin
ON din.deviceid = refin.deviceid
but its failing with following reasons:
The join predicate is not time bounded. JOIN operation between data streams requires specifying max time distances between matching events. Please add DATEDIFF to the JOIN condition. Example: SELECT input1.a, input2.b FROM input1 JOIN input2 ON DATEDIFF(minute, input1, input2) BETWEEN 0 AND 10
As error shows,
JOIN operation between data streams requires specifying max time
distances between matching events.
So you can try something like this:
SELECT
din.EventProcessedUtcTime, din.deviceid as streamdeviceid, din.heartrate as streamheartrate, refin.deviceid as refdeviceid, refin.patientid as refpatientid
FROM deviceinput din TIMESTAMP BY EventProcessedUtcTime
LEFT OUTER JOIN refinputpdjson refin TIMESTAMP BY EventEndUtcTime
ON din.deviceid = refin.deviceid
AND DATEDIFF(minute,din,refin) BETWEEN 0 AND 15
More detail, you can refer to this documentation.
It was a Azure service bug. Its resolved by MS Team.
Its working fine now.
Thank you,
Kamlesh Khollam

JOIN in Azure Stream Analytics

I have a requirement to validate the values of one column with a master data in stream analytics.
I have written queries to fetch some data from a blob location and One of the column value should be validated against a master data available in another blob location.
Below is the SAQL I tried. signals1 is the master data in blob and signals2 is the data processed and to be validated:
WITH MASTER AS (
SELECT [signals1].VAL as VAL
FROM [signals1]
)
SELECT
ID,
VAL,
SIG
INTO [output]
FROM signals2
I have to check the VAL from signals2 to be validated against VAL in signals1.
If the VAL in signals2 is there in signals1, then we should write to output.
If the VAL in signals2 is not there in signals1, then that doc should be ignored(should not write into output).
I tried with JOIN and WHERE clause, but not working as expected.
Any leads, how to achieve this using JOIN or WHERE?
In case your Signal1 data is the reference input, and Signal2 is the streaming input, you can use something like the following query:
with signals as (select * from Signal2 I join Signal1 R ON I.Val = R.Val)
select * into output from signals
I tested this query locally, and I assumed that your reference data(Signal1) is in the format:
[
{
"Val":"123",
"Data":"temp"
},
{
"Val":"321",
"Data":"humidity"
}
]
And for example, your Signal2 - the streaming input is:
{
"Val":"123",
"SIG":"k8s23kk",
"ID":"1234589"
}
Have a look at this query and data samples to see if it can guide you towards the solution.
Side note you cannot use this join in case that Signal1 is the streaming data. The way these types of joins are working is that you have to use time-windowing. Without that is not possible.

Syntax issue in Stream Analytics Query running in Azure: Invalid column name: 'payload'

I am having a syntax issue with my stream analytics query. Following is my Stream Analytics query, where i am trying to get following fields from the events:
Vehicle Id
Diff of previous and current fuel level (for each
vehicle),
Diff of current and previous odometer value (for each
vehicle).
NON-WORKING QUERY
SELECT input.vehicleId,
FUEL_DIFF = LAG(input.Payload.FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1)) - input.Payload.FuelLevel,
ODO_DIFF = input.Payload.OdometerValue - LAG(input.Payload.OdometerValue) OVER (PARTITION BY input.vehicleId LIMIT DURATION(minute, 1))
from input
Following is one sample input event on which the above query/job is ran on the series of events:
{
"IoTDeviceId":"DeviceId_1",
"MessageId":"03494607-3aaa-4a82-8e2e-149f1261ebbb",
"Payload":{
"TimeStamp":"2017-01-23T11:16:02.2019077-08:00",
"FuelLevel":19.9,
"OdometerValue":10002
},
"Priority":1,
"Time":"2017-01-23T11:16:02.2019077-08:00",
"VehicleId":"MyCar_1"
}
Following syntax error is thrown when the Stream Analytics job is ran:
Invalid column name: 'payload'. Column with such name does not exist.
Ironically, the following query works just fine:
WORKING QUERY
SELECT input.vehicleId,
FUEL_DIFF = LAG(input.Payload.FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(second, 1)) - input.Payload.FuelLevel
from input
The only diffrence between WORKING QUERY and NON-WORKING QUERY is number of LAG constructs used. The NON-WORKING QUERY has two LAG constructs, while WORKING QUERY has just one LAG construct.
I have referred Stream Analytics Query Language, they only have basic examples. Also tried looking into multiple blogs. In addition, I have tried using GetRecordPropertyValue() function, but no luck. Kindly suggest.
Thank you in advance!
This looks like a syntax bug indeed. Thank you for reporting - we will fix it in the upcoming updates.
Please consider using this query as a workaround:
WITH Step1 AS
(
SELECT vehicleId, Payload.FuelLevel, Payload.OdometerValue
FROM input
)
SELECT vehicleId,
FUEL_DIFF = LAG(FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1)) - FuelLevel,
ODO_DIFF = OdometerValue - LAG(OdometerValue) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1))
from Step1

Resources