I have a Azure Stream Analytics job that receives some raw events, transforms them and then writes them to some different outputs. I get the following error:
Maximum Event Hub receivers exceeded. Only 5 receivers per partition are allowed.
Please use dedicated consumer group(s) for this input. If there are multiple queries using same input, share your input using WITH clause.
This is weird, because I use a common table expression (the WITH clause) at the beginning to get all the data, and then I don't access the event hub anymore. Here is the query:
WITH
ODSMeasurements AS (
SELECT
collectionTimestamp,
analogValues,
digitalValues,
type,
translationTable
FROM EventhubODSMeasurements
),
-- Combine analog and digital measurements
CombineAnalogAndDigital AS (
SELECT
CAST(CONCAT(SUBSTRING(ODS.collectionTimestamp, 1, 10), ' ', SUBSTRING(ODS.collectionTimestamp, 12, 12)) AS datetime) AS "TimeStamp",
ROUND(AV.PropertyValue.value / (CAST(TT.ConversionFactor AS float)), 5) AS "ValueNumber",
NULL AS "ValueBit",
CAST(TT.MeasurementTypeId AS bigint) AS "MeasurementTypeId",
TT.MeasurementTypeName AS "MeasurementName",
TT.PartName AS "PartName",
CAST(TT.ElementId AS bigint) AS "ElementId",
TT.ElementName AS "ElementName",
TT.ObjectName AS "ObjectName",
TT.LocationName AS "LocationName",
CAST(TT.TranslationTableId AS bigint) AS "TranslationTableId",
ODS.Type AS "Status"
FROM ODSMeasurements ODS
CROSS APPLY GetRecordProperties(analogValues) AS AV
INNER JOIN SQLTranslationTable TT
ON
TT.Tag = AV.PropertyName AND
CAST(TT.Version as bigint) = ODS.translationTable.version AND
TT.Name = ODS.translationTable.name
UNION
SELECT
CAST(CONCAT(SUBSTRING(ODS.collectionTimestamp, 1, 10), ' ', SUBSTRING(ODS.collectionTimestamp, 12, 12)) AS datetime) AS "TimeStamp",
CAST(-9999.00000 AS float) AS "ValueNumber",
CAST(DV.PropertyValue.value AS nvarchar(max)) AS "ValueBit",
CAST(TT.MeasurementTypeId AS bigint) AS "MeasurementTypeId",
TT.MeasurementTypeName AS "MeasurementName",
TT.PartName AS "PartName",
CAST(TT.ElementId AS bigint) AS "ElementId",
TT.ElementName AS "ElementName",
TT.ObjectName AS "ObjectName",
TT.LocationName AS "LocationName",
CAST(TT.TranslationTableId AS bigint) AS "TranslationTableId",
ODS.Type AS "Status"
FROM ODSMeasurements ODS
CROSS APPLY GetRecordProperties(digitalValues) AS DV
INNER JOIN SQLTranslationTable TT
ON
TT.Tag = DV.PropertyName AND
CAST(TT.Version as bigint) = ODS.translationTable.version AND
TT.Name = ODS.translationTable.name
)
-- Output data
SELECT *
INTO DatalakeHarmonizedMeasurements
FROM CombineAnalogAndDigital
PARTITION BY TranslationTableId
SELECT *
INTO FunctionsHarmonizedMeasurements
FROM CombineAnalogAndDigital
SELECT Timestamp, ValueNumber, CAST(ValueBit AS bit) AS ValueBit, ElementId, MeasurementTypeId, CAST(TranslationTableId AS bigint) AS TranslationTableId
INTO SQLRealtimeMeasurements
FROM CombineAnalogAndDigital
SELECT *
INTO EventHubHarmonizedMeasurements
FROM CombineAnalogAndDigital
PARTITION BY TranslationTableId
And this is the event hub input that I use:
{
"Name": "EventhubODSMeasurements",
"Type": "Data Stream",
"DataSourceType": "Event Hub",
"EventHubProperties": {
"ServiceBusNamespace": "xxx",
"EventHubName": "xxx",
"SharedAccessPolicyName": "xxx",
"SharedAccessPolicyKey": null,
"ConsumerGroupName": "streamanalytics",
"AuthenticationMode": "ConnectionString"
},
"DataSourceCredentialDomain": "xxx",
"Serialization": {
"Type": "Json",
"Encoding": "UTF8"
},
"PartitionKey": null,
"CompressionType": "None",
"ScriptType": "Input"
}
I use a separate consumer group for this as well. As far as I see it, I do everything right. Does anyone know what's up?
Edit: I enable the diagnostic logs, and it says this:
Exceeded the maximum number of allowed receivers per partition in a
consumer group which is 5. List of connected receivers - nil, nil,
nil, nil, nil.
Turns out the issue was PEBKAC. There is another Job that accidentally pointed to the same input Event Hub.
Related
I'm working off the example from the jooq blog post: https://blog.jooq.org/jooq-3-15s-new-multiset-operator-will-change-how-you-think-about-sql/. I've got the same setup: a foreign key relationship, and I want to load the Parent table and also get all rows that reference each of the Parent rows:
CREATE TABLE parent (
id BIGINT NOT NULL,
CONSTRAINT pk_parent PRIMARY KEY (id)
);
CREATE TABLE item (
id BIGINT NOT NULL,
parent_id BIGINT NOT NULL,
type VARCHAR(255) NOT NULL,
CONSTRAINT pk_item PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES parent (id)
);
This is what I think the jooq query should look like:
#Test
public void test() {
dslContext.insertInto(PARENT, PARENT.ID).values(123L).execute();
dslContext.insertInto(PARENT, PARENT.ID).values(456L).execute();
dslContext.insertInto(ITEM, ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE).values(1L, 123L, "t1").execute();
dslContext.insertInto(ITEM, ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE).values(2L, 456L, "t2").execute();
var result = dslContext.select(
PARENT.ID,
DSL.multiset(
DSL.select(
ITEM.ID,
ITEM.PARENT_ID,
ITEM.TYPE)
.from(ITEM)
.join(PARENT).onKey()))
.from(PARENT)
.fetch();
System.out.println(result);
}
The result is that each Item shows up for each Parent:
Executing query : select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" join "PUBLIC"."PARENT" on "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"
Fetched result : +----+----------------------------+
: | ID|multiset |
: +----+----------------------------+
: | 123|[(1, 123, t1), (2, 456, t2)]|
: | 456|[(1, 123, t1), (2, 456, t2)]|
: +----+----------------------------+
Fetched row(s) : 2
I also tried using doing an explicit check for Parent.id == Item.parent_id, but it didn't generate valid SQL:
var result =
dslContext
.select(
PARENT.ID,
DSL.multiset(
DSL.select(ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE)
.from(ITEM)
.where(ITEM.PARENT_ID.eq(PARENT.ID))))
.from(PARENT)
.fetch();
Error:
jOOQ; bad SQL grammar [select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" where "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"]
org.springframework.jdbc.BadSqlGrammarException: jOOQ; bad SQL grammar [select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" where "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"]
at org.jooq_3.17.6.H2.debug(Unknown Source)
What am I doing wrong here?
Correlated derived table support
There are numerous SQL dialects that can emulate MULTISET in principle, but not if you correlate them like you did. According to #12045, these dialects do not support correlated derived tables:
Db2
H2
MariaDB (see https://jira.mariadb.org/browse/MDEV-28196)
MySQL 5.7 (8 can handle it)
Oracle 11g (12c can handle it)
#12045 was fixed in jOOQ 3.18, producing a slighly less robust and more limited MULTISET emulation that only works in the absence of:
DISTINCT
UNION (and other set operations)
OFFSET .. FETCH
GROUP BY and HAVING
WINDOW and QUALIFY
But that probably doesn't affect 95% of all MULTISET usages.
Workarounds
You could use MULTISET_AGG, which doesn't suffer from this limitation (but is generally less powerful)
You could stop using H2, in case you're using that only as a test database (jOOQ recommends you integration test directly against your target database. This is a prime example of why this is generally better)
You could upgrade to 3.18.0-SNAPSHOT for the time being (built off github, or available from here, if you're licensed: https://www.jooq.org/download/versions)
Curious to find out what the best way is to generate relationship identities through ADF.
Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.
This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.
Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.
Also note that I turn IDENTITY_INSERT ON/OFF for each sink.
I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:
-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO
DROP SCHEMA IF EXISTS import
GO
CREATE SCHEMA import
CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO
DECLARE #json NVARCHAR(MAX) = '{
"Cities": [
{
"Country": "England",
"Region": "Greater London",
"City": "London"
},
{
"Country": "England",
"Region": "West Midlands",
"City": "Birmingham"
},
{
"Country": "England",
"Region": "Greater Manchester",
"City": "Manchester"
},
{
"Country": "Scotland",
"Region": "Lothian",
"City": "Edinburgh"
}
]
}'
SELECT *
INTO #tmp
FROM OPENJSON( #json, '$.Cities' )
WITH
(
Country VARCHAR(50),
Region VARCHAR(50),
City VARCHAR(50)
)
GO
-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )
-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
INNER JOIN import.Country t ON s.Country = t.CountryName
-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
INNER JOIN import.Country c ON s.Country = c.CountryName
INNER JOIN import.Region r ON s.Region = r.RegionName
AND c.CountryKey = r.CountryKey
SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;
This is a simple test script designed to show the idea and should run end-to-end but it is not production code.
I got the following input stream data to Stream Analytics.
[
{
"timestamp": 1559529369274,
"values": [
{
"id": "SimCh01.Device01.Ramp1",
"v": 39,
"q": 1,
"t": 1559529359833
},
{
"id": "SimCh01.Device01.Ramp2",
"v": 183.5,
"q": 1,
"t": 1559529359833
}
],
"EventProcessedUtcTime": "2019-06-03T02:37:29.5824231Z",
"PartitionId": 3,
"EventEnqueuedUtcTime": "2019-06-03T02:37:29.4390000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "ew-IoT-01-KepServer",
"ConnectionDeviceGenerationId": "636948080712635859",
"EnqueuedTime": "2019-06-03T02:37:29.4260000Z",
"StreamId": null
}
}
]
I try to extract the "values" array and specify the "t" within the array element for TIMESTAMP BY
I was able to query with simple SAQL statement within Stream Analytics to read the input and route to the output. However, I only interested in the "values" array above.
This is my first attempt. It does not like my 'TIMESTAMP BY' statement when I try to re-start Stream Analytics Job
SELECT
KepValues.ArrayValue.id,
KepValues.ArrayValue.v,
KepValues.ArrayValue.q,
KepValues.ArrayValue.t
INTO
[PowerBI-DS]
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
TIMESTAMP BY KepValues.ArrayValue.t
==============================================================================
This is my 2nd attempt. It still does not like my 'TIMESTAMP BY' statement.
With [PowerBI-Modified-DS] As (
SELECT
KepValues.ArrayValue.id as ID,
KepValues.ArrayValue.v as V,
KepValues.ArrayValue.q as Q,
KepValues.ArrayValue.t as T
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
)
SELECT
ID, V, Q, T
INTO
[PowerBI-DS]
FROM
[PowerBI-Modified-DS] TIMESTAMP BY T
After extraction, this is what I expected, a table with columns "id", "v", "q", "t" and each row will have a single ArrayElement. e.g.,
"SimCh01.Device01.Ramp1", 39, 1, 1559529359833
"SimCh01.Device01.Ramp2", 183.5, 1, 1559529359833
Added
Since then, I have modified the query as below to create a DateTime by converting the Unix time t into DateTime time
With [PowerBI-Modified-DS] As (
SELECT
arrayElement.ArrayValue.id as ID,
arrayElement.ArrayValue.v as V,
arrayElement.ArrayValue.q as Q,
arrayElement.ArrayValue.t as TT
FROM
[IoTHub-Input] as iothubAlias
CROSS APPLY GetArrayElements(iothubAlias.data) as arrayElement
)
SELECT
ID, V, Q, DATEADD(millisecond, TT, '1970-01-01T00:00:00Z') as T
INTO
[SAJ-01-PowerBI]
FROM
[PowerBI-Modified-DS]
I manage to add DATEADD() to convert Unix Time into DateTime and call it as T. Now how can I add 'TIMESTAMP BY'. I did try to add behind [PowerBI-Modified-DS] . But the editor complains the insert is invalid. What else can I do. Or this is the best I can do. I understand I need to set 'TIMESTAMP BY' so Power BI understand this is the streaming data.
The TIMESTAMP BY clause in Stream Analytics requires a value to be of type DATETIME. String values conforming to ISO 8601 formats are supported. In your example, the value of 't' does not conform to this standard.
To use TIMESTAMP BY clause in your case, you will have to pre-process the data before sending it to Stream Analytics or change the source to create the event (specifically the field 't') using this format.
Stream Analytics assigns TIMESTAMP before the query is executed. So TIMESTAMP BY expression can only refer to fields in the input payload. You have 2 options.
You can have 2 ASA jobs. First does the CROSS APPLY and the 2nd job does TIMESTAMP BY.
You can implement a deserializer in C# (sign up for preview access). This way you can have one job that uses your implementation to read incoming events. Your deserializer will convert the unix time to DateTime and this field can then be used in your TIMESTAMP BY clause.
I need some help for a data model to save smart meter data, im pretty new working with cassandra.
The data that has to be stored:
This is a example of 1 smart meter:
{"logical_name": "smgw_123",
"ldevs":
[{"logical_name": "sm_1", "objects": [{"capture_time": 390600, "unit": 30, "scaler": -3, "status": "000", "value": 152.361925}]},
{"logical_name": "sm_2", "objects": [{"capture_time": 390601, "unit": 33, "scaler": -3, "status": "000", "value": 0.3208547253907171}]},
{"logical_name": "sm_3", "objects": [{"capture_time": 390602, "unit": 36, "scaler": -3, "status": "000", "value": 162.636025}]}]
}
So this is 1 smart meter gateway with the logical_name "smgw_123".
And in the ldevs array are 3 smartmeters with their values described.
So the smart meter gateway has a relation to the 3 smart meters. And the smart meters again have their own data.
Questions
I dont know how I can store these data which have relations in a no sql database (in my case cassandra).
Do I have to use than 2 columns? Like smartmetergateway (logical name, smart meter1, smart meter 2, smart meter 3)
and another with smart meter (logical name, capture time, unit, scaler, status, value)
???
Another problem is, all smart meter gateways can have different amount of smart meters.
I hope I could describe my problem understandable.
thx
In Cassandra data modelling, the first thing you should do is to determine your queries. You will model partition keys and clustering columns of your tables according to your queries.
In your example, I assume you will query your smart meter gateways based on their logical names. I mean, your queries will look like
select <some_columns>
from smart_meter_gateway
where smg_logical_name = <a_smg_logical_name>;
Also I assume each smart meter gateway logical names are unique and each smart meter name in ldevs array has a unique logical name.
If this is the case, you should create a table with a partition key column of smg_logical_name and clustering column of sm_logical_name. By doing this, you will create a table where each smart meter gateway partition will contain some number of rows of smart meters:
create table smart_meter_gateway
(
smg_logical_name text,
sm_logical_name text,
capture_time int,
unit int,
scaler int,
status text,
value decimal,
primary key ((smg_logical_name), sm_logical_name)
);
And you can insert into this table by using following statements:
insert into smart_meter_gateway (smg_logical_name, sm_logical_name, capture_time, unit, scaler, status, value)
values ('smgw_123', 'sm_1', 390600, 30, -3, '000', 152.361925);
insert into smart_meter_gateway (smg_logical_name, sm_logical_name, capture_time, unit, scaler, status, value)
values ('smgw_123', 'sm_2', 390601, 33, -3, '000', 0.3208547253907171);
insert into smart_meter_gateway (smg_logical_name, sm_logical_name, capture_time, unit, scaler, status, value)
values ('smgw_123', 'sm_3', 390602, 36, -3, '000', 162.636025);
And when you query smart_meter_gateway table by smg_logical_name, you will get 3 rows in the result set:
select * from smart_meter_gateway where smg_logical_name = 'smgw_123';
The result of this query is:
smg_logical_name sm_logical_name capture_time scaler status unit value
smgw_123 sm_1 390600 -3 000 30 152.361925
smgw_123 sm_2 390601 -3 000 33 0.3208547253907171
smgw_123 sm_3 390602 -3 000 36 162.636025
You can also add sm_name as a filter to your query:
select *
from smart_meter_gateway
where smg_logical_name = 'smgw_123' and sm_logical_name = 'sm_1';
This time you will get only 1 row in the result set:
smg_logical_name sm_logical_name capture_time scaler status unit value
smgw_123 sm_1 390600 -3 000 30 152.361925
Note that there are other ways you can model your data. For example, you can use collection columns for ldevs array and this approach has some advantages and disadvantages. As I said in the beginning, it depends on your query needs.
I am streaming event messages which contain an posix/epoch time field. I am trying to calculate in which time frame i received a series of messages from a device.
Let's assume the following (simplified) input:
[
{ "deviceid":"device01", "epochtime":1500975613660 },
{ "deviceid":"device01", "epochtime":1500975640194 },
{ "deviceid":"device01", "epochtime":1500975649627 },
{ "deviceid":"device01", "epochtime":1500994473225 },
{ "deviceid":"device01", "epochtime":1500994486725 }
]
The result of my calculation should be a message like {deviceid, start, end} for each device id. I assume that a new time frame starts, if the time intervall between two events is longer than one hour. In my example this would result in two transmissions:
[
{"deviceid":"device01", "start":1500975613660, "end"=1500975649627},
{"deviceid":"device01", "start":500994473225, "end"=1500994486725}
]
I can convert the epoch time according to example 2 in the documentation https://msdn.microsoft.com/en-us/library/azure/mt573293.aspx. However, i cannot use the converted timestamp with the LAG function in a sub query. All values for previousTime are null in the ouput.
WITH step1 AS (
SELECT
[deviceid] AS deviceId,
System.Timestamp AS ts,
LAG([ts]) OVER (LIMIT DURATION(hour, 24)) as previousTime
FROM
input TIMESTAMP BY DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z')
)
I am not sure how i can perform my calculation and what's the best way to do it. I need to figure out the beginning and end of an event series.
Any help is very much appreciated.
I slightly modified your query below in order to get the expected result:
WITH STEP1 AS (
SELECT
[deviceid] AS deviceId,
System.Timestamp AS ts,
LAG(DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z') ) OVER (LIMIT DURATION(hour, 24)) as previousTime
FROM
input TIMESTAMP BY DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z')
)
SELECT * from STEP1
The problem is "ts" was defined in the current step, but when using LAG you are looking at the original message coming from the FROM statement, and it doesn't contain the "ts" variable.
Let me know if you have any question.
Thanks,
JS - Azure Stream Analytics team