Azure stream analyics - Compiling query failed

Azure stream analyics - Compiling query failed - azure

When I try to use the last function (https://msdn.microsoft.com/en-us/library/azure/mt421186.aspx). I get the following error:
Compiling query failed.
SELECT
deviceId
,System.TimeStamp as timestamp
,avg(events.externaltemp) as externaltemp
,LAST(System.Timestamp) OVER (PARTITION BY deviceId LIMIT DURATION(second, 1) when [externaltemp] is not null ) as Latest
INTO
[powerBI]
FROM
[EventHub] as events timestamp by [timestamp]
GROUP BY deviceId, TumblingWindow(second,1)
My last function looks very similar to the one in the msdn sample, so I'm not sure why there is a problem.

You are using [externaltemp] in your query, but it is not included in group by. That is the reason. And "last" function does not allow aggregates inside it, so below wouldn't work as well
LAST(System.Timestamp) OVER (PARTITION BY deviceId LIMIT DURATION(second, 1) when avg([externaltemp]) is not null ) as Latest
It can be achieved by splitting the query into two steps, like this
with DeviceAggregates
as
(
SELECT
System.TimeStamp as [Timestamp],
deviceId,
avg(events.externaltemp) as [externaltemp]
FROM
[EventHub] as events timestamp by [timestamp]
GROUP BY
deviceId,
TumblingWindow(second,1)
),
DeviceAggregatesWithLast as
(
select
*,
last([Timestamp]) over (partition by deviceId limit duration(second,1) when [externaltemp] is not null) [LastTimeThereWasANonNullTemperature]
from
DeviceAggregates
)
select *
INTO
[powerBI]
from
DeviceAggregatesWithLast

Related

cross apply an array of values recorded every 10 mins from a timestamp and generate their timestamps in stream analytics

I have the following stream analytics input:
{ "ID":"DEV-001-Test",
"TMSMUTC":"2021-10-14T14:00:00.000",
"MSGTYP":"TELEMETRY",
"THING":[
{
"TMSDUTC":"2021-10-14T13:00:00.000",
"DATA":[
{
"TAGID":"TAGB",
"VALUE":30
},
{
"TAGID":"TAGX",
"VALUE":[30.34,245.65,30.34,245.65,245.65,30.34]
}
]
}
]
}
in which the array of values for the "TAGX" is representing a value recorded from a sensor every 10 mins for one hour from the timestamp "TMSDUTC":"2021-10-14T13:00:00.000".
I was wondering how could make a query that would give me a similar output:
output
my main doubts are how to create the sequence of 10 mins from the timestamp and cross apply the values to it.

That's a good one! Note that I highly recommend you use VSCode and the ASA extension when working on these queries. The developer experience is much nicer than in the portal thanks to local testing, and you can also unit test your query via the npm package.
I took the following assumptions:
THING is an array of a single record. Let me know if that's not the case
[edited] TMSDUTC needs to be incremented by 10 minutes according to the position of each item in the array when applicable (TAGX)
With that, here is the query. It's split in multiple code blocks to explain the flow, but I also pasted it whole in the last code block.
First we bring all the required fields to the first level. It makes things easier to read, but not only. GetArrayElements needs an array to CROSS APPLY, but GetArrayElement (singular) doesn't return the type at compile time. Using an intermediary query step solves that.
WITH things AS (
SELECT
ID,
GetArrayElement(THING,0).TMSDUTC AS TMSDUTC,
MSGTYP AS MessageType,
GetArrayElement(THING,0).DATA AS DATA
FROM [input]
),
Then we expand DATA:
dataAll AS (
SELECT
T.ID,
T.TMSDUTC,
T.MessageType,
D.ArrayValue.TAGID AS Tag,
D.ArrayValue.Value AS [Value]
FROM things T
CROSS APPLY GetArrayElements(T.DATA) AS D
),
Then we create a subset for records that have a VALUE of type array (TAGX in your example). Here I avoid hard-coding per tag by detecting the type at runtime. These records will need another round of array processing in the following step.
dataArrays AS (
SELECT
A.ID,
A.TMSDUTC,
A.MessageType,
A.Tag,
A.[Value]
FROM dataAll A
WHERE GetType(A.[Value]) = 'array'
),
Now we can focus on expanding VALUE for those records. Note that we could not do that in a single pass (filter on arrays above and CROSS APPLY below), as GetArrayElements checks types before filtering is done.
[edited] To increment TMSDUTC, we use DATEADD on the index of each item in its array (ArrayIndex/ArrayValue are both returned from the array expansion, see doc below).
dataArraysExpanded AS (
SELECT
A.ID,
DATEADD(minute,10*V.ArrayIndex,A.TMSDUTC) AS TMSDUTC,
A.MessageType,
A.Tag,
V.ArrayValue AS [Value]
FROM dataArrays A
CROSS APPLY GetArrayElements(A.[Value]) AS V
),
We union back everything together:
newSchema AS (
SELECT ID, TMSDUTC, MessageType, Tag, [Value] FROM dataAll WHERE GetType([Value]) != 'array'
UNION
SELECT ID, TMSDUTC, MessageType, Tag, [Value] FROM dataArraysExpanded
)
And finally insert everything into the destination:
SELECT
*
INTO myOutput
FROM newSchema
[edited] Please note that the only order guaranteed on a result set is the one defined by the timestamp. If multiple records occur on the same timestamp, no order is guaranteed by default. Here, at the end of the query, all of the newly created events are still timestamped on the timestamp of the original event. If you now need to apply time logic on the newly generated TMSDUTC, you will need to output these records to Event Hub, and load them in another job using TIMESTAMP BY TMSDUTC. Currently the timestamp can only be changed directly at the very first step of a query.
What is used here :
GetArrayElement (singular) : doc
WITH aka Common Table Expression (CTE) : doc
CROSS APPLY + GetArrayElements : doc and doc, plus very good ref
GetType : doc
The entire thing for easier copy/pasting:
WITH things AS (
SELECT
ID,
GetArrayElement(THING,0).TMSDUTC AS TMSDUTC,
MSGTYP AS MessageType,
GetArrayElement(THING,0).DATA AS DATA
FROM [input]
),
dataAll AS (
SELECT
T.ID,
T.TMSDUTC,
T.MessageType,
D.ArrayValue.TAGID AS Tag,
D.ArrayValue.Value AS [Value]
FROM things T
CROSS APPLY GetArrayElements(T.DATA) AS D
),
dataArrays AS (
SELECT
A.ID,
A.TMSDUTC,
A.MessageType,
A.Tag,
A.[Value]
FROM dataAll A
WHERE GetType(A.[Value]) = 'array'
),
dataArraysExpanded AS (
SELECT
A.ID,
DATEADD(minute,10*V.ArrayIndex,A.TMSDUTC) AS TMSDUTC,
A.MessageType,
A.Tag,
V.ArrayValue AS [Value]
FROM dataArrays A
CROSS APPLY GetArrayElements(A.[Value]) AS V
),
newSchema AS (
SELECT ID, TMSDUTC, MessageType, Tag, [Value] FROM dataAll WHERE GetType([Value]) != 'array'
UNION
SELECT ID, TMSDUTC, MessageType, Tag, [Value] FROM dataArraysExpanded
)
SELECT
*
INTO myOutput
FROM newSchema

To Find Distinct values after doing a UNION operation in Azure Stream Analytics

I am running a query in Stream analytics with a UNION function.
I would like to get the distinct values for the query results. Since the UNION is allowing duplicates in Azure Stream Analytics I am getting result with duplicate values. I have tried using DISTINCT keyword also but even it is not working.
Below is the Query I tried.
WITH
ABCINNERQUERY AS (
SELECT
event.ID as ID,
ABCArrayElement.ArrayValue.E as TIME,
ABCArrayElement.ArrayValue.V as ABC
FROM
[YourInputAlias] as event
CROSS APPLY GetArrayElements(event.ABC) AS ABCArrayElement
),
XYZINNERQUERY AS (
SELECT
event.ID as ID,
XYZArrayElement.ArrayValue.E as TIME,
XYZArrayElement.ArrayValue.V as XYZ
FROM
[YourInputAlias] as event
CROSS APPLY GetArrayElements(event.XYZ) AS XYZArrayElement
),
KEYS AS
(
SELECT DISTINCT
ABCINNERQUERY.ID AS ID,
ABCINNERQUERY.TIME as TIME
FROM ABCINNERQUERY
UNION
SELECT DISTINCT
XYZINNERQUERY.ID AS ID,
XYZINNERQUERY.TIME as TIME
FROM XYZINNERQUERY
)
SELECT
KEYS.ID as ID,
KEYS.TIME as TIME
INTO [YourOutputAlias]
FROM KEYS
In the above query ID is unique and Time will be An array of values with Time and value of ABC/XYZ.
input json file is as below.
[
{"ID":"006XXXXX",
"ABC":
[{"E":1557302231320,"V":54.799999237060547}],
"XYZ":
[{"E":1557302191899,"V":31.0},{"E":1557302231320,"V":55}],
{"ID":"007XXXXX",
"ABC":
[{"E":1557302195483,"V":805.375},{"E":1557302219803,"V":0}],
"XYZ":
[{"E":1557302219803,"V":-179.0},{"E":1557302195483,"V":88}]
]
Expected result without duplicates.

Cassandra CQL alternative to OR in WHERE clause

Here's the code I used to create the table:
CREATE TABLE test.packages (
packageuuid timeuuid,
ruserid text,
suserid text,
timestamp int,
PRIMARY KEY (ruserid, suserid, packageuuid, timestamp)
);
and then I create a materialized view:
CREATE MATERIALIZED VIEW test.packages_by_userid
AS SELECT * FROM test.packages
WHERE ruserid IS NOT NULL
AND suserid IS NOT NULL
AND TIMESTAMP IS NOT NULL
AND packageuuid IS NOT NULL
PRIMARY KEY (ruserid, suserid, timestamp, packageuuid)
WITH CLUSTERING ORDER BY (packageuuid DESC);
I want to be able to search for packages sent between two IDs
so I would need something like this:
SELECT * FROM test.packages_by_userid WHERE (ruserid = '1' AND suserid = '2' AND suserid = '1' AND ruserid = '2') AND timestamp > 1496601553;
How would I accomplish something like this with CQL?
I've searched a bit but I can't figure it out.
I'm willing to change the structure of the table if it will make something like this possible.
If it's doable without a materialized view that would also be good.

Use In Clause:
SELECT * FROM test.packages_by_userid WHERE ruserid IN ( '1', '2') AND suserid IN ( '1','2') AND timestamp > 1496601553;
Note : Keep the in clause size smaller, Large in clause in the partition can cause GC pauses and heap pressure that leads to overall slower performance
In practical terms this means you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.
If the multiple partition in clause larger try to use separate query, for each partition (ruserid) with executeAsync.
SELECT * FROM test.packages_by_userid WHERE ruserid = '1' AND suserid IN ( '1','2') AND timestamp > 1496601553;
SELECT * FROM test.packages_by_userid WHERE ruserid = '2' AND suserid IN ( '1','2') AND timestamp > 1496601553;
Learn More : https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

Since you always search for both sender and receiver, I'd model this with the following table layout:
CREATE TABLE test.packages (
ruserid text,
suserid text,
timestamp int,
packageuuid timeuuid,
PRIMARY KEY ((ruserid, suserid), timestamp)
);
In this way, for each pair of sender/receiver you need to run two queries, one for each partition:
SELECT * FROM packages WHERE ruserid=1 AND suserid=2 AND timestamp > 1496601553;
SELECT * FROM packages WHERE ruserid=2 AND suserid=1 AND timestamp > 1496601553;
This is IMHO the best solution because, remember, in Cassandra you start from your queries and build your table models on that, never the reverse.

Postgres sorting on timestamp works on mac but not linux

Using Postgres 9.4
I have a posts table which relates to a users table. I'm querying for two users and 3 of their most recent posts.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
On mac, the order is as expected.
"2016-04-17 18:49:15.942"
"2016-04-15 03:29:31.212"
"2016-04-13 15:07:15.119"
I get descending order on created_at, which is a timestamptz. However, when run on my travis build, which is Ubuntu, the ordering is stable, but neither ascending nor descending....
"2016-04-15 03:29:31.212"
"2016-04-13 15:07:15.119"
"2016-04-17 18:49:15.942"
I made user to create the databases with the same LC_COLLATE = en_US.UTF-8 with no luck. Why on earth isn't the ordering working on travis?

To solve this, just add the order by statement under your existing statements above.
i.e.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
order by posts.created_at desc
The order of output on postgres (and many other dbms's) cannot be guaranteed without an order by statement.
While you do indeed have order by statements, they are within sub-queries, you need the order by on the outer query.

you may need to order the outer query too because the in join between the 2 inner queries, even when they are ordered, won't be guaranteed.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
order by "posts"."created_at" DESC

Because the actual sort order depends on both the order of id in the first table and the order of the created_at & id in the second one prior to joining them. This means the order of the first table can produce unexpected results when computing the selected values from the joined table.
To fix the sort order, you should sort the final result set by relevant columns as well.

Excel Query - invalid number of parameter and invalid descriptor index errors

I work in financial deparment of a company and want to extract data from our server (SQL) to put together reports for Board.
So I have several excel file where I use excel query to retreive data and make presentations.
I have been upgrading my querys and hit an obstacle with this one.
It was working fine with this bit of code:
SELECT
INTERNAL_REFERENCE as ref,
CMP_CODE AS CMP_CODE,
COUNTERPARTY_CODE AS BANK_CODE,
TRANSACTION_CODE AS TRANS_CODE,
CAST(CONVERT(varchar(10), AMO_END_DATE, 110) AS datetime) AS DATE,
SUM([AMORTIZATION]) AS AMOUNT
FROM
[SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE] inner join [SAGE_MTC_FRP].[dbo]. [LOANS]
on [SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE].LOAN_ID=[SAGE_MTC_FRP].[dbo]. [LOANS].LOAN_ID
WHERE
(AMO_END_DATE>=?) AND (BOOK_DATE<?) AND
(TRANSACTION_CODE<>'CPCA' AND TRANSACTION_CODE<>'CPCF' AND TRANSACTION_CODE<>'RENT') AND
IS_DELETED=0 AND VERSION_NUMBER=1 AND CMP_CODE='MTG'
GROUP BY
INTERNAL_REFERENCE, CMP_CODE, COUNTERPARTY_CODE, TRANSACTION_CODE, AMO_END_DATE
But when I put all code with the union all, it blows up with these 2x errors:
problem:
invalid number of parameter
invalid descriptor index
Code:
SELECT * FROM
(
SELECT
INTERNAL_REFERENCE as ref,
CMP_CODE AS CMP_CODE,
COUNTERPARTY_CODE AS BANK_CODE,
TRANSACTION_CODE AS TRANS_CODE,
CAST(CONVERT(varchar(10), AMO_END_DATE, 110) AS datetime) AS DATE,
SUM([AMORTIZATION]) AS AMOUNT
FROM
[SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE] inner join [SAGE_MTC_FRP].[dbo]. [LOANS]
on [SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE].LOAN_ID=[SAGE_MTC_FRP].[dbo]. [LOANS].LOAN_ID
WHERE
(AMO_END_DATE>=?) AND (BOOK_DATE<?) AND
(TRANSACTION_CODE<>'CPCA' AND TRANSACTION_CODE<>'CPCF' AND TRANSACTION_CODE<>'RENT') AND
IS_DELETED=0 AND VERSION_NUMBER=1 AND CMP_CODE='MTG'
GROUP BY
INTERNAL_REFERENCE, CMP_CODE, COUNTERPARTY_CODE, TRANSACTION_CODE, AMO_END_DATE
UNION ALL
SELECT
CL_CODE as ref
,LEFT([ACC_CODE] , 3) AS CMP_CODE
,[COUNTERPARTY_CODE] AS BANK_CODE
,RIGHT([CL_DESCRIPTION] , 3) AS TRANS_CODE
,CAST(CONVERT(varchar(10), [END_DATE], 110) AS datetime) AS DATE
,[CL_AMOUNT] AS AMOUNT
FROM
[SAGE_MTC_FRP].[dbo].[CREDIT_LINES]
WHERE
(END_DATE>?) AND
RIGHT([CL_DESCRIPTION] , 3)='PPC'
) AS DATA
ORDER BY REF

If both queries run by there self, you could use something like:
DECLARE #result TABLE(
[ref] NVARCHAR(50),
[CMP_CODE] NVARCHAR(50),
[BANK_CODE] NVARCHAR(50),
[TRANS_CODE] NVARCHAR(50),
[Date] DATE,
[AMOUNT] DECIMAL(10,2)
)
INSERT INTO #result
( ref ,
CMP_CODE ,
BANK_CODE ,
TRANS_CODE ,
Date ,
AMOUNT
)
SELECT
INTERNAL_REFERENCE as ref,
CMP_CODE AS CMP_CODE,
COUNTERPARTY_CODE AS BANK_CODE,
TRANSACTION_CODE AS TRANS_CODE,
CAST(CONVERT(varchar(10), AMO_END_DATE, 110) AS datetime) AS DATE,
SUM([AMORTIZATION]) AS AMOUNT
FROM
[SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE] inner join [SAGE_MTC_FRP].[dbo]. [LOANS]
on [SAGE_MTC_FRP].[dbo].[LOAN_SCHEDULE].LOAN_ID=[SAGE_MTC_FRP].[dbo]. [LOANS].LOAN_ID
WHERE
(AMO_END_DATE>=?) AND (BOOK_DATE<?) AND
(TRANSACTION_CODE<>'CPCA' AND TRANSACTION_CODE<>'CPCF' AND TRANSACTION_CODE<>'RENT') AND
IS_DELETED=0 AND VERSION_NUMBER=1 AND CMP_CODE='MTG'
GROUP BY
INTERNAL_REFERENCE, CMP_CODE, COUNTERPARTY_CODE, TRANSACTION_CODE, AMO_END_DATE
INSERT INTO #result
( ref ,
CMP_CODE ,
BANK_CODE ,
TRANS_CODE ,
Date ,
AMOUNT
)
SELECT
CL_CODE as ref
,LEFT([ACC_CODE] , 3) AS CMP_CODE
,[COUNTERPARTY_CODE] AS BANK_CODE
,RIGHT([CL_DESCRIPTION] , 3) AS TRANS_CODE
,CAST(CONVERT(varchar(10), [END_DATE], 110) AS datetime) AS DATE
,[CL_AMOUNT] AS AMOUNT
FROM
[SAGE_MTC_FRP].[dbo].[CREDIT_LINES]
WHERE
(END_DATE>?) AND
RIGHT([CL_DESCRIPTION] , 3)='PPC'
SELECT * FROM #result ORDER BY ref

I realize this is ancient but ran across the problem today.
To test, I removed the parameter and found that it was a security issue. I granted the user execute permission to the SQL Procedure, and it worked without the parameter. When I re-introduced the parameter, it continued working.
I'm far from a security expert, so I'm sure there's a better way to grant access, but this got me past the problem for now. Feel free to suggest a better solution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure stream analyics - Compiling query failed - azure

Related

cross apply an array of values recorded every 10 mins from a timestamp and generate their timestamps in stream analytics

To Find Distinct values after doing a UNION operation in Azure Stream Analytics

Cassandra CQL alternative to OR in WHERE clause

Postgres sorting on timestamp works on mac but not linux

Excel Query - invalid number of parameter and invalid descriptor index errors

Categories

Resources