Azure Stream Analytics query language get value by key from array of key value pairs - azure

I am trying to extract a specific value from an array property in the Stream Analytics query language.
My data looks as follows:
"context": {
"custom": {
"dimensions": [{
"MacAddress": "ma"
},
{
"IpAddress": "ipaddr"
}]
}
}
I am trying to obtain a result that has "MacAddress", "IpAddress" as column titles and "ma", "ipaddr" as rows.
I am currently achieving this with this query:
SELECT
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'MacAddress') AS MacAddress,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'IpAddress') AS IpAddress,
I am trying to use CROSS APPLY but so far no luck. Below the CROSS APPLY query:
SELECT
flat.ArrayValue.MacAddress as MacAddress,
flat.ArrayValue.IpAddress as IpAddress
FROM
[ffapi-track-events] as MySource
CROSS APPLY GetArrayElements(MySource.context.custom.dimensions) as flat
This one produces two rows instead of one:
MacAddress, IpAddress
ma ,
, ipaddr
so I'm missing precisely the flattening when writing it like that.
I would like to bypass hardcoding the index 0 as it's not guaranteed that MacAddress won't switch places with "IpAddress"... So I need something like FindElementInArray by condition, or some means to join with the dimensions array.
Is there such thing?
Thank you.

Related

Query to get all Cosmos DB documents referenced by another

Assume I have the following Cosmos DB container with the possible doc type partitions:
{
"id": <string>,
"partitionKey": <string>, // Always "item"
"name": <string>
}
{
"id": <string>,
"partitionKey": <string>, // Always "group"
"items": <array[string]> // Always an array of ids for items in the "item" partition
}
I have the id of a "group" document, but I do not have the document itself. What I would like to do is perform a query which gives me all "item" documents referenced by the "group" document.
I know I can perform two queries: 1) Retrieve the "group" document, 2) Perform a query with IN clause on the "item" partition.
As I don't care about the "group" document other than getting the list of ids, is it possible to construct a single query to get me all the "item" documents I want with just the "group" document id?
You'll need to perform two queries, as there are no joins between separate documents. Even though there is support for subqueries, only correlated subqueries are currently supported (meaning, the inner subquery is referencing values from the outer query). Non-correlated subqueries are what you'd need.
Note that, even though you don't want all of the group document, you don't need to retrieve the entire document. You can project just the items property, which can then be used in your 2nd query, with something like array_contains(). Something like:
SELECT VALUE g.items
FROM g
WHERE g.id="1"
AND g.partitionKey="group"
SELECT VALUE i.name
FROM i
WHERE array_contains(<items-from-prior-query>,i.id)
AND i.partitionKey="item"
This documentation page clarifies the two subquery types and support for only correlated subqueries.

How to extract and flatten a JSON array as well as specify an Array Value for 'TIMESTAMP BY' in Stream Analytics Query

I got the following input stream data to Stream Analytics.
[
{
"timestamp": 1559529369274,
"values": [
{
"id": "SimCh01.Device01.Ramp1",
"v": 39,
"q": 1,
"t": 1559529359833
},
{
"id": "SimCh01.Device01.Ramp2",
"v": 183.5,
"q": 1,
"t": 1559529359833
}
],
"EventProcessedUtcTime": "2019-06-03T02:37:29.5824231Z",
"PartitionId": 3,
"EventEnqueuedUtcTime": "2019-06-03T02:37:29.4390000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "ew-IoT-01-KepServer",
"ConnectionDeviceGenerationId": "636948080712635859",
"EnqueuedTime": "2019-06-03T02:37:29.4260000Z",
"StreamId": null
}
}
]
I try to extract the "values" array and specify the "t" within the array element for TIMESTAMP BY
I was able to query with simple SAQL statement within Stream Analytics to read the input and route to the output. However, I only interested in the "values" array above.
This is my first attempt. It does not like my 'TIMESTAMP BY' statement when I try to re-start Stream Analytics Job
SELECT
KepValues.ArrayValue.id,
KepValues.ArrayValue.v,
KepValues.ArrayValue.q,
KepValues.ArrayValue.t
INTO
[PowerBI-DS]
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
TIMESTAMP BY KepValues.ArrayValue.t
==============================================================================
This is my 2nd attempt. It still does not like my 'TIMESTAMP BY' statement.
With [PowerBI-Modified-DS] As (
SELECT
KepValues.ArrayValue.id as ID,
KepValues.ArrayValue.v as V,
KepValues.ArrayValue.q as Q,
KepValues.ArrayValue.t as T
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
)
SELECT
ID, V, Q, T
INTO
[PowerBI-DS]
FROM
[PowerBI-Modified-DS] TIMESTAMP BY T
After extraction, this is what I expected, a table with columns "id", "v", "q", "t" and each row will have a single ArrayElement. e.g.,
"SimCh01.Device01.Ramp1", 39, 1, 1559529359833
"SimCh01.Device01.Ramp2", 183.5, 1, 1559529359833
Added
Since then, I have modified the query as below to create a DateTime by converting the Unix time t into DateTime time
With [PowerBI-Modified-DS] As (
SELECT
arrayElement.ArrayValue.id as ID,
arrayElement.ArrayValue.v as V,
arrayElement.ArrayValue.q as Q,
arrayElement.ArrayValue.t as TT
FROM
[IoTHub-Input] as iothubAlias
CROSS APPLY GetArrayElements(iothubAlias.data) as arrayElement
)
SELECT
ID, V, Q, DATEADD(millisecond, TT, '1970-01-01T00:00:00Z') as T
INTO
[SAJ-01-PowerBI]
FROM
[PowerBI-Modified-DS]
I manage to add DATEADD() to convert Unix Time into DateTime and call it as T. Now how can I add 'TIMESTAMP BY'. I did try to add behind [PowerBI-Modified-DS] . But the editor complains the insert is invalid. What else can I do. Or this is the best I can do. I understand I need to set 'TIMESTAMP BY' so Power BI understand this is the streaming data.
The TIMESTAMP BY clause in Stream Analytics requires a value to be of type DATETIME. String values conforming to ISO 8601 formats are supported. In your example, the value of 't' does not conform to this standard.
To use TIMESTAMP BY clause in your case, you will have to pre-process the data before sending it to Stream Analytics or change the source to create the event (specifically the field 't') using this format.
Stream Analytics assigns TIMESTAMP before the query is executed. So TIMESTAMP BY expression can only refer to fields in the input payload. You have 2 options.
You can have 2 ASA jobs. First does the CROSS APPLY and the 2nd job does TIMESTAMP BY.
You can implement a deserializer in C# (sign up for preview access). This way you can have one job that uses your implementation to read incoming events. Your deserializer will convert the unix time to DateTime and this field can then be used in your TIMESTAMP BY clause.

Cosmos: DISTINCT results with JOIN and ORDER BY

I'm trying to write a query that uses a JOIN to perform a geo-spatial match against locations in a array. I got it working, but added DISTINCT in order to de-duplicate (Query A):
SELECT DISTINCT VALUE
u
FROM
u
JOIN loc IN u.locations
WHERE
ST_WITHIN(
{'type':'Point','coordinates':[loc.longitude,loc.latitude]},
{'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})
However, I then found that combining DISTINCT with continuation tokens isn't supported unless you also add ORDER BY:
System.ArgumentException: Distict query requires a matching order by in order to return a continuation token. If you would like to serve this query through continuation tokens, then please rewrite the query in the form 'SELECT DISTINCT VALUE c.blah FROM c ORDER BY c.blah' and please make sure that there is a range index on 'c.blah'.
So I tried adding ORDER BY like this (Query B):
SELECT DISTINCT VALUE
u
FROM
u
JOIN loc IN u.locations
WHERE
ST_WITHIN(
{'type':'Point','coordinates':[loc.longitude,loc.latitude]},
{'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})
ORDER BY
u.created
The problem is, the DISTINCT no longer appears to be taking effect because it returns, for example, the same record twice.
To reproduce this, create a single document with this data:
{
"id": "b6dd3e9b-e6c5-4e5a-a257-371e386f1c2e",
"locations": [
{
"latitude": -42,
"longitude": -109
},
{
"latitude": -42,
"longitude": -109
}
],
"created": "2019-03-06T03:43:52.328Z"
}
Then run Query A above. You will get a single result, despite the fact that both locations match the predicate. If you remove the DISTINCT, you'll get the same document twice.
Now run Query B and you'll see it returns the same document twice, despite the DISTINCT clause.
What am I doing wrong here?
Reproduced your issue indeed,based on my researching,it seems a defect in cosmos db distinct query. Please refer to this link:Provide support for DISTINCT.
This feature is broke in the data explorer. Because cosmos can only
return 100 results per page at a time, the distinct keyword will only
apply to a single page. So, if your result set contains more than 100
results, you may still get duplicates back - they will simply be on
separately paged result sets.
You could describe your own situation and vote up this feedback case.

Azure Document Query Sub Dictionaries

I have stored the following JSON document in the Azure Document DB:
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"
"Properties": [
{
"Key": "Value1",
"Value": "testing1"
},
{
"Key": "Value",
"Value": "testing2"
}
]
When i try to query the document back i can easily perform the
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Key = 'Value1'
However when i try to query:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Value = 'testing1'
I get an error that the query cannot be computed. I assume this is due to 'VALUE' being a reserved keyword within the query language.
I cannot specify a specific order in the property array because different subclasses can add different property in different orders as they need them.
Anybody any suggestion how i can still complete this query ?
To escape keywords in DocumentDB, you can use the [] syntax. For example, the above query would be:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C["Value"] = 'testing1'

Logic for reverse search result

I have a usecase where for a given result value i want to reverse lookup all the search conditions defined that will give this as result.
So, I have a set of search conditions defined in a table as key value list. Each entry in this table is a search query. Now, I have a random value in dataset which can be result of any search entries defined in the table. I want to lookup that table so that for this value i can get all the search queries possible where this value would appear as its result.
The search table consist of fields search_conditions, search_table among other fields.
Schema would be like
Search_Table
id (long)
search_table_id (long)
search_conditions (json array as text)
This is value of one such search condition
[
{
"key": "name",
"operator": "equals",
"value": "jeff"
},
{
"key": "age",
"operator": "between",
"value": [
20,
40
]
}
]
Value that i have to search can be say a random user {"name": "mr x", "age":12}.
This may not be exactly a technology based question but its solution may require technology. Any help will be appreciated. The concern is more about optimization as this has to be done in real time.

Resources