azure stream analytics to cosmos db - azure

I have a trouble saving telemetry that are coming from Azure IoT hub to Cosmos DB. I have the following setup:
IoT Hub - for events aggregation
Azure Stream Analytics - for event stream processing
Cosmos DB with Table API. Here I created 1 table.
The sample message from IoT Hub:
{"id":33,"deviceId":"test2","cloudTagId":"cloudTag1","value":24.79770721657087}
The query in stream analytics which processes the events:
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as '$pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
the proble is following every time the job tries to save the output to CosmosDB I get an error An error occurred while preparing data for DocumentDB. The output record does not contain the column '$pk' to use as the partition key property by DocumentDB
Note: I've added $pk column and PartitionKey when trying to solve the problem.
EDIT Here, is the output configuration:
Does anyone know what I'm doing wrong?

Unfortunately the Table API from CosmosDB is not supported yet as output sink for ASA.
If want to use Table as output, you can use the one under Storage Account.
Sorry for the inconvenience.
We will add the Cosmos DB Table API in the future.
Thanks!
JS - Azure Stream Analytics team

I had this problem also. Although it isn't clear in the UI only the SQL API for CosmosDB is currently supported. I switched over to that and everything worked fantastically.

Try with
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as 'pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
The Special char is the problem.
While create the output with partition as 'id' and while insert query 'deviceId' as PartitionKey, because of that it is not partition correctly.
Example:
SELECT
id as PartitionKey, SUM(CAST(temperature AS float)) AS temperaturesum ,AVG(CAST(temperature AS float)) AS temperatureavg
INTO streamout
FROM
Streaminput TIMESTAMP by Time
GROUP BY
id ,
TumblingWindow(second, 60)

Related

Write Azure Table Storage - Different behaviour local and cloud

I've a simple Azure function that writes periodically some data into an Azure Table Storage.
var storageAccount = new CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials("mystorage","xxxxx"),true);
var tableClient = storageAccount.CreateCloudTableClient();
myTable = tableClient.GetTableReference("myData");
TableOperation insertOperation = TableOperation.Insert(data);
myTable.ExecuteAsync(insertOperation);
The code runs well locally in Visual Studio and all data is written correctly into the Azure located Table Storage.
But if I deploy this code 1:1 into Azure as an Azure function, the code also runs well without any exception and logging shows, it runs through every line of code.
But no data is written in the Table Storage - same name, same credentials, same code.
Is Azure blocking this connection (AzureFunc in Azure > Azure Table Storage) in some way in contrast to "Local AzureFunc > Azure Table Storage)?
Is Azure blocking this connection (AzureFunc in Azure > Azure Table
Storage) in some way in contrast to "Local AzureFunc > Azure Table
Storage)?
No, it's not azure which is blocking the connection or anything of that sort.
You have to await the table operation you are doing with ExecuteAsync as the control in program is moving without that method being completed. Change your last line of code to
await myTable.ExecuteAsync(insertOperation);
Take a look how here on Because this call is not awaited, the current method continues to run before the call is completed.
The problem was the rowkey:
I used DateTime.Now for the rowkey (since autoincrement values are not provided by table storage).
And my local format was "1.1.2019 18:19:20" while the server's format was "1/1/2019 ..."
And "/" seems not to be allowed in the rowkey string.
Now, formatting the DateTime string correct everything works fine.

How to use CosmosDb with partition key as a Stream Analytics output?

I'm setting up CosmosDb with a partition key as a Stream Analytics Job output and the connection test fails with the following error:
Error connecting to Cosmos DB Database: Invalid or no matching collections >found with collection pattern 'containername/{partition}'. Collections must >exist with case-sensitive pattern in increasing numeric order starting with >0..
NOTE: I'm using the cosmosdb with SQL API, but the configuration is done through portal.azure.com
I have confirmed I can manually insert documents into the DocumentDB through the portal Data Explorer. Those inserts succeed and the partition key value is correctly identified.
I set up the Cosmos container like this
Database Id: testdb
Container id: containername
Partition key: /partitionkey
Throughput: 1000
I set up the Stream Analytics Output like this
Output Alias: test-output-db
Subscription: My-Subscription-Name
Account id: MyAccountId
Database -> Use Existing: testdb
Collection name pattern: containername/{partition}
Partition Key: partitionkey
Document id:
When testing the output connection I get a failure and the error listed above.
I received a response from Microsoft support that specifying the partition via the "{partition}" token pattern is no longer supported by Azure Stream Analytics. Furthermore, writing to multiple containers from ASA in general has been deprecated. Now, if ASA outputs to a CosmosDb with a partition configured, Cosmos should automatically take care of that on its side.
after discussion with our ASA developer/product group team, the
collection pattern such as MyCollection{partition} or
MyCollection/{partition} is no longer supported. Writing to multiple
fixed containers is being deprecated and it is not the recommended
approach for scaling out the Stream Analytics job [...] In summary,
you can define the collection name simply as "apitraffic". You don't
need to specify any partition key as we detect it automatically from
Cosmos DB.

filter data from azure data storage in data factory v2

I am new to Azure Data Factory v2. We have a table in a Azure data storage and I am able to load all data in a Azure SQL database by using the copy data option.
But what I would like to achieve is filter the data in the data storage by the field status that is an integer field. I tried some examples from the Microsoft website. But every time I get the bad syntax error when I run the pipeline.
So what I tried is, in the source tab I choose my data store as source data set, with the source table documentStatus. And I clicked on use query and put this line in:
"azureTableSourceQuery": "$$Text.Format('Status = 2')"
But when I run this I get this error: The remote server returned an error: (400) Bad Request.
Can anybody help me with writing a correct query so I can filter my source on this status field?
Thanks
Please set "azureTableSourceQuery": "Status eq 2":
Please refer to this doc related to Azure Table Storage Filter Sql.

Preserve last value with Azure Stream Analytics

We are collecting device events via IoT Hub which are then processed with Stream Analytics. We want to generate a status overview containing the last value of every measurement. The status is then written to a CosmosDB output, one document per device.
The simplified query looks like this:
SELECT
device_id as id,
LAST(value) OVER (PARTITION BY device_id LIMIT DURATION(day, 1) WHEN name = 'battery_status') AS battery_status
INTO status
FROM iothub
The resulting document should be (also simplified):
{
"id": "8c03b6cef760",
"battery_status": 95
}
The problem is that not all events contain a battery_status and whenever the last event with battery_status is older than the specified duration, the last value in the CosmosDB document is overwritten with NULL.
What I would need is some construct to omit the value entirely when there is no data and consequently preserve the last value in the output document. Any ideas how I could achieve this?
Currently Azure Stream Analytics does not support partition your output to CosmosDB
for each device.
There are two option to workaround.
You can choose Azure Function to workaround. In azure function you can create a IoT Hub trigger, filter the data with battery_status property ,and then store the data to CosmosDB for each device programmatically.
You can choose Azure Storage Container instead CosmosDB, and then config Azure Storage Container as endpoints and message route in Azure IoT Hub, please refer to IoT Hub Endpoints and this tutorial about how to save IoT hub messages that contain sensor data to your Azure blob storage. In the route config, you can add query string for filtering the data.
If I understand your problem correctly, I just put where condition to filter non battery_status data. You can write multiple queries to process your data, process battery event data separately.
Sample Input
1.
{
"device_id": "8c03b6cef760",
"name": "battery_status",
"value": 67
}
2.
{
"device_id": "8c03b6cef760",
"name": "cellular_connectivity",
"value": 67
}
Output
{
"id": "8c03b6cef760",
"battery_status": 67,
"_rid": "vYYFAIRr5b8LAAAAAAAAAA==",
"_self": "dbs/vYYFAA==/colls/vYYFAIRr5b8=/docs/vYYFAIRr5b8LAAAAAAAAAA==/",
"_etag": "\"8d001092-0000-0000-0000-5b7ffe8e0000\"",
"_attachments": "attachments/",
"_ts": 1535114894
}
ASA Query
SELECT
device_id as id,
LAST(value) OVER (PARTITION BY device_id LIMIT DURATION(day, 1) WHEN name = 'battery_status') AS battery_status
INTO status
FROM iothub
WHERE name = 'battery_status'

In Azure Eventhub how to send incoming data to a sql database

I have some data being collected that is in an xml format. Something that looks like
<OLDI_MODULE xmlns="">
<StStoHMI_IBE>
<PRack>0</PRack>
<PRackSlotNo>0</PRackSlotNo>
<RChNo>0</RChNo>
<RChSlotNo>0</RChSlotNo>
This data is sent to Azure Eventhub. I wanted to send this data to a SQL database. I created a stream in Azure Stream Analytics that takes this input and puts it in a SQL database. But when the input format is asked for the input stream, there are only JSON,CVS and Avro. Which of these formats can I use? Or which of the azure services should I use to move data from Eventhub to sql database?
By far the easiest option is to use Azure Stream Analytics as you intended to do. But yes, you will have to convert the xml to json or another supported format before you can use the data.
The other options is more complex, requires some code and a way to host the code (using a worker role or web job for instance) but gives the most flexibility. That option is to use an EventProcessor to read the data from the Event Hub and put it in a database.
See https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/ for how to set this up.
The main work is done in the Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable messages) method. Based on the example it will be something like:
async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (EventData eventData in messages)
{
string xmlData = Encoding.UTF8.GetString(eventData.GetBytes());
// Parse the xml and store the data in db using Ado.Net or whatever you're comfortable with
}
//Call checkpoint every 5 minutes, so that worker can resume processing from 5 minutes back if it restarts.
if (this.checkpointStopWatch.Elapsed > TimeSpan.FromMinutes(5))
{
await context.CheckpointAsync();
this.checkpointStopWatch.Restart();
}
}
JSON would be a good data format to be used in Azure Event Hub. Once you receive the data in Azure Event Hub. You can use Azure Stream Analytics to move the data SQL DB.
Azure Stream Analytics consists of 3 parts : input, query and output. Where input is the event hub , output is the SQL DB. The query should be written by you to select the desired fields and output it.
Check out the below article:
https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-define-outputs/
Stream Analytics would be Azure resource you should look into for moving the data from Event Hub

Resources