Preserve last value with Azure Stream Analytics - azure

We are collecting device events via IoT Hub which are then processed with Stream Analytics. We want to generate a status overview containing the last value of every measurement. The status is then written to a CosmosDB output, one document per device.
The simplified query looks like this:
SELECT
device_id as id,
LAST(value) OVER (PARTITION BY device_id LIMIT DURATION(day, 1) WHEN name = 'battery_status') AS battery_status
INTO status
FROM iothub
The resulting document should be (also simplified):
{
"id": "8c03b6cef760",
"battery_status": 95
}
The problem is that not all events contain a battery_status and whenever the last event with battery_status is older than the specified duration, the last value in the CosmosDB document is overwritten with NULL.
What I would need is some construct to omit the value entirely when there is no data and consequently preserve the last value in the output document. Any ideas how I could achieve this?

Currently Azure Stream Analytics does not support partition your output to CosmosDB
for each device.
There are two option to workaround.
You can choose Azure Function to workaround. In azure function you can create a IoT Hub trigger, filter the data with battery_status property ,and then store the data to CosmosDB for each device programmatically.
You can choose Azure Storage Container instead CosmosDB, and then config Azure Storage Container as endpoints and message route in Azure IoT Hub, please refer to IoT Hub Endpoints and this tutorial about how to save IoT hub messages that contain sensor data to your Azure blob storage. In the route config, you can add query string for filtering the data.

If I understand your problem correctly, I just put where condition to filter non battery_status data. You can write multiple queries to process your data, process battery event data separately.
Sample Input
1.
{
"device_id": "8c03b6cef760",
"name": "battery_status",
"value": 67
}
2.
{
"device_id": "8c03b6cef760",
"name": "cellular_connectivity",
"value": 67
}
Output
{
"id": "8c03b6cef760",
"battery_status": 67,
"_rid": "vYYFAIRr5b8LAAAAAAAAAA==",
"_self": "dbs/vYYFAA==/colls/vYYFAIRr5b8=/docs/vYYFAIRr5b8LAAAAAAAAAA==/",
"_etag": "\"8d001092-0000-0000-0000-5b7ffe8e0000\"",
"_attachments": "attachments/",
"_ts": 1535114894
}
ASA Query
SELECT
device_id as id,
LAST(value) OVER (PARTITION BY device_id LIMIT DURATION(day, 1) WHEN name = 'battery_status') AS battery_status
INTO status
FROM iothub
WHERE name = 'battery_status'

Related

Can't debug Azure Stream Analytics Cosmos DB output in VS Code

I am trying to build an Azure Stream Analytics job in VS Code using the Azure Stream Analytics Tools extension. I have added an event hub as an input and a data lake gen 2 storage account as an output and I can successfully run the job in VS Code using "Use Live Input and Live Output".
The issue I'm having is when I try to set the output to an Azure Cosmos DB Document DB instead I get an error "Failed to convert output 'cosmosdb' : Unsupported data source type.." when trying to use live input and output. I can however use successfully run the job using "Live input and local output"
Is this a limitation of the VS Code extension that you can't debug live output against Cosmos DB? Or have I set something up incorrectly in my cosmos db output? See cosmos db output code
{
"Name": "cosmosdb",
"DataSourceType": "DocumentDB",
"DocumentDbProperties": {
"AccountId": "cosmosdb-dev-eastau-001",
"AccountKey": null,
"Database": "cosmosdb_db",
"ContainerName": "container1",
"DocumentId": ""
},
"DataSourceCredentialDomain": "xxxxxxxxxxxxxxxxxxxxxxxxxxxx.StreamAnalystics",
"ScriptType": "Output"
}
For Live Input to Live Output mode, the only supported output adapters (for now) are Event Hub, Storage Account, and Azure SQL. https://learn.microsoft.com/en-us/azure/stream-analytics/visual-studio-code-local-run-all#local-run-modes

How to get all blob storage(new or updated one) using azure functions time trigger

I have requirement where i need to get all the blob storage which are updated or added after a particular time duration.
Example :- In container I have list of zip file as a blob, I need to get all the updated or newly added blob in a given interval like after every 1 hour I need to get all the newly added or updated blob.
So I have used azure function where created one time trigger function but could not able to get all the blob(updated or newly added).
Could anyone let me know how I can solve this problem.
function.json file
"bindings": [
{
"name": "myTimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 */2 * * * *"
},
{
"type": "blob",
"name": "myBlob",
"path": "*****/******.zip",
"connection": "***************",
"direction": "in"
}
],
"disabled": false
}
index.js
module.exports = function(context, trigger, inputBlob) {
context.log(inputBlob);
//it's also available on context.bindings
context.log(context.bindings.inputBlob); // will log the same thing as above
context.done();
}
Thanks in Advance.
Ideally, Functions work best if you use them in reactive way, i.e. when Function is run on Blob change event directly (or via Event Grid).
If you have to stick to timer and then find all changed Blobs, Azure Function bindings won't help you. In this case, remove the input binding that you were trying to declare and search for changed blobs with Blob Storage API. I believe Azure Storage SDK for Node.js supports listing the blobs, but I haven't used it.
Your scenario is a good candidate for using an Azure Event Grid (now in the preview) solution with an event-driven blob storage publisher. More details here.
Basically, there is no limitation for number of containers, blob storages either the Azure subscriptions. If your blob storage has been subscribed for the event interest, such as the blob has been created or deleted, the custom filtered event message can be delivered to the subscriber, for instance, the EventGridTrigger Function.
The following screen snippet shows an example of the event-driven blob storages:
The following logs shows a received an event message by function, when the blob has been deleted:
Note, that the event message sent by blob storage publisher can be filtered in the subscription based on subject and/or eventType properties. In other words, each subscription can tell to Event Grid for its event source interest.
In the case of streaming events and their analyzing, the Event Grid can be subscribed for Event Hub subscriber, see the following screen snippet:
All events from the source interest will ingest to the Event Hub which it represents an entry point of the stream pipeline. The stream of the events, such as the event messages of the created/deleted blobs across the accounts and/or azure subscriptions is analyzing by ASA job based on the needs. The output of the ASA job will trigger an Azure Function to finish a business requirements.
More details about the Event Hub as a destination for Event Grid is here.

azure stream analytics to cosmos db

I have a trouble saving telemetry that are coming from Azure IoT hub to Cosmos DB. I have the following setup:
IoT Hub - for events aggregation
Azure Stream Analytics - for event stream processing
Cosmos DB with Table API. Here I created 1 table.
The sample message from IoT Hub:
{"id":33,"deviceId":"test2","cloudTagId":"cloudTag1","value":24.79770721657087}
The query in stream analytics which processes the events:
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as '$pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
the proble is following every time the job tries to save the output to CosmosDB I get an error An error occurred while preparing data for DocumentDB. The output record does not contain the column '$pk' to use as the partition key property by DocumentDB
Note: I've added $pk column and PartitionKey when trying to solve the problem.
EDIT Here, is the output configuration:
Does anyone know what I'm doing wrong?
Unfortunately the Table API from CosmosDB is not supported yet as output sink for ASA.
If want to use Table as output, you can use the one under Storage Account.
Sorry for the inconvenience.
We will add the Cosmos DB Table API in the future.
Thanks!
JS - Azure Stream Analytics team
I had this problem also. Although it isn't clear in the UI only the SQL API for CosmosDB is currently supported. I switched over to that and everything worked fantastically.
Try with
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as 'pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
The Special char is the problem.
While create the output with partition as 'id' and while insert query 'deviceId' as PartitionKey, because of that it is not partition correctly.
Example:
SELECT
id as PartitionKey, SUM(CAST(temperature AS float)) AS temperaturesum ,AVG(CAST(temperature AS float)) AS temperatureavg
INTO streamout
FROM
Streaminput TIMESTAMP by Time
GROUP BY
id ,
TumblingWindow(second, 60)

In Azure Eventhub how to send incoming data to a sql database

I have some data being collected that is in an xml format. Something that looks like
<OLDI_MODULE xmlns="">
<StStoHMI_IBE>
<PRack>0</PRack>
<PRackSlotNo>0</PRackSlotNo>
<RChNo>0</RChNo>
<RChSlotNo>0</RChSlotNo>
This data is sent to Azure Eventhub. I wanted to send this data to a SQL database. I created a stream in Azure Stream Analytics that takes this input and puts it in a SQL database. But when the input format is asked for the input stream, there are only JSON,CVS and Avro. Which of these formats can I use? Or which of the azure services should I use to move data from Eventhub to sql database?
By far the easiest option is to use Azure Stream Analytics as you intended to do. But yes, you will have to convert the xml to json or another supported format before you can use the data.
The other options is more complex, requires some code and a way to host the code (using a worker role or web job for instance) but gives the most flexibility. That option is to use an EventProcessor to read the data from the Event Hub and put it in a database.
See https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/ for how to set this up.
The main work is done in the Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable messages) method. Based on the example it will be something like:
async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (EventData eventData in messages)
{
string xmlData = Encoding.UTF8.GetString(eventData.GetBytes());
// Parse the xml and store the data in db using Ado.Net or whatever you're comfortable with
}
//Call checkpoint every 5 minutes, so that worker can resume processing from 5 minutes back if it restarts.
if (this.checkpointStopWatch.Elapsed > TimeSpan.FromMinutes(5))
{
await context.CheckpointAsync();
this.checkpointStopWatch.Restart();
}
}
JSON would be a good data format to be used in Azure Event Hub. Once you receive the data in Azure Event Hub. You can use Azure Stream Analytics to move the data SQL DB.
Azure Stream Analytics consists of 3 parts : input, query and output. Where input is the event hub , output is the SQL DB. The query should be written by you to select the desired fields and output it.
Check out the below article:
https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-define-outputs/
Stream Analytics would be Azure resource you should look into for moving the data from Event Hub

How can you fetch data from an http rest endpoint as an input for an Azure data factory?

How can you fetch data from an http rest endpoint as an input for a data factory?
My use case is to fetch new data hourly from a rest HTTP GET and update/insert it into a document db in azure.
Can you just create an endpoint like this and put in the rest endpoint?
{
"name": "OnPremisesFileServerLinkedService",
"properties": {
"type": "OnPremisesFileServer",
"description": "",
"typeProperties": {
"host": "<host name which can be either UNC name e.g. \\\\server or localhost for the same machine hosting the gateway>",
"gatewayName": "<name of the gateway that will be used to connect to the shared folder or localhost>",
"userId": "<domain user name e.g. domain\\user>",
"password": "<domain password>"
}
}
}
And what kind of component do I add to create the data transformation job - I see that there is a a bunch of things like hdinsight, data lake and batch but not sure what the differences or appropriate service would be to simply upsert the new set into the azure documentDb.
I think the simplest way will be to use the Azure Logic Apps.
You can make a call to any Restfull service using the Http Connector in Azure Logic App connectors.
So you can do GET and POST/PUT etc in a flow based on schedule or based on some other GET listener:
Here is the documentation for it:
https://azure.microsoft.com/en-us/documentation/articles/app-service-logic-connector-http/
To do this with Azure Data Factory you will need to utilize Custom Activities.
Similar question here:
Using Azure Data Factory to get data from a REST API
If Azure Data Factory is not an absolute requirement Aram's suggestion might serve you better utilizing Logic Apps.
Hope that helps.
This can be achieved with Data Factory. This is especially good if you want to run batches on a schedule and have a single place for monitoring and management. There is sample code in our GitHub repo for an HTTP loader to blob here https://github.com/Azure/Azure-DataFactory. Then, the act of moving data from the blob to docdb will do the insert for you using our DocDB connector. There is a sample on how to use this connector here https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-documentdb-connector/ Here are the brief steps you will take to fulfill your scenario
Create a custom .NET activity to get your data to blob.
Create a linked service of type DocumentDb.
Create linked service of type AzureStorage.
Use input dataset of type AzureBlob.
Use output dataset of type DocumentDbCollection.
Create and schedule a pipeline that includes your custom activity, and a Copy Activity that uses BlobSource and DocumentDbCollectionSink schedule the activities to the required frequency and availability of the datasets.
Aside from that, choosing where to run your transforms (HDI, Data Lake, Batch) will depend on your I/o and perf reqs. You can choose to run your custom activity on Azure Batch or HDI in this case.

Resources