Stream Analytics: Dynamic output path based on message payload - azure

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?

Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Related

How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications

I want to send Azure Diagnostics to Kusto tables.
The idea is to get logs and metrics from various Azure resources by sending them to a storage account.
I'm following both Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications and Tutorial: Ingest and query monitoring data in Azure Data Explorer,
trying to use the best of all worlds - cheap intermediate storage for logs, and using EventHub only for notifications about the new blobs.
The problem is that only part of the data is being ingested.
I'm thinking that the problem is in the append blobs which monitoring creates. When Kusto receives "Created" notification, only a part of the blob is written, and the rest of events are never ingested as the blob is appended to.
My question is, how to make this scenario work? Is it possible at all, or I should stick with sending logs to EventHub without using the blobs with Event Grid?
Append blobs do not work nicely with Event Grid ADX ingestion, as they generate multiple BlobCreated events.
If you are able to cause blob rename on update completion, that would sole the problem.

Azure IoT: How to access device twin metadata in stream job and azure function?

Is it possible to access device twin meta data in stream jobs. I know I can upload data to blob store and access that in my stream job but that becomes very cumbersome.
Is there a way I can access device metadata (tags, desired properties, reported properties) in stream job so I can persist and make decision accordingly?
Basically, there are two ways how to persist the device twins.
The first one is to call a bulk job for exporting all devices to the Azure Blob Storage in the json formatted text. See more details here.
You can use a BlobTrigger Function to evaluate a blob contents or referencing a blob to the stream job.
The second way to persist a device twin is persisting a changes on the device twin. The Azure IoT Hub Routes can be configured with a route for TwinChangeEvents to the custom endpoint such as Azure Blob Storage. Note, that the blob data has an avro serialized format. More details about this routes is here.
Based on the above, both blobs can be referenced to the stream job for their analyzing.

Azure Event Hub: What type of data it can ingest

Is this true that Azure Stream Analytics can only accept JSON files?
Can you possibly ingest/send pipe delimited or other file formats to Event Hub and consume them from Stream Analytics?
Stream Analytics has a drop down menu labeled serialization (in the Manager or old portal at manage.windowsazure.com) that will allow you to choose CSV or Avro as well.
What is the file content? Maybe, there is a sense to put a file to the Azure storage and send a link to it to the Event Hub/Service Bus Queue? Or just put files to the storage and consume them with Stream Analytics.

Where is Azure Event Hub messages stored?

I generated a SAS signature using this RedDog tool and successfully sent a message to Event Hub using the Events Hub API refs. I know it was successful because I got a 201 Created response from the endpoint.
This tiny success brought about a question that I have not been able to find an answer to:
I went to the azure portal and could not see the messages I created anywhere. Further reading revealed that I needed to create a storage account; I stumbled on some C# examples (EventProcessorHost) which requires the storage account creds etc.
Question is, are there any APIs I can use to persist the data? I do not want to use the C# tool.
Please correct me if my approach is wrong, but my aim is to be able to post telemetries to EventHub, persist the data and perform some analytics operations on it. The telemetry data should be viewable on Azure.
You don't have direct access to the transient storage used for EventHub messages, but you could write a consumer that reads from the EventHub continuously and persist the messages to Azure Table or to Azure Blob.
The closest thing you will find to a way to automatically persist messages (as with Amazon Kinesis Firehose vs Amazon Kinesis which EventHubs are basically equivalent to), would be to use Azure Streaming Analytics configured to write the output either to Azure Blob or to Azure Table. This example shows how to set up a Streaming Analytics job that passes the data through and stores it in SQL, but you can see the UI where you can choose a choice such as Azure Table. Or you can get an idea of the options from the output API.
Of course you should be aware of the requirements around serialization that led to this question
The Event Hub stores data for maximum of 7 days; that’s too in standard pricing tier. If you want to persist the data for longer in a storage account, you can use the Event Hub Capture feature. You don’t have to write a single line of code to achieve this. You can configure it through Portal or ARM template. This is described in this document - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview
The event hub stores it’s transient data in Azure storage. It doesn’t give any more detail in relation to the data storage. This is evident from this documentation - https://learn.microsoft.com/en-us/azure/event-hubs/configure-customer-managed-key
The storage account you need for EventProcessorHost is only used for checkpointing or maintaining the offset of the last read event in a partition.

Use Azure Stream Analytics for a simple Data pass-through

I want to build an IoT-Architecture with Azure Services. The Data comes from different IoT-Devices and gets received by an Event-Hub. The Event-Hub passes the Data to a Stream Analytics Service and to a Worker Role. The Worker Role should calculate parameters and pass them to a Service-Bus-Queue. The Stream Analytics Service should simply act as a "Storage Writer" and pass the Data through into a Blob-Storage, for the case that we need more explicit Data later.
Is Stream Analytics the right Service for this purpose or is it kind of oversized?
Yes, using Azure Stream Analytics to perform low latency writes to blob storage without any data transformation (as a passthrough) is a supported scenario. You would implement this with a SELECT * FROM [input] query.

Resources