Azure IoT: How to access device twin metadata in stream job and azure function? - azure

Is it possible to access device twin meta data in stream jobs. I know I can upload data to blob store and access that in my stream job but that becomes very cumbersome.
Is there a way I can access device metadata (tags, desired properties, reported properties) in stream job so I can persist and make decision accordingly?

Basically, there are two ways how to persist the device twins.
The first one is to call a bulk job for exporting all devices to the Azure Blob Storage in the json formatted text. See more details here.
You can use a BlobTrigger Function to evaluate a blob contents or referencing a blob to the stream job.
The second way to persist a device twin is persisting a changes on the device twin. The Azure IoT Hub Routes can be configured with a route for TwinChangeEvents to the custom endpoint such as Azure Blob Storage. Note, that the blob data has an avro serialized format. More details about this routes is here.
Based on the above, both blobs can be referenced to the stream job for their analyzing.

Related

How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications

I want to send Azure Diagnostics to Kusto tables.
The idea is to get logs and metrics from various Azure resources by sending them to a storage account.
I'm following both Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications and Tutorial: Ingest and query monitoring data in Azure Data Explorer,
trying to use the best of all worlds - cheap intermediate storage for logs, and using EventHub only for notifications about the new blobs.
The problem is that only part of the data is being ingested.
I'm thinking that the problem is in the append blobs which monitoring creates. When Kusto receives "Created" notification, only a part of the blob is written, and the rest of events are never ingested as the blob is appended to.
My question is, how to make this scenario work? Is it possible at all, or I should stick with sending logs to EventHub without using the blobs with Event Grid?
Append blobs do not work nicely with Event Grid ADX ingestion, as they generate multiple BlobCreated events.
If you are able to cause blob rename on update completion, that would sole the problem.

Azure solution to save stream to blob files as parquet

I read about few different azure services - Events hub capture, Azure data factory, events hub, and more. I am trying to find several ways using azure services to do:
Write data to some "endpoint" or place from my application (preferably service of azure)
The data would be batched and saved in files to BLOB
Eventually, the format should be parquet in the BLOB files
My questions are:
I read that events hub capture only saves files as AVRO. So I might also consider second pipeline of copy from original AVRO BLOB to destination parquet BLOB. Is there a service in AZURE that can listen to my BLOB, convert all files to parquet and save again (I'm not sure from the documentation if the data factory can do this)?
What other alternatives would you consider (except Kafka that I know about) to save stream of data to batches of parquet in BLOB?
Thank you!
For the least amount of effort you can look into a combination of an Event Hub as your endpoint and then connect Azure Stream Analytics to that. It can natively write parquet to blob: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs#blob-storage-and-azure-data-lake-gen2

Uploading file to Azure BLOB using IoT Hub - Permissions

I'm uploading files from a Raspberry Pi to Azure Blob storage using an Azure IoT hub, using this microsoft tutorial as the basis for my C# code, and it's working fine.
Looking at the Microsoft documentation for the method UploadToBlobAsync(), "If the blob already exists, it will be overwritten."
I'm wondering if there's any way to restrict the device's permissions to create-only in the Azure portal or via PowerShell. My concern is that should someone access the device's storage and get the device id and key they would have the means to delete or overwrite files previously uploaded by that device in the storage container.
As a work-around I could have a server-side process pick up files once they've been received and move them elsewhere, but if the device id/key was restricted to create-only then I wouldn't need this overhead.
The method UploadToBlobAsync (assembly Microsoft.Azure.Devices.Client.UWP) is a wrapper of the REST API sequence calls for uploading a blob to the Azure Storage container.
The following sequence is processed:
REST API call to the Azure IoT Hub to obtain a reference for uploading blob, see the following screen snippet:
As you can see in the above picture, the sasToken for this operation has been generated for read/write.
Once the device received the above response, the REST API PUT the blob can be called.
Here is my suggestion. The device can call REST API Get the metadata of the blob, see the following screen snippet:
Based on the above result, this sequence can be either skipped or continue for actually uploading blob using the REST API PUT.
This is a last step of the sequence (very important). The device need to send a notification to the Azure IoT Hub with the status of the uploading sequence. The following screen snippet shows this REST API call:
Well, as you can see the above step #2 can decide about the skipping or overwriting the upload blob process.

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Where is Azure Event Hub messages stored?

I generated a SAS signature using this RedDog tool and successfully sent a message to Event Hub using the Events Hub API refs. I know it was successful because I got a 201 Created response from the endpoint.
This tiny success brought about a question that I have not been able to find an answer to:
I went to the azure portal and could not see the messages I created anywhere. Further reading revealed that I needed to create a storage account; I stumbled on some C# examples (EventProcessorHost) which requires the storage account creds etc.
Question is, are there any APIs I can use to persist the data? I do not want to use the C# tool.
Please correct me if my approach is wrong, but my aim is to be able to post telemetries to EventHub, persist the data and perform some analytics operations on it. The telemetry data should be viewable on Azure.
You don't have direct access to the transient storage used for EventHub messages, but you could write a consumer that reads from the EventHub continuously and persist the messages to Azure Table or to Azure Blob.
The closest thing you will find to a way to automatically persist messages (as with Amazon Kinesis Firehose vs Amazon Kinesis which EventHubs are basically equivalent to), would be to use Azure Streaming Analytics configured to write the output either to Azure Blob or to Azure Table. This example shows how to set up a Streaming Analytics job that passes the data through and stores it in SQL, but you can see the UI where you can choose a choice such as Azure Table. Or you can get an idea of the options from the output API.
Of course you should be aware of the requirements around serialization that led to this question
The Event Hub stores data for maximum of 7 days; that’s too in standard pricing tier. If you want to persist the data for longer in a storage account, you can use the Event Hub Capture feature. You don’t have to write a single line of code to achieve this. You can configure it through Portal or ARM template. This is described in this document - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview
The event hub stores it’s transient data in Azure storage. It doesn’t give any more detail in relation to the data storage. This is evident from this documentation - https://learn.microsoft.com/en-us/azure/event-hubs/configure-customer-managed-key
The storage account you need for EventProcessorHost is only used for checkpointing or maintaining the offset of the last read event in a partition.

Resources