Use Azure Stream Analytics for a simple Data pass-through - azure

I want to build an IoT-Architecture with Azure Services. The Data comes from different IoT-Devices and gets received by an Event-Hub. The Event-Hub passes the Data to a Stream Analytics Service and to a Worker Role. The Worker Role should calculate parameters and pass them to a Service-Bus-Queue. The Stream Analytics Service should simply act as a "Storage Writer" and pass the Data through into a Blob-Storage, for the case that we need more explicit Data later.
Is Stream Analytics the right Service for this purpose or is it kind of oversized?

Yes, using Azure Stream Analytics to perform low latency writes to blob storage without any data transformation (as a passthrough) is a supported scenario. You would implement this with a SELECT * FROM [input] query.

Related

Grafana as Azure Stream Analytics output

I am pushing events to my Event hub, then this data is being analyzed in Azure Stream Analytics. I'd like to visualize output from stream analytics in Grafana.
What is the easiest approach to achieve this?
Azure Stream Analytics job can natively ingest the data into Azure Data Explorer. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
You can then use the Azure Data Explorer plugin in Grafana. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
Another option is to use Power BI instead of Grafana. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
If I remember correctly, Grafana doesn't store data locally, you need to define a data source on top of one of the compatible storage systems.
Azure Stream Analytics doesn't come with a storage layer either, it's compute only.
So if you want to use ASA and Grafana, you need to output data from ASA to a data source that is supported by Grafana in ingress.
Looking at both lists that leaves only MSSQL via Azure SQL (hopefully it's compatible) as a native option. It's not a bad option for narrow dashboards, or if you intend to store your data in a RDBMS anyway. You can store your entire payload in an NVARCHAR(MAX) if you don't plan to consume the data in SQL.
But being clever, we can actually use the Functions output to write to any other store, or call any API. I'm not sure if Grafana has a direct ingestion API, but Azure Monitor does and it's a supported data source in Grafana.
The other option would be to go through ADX as explained in the other answer.
Not straightforward but doable ;)

Azure SQL can't handle incoming stream analytics data

I have a scenario where event hub gets data in every 10 seconds, which pass to the stream analytics and then which is passed to the Azure SQL Server. The technical team raised the concerns that Azure SQL is unable to handler so much of data, if data raises 2,00,00,000. then it stops to work.
Can you please guide me is it actual problem of Azure SQL, if it is then can you please suggest me the solution.
Keep in mind that 4TB is the absolute maximum size of an Azure SQL Premium instance. If you plan to store all events for your use case, then this will fill up very quickly. Consider using CosmosDb or Event Hub Capture if you really need to store the messages indefinitely and use SQL for aggregates after processing with SQL DW or ADLS.
Remeber that to optimise Event Hubs you must have a partitioning strategy to optimise the throughput. See the docs.

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Where is Azure Event Hub messages stored?

I generated a SAS signature using this RedDog tool and successfully sent a message to Event Hub using the Events Hub API refs. I know it was successful because I got a 201 Created response from the endpoint.
This tiny success brought about a question that I have not been able to find an answer to:
I went to the azure portal and could not see the messages I created anywhere. Further reading revealed that I needed to create a storage account; I stumbled on some C# examples (EventProcessorHost) which requires the storage account creds etc.
Question is, are there any APIs I can use to persist the data? I do not want to use the C# tool.
Please correct me if my approach is wrong, but my aim is to be able to post telemetries to EventHub, persist the data and perform some analytics operations on it. The telemetry data should be viewable on Azure.
You don't have direct access to the transient storage used for EventHub messages, but you could write a consumer that reads from the EventHub continuously and persist the messages to Azure Table or to Azure Blob.
The closest thing you will find to a way to automatically persist messages (as with Amazon Kinesis Firehose vs Amazon Kinesis which EventHubs are basically equivalent to), would be to use Azure Streaming Analytics configured to write the output either to Azure Blob or to Azure Table. This example shows how to set up a Streaming Analytics job that passes the data through and stores it in SQL, but you can see the UI where you can choose a choice such as Azure Table. Or you can get an idea of the options from the output API.
Of course you should be aware of the requirements around serialization that led to this question
The Event Hub stores data for maximum of 7 days; that’s too in standard pricing tier. If you want to persist the data for longer in a storage account, you can use the Event Hub Capture feature. You don’t have to write a single line of code to achieve this. You can configure it through Portal or ARM template. This is described in this document - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview
The event hub stores it’s transient data in Azure storage. It doesn’t give any more detail in relation to the data storage. This is evident from this documentation - https://learn.microsoft.com/en-us/azure/event-hubs/configure-customer-managed-key
The storage account you need for EventProcessorHost is only used for checkpointing or maintaining the offset of the last read event in a partition.

Azure Storm vs Azure Stream Analytics

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences.
It depends on your needs and requirements. I'll try to lay out the strengths and benefits of both. In terms of setup, Stream Analytics has Storm beat. Stream Analytics is great if you need to ask a lot of different questions often. Stream Analytics can also only handle CSV or JSON type data. Stream Analytics is also at the mercy of only sending outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI; any other output will require Storm. Stream Analytics lacks the data transformation capabilities of Storm.
Storm:
Data Transformation
Can handle more dynamic data (if you're willing to program)
Requires programming
Stream Analytisc
Ease of Setup
JSON and CSV format only
Can change queries within 4 minutes
Only takes inputs from Event Hub, Blob Storage
Only outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI
If you are looking for versatility over flexibility. I'd go with Stream Analytics, if you require specific operations that are limited by Stream Analytics, it's worth looking into Spark, which gives you data persistence options. On the Stream Analytics outputs side, one interesting thing would be to output into an Event Hub and consume it from there giving you unlimited flexibility on how you want to consume the data.
Below is the output options for Stream Analytics and the link for Apache Spark on Azure
Hope this helps.

Resources