Azure Event Hub: What type of data it can ingest - azure

Is this true that Azure Stream Analytics can only accept JSON files?
Can you possibly ingest/send pipe delimited or other file formats to Event Hub and consume them from Stream Analytics?

Stream Analytics has a drop down menu labeled serialization (in the Manager or old portal at manage.windowsazure.com) that will allow you to choose CSV or Avro as well.

What is the file content? Maybe, there is a sense to put a file to the Azure storage and send a link to it to the Event Hub/Service Bus Queue? Or just put files to the storage and consume them with Stream Analytics.

Related

Process event files into Azure EventHub

I am fairly new to Azure .
I have a requirement where Source will send the event data in flat files. File will contain header and trailer records and events as data records. Each file will be in 10MB size and can contains about 50000-60000 events.
I want to process this file using python/scala and send the data into Azure eventhub. Can someone suggest me is this the best solution and how can I achieve this please?
Its an architectural question but you can use either Azure Logic Apps or Azure Functions.
First of all you should trigger whatever you choose by upload a file to Blob Storage. The file will gets picked and processed and then sent.
Use Azure Logic apps if you can simply parse the files for instance because they are JSON files and then simply repeat for each event and direct it to the event hub you want.
If the parsing of the files is more complex use Azure Functions, write up the code and output it to an event hub.

Azure stream analytics how to create a single parquet file

I have few IoT devices sending telemetry data to Azure Event Hub. I want to write a data to Parquet file in Azure Data Lake so that I can query that data using Azure Synapse.
I have Azure function triggered to Azure event hub, But I did not find a way directly to write a data received from device to Azure data Lake in Parquet format.
So what I am doing, I have Stream Analytics job - which has Input from Event hub and Output to Azure Data lake in Parquet format.
I have configured Stream analytics output path format as different format - but it would create multiple small files within the following folders.
*device-data/{unitNumber}/
device-data/{unitNumber}/{datetime:MM}/{datetime:dd}*
I want to have single parquet file for single device. Can someone help in this?
I have tried to configure Maximum time -> But the data wont get written to parquet file till this time get elapsed. I don't want this as well.
I want simple functionality - as soon as data received from the device to event hub, it should get appended to parquet file in Azure Data lake.

How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications

I want to send Azure Diagnostics to Kusto tables.
The idea is to get logs and metrics from various Azure resources by sending them to a storage account.
I'm following both Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications and Tutorial: Ingest and query monitoring data in Azure Data Explorer,
trying to use the best of all worlds - cheap intermediate storage for logs, and using EventHub only for notifications about the new blobs.
The problem is that only part of the data is being ingested.
I'm thinking that the problem is in the append blobs which monitoring creates. When Kusto receives "Created" notification, only a part of the blob is written, and the rest of events are never ingested as the blob is appended to.
My question is, how to make this scenario work? Is it possible at all, or I should stick with sending logs to EventHub without using the blobs with Event Grid?
Append blobs do not work nicely with Event Grid ADX ingestion, as they generate multiple BlobCreated events.
If you are able to cause blob rename on update completion, that would sole the problem.

Azure solution to save stream to blob files as parquet

I read about few different azure services - Events hub capture, Azure data factory, events hub, and more. I am trying to find several ways using azure services to do:
Write data to some "endpoint" or place from my application (preferably service of azure)
The data would be batched and saved in files to BLOB
Eventually, the format should be parquet in the BLOB files
My questions are:
I read that events hub capture only saves files as AVRO. So I might also consider second pipeline of copy from original AVRO BLOB to destination parquet BLOB. Is there a service in AZURE that can listen to my BLOB, convert all files to parquet and save again (I'm not sure from the documentation if the data factory can do this)?
What other alternatives would you consider (except Kafka that I know about) to save stream of data to batches of parquet in BLOB?
Thank you!
For the least amount of effort you can look into a combination of an Event Hub as your endpoint and then connect Azure Stream Analytics to that. It can natively write parquet to blob: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs#blob-storage-and-azure-data-lake-gen2

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Resources