custom output path from Azure Stream analytics - azure

I have an event hub which receives telemetry data from different devices. I created a stream analytics job to process this data and output it to various sinks (Power BI, Cosmos DB and Data Lake). While creating the data lake output I found that I couldn't set the output path based on the message payload. The path I can set inside the sink is of the format: [folder_structure]/{date}{time}. I need a very specific folder structure which would check the message payload and put the file in the specified location. Is there any way to do that?

This capability is currently available in private preview - for output to blob storage.
https://azure.microsoft.com/en-us/blog/4-new-features-now-available-in-azure-stream-analytics/
If this is something you can use, please provide details at the following url. we will add you to the preview program.
https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR8zMnUkKzk5Elg9i6hoUmJVUNDhIMjJESFdVNDhRODNMTVZTNDVIR0w2Qi4u

Related

Azure stream analytics how to create a single parquet file

I have few IoT devices sending telemetry data to Azure Event Hub. I want to write a data to Parquet file in Azure Data Lake so that I can query that data using Azure Synapse.
I have Azure function triggered to Azure event hub, But I did not find a way directly to write a data received from device to Azure data Lake in Parquet format.
So what I am doing, I have Stream Analytics job - which has Input from Event hub and Output to Azure Data lake in Parquet format.
I have configured Stream analytics output path format as different format - but it would create multiple small files within the following folders.
*device-data/{unitNumber}/
device-data/{unitNumber}/{datetime:MM}/{datetime:dd}*
I want to have single parquet file for single device. Can someone help in this?
I have tried to configure Maximum time -> But the data wont get written to parquet file till this time get elapsed. I don't want this as well.
I want simple functionality - as soon as data received from the device to event hub, it should get appended to parquet file in Azure Data lake.

Data Factory Email Errors when found on rows

I'm using the Copy Data task in Data Factory to copy data from CSV files in Azure Files to a SQL Azure DB.
Within the task there is a setting called Fault tolerance which can be set to skip and log incompatible rows which writes an error log to Azure Blob Storage.
However I'd like the errors picked up from the file to be emailed to a user to action and also store the list of errors in a DB rather than a log file in blob storage.
All features of Fault tolerance are established, no such email alert mechanism in that. However you could use workaround to implement your requirements.
Blob Trigger Azure Function to monitor the blob path you configured in the fault tolerance. Once the error logs streams into your blob file, you could collect the log and use send email sdk(For example,you could just configure the output as SendGrid service in MS) to the destinations you want.
As for store errors into DB,you could create another trigger function to configure the output as Table Storage.
Just a reminder,ADF has own monitor and alert mechanism. It's for all pipelines in ADF,not specific for copy activity. You could get an idea of it from this link.

Not working - Azure Stream Analytics with Blob OR Event as input and DocumentDB as Output Sink

Using Azure Stream Analytics to stream events from Blob OR Event Hubs to DocumentDB. Configuration has been done as per Microsoft documentation, "https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-get-started"
When I am trying to give input as Event Hubs OR Blob in stream analytics and output sink as documentdb, Then I am not able to see any json data in document explorer.
In the Stream Analytics Query, I uploaded a JSON File, and I am getting Output in a single line. I followed some links Azure Stream Analytics is not feeding DocumentDB output sink and Getting error in Azure Stream Analytics with DocumentDB as sink
But not able to figure out what's wrong.
Are you writing to a partitioned documentdb? If so, try providing criteria to filter on your partition key. I had a few collections that wouldn't show me documents in document explorer unless I provided criteria, even though I could return documents via query explorer.
Edit: You can provide this criteria by adding a filter.
Click the filter button:
Add filter criteria that includes whatever field you use for partition key. In my case I use deviceID, so my filter is c.deviceID = "SomeDeviceIDThatIWantToFind":

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Azure blob storage and stream analytics

I read what in azure blob very nice save some data for statistics or something else, after it create requests for blob and show statistics to website (dashboard).
But I don't know how to use stream analytics for showing statistics. It is some SDK for create query to blob and generate josn data. Or ... I don't know.
And I have more question about it:
How to save data to blob (it is json data or something else). I don't
know format data for it issue.
How to use stream analytics for create request to blob and after it get data for showing in dashboard.
And maybe you know how to use this technology. Help me please. Thanks, and have a nice day.
#Taras - did you get a chance to toy with the Stream Analytics UI?
When you add a blob input you can either add an entire container - which means Stream Analytics will scan the entire container for new files or you can specify a path prefix pattern which will make Stream Analytics look in only that path.
You can also specify tokens such as {date}, {time} on the path prefix pattern to help guide Stream Analytics on the files to read.
Generally speaking - it is highly recommended to use Event Hub as input for the improved latency.
As for output - you can either use Power BI which would give you an interactive dashboard or you can output to some storage (blob, table, SQL, etc...) and build a dashboard on top of that.
You can also try to do one of the walkthroughs to get a feel for Stream Analytics: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-twitter-sentiment-analysis-trends/
Thanks!
Ziv.

Resources