Process event files into Azure EventHub - azure

I am fairly new to Azure .
I have a requirement where Source will send the event data in flat files. File will contain header and trailer records and events as data records. Each file will be in 10MB size and can contains about 50000-60000 events.
I want to process this file using python/scala and send the data into Azure eventhub. Can someone suggest me is this the best solution and how can I achieve this please?

Its an architectural question but you can use either Azure Logic Apps or Azure Functions.
First of all you should trigger whatever you choose by upload a file to Blob Storage. The file will gets picked and processed and then sent.
Use Azure Logic apps if you can simply parse the files for instance because they are JSON files and then simply repeat for each event and direct it to the event hub you want.
If the parsing of the files is more complex use Azure Functions, write up the code and output it to an event hub.

Related

Email notification about new files on Azure File share

What are possible ways to implement such scenario?
I can think of some Azure function which will periodically check the share for new files. Are there any other possibilities.
I have been thinking also about duplicating the files to Blob storage and generate the notifications from there.
Storage content trigger is by default available for blobs. If you look for migrating to blob storage, then you can utilise BlobTrigger Azure function. In case of file trigger in File Share, the below are my suggestions as requested:
A TimerTrigger Azure function that acts as a poll to check for new file in that time frame the previous trigger occured.
Recurrence trigger in logic app to poll and check for new contents.
A continuous WebJob to continuously poll the File Share checking for new contents.
In my opinion, duplicating the files to Blob storage and making your notification work may not be a great option, because such operation once again requires a polling mechanism which can be achieved with options like a few mentioned above, but is still unnecessary.

Security concerns when uploading data to Azure Blob Storage directly from frontend

I am working on a project where we have to store some audio/video files on Azure Blob Storage and after the file is uploaded we need to calculate some price on the basis of the length of the file in minutes. We have an Angular frontend and the idea was to upload the file directly from the frontend, get the response from Azure with the file stats , then call a backend API to put that data in the database.
What I am wondering is what are the chances of manipulation of data in between getting the file data back from Azure and calling our backend API. Is there any chance the length could be modified before sending it to our API?
One possible solution would be to make use of Azure Event Grid with Blob integration. Whenever a blob is uploaded, an event will be raised automatically that you can consume in an Azure Function and save the data in your database.
There's a possibility that a user might re-upload same file with different size. If that happens, you will get another event (apart from the original event when the blob was first created). How you handle with updates would be entirely up to you.

Sending excel file from a Blob Storage to a REST Endpoint in Azure Functions with Node

Working on a small personal project where I can drop an .xlsx file on Azure Blob and it'll trigger( Node.js Blob Storage Trigger fn ) and send to a REST endpoint to be parsed and worked with etc.
I've been able to set it up and have the file be moved to another blob( intend to set up logic on the HTTP response to REST endpoint to then archive said file);
I'm not exactly sure how to set up the correct code and bindings to take the ingested .xlsx file and send the whole thing to an endpoint.
Bonus Question: is it better practice to zip the file or convert to binary or anything before sending? Performance isn't too big of a concern currently.
Thanks for any information or any pointers.
The recommended approach is to use event grid trigger of blob storage to trigger an event grid trigger based Azure function. Please refer these which seems to meet your requirement
Blob Event and Event Grid Trigger For Azure Function.
Note: Using blob trigger of Azure function may not be as reliable as Event Grid trigger for high volume scenario.
To answer to your bonus question, I think it won't be much beneficial to zip your .xlsx files since those are already compressed behind the scene.

Azure Data Factory Lookup Source Data and Mail Notification

I am trying my best to solve following scenario.
I am using PowerShell scripts to collect some information about my server environments and saving like .csv files.
There are information about Hardware, Running Services etc. in the .csv files.
I am sending these .csv files into Blob Storage and using Azure Data Factory V2 Pipelines to write these information into Azure SQL. I have succesfully configured mail notification via Azure Logic Apps that is informing me the Pipeline Run was succesfull/unsuccesfull.
Now I am trying to lookup into source data to find concrete column. In my scenario it is column with the name of Windows Service - for example - Column: PrintSpooler - Row: Running.
So I need to lookup for concretely column and also send a mail notification if the service is running or it is stopped.
Is there any way how to do that ?
In ideal way I want to receive a mail only in case the Service in my Source Data is stopped.
Thank you for any ideas.
Do you update the .csv file or upload a new .csv file?
If you upload a new .csv, then you can use azure function blob trigger.
This trigger will collect the new upload blob and you can do process on this blob. You can get the data in the .csv file and create an alert to your email.
This is the offcial document of azure function timetrigger:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-scheduled-function
In the blobtrigger, you can search whether there is a value in the .csv file and then you can set an output binding.
And then, go to this place:
Then you will get the alert in your email when the data in csv file is meet your requirement.

Stream Analytics: Dynamic output path based on message payload

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.
I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.
Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?
Stream Analytics is indeed oriented for efficient processing of large streams.
For your use case, you need an additional component to implement your custom logic.
Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.
From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

Resources