Design parallel processing of Azure Blob files

Design parallel processing of Azure Blob files - azure

I'm a newbie to .net and Azure. I'm trying to design a pipeline to process files created in Azure container. I have a created event grid blob trigger on the container to get the metadata of the files created. I have two options now
Use Azure function to consume the metadata from event grid and process it. I believe the Azure function can scale out based on traffic in event grid. But the files could be large, having a size of 60 GB. I read Azure functions are not ideal for long processing. Does Azure function works for my case ?
Use storage queue to consume the metadata from event grid. Create an application to consume from the queue and process files.
Please suggest what kind of application I can develop and deploy so that I could achieve scale out/in based on the queue traffic and process large blob files efficiently.

Related

How to consume a storage queue from data factory pipeline?

I am working on a project where I need to consume the entries of the storage queue from a data factory pipeline.
Files will be uploaded to a blob storage which triggers a azure function. This azure function writes into a storage queue. Now I want to consume the entries of this storage queue. Due to the fact that the storage queue provide a rest api to consume data, I can use a web client in the azure data factory which can be scheduled every few minutes. But I would prefere a more direct way, so that when the storage queue has been filled, my pipeline should be starting.
I am quite new to the azure world, so now I am searching for solution. Is there a way to subscribe to the storage queue? I can see that there is the possibilty to create custom triggers in the data factory how can I connect to a storage queue there? Or is there another way?

Thank you #Scott Mildenberger for pointing out in the right direction. After taking the inputs and reproducing from my end, this was working when I used Queue Trigger called When a specified number of messages are in a given queue (V2) where we can specify the Threshold of the Queue to get the flow triggered. Below is the flow of my Logic App.
RESULTS:
In Logic App
In ADF

How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications

I want to send Azure Diagnostics to Kusto tables.
The idea is to get logs and metrics from various Azure resources by sending them to a storage account.
I'm following both Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications and Tutorial: Ingest and query monitoring data in Azure Data Explorer,
trying to use the best of all worlds - cheap intermediate storage for logs, and using EventHub only for notifications about the new blobs.
The problem is that only part of the data is being ingested.
I'm thinking that the problem is in the append blobs which monitoring creates. When Kusto receives "Created" notification, only a part of the blob is written, and the rest of events are never ingested as the blob is appended to.
My question is, how to make this scenario work? Is it possible at all, or I should stick with sending logs to EventHub without using the blobs with Event Grid?

Append blobs do not work nicely with Event Grid ADX ingestion, as they generate multiple BlobCreated events.
If you are able to cause blob rename on update completion, that would sole the problem.

Azure Storage Webhook being triggered by historical events

I have an Azure Function which uses the webhook bindings to be triggered by each upload or modification of a blob in an Azure Storage container.
This seems to work fine on an empty test container, i.e. when uploading the first blob or modifying one of two or three blobs in the test container.
However, when I point it towards a container with approximately a million blobs it receives a continuous stream of historic blob events.
I've read that
If the blob container being monitored contains more than 10,000 blobs,
the Functions runtime scans log files to watch for new or changed
blobs.
[source]
Is there any way I can ignore these historical events and consider only current events?

Launch Azure Container Service on Upload to Blob Storage

I have a use case where I'd like to launch a job on an Azure Container Service cluster to process a file being uploaded to Blob storage. I know that I can trigger an Azure Functions instance from the upload, but I haven't been able to find examples in the documentation of starting a job within Functions.
This diagram illustrates the AWS equivalent of what I want:
Thanks!

The Azure Event Grid feature is what you need. It is still in preview, but you can subscribe to the Blob Created event. You can set the subscriber endpoint to an Azure Function that puts a message in a queue to trigger your job, or you can expose a service on your cluster that will accept the request and do whatever you need done.
Microsoft provides a guide at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-quickstart?toc=%2fazure%2fevent-grid%2ftoc.json#create-a-message-endpoint

How to copy files from a custom encrypted azure blob container to Azure encrypted container

We have a very large blob container containing blob blocks which we Encrypted using our own encryption.
We now wish to move all these blobs to a new storage container which will use Azure's encryption at rest
The only way I can think of is to write a worker role that downloads to a stream decrypts it and uploads it. One at a time ...
That will probably take a while. Are they any other faster ways one can think of.Is there a way to parallelize this

As David Makogon mentioned that there are lots of ways to copy blobs.
Is there a way to parallelize this.
Based on my experience, we could use WebJob queue trigger to do that.
We could list the container blob names and write the blob url to storage queue or service bus queue.
In the webjob with your owner logic that download the blobs rid of your encryption and upload to another container.
Then we could scale out the App service plan or use multiple webjobs to do that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Design parallel processing of Azure Blob files - azure

Related

How to consume a storage queue from data factory pipeline?

How to ingest blobs created by Azure Diagnostics into Azure Data Explorer by subscribing to Event Grid notifications

Azure Storage Webhook being triggered by historical events

Launch Azure Container Service on Upload to Blob Storage

How to copy files from a custom encrypted azure blob container to Azure encrypted container

Categories

Resources