Trigger Azure data factory pipeline - Blob upload ADLS Gen2 (programmatically) - azure

We are uploading files into Azure data lake storage using Azure SDK for java. After uploading a file, Azure data factory needs to be triggered. BLOB CREATED trigger is added in a pipeline.
Main problem is after each file upload it gets triggered twice.
To upload a file into ADLS gen2, azure provides different SDK than conventional Blobstorage.
SDK uses package - azure-storage-file-datalake.
DataLakeFileSystemClient - to get container
DataLakeDirectoryClient.createFile - to create a file. //this call may be raising blob created event
DataLakeFileClient.uploadFromFile - to upload file //this call may also be raising blob created event
I think ADF trigger is not upgraded to capture Blob created event appropriately from ADLSGen2.
Any option to achieve this? There are restrictions in my org not to use Azure functions, otherwise Azure functions can be triggered based on Storage Queue message or Service bus message and ADF pipeline can be started using data factory REST API.

You could try Azure Logic Apps with a blob trigger and a data factory action:
Trigger: When a blob is added or modified (properties only):
This operation triggers a flow when one or more blobs are added or
modified in a container. This trigger will only fetch the file
metadata. To get the file content, you can use the "Get file content"
operation. The trigger does not fire if a file is added/updated in a
subfolder. If it is required to trigger on subfolders, multiple
triggers should be created.
Action: Get a pipeline run
Get a particular pipeline run execution
Hope this helps.

Related

Azure blob trigger fired once for multiple files upload in azure blob

I need help for following scenario.
I have setup azure blob trigger on one of the azure blob storage.
Now suppose from media shuttle I am uploading 3 files to that azure blob and I have setup blob trigger on that azure blob but now what I want is that, I want to trigger some logic only when blob trigger function get call for last file(3rd one). for 1st and 2nd file blob trigger function should get call but should not execute the logic which supposed to execute for last file(3rd one).
so basically somewhere we need to maintain count(for total upload) and count for number of time blob trigger function get call and compare and run final logic if condition satisfied but I am not aware about how to do that.
I'm using .Net core to write azure blob trigger function.

Azure Synapse Pipeline Execution based on file copy in DataLake

I want to execute Azure Synapse Pipeline whenever a file is copied into a folder in data lake.
Can we do that and how can we achieve that?
Thanks,
Pavan.
You can trigger a pipeline (start pipeline execution) based on a file copied to datalake folder using storage event triggers. The storage event triggers can start the execution of pipeline based on a selected action.
You can follow the steps specified below to create a storage event trigger.
Assuming you have a pipeline named ‘pipeline1’ in azure synapse which you want to execute based on file copied to datalake folder, click on trigger and select New/Edit.
Choose a new trigger. Select trigger type as storage events and specify the datalake storage details on which you want to start trigger when a file is copied into it. Specify container name, blob path begins with and blob path ends with according to your datalake directory structure and type of files.
Since you need to start pipeline when a blob file appears in datalake folder, check Blob Created event. Check start trigger on action, complete creating the trigger and publish it.
These steps allow you to create a storage event trigger for your pipeline based on the datalake storage. As soon as files are uploaded or copied to the specific directory of the datalake container, the pipeline execution will be started, and you can work on further steps. You can refer to the following document to understand more about event triggers.
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger?tabs=data-factory

Azure logic app Action to run the pipeline in Azure Data factory

I created an Azure logic app for running my actions from my application.
so my requirement is to upload the blob to the shared folder from Azure blob.
for that my team is created azure data factory pipeline,
so from the logic app designer I am running the trigger when the blob is added or modified, I need to run the pipeline from the azure data factory,
while running the trigger, the trigger is fired successfully but it is running at the blob only it is not going to the second action.
can you please give me the guidance how should I resolve the issue.
Azure data-factory provides the option of azure blob trigger which will work automatically. When new blob gets added your pipeline will run.
Data Factory Event Trigger
In this case you don't need separate logic app for the triggering event.

How to get csv file from azure blob and ingest endpoint into Azure Event Hub using logic app?

I have many CSV file stored in Azure blob storage container, I need those file from azure blob storage and dump into azure event hub using azure logic app.
Scenarios:
If any new CSV file is added into the storage container only that new file should be fetched from the blob and pushed to the event hub.
If any old file is updated only those files and the newly added files should be fetched from the blob storage by using Azure Logic App.
Please refer to my logic app:
You can use When a blob is added or modified (properties only) as trigger.
Then use Get blob content to get content of your blob, within for each, you need to use send event.
The detail of send event 2:

Duplicate Blob Created Events When Writing to Azure Blob Storage from Azure Databricks

We are using an Azure Storage Account (Blob, StorageV2) with a single container in it. We are also using Azure Data Factory to trigger data copy pipelines from blobs (.tar.gz) created in the container. The trigger works fine when creating the blobs from an Azure App Service or by manually uploading via the Azure Storage Explorer. But when creating the blob from a Notebook on Azure Databricks, we get two (2) events for every blob created (same parameters for both events). The code for creating the blob from the notebook resembles:
dbutils.fs.cp(
"/mnt/data/tmp/file.tar.gz",
"/mnt/data/out/file.tar.gz"
)
The tmp folder is just used to assemble the package, and the event trigger is attached to the out folder. We also tried with dbutils.fs.mv, but same result. The trigger rules in Azure Data Factory are:
Blob path begins with: out/
Blob path ends with: .tar.gz
The container name is data.
We did find some similar posts relating to zero-length files, but at least we can't see them anywhere (if some kind of by-product to dbutils).
As mentioned, just manually uploading file.tar.gz works fine - a single event is triggered.
We had to revert to uploading the files from Databricks to the Blob Storage using the azure-storage-blob library. Kind of a bummer, but it works now as expected. Just in case anyone else runs into this.
More information:
https://learn.microsoft.com/en-gb/azure/storage/blobs/storage-quickstart-blobs-python

Resources