Azure data factory end time trigger - azure

I have one scenario let say I have one ADF instance name XYZ contains one pipeline which is schedule trigger starts at 12:00 AM in night. Trigger some time ends in 1 hours and sometime it takes more than 2 hours because of data load.
I have one more ADF instance ABC in that also one pipeline I have, now my requirement is that I have to schedule the ABC instance pipeline when xyz instance trigger is completed.
Kindly help on this requirement. Both ADF have different instance & also trigger end time may vary based on load.

The simplest way is using Logic app. In Logic app designer, we can create two pipeline run steps to trigger the two pipelines in different Data Factory running.
Create a Recurrence trigger to run this logic app in a loop.
In the Azure Data Factory operations:
select Create a pipeline run Action.
The summary is here:
So we can trigger the pipeline run of the ADF instance name XYZ, and when it is completed, it will trigger the pipeline run of the ADF instance ABC.

Related

How to set the dependency between two trigger pipelines in azure synapse Analytics

I have a two trigger synapse pipelines one which is scheduled at 03 am cst , What I'm looking now is the Second pipeline should trigger after the completion of the first pipeline i.e after 03 am cst.
Is there a way i can create this dependency in the synapse. If yes please suggest.
There are 2 options:
Create an event trigger for the 2nd pipeline and add a copy file activity at the end of 1st pipeline. So whenever the 1st pipeline gets completed, it would generate a file and trigger the 2nd pipeline
Use execute pipeline activity at the end of 1st pipeline and trigger the 2nd pipeline ( you can even use web activity but there would be additional efforts for it)
Create a tumbling window trigger for both pipelines.
While creating a tumbling window trigger for the second pipeline, you can add a dependency trigger under Advance property and select pipeline1 trigger.
The second trigger runs only upon completion of the dependency trigger.

How to stop trigger if pipeline fails Azure Datafactory

I have pipeline need to run at every one hour.
Using tumbling window trigger.
If pipeline is successfull it should continue to run every one hour.
If pipeline fails with some reason, next instance, I mean pipline should not run for next hour.
How can we stop trigger if pipeline fails.
Currently, this feature is not available in the Azure data factory. You can raise a feature request from the Azure data factory feedback.
Alternatively, you can add a web activity in your pipeline upon the failure of the previous activity. In web activity, you can make a call to an HTTP request to Stop the trigger.
Refer to this document to Stop a trigger.

How to Trigger ADF Pipeline from Synapse Pipelines

Problem
Due to internal requirements, I need to run a Synapse pipeline and then trigger an ADF pipeline. It does not seem that there is a Microsoft-approved method of doing this. The pipelines run infrequently (every week or month) and the ADF pipeline must run after the Synapse pipeline.
Options
It seems that other answers pose several options:
Azure Functions. Create an Azure function that calls the CreatePipelineRun function on the ADF pipeline. At the end of the Synapse pipeline, insert a block that calls the Azure function.
Use the REST API and Web Activity. Use the REST API to make a call to run the ADF pipeline. Insert a Web Activity block at the end of the Synapse pipeline to make the API call.
Tables and polling. Insert a record into a table in a managed database with data about the Synapse pipeline run. Have regular polling from the ADF pipeline to check for new records and run when ready.
Storage Event. Create a timestamped blob file at the end of the Synapse run. Use the "storage event trigger" within ADF to trigger the ADF pipeline.
Question
Which of these would be closest to the "approved" option? Are there any clear disadvantages to any of these?
As you mentioned, there is no "approved" solution for this problem. All the approaches you mentioned have pros and cons and should work. For me, Option #3 has been very successful. We have built a Queue Manager based on Tables & Stored Procedures in Azure SQL. We use Logic Apps to process the Triggers which can be Scheduled, Blob Events, or REST calls. Those Logic Apps insert jobs in the Queue table via Stored Procedure. That Stored Procedure can be called directly by virtually any system, so your Synapse pipeline could insert a Queue job to execute the ADF pipeline. Other benefits include a log of all the pipeline runs, support for multiple Data Factories (and now Synapse Workspaces), and a web interface we wrapped around the database for management and tracking.
We have 2 other Logic Apps that process the Queue (a Status manager and an Executor). These run constantly (every 1 minute and every 3 minutes). The actions to check status and create pipeline runs are both implemented as .NET Azure Functions [you'll need different SDKs for Synapse vs. ADF]. This system runs thousands of pipelines a month, sometimes more, across numerous Data Factories and Synapse Workspaces.
The PROs here are many, but this disconnected approach permits facets of your system to operate in isolation. And it is flexible, in that you can tie virtually any system into the Queue. Your example of a pipeline that needs to execute another pipeline in a different system is a perfect example.
The CON here is that this is the most involved approach. If this is a on-off problem you are trying to solve, choose one of the other options.

Can datafactory start or stop self integration runtime?

I have a self integration runtime configured in a virtual machine on azure, because I need access on-premise database to load some information and this database can be accessed only throught VPN connection.
Is there a way to on/off the virtual machine when loading process is going to run (once a week) in order to optimize cost in the cloud ? because it not makes sense to me leave a vm billing at idle times.
Thank you
Updated:
We can create Azure runbook, add PowerShell code to turn on/off the VM and call it in ADFv2 using Webhook according to this post.
Create a trigger that runs a pipeline on a schedule. When creating a schedule trigger, you specify a schedule (start date, recurrence, end date etc.) for the trigger, and associate with a pipeline.
At the start of the pipeline, you can use WebHook activity to start the vm, then copy data, at the end stop the vm. As follows:
If you're looking to build data pipelines in Azure Data Factory, your cost will be split into two categories:
Data Factory Operations
Pipeline Orchestration and Execution
Data Factory Operations
Read/Write: Every time you create/edit/delete a pipeline activity or a Data Factory entity such as a dataset, linked service, integration runtime or trigger, it counts towards your Data Factory Operations cost. These are billed at $0.50 per 50,000 operations.
Monitoring: You can monitor each pipeline run and view the status for each individual activity. For each pipeline run, you can expect to retrieve one record for the pipeline and one record for each activity or trigger. For instance, you would be charged for 3 Monitoring activities if you debug a pipeline containing 2 activities. Monitoring activities are charged at $0.25 per 50,000 run records retrieved.
Pipeline Orchestration
Self Hosted
Every time you run a pipeline, you are charged for every activity and trigger inside that pipeline that is executed at a rate of $1.50 per 1000 Activity runs.
As an example, executing a pipeline with a trigger and two activities would be charged as 3 Activity runs.
Pipeline Execution
Self hosted
Data movement : $0.10/hour
Pipeline activities : $0.002/hour
External activities : $0.0001/hour
An inactive pipeline is charged at $0.80 per month.
Summary:
When there are 30 days in a month and if your pipeline starts to run on a certain day and continues to run for 10 hours to move data. It will charge according to 10hx$0.10/h = $1. Pipelines that are inactive for an entire month are billed at the applicable "inactive pipeline" rate for the month as 29days/30days x $0.80 = $0.80. So your total cost is $1.8.

How to set an alert for Azure Data factory when Pipeline takes more than N minutes to complete

I need to setup an alert system if my Azure Datafactory pipeline runs for more than 20 minutes. The alert should come while the pipeline is running and the duration passes 20mins, not after the completion of pipeline. How can I do this? I think this can be done using Azure function but I am not familiar with it so I'm in search for a script for the same.
Yes, azure function is a solution to acheive your requirement.
For example, if you are using Python. You need an azure function that runs periodically to monitor the status of the pipeline. The key is the duration time of the pipeline. pipeline is based on activities. You can monitor every activity.
In Python, This is how to get the activity you want:
https://learn.microsoft.com/en-us/python/api/azure-mgmt-datafactory/azure.mgmt.datafactory.operations.activityrunsoperations?view=azure-python#query-by-pipeline-run-resource-group-name--factory-name--run-id--filter-parameters--custom-headers-none--raw-false----operation-config-
The below is to get the duration time of azure datafactory activity:
https://learn.microsoft.com/en-us/python/api/azure-mgmt-datafactory/azure.mgmt.datafactory.models.activityrun?view=azure-python#variables
(There is a variable named duration_in_ms, you can use this to get the duration time of the activity run.)
This is use Python to monitor pipeline:
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#python
You can create a azure function app with a timetrigger to monitor the azure datafactory activity. This is the document of azure function timetrigger:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer?tabs=python
The basic idea is put the code that monitor the pipeline whether run more than N minutes in the logic body of azure function timetrigger. And then use the status of the azure function to reflect whether the pipeline running time of azure datafactory exceeds N hours.
Then use the alarm event of the azure function. The alarm events supported by azure for the azure function are as follows: (You can set an output binding of your azure function.)
In azure portal, you can find the alert in this place:
(Select Email/SMS message as the action type and give it your email address.)

Resources