How to re-try an ADF pipeline execution until conditions are met - azure

An ADF pipeline needs to be executed on a daily basis, lets say at 03:00 h AM.
But prior execution we also need to check if the data sources are available.
Data is provided by an external agent, it periodically loads the corresponding data into each source table and let us know when this process is completed using a flag-table: if data source 1 is ready it set flag to 1.
I don't find a way to implement this logic with ADF.
We would need something that, for instance, at 03.00 h would trigger an 'element' that checks the flags, if the flags are not up don't launch the pipeline. Past, lets say, 10 minutes, check again the flags, and be like this for at most X times OR until the flags are up.
If the flags are up, launch the pipeline execution and stop trying to launch the pipeline any further.
How would you do it?
The logic per se is not complicated in any way, but I wouldn't know where to implement it. Should I develop an Azure Funtions that launches the Pipeline or is there a way to achieve it with an out-of-the-box AZDF activity?

There is a UNTIL iteration activity where you can check if your clause.
Example:
Your azure function (AF) checking the flag and returns 0 or 1.
Build ADF pipeline with UNTIL activity where you check the output of AF (if its 1 do something). In UNTIL activity you can have your process step. For example, you have a variable flag that will before until activity is 0. In your until you check if it's 1. if it is do your processing step, if its not, put WAIT activity on 10 min or so.
So you have the ability in ADF to iterate until something it's not satisfied.
Hope that this will help you :)

Related

Am I using switch/case wrong here to control?

I am trying to check the logs and depending the last log, run a different step in the transformation. Am I supposed to use some other steps or am I making another mistake here?
For example, if the query returns 1 I want execute SQL script to run, for 2 I want execute SQL script 2 to run and for 3 I want transformation to abort. But it keeps running all the steps even if only one value returns from the CONTROL step.
The transformation looks like this
And the switch/case step looks like this
It looks like it's correctly configured, but keep in mind that in a transformation all steps are initiated at the beginning of the transformation, waiting to receive the streaming data from the previous step. So the Abort and Execute script steps are going to be started as soon as the transformation is started, if they don't need data from the previous step to run, they are going to run at the beginning.
If you want the scripts to be executed depending on the result of the CONTROL output, you'll need to use a job, that runs the steps (actions) sequencially:
A transformation runs the CONTROL step and afterwards you put a "Copy rows to result" step to make the data produced from the CONTROL step available to the job
After the transformation, you use a "Simple evaluation" action in the job, to determine which script (or abort) to run. Jobs also have an "Execute SQL Script" action, so you can put it afterwards.
I'm supposing your CONTROL step only produces one row, if the output is more than one row, the "Simple evaluation" action won't do the job, you'll have to design one of various transformations to execute for each row of the previous transformation, running what you need.

Azure data factory IF condition is taking a lot longer to complete than the activities inside it take to complete

Ok guys, this is a strange one and I can't see anything obvious that would explain it...
I've got a pipeline with an IF condition, this IF condition only has a single 'copy data' activity in it. My confusion is that when monitoring this pipeline as its been triggered by a scheduled trigger, the IF condition often takes a lot longer than the only activity, the 'copy data' activity, that it contains. See screen shot below, where the 'copy data' has only taken 7:47mins but the IF condition has a duration of 16:16mins!?!?!
Does anyone know what this means, and what might be causing it? Note... the IF condition itself is only a simple check of a variable that has previously been set before...
At first I thought it was because the 'copy data' was queueing, but as there's no input/output information on an IF condition in the monitor I've no idea what's going on. Surely the IF condition isn't taking several minutes to evaluate its expression??
When you read the "Duration" time, that it is end-to-end for the pipeline activity. That takes into account all factors like marshaling of your data flow script from ADF to the Spark cluster, cluster acquisition time, job execution, and I/O write time. So the "Duration" time will be longer than actual execution time.
Therefore, the IF conditional activity is waiting for the response from the copy activity successfully ended and close all diu resources. But there is very little official information about this explanation.:(
By the way, the "Duration" time of the IF condition activity is not chargeable. You can click this link to see run consumption.
IF condition activity is is billed according to runs at first line. The copy activity is billed according to diu. So we don't need to worry about the "Duration" time of the IF condition activity. :)

Azure Data Factory - Tumbling Window Trigger - Limit hours it is running

With an Azure Data Factory "Tumbling Window" trigger, is it possible to limit the hours of each day that it triggers during (adding a window you might say)?
For example I have a Tumbling Window trigger that runs a pipeline every 15 minutes. This is currently running 24/7 but I'd like it to only run during business hours (0700-1900) to reduce costs.
Edit:
I played around with this, and found another option which isn't ideal from a monitoring perspective, but it appears to work:
Create a new pipeline with a single "If Condition" step with a dynamic Expression like this:
#and(greater(int(formatDateTime(utcnow(),'HH')),6),less(int(formatDateTime(utcnow(),'HH')),20))
In the true case activity, add an Execute Pipeline step executing your original pipeline (with "Wait on completion" ticked)
In the false case activity, add a wait step which sleeps for X minutes
The longer you sleep for, the longer you can possibly encroach on your window, so adjust that to match.
I need to give it a couple of days before I check the billing on the portal to see if it has reduced costs. At the moment I'm assuming a job which just sleeps for 15 minutes won't incur the costs that one running and processing data would.
there is no easy way but you can create two deployment pipelines for the same job in Azure devops and as soon as your winodw 0700 to 1900 expires you replace that job with a dummy job using azure dev ops pipeline.

Azure data factory activity execute after all other copy data activities have completed

I've got an Azure Data Factory V2 pipeline with multiple Copy Data activities that run in parallel.
I have a Pause DW web hook pauses an Azure data warehouse after each run. This activity is set to run after the completion of one of the longest running activities in the pipeline. The pipeline is set to trigger nightly.
Unfortunately, the time taken to run copy data activities varies because it depends on transactions that have been processed in the business, which varies each day. This means, I can't predict which activity of those that run in parallel will finish last. This means, often the whole pipeline fails because the DW has been paused before some of the activities have started.
What's the best way of running an activity only after all other activities in the pipeline have completed?
I have tried to add an If activity to the pipeline like this:
However, I then run into this error during validation:
If Condition1
The output of activity 'Copy small tables' can't be referenced since it has no output.
Does anyone have any idea how I can move this forwards?
thanks
Just orchestrate all your parallel activities towards PAUSE DWH activity. Then it will be executed after all your activities are completed.
I think you can use the Execute pipeline activity .
Let the trigger point to the new pipeline which has the "Execute activity " which points to the current ADF with the copy activity , please do select the option Advanced -> Wait for completion . Once the execute pipeline is done it should to move to the webhook activity which should have the logic to pause the DW .
Let me know how this goes .

Any intelligence to run the Azure Data Factory other than Schedule Basis

I have a Client Request for my Data Factory Solution
They want to run my Data-Factory when ever the i/p file is available in the Blob Storage/any location.To be very clear they doesn't want to run the solution in an schedule basis,because some day the file won't shows up.So i want an intelligence to search whether the file is available to be process in the location or not.If yes then i have to run my Data factory Solution to process that file,else no need to run the Data factor
Thanks in Advance
Jay
I think you've currently got 3 options to dealing with this. None of which are exactly what you want...
Option 1 - use C# to create a custom activity that does some sort of checking on the directory before proceeding with other downstream pipelines.
Option 2 - Add a long delay to the activity so the processing retires for the next X days. Sadly only a maximum of 10 long retires is allowed currently.
Option 3 - Wait for a newer version of Azure Data Factory that might allow the possibility of more event driven activities, rather than using a scheduled time slice approach.
Apologies this isn't exactly the answer you want. But this gives you current options.

Resources