I have an ADF pipeline which has around 30 activities that call Databricks Notebooks. The activities are arranged sequentially, that is, one gets executed only after the successful completion of the other.
However, at times, even when there is a run time error with a particular notebook, the activity that calls the notebook is not failing, and the next activity is triggered. Ideally, this should not happen.
So, I want to keep an additional check on the link condition between the activities. I plan to put a condition on the status of the commands running in the notebook (imagine a notebook has 10 python commands, I want to capture the status of 10th command).
Is there a way to configure this? Appreciate ideas. Thank you.
I did try at my end - When there was an exception in the code - I did see the output of the error in the activity output. But in my case the activity failed like #Alex mentioned.
In your case you could check the output of the activity and see whether there is any run error. If there is no runError, then proceed with the next activity.
#activity('Notebook2').output.runError
Related
I am trying to check the logs and depending the last log, run a different step in the transformation. Am I supposed to use some other steps or am I making another mistake here?
For example, if the query returns 1 I want execute SQL script to run, for 2 I want execute SQL script 2 to run and for 3 I want transformation to abort. But it keeps running all the steps even if only one value returns from the CONTROL step.
The transformation looks like this
And the switch/case step looks like this
It looks like it's correctly configured, but keep in mind that in a transformation all steps are initiated at the beginning of the transformation, waiting to receive the streaming data from the previous step. So the Abort and Execute script steps are going to be started as soon as the transformation is started, if they don't need data from the previous step to run, they are going to run at the beginning.
If you want the scripts to be executed depending on the result of the CONTROL output, you'll need to use a job, that runs the steps (actions) sequencially:
A transformation runs the CONTROL step and afterwards you put a "Copy rows to result" step to make the data produced from the CONTROL step available to the job
After the transformation, you use a "Simple evaluation" action in the job, to determine which script (or abort) to run. Jobs also have an "Execute SQL Script" action, so you can put it afterwards.
I'm supposing your CONTROL step only produces one row, if the output is more than one row, the "Simple evaluation" action won't do the job, you'll have to design one of various transformations to execute for each row of the previous transformation, running what you need.
Ok guys, this is a strange one and I can't see anything obvious that would explain it...
I've got a pipeline with an IF condition, this IF condition only has a single 'copy data' activity in it. My confusion is that when monitoring this pipeline as its been triggered by a scheduled trigger, the IF condition often takes a lot longer than the only activity, the 'copy data' activity, that it contains. See screen shot below, where the 'copy data' has only taken 7:47mins but the IF condition has a duration of 16:16mins!?!?!
Does anyone know what this means, and what might be causing it? Note... the IF condition itself is only a simple check of a variable that has previously been set before...
At first I thought it was because the 'copy data' was queueing, but as there's no input/output information on an IF condition in the monitor I've no idea what's going on. Surely the IF condition isn't taking several minutes to evaluate its expression??
When you read the "Duration" time, that it is end-to-end for the pipeline activity. That takes into account all factors like marshaling of your data flow script from ADF to the Spark cluster, cluster acquisition time, job execution, and I/O write time. So the "Duration" time will be longer than actual execution time.
Therefore, the IF conditional activity is waiting for the response from the copy activity successfully ended and close all diu resources. But there is very little official information about this explanation.:(
By the way, the "Duration" time of the IF condition activity is not chargeable. You can click this link to see run consumption.
IF condition activity is is billed according to runs at first line. The copy activity is billed according to diu. So we don't need to worry about the "Duration" time of the IF condition activity. :)
In the RunDetails Jupyter module, what does the table (see screenshot below) represent?
The RunDetails(run_instance).show() method from the azureml-widgets package shows the progress of your job along with streaming the log files. The widget is asynchronous and provides updates until the training run finishes.
Since the output shown is specific to your Pipeline run, you can troubleshoot it further from the logs from pipeline runs, which can be found in either the Pipelines or Experiments section of the studio.
I've got an Azure Data Factory V2 pipeline with multiple Copy Data activities that run in parallel.
I have a Pause DW web hook pauses an Azure data warehouse after each run. This activity is set to run after the completion of one of the longest running activities in the pipeline. The pipeline is set to trigger nightly.
Unfortunately, the time taken to run copy data activities varies because it depends on transactions that have been processed in the business, which varies each day. This means, I can't predict which activity of those that run in parallel will finish last. This means, often the whole pipeline fails because the DW has been paused before some of the activities have started.
What's the best way of running an activity only after all other activities in the pipeline have completed?
I have tried to add an If activity to the pipeline like this:
However, I then run into this error during validation:
If Condition1
The output of activity 'Copy small tables' can't be referenced since it has no output.
Does anyone have any idea how I can move this forwards?
thanks
Just orchestrate all your parallel activities towards PAUSE DWH activity. Then it will be executed after all your activities are completed.
I think you can use the Execute pipeline activity .
Let the trigger point to the new pipeline which has the "Execute activity " which points to the current ADF with the copy activity , please do select the option Advanced -> Wait for completion . Once the execute pipeline is done it should to move to the webhook activity which should have the logic to pause the DW .
Let me know how this goes .
An ADF pipeline needs to be executed on a daily basis, lets say at 03:00 h AM.
But prior execution we also need to check if the data sources are available.
Data is provided by an external agent, it periodically loads the corresponding data into each source table and let us know when this process is completed using a flag-table: if data source 1 is ready it set flag to 1.
I don't find a way to implement this logic with ADF.
We would need something that, for instance, at 03.00 h would trigger an 'element' that checks the flags, if the flags are not up don't launch the pipeline. Past, lets say, 10 minutes, check again the flags, and be like this for at most X times OR until the flags are up.
If the flags are up, launch the pipeline execution and stop trying to launch the pipeline any further.
How would you do it?
The logic per se is not complicated in any way, but I wouldn't know where to implement it. Should I develop an Azure Funtions that launches the Pipeline or is there a way to achieve it with an out-of-the-box AZDF activity?
There is a UNTIL iteration activity where you can check if your clause.
Example:
Your azure function (AF) checking the flag and returns 0 or 1.
Build ADF pipeline with UNTIL activity where you check the output of AF (if its 1 do something). In UNTIL activity you can have your process step. For example, you have a variable flag that will before until activity is 0. In your until you check if it's 1. if it is do your processing step, if its not, put WAIT activity on 10 min or so.
So you have the ability in ADF to iterate until something it's not satisfied.
Hope that this will help you :)