Azure data factory: Handling inner failure in until/for activity - azure

I have an Azure data factory v2 pipeline containing an until activity.
Inside the until is a copy activity - if this fails, the error is logged, exactly as in this post, and I want the loop to continue.
Azure Data Factory Pipeline 'On Failure'
Although the inner copy activity’s error is handled, the until activity is deemed to have failed because an inner activity has failed.
Is there any way to configure the until activity to continue when an inner activity fails?

Solution
Put the error-handling steps in their own pipeline and run them from an ExecutePipeline activity. You'll need to pass-in all the parameters required from the outer pipeline.
You can then use the completion (blue) dependency from the ExecutePipeline (rather than success (green)) so the outer pipeline continues to run despite the inner error.
Note that if you want the outer to know what happened in the inner then there is currently no way to pass data out of the ExecutePipeline to its parent (https://feedback.azure.com/forums/270578-data-factory/suggestions/38690032-add-ability-to-customize-output-fields-from-execut).
To solve this, use an sp activity inside the ExecutePipeline to write data to a SQL table, identified with the pipeline run id. This can be referenced inside the pipeline with #pipeline().RunId.
Then outside the pipeline you can do a lookup in the SQL table, using the run ID to get the right row.
HEALTH WARNING:
For some weird reason, the output of ExecutePipeline is returned not as a JSON object but as a string. So if you try to select a property of output like this #activity('ExecutePipelineActivityName').output.something then you get this error:
Property selection is not supported on values of type 'String'
So, to get the ExecutePipeine's run ID from outside you need:
#json(activity('ExecutePipelineActivityName').output).pipelineRunId
I couldn't find this documented in Microsoft's documentation anywhere, hence posting gory details here.

Related

In Azure Data Factory, how do I pass the Index of a ForEach as a parameter properly

Sorry if this is a bit vague or rambly, I'm still getting to grips with Data Factory and a lot of it seems a bit obtuse...
What I want to do is query my Cosmos Database for a list of Ids of records that need to be updated. For each of these records, I want to call a REST API using the Id (i.e. /Record/{Id}/Details)
I've created a Data Flow that took a string as a parameter and then called the REST API fine.
I then made a pipeline using a Lookup with a query (select c.RecordId from c where...) and pass that into a ForEach with items set to #activity('Lookup1').output.value
I then setup the Activity of the ForEach to my Data flow. From research, I think I'm supposed to set the Parameter value to "#item().RecordId", but that gives an error "parameter [name] does not match parameter type 'string'".
I can change the type of the parameter to any (and use toString([parameter]) to cast it ) and then when I try and debug it passes the parameter in, but it gives an error of "Job failed due to reason: at (Line 2/Col 14): Datatype any not found".
I'm not sure what the solution is. Is there a way to cast the result of the lookup to an integer or string? Is there a way to narrow an any down? Is there a better way than toString() that would work? Is there a better way than ForEach?
I tried to reproduce similar scenario what you are trying.
My sample data in cosmos
To query Cosmos Database for a list of Ids and call a REST API using the Id For each of these records.
First, I took Lookup activity in data factory and selected the id's where the last_name is Bluth
Its output and settings are as below:
Then I passed the output of lookup activity to For-each activity.
Then inside for each activity I created Dataflow activity and for that DataSource I gave the source as Rest API. My Rest API to call specific user is https://reqres.in/api/users/2 I gave base URL as https://reqres.in/api/users.
Then I created parameter called demoId as datatype string and in relative URL I gave that dynamic value as #dataset().demoId
After this I gave value source parameter as #item().id as after https://reqres.in/api/users there is only id should be provided to get data in you case you can try Record/#{item().id}/Details.
For each id it is successfully passing id to rest API and fetching data:

Azure Data Factory - Capture error details of a dataflow activity

I have a data flow and my requirement is to capture the error details into a variable when it fails and assign this variable to a parameter in the next data flow. I tried to achieve this until the second stage(With help) as below, but I'm unable to get this variable assigned to a parameter in the next data flow. The error I get is - Expression cannot be parsed
To retrieve the dataflow error message, connect the dataflow activity upon failure to the set variable activity to store the error message using the expression:
#string(json(activity('Data flow1').error.message).Message)
Error Message:
Output:

How to capture the error message of the each activity among the multiple activities(3) in a pipeline using single stored procedure

I have a ADF pipeline with 3 different activities in sequence. I want to capture the error message of the activity whichever gets failed. So I have created a stored procedure by passing a parameter with which it can capture using #activity('Activity name').Error.Message
But using that expression I can only get the error message for the specified activity.
How can I capture the error message of any activity (of three activities in the pipeline)
This would be the pipeline output as the activities are in sequence.
As far as I know, you need to create three stored procedure to capture error message of correspond activity.
#concat(activity('Activity1').Error?.message,'|',activity('Activity2')?.Error?.message,'|',activity('Activity3')?.Error?.message)

Using If-Condition ADF V2

I have one Copy activity and two stored Proc Activity and i want to basically update the status of my pipeline as Failed in Logtable if any of these activities failed with error message details. Below is the flow of my pipeline
I wanted to use If-Condition activity and need help in setting the expression for it. For Copy activity i can use the below expression, but not sure about getting the status of stored Proc activity
#or(equals(activity('Copy Data Source1').output.executionDetails[0].status, 'Failed'), <expression to get the status of Stored Proc>)
If the above expression is true then i want to have one common stored proc activity that i will set in Add If True Activity to log the error details
Let me know if this possible.
I think you have overcomplicated this.
A much easier way to do that is to leverage a Failure path for required activities. Furthermore, SP would not be executed when Copy Data fails, therefore checking the status of execution of SP doesn't really make sense.
My pipeline would look like this:

Azure Data Factory v2: Activity execute pipeline output

Is there a way to reference the output of an executed pipeline in the activity "Execute pipeline"?
I.e.: master pipeline executes 2 pipelines in sequence. The first pipeline generates an own created run_id that needs to be forwarded as a parameter to the second pipeline.
I've read the documentation and checked that the master pipeline log the output of the first pipeline, but it looks like that this is not directly possible?
We've used until now only 2 pipelines without a master pipeline, but we want to re-use the logic more. Currently we have 1 pipeline that calls the next pipeline and forwards the run_id.
ExecutePipline currently cannot pass anything from its insides to its output. You can only get the runID or name.
For some weird reason, the output of ExecutePipeline is returned not as a JSON object but as a string. So if you try to select a property of output like this #activity('ExecutePipelineActivityName').output.something then you get this error:
Property selection is not supported on values of type 'String'
I found that I had to use the following to get the run ID:
#json(activity('ExecutePipelineActivityName').output).pipelineRunId
The execute pipeline activity is just another activity with outputs that can be captured by other activities. https://learn.microsoft.com/en-us/azure/data-factory/control-flow-execute-pipeline-activity#type-properties
If you want to use the runId of the pipeline executed previosly, it would look like this:
#activity('ExecutePipelineActivityName').output.pipeline.runId
Hope this helped!

Resources