How to output a value from ADF .NET Custom Activity - azure

I have an ADF that takes a Dataset input from Azure Data Lake Storage, it then has a pipeline with a custom .NET activity. The activity moves files from their Import folder into a custom folder location ready for processing and then deletes the original file.
I want to be able to pass the custom folder location back out into the activities pipeline so that I can give it to the next activity.
Is there a way of outputting the custom folder string to the activities pipeline?
Thank you,

We have an improve item the output can be next customer activity's input. Its pending deployment. You can have a try by the end of this month :)
For how to use this feature:
Update code: Activity1:
Execute(...)
{
return new Dictionary<string, string> { "**Foo**", "Value" }
}
Update pipeline: Activity2 Json:
"extendedProperties": { "ValueOfFoo", "**$$Foo**" }

If you want to use a "custom activity" as the title of the question suggest, this is possible using
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#retrieve-execution-outputs
#activity('MyCustomActivity').output.outputs[0]"
You can also consume the output in another activity as described here:
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#pass-outputs-to-another-activity
You can send custom values from your code in a Custom Activity back to the service. You can do so by writing them into outputs.json from your application. The service copies the content of outputs.json and appends it into the Activity Output as the value of the customOutput property. (The size limit is 2MB.) If you want to consume the content of outputs.json in downstream activities, you can get the value by using the expression
#activity('<MyCustomActivity>').output.customOutput

Related

Azure Data Factory - copy Lookup activity output to json in blob

in ADF below is the output of my lookup activity (this the header of a flat file from SFTP)
{
"firstRow": {
"Prop_0": "000",
"Prop_1": "IN",
"Prop_2": "12123",
"Prop_3": "XYZ_ABC",
"Prop_4": "20211011",
"Prop_5": "034255",
"Prop_6": "272023"
}
Can someone help me the approach to transform this to a JSON file with custom field names instead of prop_x and save to a Blob storage
You can simply leverage the additional column in a copy activity.
Follow your lookup activity by the copy activity:
In the source settings of the copy activity, add the new column names(i.e. the ones you expect in json). Here I used p0, p1...
Taking p0 as example, you can simply put #activity('Lookup1').output.firstRow.Prop_0 in the dynamic content.
Then in the Mapping tab, you just map to the target columns of your json file. (Assume you already import the schema of your target json)

Azure data factory: Handling inner failure in until/for activity

I have an Azure data factory v2 pipeline containing an until activity.
Inside the until is a copy activity - if this fails, the error is logged, exactly as in this post, and I want the loop to continue.
Azure Data Factory Pipeline 'On Failure'
Although the inner copy activity’s error is handled, the until activity is deemed to have failed because an inner activity has failed.
Is there any way to configure the until activity to continue when an inner activity fails?
Solution
Put the error-handling steps in their own pipeline and run them from an ExecutePipeline activity. You'll need to pass-in all the parameters required from the outer pipeline.
You can then use the completion (blue) dependency from the ExecutePipeline (rather than success (green)) so the outer pipeline continues to run despite the inner error.
Note that if you want the outer to know what happened in the inner then there is currently no way to pass data out of the ExecutePipeline to its parent (https://feedback.azure.com/forums/270578-data-factory/suggestions/38690032-add-ability-to-customize-output-fields-from-execut).
To solve this, use an sp activity inside the ExecutePipeline to write data to a SQL table, identified with the pipeline run id. This can be referenced inside the pipeline with #pipeline().RunId.
Then outside the pipeline you can do a lookup in the SQL table, using the run ID to get the right row.
HEALTH WARNING:
For some weird reason, the output of ExecutePipeline is returned not as a JSON object but as a string. So if you try to select a property of output like this #activity('ExecutePipelineActivityName').output.something then you get this error:
Property selection is not supported on values of type 'String'
So, to get the ExecutePipeine's run ID from outside you need:
#json(activity('ExecutePipelineActivityName').output).pipelineRunId
I couldn't find this documented in Microsoft's documentation anywhere, hence posting gory details here.

Using If-Condition ADF V2

I have one Copy activity and two stored Proc Activity and i want to basically update the status of my pipeline as Failed in Logtable if any of these activities failed with error message details. Below is the flow of my pipeline
I wanted to use If-Condition activity and need help in setting the expression for it. For Copy activity i can use the below expression, but not sure about getting the status of stored Proc activity
#or(equals(activity('Copy Data Source1').output.executionDetails[0].status, 'Failed'), <expression to get the status of Stored Proc>)
If the above expression is true then i want to have one common stored proc activity that i will set in Add If True Activity to log the error details
Let me know if this possible.
I think you have overcomplicated this.
A much easier way to do that is to leverage a Failure path for required activities. Furthermore, SP would not be executed when Copy Data fails, therefore checking the status of execution of SP doesn't really make sense.
My pipeline would look like this:

How to set up output path while copying data from Azure Cosmos DB to ADLS Gen 2 via Azure Data Factory

I have a cosmos DB collection in the following format:
{
"deviceid": "xxx",
"partitionKey": "key1",
.....
"_ts": 1544583745
}
I'm using Azure Data Factory to copy data from Cosmos DB to ADLS Gen 2. If I copy using a copy activity, it is quite straightforward. However, my main concern is the output path in ADLS Gen 2. Our requirements state that we need to have the output path in a specific format. Here is a sample of the requirement:
outerfolder/version/code/deviceid/year/month/day
Now since deviceid, year, month, day are all in the payload itself I can't find a way to use them except create a lookup activity and use the output of the lookup activity in the copy activity.
And this is how I set the ouput folder using the dataset property:
I'm using SQL API on Cosmos DB to query the data.
Is there a better way I can achieve this?
I think that your way works, but its not the cleanest. What I'd do is create a different variable inside the pipeline for each one: version, code, deviceid, etc. Then, after the lookup you can assign the variables, and finally do the copy activity referencing the pipeline variables.
It may look kind of redundant, but think of someone (or you 2 years from now) having to modify the pipeline and if you are not around (or have forgotten), this way makes it clear how it works, and what you should modify.
Hope this helped!!

Is it possible to generate a unique BlobOutput name from an Azure WebJobs QueueInput item?

I have a continuous Azure WebJob that is running off of a QueueInput, generating a report, and outputting a file to a BlobOutput. This job will run for differing sets of data, each requiring a unique output file. (The number of inputs is guaranteed to scale significantly over time, so I cannot write a single job per input.) I would like to be able to run this off of a QueueInput, but I cannot find a way to set the output based on the QueueInput value, or any value except for a blob input name.
As an example, this is basically what I want to do, though it is invalid code and will fail.
public static void Job([QueueInput("inputqueue")] InputItem input, [BlobOutput("fileoutput/{input.Name}")] Stream output)
{
//job work here
}
I know I could do something similar if I used BlobInput instead of QueueInput, but I would prefer to use a queue for this job. Am I missing something or is generating a unique output from a QueueInput just not possible?
There are two alternatives:
Use IBInder to generate the blob name. Like shown in these samples
Have an autogenerated in the queue message object and bind the blob name to that property. See here (the BlobNameFromQueueMessage method) how to bind a queue message property to a blob name
Found the solution at Advanced bindings with the Windows Azure Web Jobs SDK via Curah's Complete List of Web Jobs Tutorials and Videos.
Quote for posterity:
One approach is to use the IBinder interface to bind the output blob and specify the name that equals the order id. The better and simpler approach (SimpleBatch) is to bind the blob name placeholder to the queue message properties:
public static void ProcessOrder(
[QueueInput("orders")] Order newOrder,
[BlobOutput("invoices/{OrderId}")] TextWriter invoice)
{
// Code that creates the invoice
}
The {OrderId} placeholder from the blob name gets its value from the OrderId property of the newOrder object. For example, newOrder is (JSON): {"CustomerName":"Victor","OrderId":"abc42"} then the output blob name is “invoices/abc42″. The placeholder is case-sensitive.
So, you can reference individual properties from the QueueInput object in the BlobOutput string and they will be populated correctly.

Resources