Azure DF Parameterizing - azure

I am trying to parameterize a mapping data flow in Azure DF UI. I am declaring a variable in the pipeline, which is taking the result from a lookup value - the current timestamp of a table. In the mapping dataflow though I am defining a parameter, which I would like to take the value of the variable declared in the pipeline. The variable is of type string, holding the current timestamp value.
this is the pipeline in debug
the dataflow parameter is taking the value of the variable
I am setting empty string as a default value - I tried also setting it to take variable value
that is a filter expression to filter based on this timestamp
So I guess I have to add a default value to the parameter value inside the Data Flow. How should this be done using the expression language? Thank you in advance!

When you are passing the Data flow parameter value from the Pipeline, the default value is not at all required. You can leave it as it is without giving any value.
When you debug the pipeline, the value is taken from the pipeline only and not from the default value.
For the Data preview of each transformation, you can provide a temporary hardcoded value by going through the click to see parameters below. It will help us to check whether the Data flow preview is giving correct results or not.
Please go through my repro for your reference:
This is my sample data
After creating the myparam parameter in the Data flow, I have passed my variable myname to it from pipeline. Here, I have given the value of the myname variable as #string('Rakesh') to filter this name.
Don't check that checkbox.
The condition in the Filter as Name==$myparam.
Result:

The best way to add a default value you can add a previous timestamp JUST LIKE THIS to hold and btw it will not be used as its an dynamic paramter so dont worry just place it there, but it will allow the debug to preview data and example is and it run in my case as i am also getting the same lookup max date and using in an dynamic Query

Related

Not able to change datatype of Additional Column in Copy Activity - Azure Data Factory

I am facing very simple problem of not able to change the datatype of additional column in copy activity in ADF pipeline from String to Datetime
I am trying to change source datatype for additional column in mapping using JSON but still it doesn't work with polybase cmd
When I run my pipeline it gives same error
Is it not possible to change datatype of additional column, by default it takes string only
Dynamic columns return string.
Try to put the value [Ex. utcnow()] in the dynamic content of query and cast it to the required target datatype.
Otherwise you can use data-flow-derived-column :
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
Since your source is a query, you can choose to bring current date in source SQL query itself in the desired format rather than adding it in the additional column.
Thanks
Try to use formatDateTime as shown below and define the desired Date format:
Here since format given is ‘yyyy-dd-MM’, the result will look as below:
Note: The output here will be of string format only as in Copy activity we could not cast data type as of the date.
We could either create current date in the Source sql query or use above way so that the data would load into the sink in expected format.

Aggregation in Azure Data Flow is Returning Invalid Value

I have created a data flow in Data Factory.
Step 1. Read the parquet file.
Step 2. Aggregate the file to get the Max(DateField)
Step 3. Use a derived column to write in a Value.
Step 4. Alter row task with Value and the DateField.
Step 5. Sink select the Watermark table to update.
The flow updates the value, but it isn't putting in the max value. The date value is incorrect. Any ideas?
Flow_image
max() aggregate function doesn't work on date/string format type. You must pass any column which contains numerical values. Date is not a valid input on which you can apply max function. There is no maximum date term.
Instead you can filter the timestamp and get the latest or oldest date using ADF.
Refer this answer by #Leon to know how to implement the same.

Passing the Dataflow Parameter to Sink Key column in Azure Data factory

I wanted to implement SCD type 2 logic but using dynamic tables and dynamic key fields from Config Table, I have a challenge to pass the Data Flow Parameter as Sink Key Column for my Alter Row activity, it is not taking the parameter values and always gives the error as invalid key column name, I tried picking the Dataflow parameter for the expression builder at sink key column and trying to pass the value from alter row transformation and I have named the field with parameter in the select statement as well , any help or suggestion highly appreciated
Please clink below image
Sample How I wanted to Pass Dynamic Values in Sink Mapping
Trying to Give the Dynamic Value to Key Value
You have "List of columns" selected, so ADF is looking for a column in your target table that is literally called "$TargetPK1Parameter".
Change the selector to "Custom expression" and enter a string array parameter. The parameter can be an array of strings that represent names of key columns in your target table.
It should look something like this:
I encountered a similar problem when trying to pass a composite key, parameterized, as part of the update method to sink. This now allows me to fully parameterise my dataflow and it handles both composite keys and single columns keys.
Here's how the data looks in my config table:
UpsertKeyColumn = DOMNAME,DDLANGUAGE,AS4LOCAL,VALPOS,AS4VERS
A parameter value is set in the dataflow
Upsert_Key_Column = #item().UpsertKeyColumn
Finally, in the Sink settings, Custom Expression is selected for Key columns and the following expression is entered - split($upsert_key_column,',')

Azure-Data-Factory - If Condition returns false despite being logically true

I'm trying to do a logical test to compare two activity outputs.
The first one is giving back a file name (derived from GetMetaData)and the other one distinct filenames that are already in the database (derived from a lookup Activity).
So the first activity is giving X.csv (a file in a Blob0 while the second one is giving a list Y.csv; Z.csv (the result of the lookup Select distinct from table X)
Based on this outcome I would say that the logical test is true so ADF has to start a particular activity. I'm using the expresion below, but despite the fact there are no errors the outcome is always false. What am I doing wrong? I guess it has something to do with the lookup activity because the query will give a list of values I think.
please help thanks in advance!
#equals(activity('GetBlobName').output,activity('LookupBestandsnaam').output)
Output activity LookupBestandsnaam:
Output activity GetBlobName:
The output of Lookup and Get Metadata are different:
Lookup activity reads and returns the content of a configuration file
or table.
Get Metadata activity to retrieve the metadata of any data in Azure
Data Factory
We can't compare the output directly. You will always get false in the if condition expression.
Please try bellow expression:
#equals(activity('GetBlobName').output.value.name,activity('LookupBestandsnaam').output.value.bestandsnaam)
Update:
Congratulations that you use another way to solved it:
"I have now replaced the if condition with a stored procedure that uses an IF exists script running on the basis of look-up activity in ADF."

How to pass a Data Flow Parameter in Key Column in Sink Tanformation while updating a data?

I am implementing SCD Type2 through Data Flow. I having created a Parameter in it where I will pass a column name and this Parameter I am using in Sink Transformation in Key Column.
Passing a parameter in Key Column in Data Flow
I have selected the Add Dynamic Content and then Parameter, after that I selected the parameter I have created in Data Flow. Then it shows like "$Key_col".
But when I run the pipeline it gives me an error-
{"message":"at Sink 'sink1'(Line 56/Col 6): Column operands are not allowed in literal expressions. Details:at Sink 'sink1'(Line 56/Col 6): Column operands are not allowed in literal expressions","failureType":"UserError","target":"Update_Existing_Records","errorCode":"DFExecutorUserError"}
Can anyone please tell me how resolve this error or any workaround for this Problem.
Yes, this work. You just need to put single quotes around the parameter value like this:
"'$Key_col'"
I'm using double-quotes for string interpolation in this solution, so paste it in your expression exactly as that.
Key column doesn't support set with parameter. You only can choose the exist column in sink.
The column name that you pick as the key here will be used by ADF as part of the subsequent update, upsert, delete. Therefore, you must pick a column that exists in the Sink mapping. If you wish to not write the value to this key column, then click "Skip writing key columns".
Please reference: Mapping data flow properties.
The parameter Key_col is not exist in the sink, even if it has the same name.
Update:
Data Flow parameter:
If we want to using update, we must add an Alter row active:
Sink, key column choose exist column 'name':
Pipeline runs successful:
Hope this helps.

Resources