How do we create a generic mapping dataflow in datafactory that will dynamically extract data from different tables with different schema? - azure

I am trying to create a azure datafactory mapping dataflow that is generic for all tables. I am going to pass table name, the primary column for join purpose and other columns to be used in groupBy and aggregate functions as parameters to the DF.
parameters to df
I am unable to refernce this parameter in groupBy
Error: DF-AGG-003 - Groupby should reference atleast one column -
MapDrifted1 aggregate(
) ~> Aggregate1,[486 619]
Has anyone tried this scenario? Please help if you have some knowledge on this or if it can be handled in u-sql script.

We need to first lookup your parameter string name from your incoming source data to locate the metadata and assign it.
Just add a Derived Column previous to your Aggregate and it will work. Call the column 'groupbycol' in your Derived Column and use this formula: byName($group1).
In your Agg, select 'groupbycol' as your groupby column.

Related

Azure Data Factory DataFlow exclude 1 column from expression columns()

I'm looking for a solution for the following problem.
I've created the following expression in a Derived Column in Azure Data Factory DataFlow
md5(concatWS("||", toString(columns())))
But from the above expression column() I want to extract 1 Column.
so something like this md5(concatWS("||", toString(columns()-'PrimaryKey'))). I cannot exclude the primary key column with a select in front of the derived because I need it in a later stage.
So in Databricks i'm executing the following, but I want to achieve this as well in ADF
non_key_columns = [column for column in dfsourcechanges.columns if column not in key_columns]
Are there any suggestions, how I can solve this
You can try to use byNames function to do this. Create an array and add all your column names into it except 'PrimaryKey'. Then pass it to byNames function as first parameter. Something like this expression:md5(concatWS("||", toString(byNames(['yourColumn1','yourColumn2',...]))))

Passing the Dataflow Parameter to Sink Key column in Azure Data factory

I wanted to implement SCD type 2 logic but using dynamic tables and dynamic key fields from Config Table, I have a challenge to pass the Data Flow Parameter as Sink Key Column for my Alter Row activity, it is not taking the parameter values and always gives the error as invalid key column name, I tried picking the Dataflow parameter for the expression builder at sink key column and trying to pass the value from alter row transformation and I have named the field with parameter in the select statement as well , any help or suggestion highly appreciated
Please clink below image
Sample How I wanted to Pass Dynamic Values in Sink Mapping
Trying to Give the Dynamic Value to Key Value
You have "List of columns" selected, so ADF is looking for a column in your target table that is literally called "$TargetPK1Parameter".
Change the selector to "Custom expression" and enter a string array parameter. The parameter can be an array of strings that represent names of key columns in your target table.
It should look something like this:
I encountered a similar problem when trying to pass a composite key, parameterized, as part of the update method to sink. This now allows me to fully parameterise my dataflow and it handles both composite keys and single columns keys.
Here's how the data looks in my config table:
UpsertKeyColumn = DOMNAME,DDLANGUAGE,AS4LOCAL,VALPOS,AS4VERS
A parameter value is set in the dataflow
Upsert_Key_Column = #item().UpsertKeyColumn
Finally, in the Sink settings, Custom Expression is selected for Key columns and the following expression is entered - split($upsert_key_column,',')

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

Power Query how to make a Table with multiple values a parameter that uses OR

I have a question regarding Power Query and Tables as parameters for excel.
Right now I can create a table and use it as a parameter for Power query via Drill down.
But I'm unsure how i would proceed with a Table that has multiple values. How can a table be recognized with multiple "values" as a parameter
For example:
I have the following rawdata and parameter tables
Rawdata+parametertables
Now if I wanted to filter after Value2 with a parameter tables I would do a drill down of the parameter tables and load them to excel.
After that I have two tables that I can filter Value2 with an OR Function by 1 and 2
Is it possible to somehow combine this into 1 Table and that it still uses an OR Function to search
Value2
Im asking because I want it to be potentially possible to just add more and more parameters into the table without creating a new table everytime. Basically just copy paste some parameters into the parameter table and be done with it
Thanks for any help in advance
Assuming, you use Parameters only for filtering. There are other ways, but this one looks the best from performance point of view.
You may create Parameters table, so you have such tables:
Note, it's handy to have the same names (Value2) for key column in both tables, otherwise Table.Join will create additional column(s) after merging tables.
Add similar step to filter RawData table:
join = Table.Join(RawData, "Value2", Parameters, "Value2")

How to add Dynamic GroupBy Column in Data Flow Aggregate Activity in Azure Data Factory

I am using the Data Flow(preview). My "Aggregate" activity requires a GroupBy column which is not dynamic. Hence, I cant group by that column. I just want to map the column by name.
For example:
These are the two schemas:
1) Columns: M Id, Date/Time, Data Type, Values
2) Columns: MID, Date, DataType, Units
Both have actually the same data type and structure. I want to GroupBy DataType and avg(units).
Because, the name of the one field is "Data Type" and other "DataType". How do I map it together.
I have created a "Derived" activity with this
Column: DataType
Expression: case(startsWith(toString(byPosition(7)), 'D'), toString(byName('Data Type')),toString(byName('DataType')))
But it doesnt work. Any help is highly appreciated.
I just want to know how do I map the column by name.
You can write a dynamic expression directly in the Group by field in the Aggregate transformation. Hover over the Group By field and select "Computed Column" to enter into the Expression Builder.
Are you trying to determine whether to use the column "Data Type" or the column called "DataType"? If so, just enter your conditional expression directly into the expression builder on the Aggregate group by. Note that in your expression above, you are using byPosition() which is a numeric value for the number representing the incoming columns left to right, starting at position 1. Is that what you intended?

Resources