ADFv2 trouble with column mapping (reposting) - azure

I have a source .csv with 21 columns and a destination table with 25 columns.
Not ALL columns within the source have a home in the destination table and not all columns in the destination table come from the source.
I cannot get my CopyData task to let me pick and choose how I want the mapping to be. The only way I can get it to work so far is to load the source data to a "holding" table that has a 1:1 mapping and then execute a stored procedure to insert data from that table into the final destination.
I've tried altering the schemas on both the source and destination to match but it still errors out because the ACTUAL source has more columns than the destination or vice versa.
This can't possibly be the most efficient way to accomplish this but I'm at a loss as to how to make it work.
Yes I have tried the user interface, yes I have tried the column schemas, no I can't modify the source file and shouldn't need to.
The error code that is returned is some variation on:
"errorCode": "2200",
"message": "ErrorCode=UserErrorInvalidColumnMappingColumnCountMismatch,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Invalid column mapping provided to copy activity: '{LONG LIST OF COLUMN MAPPING HERE}', Detailed message: Different column count between target structure and column mapping. Target column count:25, Column mapping count:16. Check column mapping in table definition.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "LoadPrimaryOwner"

Tim F. Please view the statements in this Schema mapping in copy activity:
Column mapping supports mapping all or subset of columns in the source
dataset "structure" to all columns in the sink dataset "structure".
The following are error conditions that result in an exception:
1.Source data store query result does not have a column name that is specified in the input dataset "structure" section.
2.Sink data store (if with pre-defined schema) does not have a column name that is specified in the output dataset "structure" section.
3.Either fewer columns or more columns in the "structure" of sink dataset than specified in the mapping.
4.Duplicate mapping.
So,you could know that all the columns in the sink dataset need to be mapped. Since you can't change the destination,maybe you don't have to struggle in an unsupported feature.
Of course ,you could use stored procedure mentioned in your description.That's a perfect workaround and not very troublesome. About the using details, you could refer to my previous cases:
1.Azure Data Factory activity copy: Evaluate column in sink table with #pipeline().TriggerTime
2.Azure Data factory copy activity failed mapping strings (from csv) to Azure SQL table sink uniqueidentifier field
In addition, if you really don't want avoid above solution,you could submit feedback to ADF team about your desired feature.

Related

I'm unable to map drift columns in Azure Data Factory (ADF)

I can't map drift my columns in ADF data flow. I'm able to manually, but this isn't possible as I have 1020 columns. File is .csv
I see a message: 'This drifted column is not in the source schema and therefore can only be referenced with pattern matching expressions'
I was hoping to have a map drifted data flow from my source data.
With > 1k columns, you should consider NOT mapping those columns. Just use column patterns inside your transformation expressions to access columns. Otherwise, ADF will have to materialize the entire 1k+ columns as a physical projection.

Upsert Option in ADF Copy Activity

With the "upsert option" , should I expect to see "0" as "Rows Written" in a copy activity result summary?
My situation is this: The source and sink table columns are not exactly the same but the Key columns to tell it how to know the write behavior are correct.
I have tested and made sure that it does actually do insert or update based on the data I give to it BUT what I don't understand is if I make ZERO changes and just keep running the pipeline , why does it not show "zero" in the Rows Written summary?
The main reason why rowsWritten is not shown as 0 even when the source and destination have same data is:
Upsert inserts data when a key column value is absent in target table and updates the values of other rows whenever the key column is found in target table.
Hence, it is modifying all records irrespective of the changes in data. As in SQL Merge, there is no way to tell copy activity that if an entire row already exists in target table, then ignore that case.
So, even when key_column matches, it is going to update the values for rest of the columns and hence counted as row written. The following is an example of 2 cases
The rows of source and sink are same:
The rows present:
id,gname
1,Ana
2,Ceb
3,Topias
4,Jerax
6,Miracle
When inserting completely new rows:
The rows present in source are (where sink data is as above):
id,gname
8,Sumail
9,ATF

How to change types of multiple columns on Datafactory

I have a table with hundreds of columns.
This table will be modified in a pipeline on Datafactory.
One of these modifications is: change columns types from original type to desired type.
For example: the table have 50+ columns with type 'decimal' but i need to change them to 'integer'. All of this columns have a name starting with 'CD_'
How can i do that on Datafactory's Dataflow?
I'm trying to use Column Patterns in Derived Column Transformation. I created this following pattern (see picture attached).
My Column Pattern
But when i update the Data Visualization tab, i got the following error:
DF-EXPR-021 at Derive 'ChangeTypes'(Line 589/Col 30): Return types do not match
Thank you very much.

Passing the Dataflow Parameter to Sink Key column in Azure Data factory

I wanted to implement SCD type 2 logic but using dynamic tables and dynamic key fields from Config Table, I have a challenge to pass the Data Flow Parameter as Sink Key Column for my Alter Row activity, it is not taking the parameter values and always gives the error as invalid key column name, I tried picking the Dataflow parameter for the expression builder at sink key column and trying to pass the value from alter row transformation and I have named the field with parameter in the select statement as well , any help or suggestion highly appreciated
Please clink below image
Sample How I wanted to Pass Dynamic Values in Sink Mapping
Trying to Give the Dynamic Value to Key Value
You have "List of columns" selected, so ADF is looking for a column in your target table that is literally called "$TargetPK1Parameter".
Change the selector to "Custom expression" and enter a string array parameter. The parameter can be an array of strings that represent names of key columns in your target table.
It should look something like this:
I encountered a similar problem when trying to pass a composite key, parameterized, as part of the update method to sink. This now allows me to fully parameterise my dataflow and it handles both composite keys and single columns keys.
Here's how the data looks in my config table:
UpsertKeyColumn = DOMNAME,DDLANGUAGE,AS4LOCAL,VALPOS,AS4VERS
A parameter value is set in the dataflow
Upsert_Key_Column = #item().UpsertKeyColumn
Finally, in the Sink settings, Custom Expression is selected for Key columns and the following expression is entered - split($upsert_key_column,',')

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

Resources