How to change types of multiple columns on Datafactory - azure

I have a table with hundreds of columns.
This table will be modified in a pipeline on Datafactory.
One of these modifications is: change columns types from original type to desired type.
For example: the table have 50+ columns with type 'decimal' but i need to change them to 'integer'. All of this columns have a name starting with 'CD_'
How can i do that on Datafactory's Dataflow?
I'm trying to use Column Patterns in Derived Column Transformation. I created this following pattern (see picture attached).
My Column Pattern
But when i update the Data Visualization tab, i got the following error:
DF-EXPR-021 at Derive 'ChangeTypes'(Line 589/Col 30): Return types do not match
Thank you very much.

Related

Couldnt convert to number when expanding a table power query

I have a very annoying problem when i try to merge two tables on power query excel. I use one column to match records from both tables and when i try to expand the second table it pops up the following message:
DataFormat.Error: We couldn't convert to Number.
Λεπτομέρειες:
ECS
I have no idea how to fix this. The columns that are matched have text, not value. There are no errors when i import data. Is there anyone that can help?
Try the following:
Delete the step #"Changed Type" in both queries
Make sure that the two merged columns have the same type (text ABC, in your case)
When you create a query from a table, Power Query try to guess (based on the first 200 lines) the type of each column. Now, the value "Λεπτομέρειες: ECS" is probably included in a different column (than the two merged) that has Number 123 as a type. It's kind of a mixed column (due to the source of data itself or to a delimiter issue).

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

Azure Data Flow - Can we have Dynamic columns or change in projections for Unpiovt functionality

The excel consist of 62 columns and 7 columns are fixed and rest of them have weeks as in year(week1 to week 52)
I have used a data flow task to unpivot the 53 columns into rows with 2 extra columns year and value.
The problem is that I have the 52 week column names keep changing on every week data load and how to I handle this change in column names in data flow. For a single run it gives the exact output
What you'll want to do here is to implement late-binding of your schema, or what ADF refers to as "schema drift". Instead of setting a hardened "early binding" schema in your Source projection, leave the dataset schema and projection empty.
Next, add a Derived Column after your source and call it "Projection". This is where you'll build your projection using rules to account for your evolving schema.
Build out your canonical model with the column names for your entire year using byName('columnname'). That will tell ADF to look for the existence of the column in single quotes from your source data while also providing a schema that you can use to build out your pivot table.
If you need to cast the values, wrap byName() inside of a casting function, i.e. toString(), toDate(), etc.

How to add Dynamic GroupBy Column in Data Flow Aggregate Activity in Azure Data Factory

I am using the Data Flow(preview). My "Aggregate" activity requires a GroupBy column which is not dynamic. Hence, I cant group by that column. I just want to map the column by name.
For example:
These are the two schemas:
1) Columns: M Id, Date/Time, Data Type, Values
2) Columns: MID, Date, DataType, Units
Both have actually the same data type and structure. I want to GroupBy DataType and avg(units).
Because, the name of the one field is "Data Type" and other "DataType". How do I map it together.
I have created a "Derived" activity with this
Column: DataType
Expression: case(startsWith(toString(byPosition(7)), 'D'), toString(byName('Data Type')),toString(byName('DataType')))
But it doesnt work. Any help is highly appreciated.
I just want to know how do I map the column by name.
You can write a dynamic expression directly in the Group by field in the Aggregate transformation. Hover over the Group By field and select "Computed Column" to enter into the Expression Builder.
Are you trying to determine whether to use the column "Data Type" or the column called "DataType"? If so, just enter your conditional expression directly into the expression builder on the Aggregate group by. Note that in your expression above, you are using byPosition() which is a numeric value for the number representing the incoming columns left to right, starting at position 1. Is that what you intended?

ADFv2 trouble with column mapping (reposting)

I have a source .csv with 21 columns and a destination table with 25 columns.
Not ALL columns within the source have a home in the destination table and not all columns in the destination table come from the source.
I cannot get my CopyData task to let me pick and choose how I want the mapping to be. The only way I can get it to work so far is to load the source data to a "holding" table that has a 1:1 mapping and then execute a stored procedure to insert data from that table into the final destination.
I've tried altering the schemas on both the source and destination to match but it still errors out because the ACTUAL source has more columns than the destination or vice versa.
This can't possibly be the most efficient way to accomplish this but I'm at a loss as to how to make it work.
Yes I have tried the user interface, yes I have tried the column schemas, no I can't modify the source file and shouldn't need to.
The error code that is returned is some variation on:
"errorCode": "2200",
"message": "ErrorCode=UserErrorInvalidColumnMappingColumnCountMismatch,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Invalid column mapping provided to copy activity: '{LONG LIST OF COLUMN MAPPING HERE}', Detailed message: Different column count between target structure and column mapping. Target column count:25, Column mapping count:16. Check column mapping in table definition.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "LoadPrimaryOwner"
Tim F. Please view the statements in this Schema mapping in copy activity:
Column mapping supports mapping all or subset of columns in the source
dataset "structure" to all columns in the sink dataset "structure".
The following are error conditions that result in an exception:
1.Source data store query result does not have a column name that is specified in the input dataset "structure" section.
2.Sink data store (if with pre-defined schema) does not have a column name that is specified in the output dataset "structure" section.
3.Either fewer columns or more columns in the "structure" of sink dataset than specified in the mapping.
4.Duplicate mapping.
So,you could know that all the columns in the sink dataset need to be mapped. Since you can't change the destination,maybe you don't have to struggle in an unsupported feature.
Of course ,you could use stored procedure mentioned in your description.That's a perfect workaround and not very troublesome. About the using details, you could refer to my previous cases:
1.Azure Data Factory activity copy: Evaluate column in sink table with #pipeline().TriggerTime
2.Azure Data factory copy activity failed mapping strings (from csv) to Azure SQL table sink uniqueidentifier field
In addition, if you really don't want avoid above solution,you could submit feedback to ADF team about your desired feature.

Resources