Extracting metadata in Azure Data factory - azure

I have a csv file
Customer,Gender,Age,City
1,Male,23,Chennai
4,Female,34,Madurai
3,Male,23,Bangalore
My Azure SQL DB's table TAB_A has only one column: Column_Name
I need to move the header of csv file into TAB_A such that the result is:
Column_Name
Customer
Gender
Age
City
Is it possible to achieve this functionality with ADF - Mapping Data flow without using Databricks/Python.
I tried with Source - Surrogate Key - Filter. Able to extract header as row. Unable to transpose. Any pointers? Thanks.

I created a simple test and successfully inserted the header into the sql table.
I created a test.csv file, set it as source data, unselect First row as header.
Source data preview is as follows:
Use SurrogateKey1 activity to generate a Row_No column.
SurrogateKey1 activity data preview is as follows:
Use FIlter1 activity to filter header via expression Row_No == 1.
Data preview is as follows:
Use Unpivot1 activity to perform row-column conversion.
Ungroup by Row_No.
Unpivot key: just fill in a column name.
Unpivoted columns: This column name must be consistent with the column name in your sql table. This way ADF will do automatic mapping.
Data preview is as follows:
That's all.

Related

Azure DataFactory Foreach Copy Upsert, howto use key column

I'm using azure data factory to copy a table from a mysql source to a sql server destination. I'm using the default 'record' functionality. In the copy step, I want to enable upsert. I then need to enter the key columns, and I'm wondering how to make sure that each table can have its own key column(s).
Tried entering column names, however the end result looks confusing, what is the key then for which table?
In order to give the key columns dynamically, in lookup table, a field called key_column is added for every table_name. Below is the detailed approach.
Lookup table is taken with fileds table name and key column. In ADF, lookup activity dataset is taken with the lookup table.
In for-each activity, lookup table output is taken.
Copy activity is taken in for-each activity. Source dataset is taken like below image.
In sink dataset, write behaviour is given as 'upsert' and key columns is given dynamically as
#array(item().key_column
By this way key columns can be assigned dynamically and upsert can be performed in copy activity

how to convert currency to decimal in Azure Data Factory's copy data activity mapping

In Azure Data Factory i have a pipe line and pipeline has one copy data activity that has a source a REST api and a destination a SQL DB Table.
In the mapping of this copy activity i am telling that which columns from REST dataset (on left) will be mapped to which columns on SQL dataset (onright)
there is a json property in Rest "totalBalance" that is supposed to be mapped to "Balance" field in DB Tables.
Json has "totalBalance" as string for example "$36,970,267.07" so how to convert this into decimal so that i can map it to DataBase table?
do i need to some how use mapping activity instead of copy activity ? or just the copy activity can do that ?
finally what worked for me was having a copy activity and a mapping activity.
Copy activity copies data from REST to SQLtable where all the columns are VARCHAR type and from that table a mapping activity sinks data from SQL(allString) tables to actual destination SQLTable.
But between mapping and sink i added "Derived Column" for each source property i want to convert and in expression of that derived column i am using expression like this
toDecimal(replace(replace(totalAccountReceivable, '$', ''),',',''))
The copy activity can not do that directly.
There are two way I think can do this:
First:change decimal to varchar in DB Tables.
Second:add a lookup activity before copy activity and remove the '$' in 'totalBalance' column,then add an additional column like this:
Finally,use this additional column map to 'Balance' column.
Hope this can help you.

why the default auto created sql tables column length is -1 in azure data factory?? and how to make it fixed?

I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?
As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.

Filter csv file according to null values using azure data factory

I am having a CSV file in blob now I wanted to push that CSV file into SQL table using azure data factory but want I want is to put a check condition on CSV data if any cell has null value so that row data will copy into an error table like for an example I have ID, name and contact column in CSV so for any record lets say contact is null(1, 'Gaurav', NULL) so in that case, this row will insert into an error table and if there is no null in the row then that row will go into the master table
Note: As the sink is SQL on a VM so we can't create any this over there we have to handle this on data factory level only
This can be done using a mapping data flow in ADF. One way of doing it is to use a derived column with an expression that that does the null check with for example the isNull() function. That way you can populated a new column with some value for the different cases, which you can then use in a conditional split to redirect the different streams to different sinks.

Need to insert csv source data to Azure SQL Database

I have a source data in csv. I created a sql table to insert the csv data. My sql table has primary key column & foreign key column in it. I cannot skip these 2 columns while mapping in Data factory. How to overcome this & insert data ?
Please refer to the rules in the Schema mapping in copy activity.
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a
column name that is specified in the output dataset "structure"
section. Either fewer columns or more columns in the "structure"
of sink dataset than specified in the mapping.
Duplicate mapping.
So,if your csv file does not cover all the columns in the sql database,copy activity can't work.
You could consider creating temporary table in sql database to match your csv file ,then use stored procedure to fill the exact table. Please refer to the detailed steps in this case to implement your requirement:Azure Data Factory mapping 2 columns in one column

Resources