I'm using azure data factory to copy a table from a mysql source to a sql server destination. I'm using the default 'record' functionality. In the copy step, I want to enable upsert. I then need to enter the key columns, and I'm wondering how to make sure that each table can have its own key column(s).
Tried entering column names, however the end result looks confusing, what is the key then for which table?
In order to give the key columns dynamically, in lookup table, a field called key_column is added for every table_name. Below is the detailed approach.
Lookup table is taken with fileds table name and key column. In ADF, lookup activity dataset is taken with the lookup table.
In for-each activity, lookup table output is taken.
Copy activity is taken in for-each activity. Source dataset is taken like below image.
In sink dataset, write behaviour is given as 'upsert' and key columns is given dynamically as
#array(item().key_column
By this way key columns can be assigned dynamically and upsert can be performed in copy activity
Related
I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.
I want to create an ADF data pipeline that compares both tables and after the comparison to add the missing rows from table A to table B
table A - 100 records
table B - 90 records
add the difference of 10 rows from table A to table B
This is what I tried:
picture1
picture2
if condition 1 - #greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7))
if condition 2 - #and(equals(item().name,'master_data'),greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7)))
The Copy activity has an Upsert mode which I think would help here. Simple instructions:
Create one Copy activity
Set your source database in the Source tab of the Copy activity
Set your target (or sink) database in the Sink tab. Set the mode to Upsert
Specify the interim schema. This is used to create a transient table which holds data during the Upsert
Specify the unique keys for the source and target table in the Key columns section so the Upsert can take place successfully
A simple example:
Failing that, simply use a Copy activity to land the data into a table in your target database and use a Stored Proc activity to implement your more complicated logic.
I am new to ADF. I have a pipeline which deletes all rows of any of the attributes are null. Schema : { Name, Value, Key}
I tried using a data flow with Alter Table and set both source and sink to be the same table but it always appends to the table instead of overwriting it which creates duplicate rows and the rows I want to delete still remain. Is there a way to overwrite the table.
Assuming that your table is SQL table, I have tried to overwrite the source table after deleting the specific null values. It successfully deleted the records but got the duplicate records even after exploring various methods.
So, as an alternate you can try the below methods to achieve your requirement:
By Creating new table and deleting old table:
This is my sample source table names mytable.
Alter transformation
Give new table in the sink and in settings->post SQL scripts. give the drop command to delete the source dataset. Now your sink table is your required table. drop table [dbo].[mytable]
Result table(named newtable) and old table.
Source table deleted.
Deleting null values from source table using script activity
Use script activity to delete the null values from source table.
Source table after execution.
I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?
As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.
I wanted to achieve an incremental load from oracle to Azure SQL data warehouse using azure data factory. The Issue that I am facing is I don't have any date column or any key column to perform Incremental load Is there any other way to achieve this.
You will either have to:
A. Identify a field in each table you want to use to determine if the row has changed
B. Implement some kind of change capture feature on the source data
Those are really the only the only two ways to limit the amount of data you pull from the source.
It wouldn't be very efficient, but if you are just trying not to update rows that haven't changed in your destination, you can hash your source values and hash the values in the destination, and only insert/update rows where the hashes don't match. Here's an example of how this works in T-SQL.
There is a section of the Data Factory documentation dedicated to incrementally loading data. Please check it out if you haven't.