why the default auto created sql tables column length is -1 in azure data factory?? and how to make it fixed? - azure

I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?

As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.

Related

Azure DataFactory Foreach Copy Upsert, howto use key column

I'm using azure data factory to copy a table from a mysql source to a sql server destination. I'm using the default 'record' functionality. In the copy step, I want to enable upsert. I then need to enter the key columns, and I'm wondering how to make sure that each table can have its own key column(s).
Tried entering column names, however the end result looks confusing, what is the key then for which table?
In order to give the key columns dynamically, in lookup table, a field called key_column is added for every table_name. Below is the detailed approach.
Lookup table is taken with fileds table name and key column. In ADF, lookup activity dataset is taken with the lookup table.
In for-each activity, lookup table output is taken.
Copy activity is taken in for-each activity. Source dataset is taken like below image.
In sink dataset, write behaviour is given as 'upsert' and key columns is given dynamically as
#array(item().key_column
By this way key columns can be assigned dynamically and upsert can be performed in copy activity

How to rename column names from lookup in ADF?

I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.

ADF - how to compare two Azure SQL Database tables (A and B) with the same structure and to insert only the missing values from table A to table B

I want to create an ADF data pipeline that compares both tables and after the comparison to add the missing rows from table A to table B
table A - 100 records
table B - 90 records
add the difference of 10 rows from table A to table B
This is what I tried:
picture1
picture2
if condition 1 - #greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7))
if condition 2 - #and(equals(item().name,'master_data'),greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7)))
The Copy activity has an Upsert mode which I think would help here. Simple instructions:
Create one Copy activity
Set your source database in the Source tab of the Copy activity
Set your target (or sink) database in the Sink tab. Set the mode to Upsert
Specify the interim schema. This is used to create a transient table which holds data during the Upsert
Specify the unique keys for the source and target table in the Key columns section so the Upsert can take place successfully
A simple example:
Failing that, simply use a Copy activity to land the data into a table in your target database and use a Stored Proc activity to implement your more complicated logic.

Storing dictionary from API into Table

I'm querying an API using Azure Data Factory and the data I receive from the API looks like this.
{
"96":"29/09/2022",
"95":"31/08/2022",
"93":"31/07/2022"
)
When I come to write this data to a table, ADF assumes the column names are the numbers and the dates are stored as rows like this
96
95
93
29/09/2022
31/08/2022
31/07/2022
when i would like it to look like this
Date
ID
29/09/2022
96
31/08/2022
95
31/07/2022
93
Does any one have any suggestions on how to handle this, I ideally want to avoid using USP's and dynamic SQL. I really only need the ID for the month of the previous one we're in.
PS - API doesn't support any filtering on this object
Updates
I'm querying the API using a web activity and if i try to store the data to an Array variable the activity fails as the output is an object.
When I use a copy data activity I've set the sink to automatically create the table and the mapping looks likes this
mapping image
Thanks
Instead of directly trying to copy the JSON response to SQL table, convert it the response to a string, extract the required values and insert them into the SQL table.
Look at the following demonstration. I have taken the sample response provided as a parameter (object type). I used set variable activity for extracting the values.
My parameter:
{"93":"31/07/2022","95":"31/08/2022","96":"29/09/2022"}
Dynamic content used in set variable activity:
#split(replace(replace(replace(replace(string(pipeline().parameters.api_op),' ',''),'"',''),'{',''),'}',''),',')
The output for set variable activity will be:
Now inside For each activity (pass the previous variable value as items value in for each), I used copy data to copy each row separately to my sink table (Auto create option enabled). I have taken a sample json file as my source (We are going to ignore all the columns anyway.)
Create the required 2 additional columns called id and date with the following dynamic content:
#id
#split(item(),':')[0]
#date
#split(item(),':')[1]
Configure the sink. Select the database, create dataset, give a name for table (I have given dbo.opTable) and select Auto create table under sink settings.
The following is an image of mapping. Delete the column mappings which are not required and only use additional columns created above.
When I debug the pipeline, it will run successfully, and the required values are inserted into the table. The following is output sink table for reference.

How to overwrite source table azure Data factory

I am new to ADF. I have a pipeline which deletes all rows of any of the attributes are null. Schema : { Name, Value, Key}
I tried using a data flow with Alter Table and set both source and sink to be the same table but it always appends to the table instead of overwriting it which creates duplicate rows and the rows I want to delete still remain. Is there a way to overwrite the table.
Assuming that your table is SQL table, I have tried to overwrite the source table after deleting the specific null values. It successfully deleted the records but got the duplicate records even after exploring various methods.
So, as an alternate you can try the below methods to achieve your requirement:
By Creating new table and deleting old table:
This is my sample source table names mytable.
Alter transformation
Give new table in the sink and in settings->post SQL scripts. give the drop command to delete the source dataset. Now your sink table is your required table. drop table [dbo].[mytable]
Result table(named newtable) and old table.
Source table deleted.
Deleting null values from source table using script activity
Use script activity to delete the null values from source table.
Source table after execution.

Resources