I have a source data in csv. I created a sql table to insert the csv data. My sql table has primary key column & foreign key column in it. I cannot skip these 2 columns while mapping in Data factory. How to overcome this & insert data ?
Please refer to the rules in the Schema mapping in copy activity.
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a
column name that is specified in the output dataset "structure"
section. Either fewer columns or more columns in the "structure"
of sink dataset than specified in the mapping.
Duplicate mapping.
So,if your csv file does not cover all the columns in the sql database,copy activity can't work.
You could consider creating temporary table in sql database to match your csv file ,then use stored procedure to fill the exact table. Please refer to the detailed steps in this case to implement your requirement:Azure Data Factory mapping 2 columns in one column
Related
I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.
I am new to ADF. I have a pipeline which deletes all rows of any of the attributes are null. Schema : { Name, Value, Key}
I tried using a data flow with Alter Table and set both source and sink to be the same table but it always appends to the table instead of overwriting it which creates duplicate rows and the rows I want to delete still remain. Is there a way to overwrite the table.
Assuming that your table is SQL table, I have tried to overwrite the source table after deleting the specific null values. It successfully deleted the records but got the duplicate records even after exploring various methods.
So, as an alternate you can try the below methods to achieve your requirement:
By Creating new table and deleting old table:
This is my sample source table names mytable.
Alter transformation
Give new table in the sink and in settings->post SQL scripts. give the drop command to delete the source dataset. Now your sink table is your required table. drop table [dbo].[mytable]
Result table(named newtable) and old table.
Source table deleted.
Deleting null values from source table using script activity
Use script activity to delete the null values from source table.
Source table after execution.
I have an excel file as source that needs to be copied into the Azure SQL database using Azure Data Factory.
The ADF pipeline needs to copy the rows from the excel source to SQL database only if it is already not existing in the database. If it exists in the SQL database then no action needs to be taken.
looking forward to the best optimized solution.
You can achieve it using Azure data factory data flow by joining source and sink data and filter the new insert rows to insert if the row does not exist in the sink database.
Example:
Connect excel source to source transformation in the data flow.
Source preview:
You can transform the source data if required using the derived column transformation. This is optional.
Add another source transformation and connect it with the sink dataset (Azure SQL database). Here in the Source option, you can select a table if you are comparing all columns of the sink dataset with the source dataset, or you can select query and write the query to select only matching columns.
Source2 output:
Join source1 and source2 transformations using the Join transformation with join type as Left outer join and add the Join conditions based on the requirement.
Join output:
Using filter transformation, filter out the existing rows from the join output.
Filter condition: isNull(source2#Id)==true()
Filter output:
Using the Select transformation, you can remove the duplicate columns (like source2 columns) from the list. You can also do this in sink mapping by editing manually and deleting the duplicate rows.
Add sink and connect to sink dataset (azure SQL database) to get the required output.
You should create this using a Copy activity and a stored procedure as the Sink. Write code in the stored proc (eg MERGE or INSERT ... WHERE NOT EXISTS ...) to handle the record existing or not existing.
An example of a MERGE proc from the documentation:
CREATE PROCEDURE usp_OverwriteMarketing
#Marketing [dbo].[MarketingType] READONLY,
#category varchar(256)
AS
BEGIN
MERGE [dbo].[Marketing] AS target
USING #Marketing AS source
ON (target.ProfileID = source.ProfileID and target.Category = #category)
WHEN MATCHED THEN
UPDATE SET State = source.State
WHEN NOT MATCHED THEN
INSERT (ProfileID, State, Category)
VALUES (source.ProfileID, source.State, source.Category);
END
This article runs through the process in more detail.
In Azure Data Factory i have a pipe line and pipeline has one copy data activity that has a source a REST api and a destination a SQL DB Table.
In the mapping of this copy activity i am telling that which columns from REST dataset (on left) will be mapped to which columns on SQL dataset (onright)
there is a json property in Rest "totalBalance" that is supposed to be mapped to "Balance" field in DB Tables.
Json has "totalBalance" as string for example "$36,970,267.07" so how to convert this into decimal so that i can map it to DataBase table?
do i need to some how use mapping activity instead of copy activity ? or just the copy activity can do that ?
finally what worked for me was having a copy activity and a mapping activity.
Copy activity copies data from REST to SQLtable where all the columns are VARCHAR type and from that table a mapping activity sinks data from SQL(allString) tables to actual destination SQLTable.
But between mapping and sink i added "Derived Column" for each source property i want to convert and in expression of that derived column i am using expression like this
toDecimal(replace(replace(totalAccountReceivable, '$', ''),',',''))
The copy activity can not do that directly.
There are two way I think can do this:
First:change decimal to varchar in DB Tables.
Second:add a lookup activity before copy activity and remove the '$' in 'totalBalance' column,then add an additional column like this:
Finally,use this additional column map to 'Balance' column.
Hope this can help you.
I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?
As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.