How to overwrite source table azure Data factory

How to overwrite source table azure Data factory - azure

I am new to ADF. I have a pipeline which deletes all rows of any of the attributes are null. Schema : { Name, Value, Key}
I tried using a data flow with Alter Table and set both source and sink to be the same table but it always appends to the table instead of overwriting it which creates duplicate rows and the rows I want to delete still remain. Is there a way to overwrite the table.

Assuming that your table is SQL table, I have tried to overwrite the source table after deleting the specific null values. It successfully deleted the records but got the duplicate records even after exploring various methods.
So, as an alternate you can try the below methods to achieve your requirement:
By Creating new table and deleting old table:
This is my sample source table names mytable.
Alter transformation
Give new table in the sink and in settings->post SQL scripts. give the drop command to delete the source dataset. Now your sink table is your required table. drop table [dbo].[mytable]
Result table(named newtable) and old table.
Source table deleted.
Deleting null values from source table using script activity
Use script activity to delete the null values from source table.
Source table after execution.

Related

Azure DataFactory Foreach Copy Upsert, howto use key column

I'm using azure data factory to copy a table from a mysql source to a sql server destination. I'm using the default 'record' functionality. In the copy step, I want to enable upsert. I then need to enter the key columns, and I'm wondering how to make sure that each table can have its own key column(s).
Tried entering column names, however the end result looks confusing, what is the key then for which table?

In order to give the key columns dynamically, in lookup table, a field called key_column is added for every table_name. Below is the detailed approach.
Lookup table is taken with fileds table name and key column. In ADF, lookup activity dataset is taken with the lookup table.
In for-each activity, lookup table output is taken.
Copy activity is taken in for-each activity. Source dataset is taken like below image.
In sink dataset, write behaviour is given as 'upsert' and key columns is given dynamically as
#array(item().key_column
By this way key columns can be assigned dynamically and upsert can be performed in copy activity

ADF - how to compare two Azure SQL Database tables (A and B) with the same structure and to insert only the missing values from table A to table B

I want to create an ADF data pipeline that compares both tables and after the comparison to add the missing rows from table A to table B
table A - 100 records
table B - 90 records
add the difference of 10 rows from table A to table B
This is what I tried:
picture1
picture2
if condition 1 - #greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7))
if condition 2 - #and(equals(item().name,'master_data'),greaterOrEquals(activity('GetLastModifiedDate').output.lastModified,adddays(utcnow(),-7)))

The Copy activity has an Upsert mode which I think would help here. Simple instructions:
Create one Copy activity
Set your source database in the Source tab of the Copy activity
Set your target (or sink) database in the Sink tab. Set the mode to Upsert
Specify the interim schema. This is used to create a transient table which holds data during the Upsert
Specify the unique keys for the source and target table in the Key columns section so the Upsert can take place successfully
A simple example:
Failing that, simply use a Copy activity to land the data into a table in your target database and use a Stored Proc activity to implement your more complicated logic.

Having access to Records that went to Sink from a Copy Activity

The source of my Copy activity is the result of calling a REST API and the Sink is a Azure SQL Table that I insert those records in it.
Now I want to know what records we just got inserted in that Sink so I can do some update statement on those records. So my question is how can I know what went into Sink so I can now update those records.

Two methods come to mind
If the table has a timestamp of when the records were inserted then you could use that value to know which were just inserted.
Instead of putting the records directly in the final table put them in a staging table. Then you can either update as part of the insert to move them to the final table do the updates OR update them in staging table and then copy over to the final table. Just remember to truncate the staging table every run so it only has the new records.

why the default auto created sql tables column length is -1 in azure data factory?? and how to make it fixed?

I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?

As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.

Datastax rename table

I have deployed 9 node cluster on google cloud.
Created a table and loaded the data. Now want to change the table name.
Is there any way I can change the table name in Cassandra?
Thanks

You can't rename table name.
You have to drop the table and create again
You can use ALTER TABLE to manipulate the table metadata. Do this to change the datatype of a columns, add new columns, drop existing columns, and change table properties. The command returns no results.
Start the command with the keywords ALTER TABLE, followed by the table name, followed by the instruction: ALTER. ADD, DROP, RENAME, or WITH. See the following sections for the information each instruction require
If you need the data you can backup and restore data using copy command in cqlsh.
To Backup data :
COPY old_table_name TO 'data.csv'
To Restore data :
COPY new_table_name FROM 'data.csv'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to overwrite source table azure Data factory - azure

Related

Azure DataFactory Foreach Copy Upsert, howto use key column

ADF - how to compare two Azure SQL Database tables (A and B) with the same structure and to insert only the missing values from table A to table B

Having access to Records that went to Sink from a Copy Activity

why the default auto created sql tables column length is -1 in azure data factory?? and how to make it fixed?

Datastax rename table

Categories

Resources