is it posible update row values from tables in Azure Data Factory? - azure

I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...

There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.

Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.

This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row

As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.

Related

File is not readed completely by Copy Data in Azure Data Factory

I'm developing a pipeline that be able to insert data from a .txt file located in the Blob Storage into a table in a SQL Data Base.
Problem: Somehow the activity configuration is not working properly cause' is not reading all the records in the file and in consequence is not loading all the data into the Data Base (I realized this issue when I opened the file and compared the number of records from .text file against SQL table. Also, when I searched records from the last month in the table on SQL I didn't find them)
Note: I checked out the size limit of characters in the table from SQL and that isn't the problem.
I'd like to share with you the Data Copy activity and Source Data Set configuration as well:
Sink Dataset:
Do you know, guys what I'm doing wrong here? Hope you can help me, best regards.
P.S. Here's the Source Dataset
As discussed in comments, while using copy activity you would have to make sure to set the schema before running the activity. By design the schema mapping is left empty and has to be configured by the user either manually or asking adf to import the schema from the dataset.
Note: While using Auto create table option in sink, it automatically creates sink table (if nonexistent) in source schema,
but won't be supported when a stored procedure is specified (on the
sink side) or when staging is enabled.
Using COPY statement to load data into Azure Synapse Analytics as sink, the connector supports automatically creating destination table with DISTRIBUTION = ROUND_ROBIN if not exists based on the source schema.
Refer official doc: Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines
Source...
Sink...
So Azure Synapse will be used as the sink. Additionally, an Azure Synapse table has to be created which matches the column names, column order, and column data types of source.
For dynamic mapping
If you view the pipeline code, you can see in the Translator section the JSON equivalent of the mapping section from UI.
You can reuse this as a base in Dynamic mapping to enable further copying similar files without having to manually configure schema.
Copy the JSON under mappings in translator

How to perform data factory transformations on large datasets in Azure data warehouse

We have Data warehouse tables that we perform transformations using ADF.
If I have a group of ADW tables, and I need to perform transformations on them to land them back onto ADW, should I save the transformations into Azure Blob Storage? or go direct into the target table.
The ADW tables are in excess of 100Mil records.
Is it an acceptable practice to use Blob Storage as the middle piece.
I can think of two ways to do this (they do not require moving the data into blob storage),
Do the transformation within SQL DW using stored procedure and use ADF to orchestrate the stored procedure call
Use ADF's data flow to apply the transformation to read from SQL DW and write back to SQL DW
Yes, you'd better using the use Blob Storage as the middle piece.
You can not copy the tables from SQL DW(Source) to the same SQL DW(Sink) directly! If you have tried this, you will have the problems:
Copy data: has the error in data mapping, copy data to the same table, not create new tales.
Copy Active: Table is required for Copy activity.
If you want to copy the data from SQL DW tables to new tables with Data Factor, you need at least two steps:
copy the data from the SQL DW tables to Blob storage(create the csv
files).
Load these csv files to SQL DW and create new tables.
Reference tutorials:
Copy and transform data in Azure Synapse Analytics (formerly Azure
SQL Data Warehouse) by using Azure Data Factory
Copy and transform data in Azure Blob storage by using Azure Data
Factory
Data Factory is good at transfer big data. Reference Copy performance of Data Factory. I think it may faster than SELECT - INTO Clause (Transact-SQL).
Hope this helps.

Can i populate different SQL tables at once inside azure data factory when the source data set is Blob storage?

I want to copy data from azure blob storage to azure sql database. The destination database is divided among different tables.
So is there any way in which i directly send the blob data to different sql tables using a single pipeline in one copy activity?
As this should be a trigger based pipeline so it is a continuous process, i created trigger for every hour but right now i can just send blob data to one table and then divide them into different table by invoking another pipeline where source and sink dataset both are SQL database.
Finding a solution for this
You could use a stored procedure in your database as a sink in the copy activity. This way, you can define the logic in the stored procedure to write the data to your destination tables. You can find the description of the stored procedure sink here.
You'll have to use a user defined table type for this solution, maintaining them can be difficult, if you run into issues, you can have a look at my & BioEcoSS' answer in this thread.
According to my experience and Azure Data Factory doucmentation, we could not directly send the blob data to different sql tables using a single pipeline in one copy activity.
Because during Table mapping settings, One Copy Data Active only allows us select one corresponding table in the destination data store or specify the stored procedure to run at the destination.
You don't need to create a new pipeline, just add a new copy data active, each copy active call different stored procedure.
Hope this helps.

How to upsert/insert records in all tables in an Azure SQL Database with Azure Data Factory v2

I have an Azure SQL database with many tables that I want to update frequently with any change made, be it an update or an insert, using Azure Data Factory v2.
There is a section in the documentation that explains how to do this.
However, the example is about two tables, and for each table a TYPE needs to be defined, and for each table a Stored Procedure is built.
I don't know how to generalize this for a large number of tables.
Any suggestion would be welcome.
You can follow my answer https://stackoverflow.com/a/69896947/7392069 but I don't know how to generalise creation of table types and stored procedures, but at least the metadata table of the metadata driven copy task provides a lot of comfort to achieve what you need.

azure data factory dataset cleanup

Does anyone tell me how Azure Data factory datasets are cleaned up (removed, deleted etc). Is there any policy or settings to control it?
From what I can say, all the time series of data sets are left behind intact.
Say, I want to develop an activity which overwrites data daily in the destination folder in Azure Blob or Data Lake storage (for example which is mapped to external table in Azure Datawarehouse and it is a full data extract). How can I achieve this with just copy activity? Shall I add custom .Net activity to do the cleanup no longer needed datasets myself?
Yes, you would need a custom activity to delete pipepline output.
You could have pipeline activities that overwrite the same output but you must be careful to understand how ADF slicing model and activity dependency works, so that anything using the output gets a clear and repeatable set.

Resources