Can i populate different SQL tables at once inside azure data factory when the source data set is Blob storage? - azure

I want to copy data from azure blob storage to azure sql database. The destination database is divided among different tables.
So is there any way in which i directly send the blob data to different sql tables using a single pipeline in one copy activity?
As this should be a trigger based pipeline so it is a continuous process, i created trigger for every hour but right now i can just send blob data to one table and then divide them into different table by invoking another pipeline where source and sink dataset both are SQL database.
Finding a solution for this

You could use a stored procedure in your database as a sink in the copy activity. This way, you can define the logic in the stored procedure to write the data to your destination tables. You can find the description of the stored procedure sink here.
You'll have to use a user defined table type for this solution, maintaining them can be difficult, if you run into issues, you can have a look at my & BioEcoSS' answer in this thread.

According to my experience and Azure Data Factory doucmentation, we could not directly send the blob data to different sql tables using a single pipeline in one copy activity.
Because during Table mapping settings, One Copy Data Active only allows us select one corresponding table in the destination data store or specify the stored procedure to run at the destination.
You don't need to create a new pipeline, just add a new copy data active, each copy active call different stored procedure.
Hope this helps.

Related

File is not readed completely by Copy Data in Azure Data Factory

I'm developing a pipeline that be able to insert data from a .txt file located in the Blob Storage into a table in a SQL Data Base.
Problem: Somehow the activity configuration is not working properly cause' is not reading all the records in the file and in consequence is not loading all the data into the Data Base (I realized this issue when I opened the file and compared the number of records from .text file against SQL table. Also, when I searched records from the last month in the table on SQL I didn't find them)
Note: I checked out the size limit of characters in the table from SQL and that isn't the problem.
I'd like to share with you the Data Copy activity and Source Data Set configuration as well:
Sink Dataset:
Do you know, guys what I'm doing wrong here? Hope you can help me, best regards.
P.S. Here's the Source Dataset
As discussed in comments, while using copy activity you would have to make sure to set the schema before running the activity. By design the schema mapping is left empty and has to be configured by the user either manually or asking adf to import the schema from the dataset.
Note: While using Auto create table option in sink, it automatically creates sink table (if nonexistent) in source schema,
but won't be supported when a stored procedure is specified (on the
sink side) or when staging is enabled.
Using COPY statement to load data into Azure Synapse Analytics as sink, the connector supports automatically creating destination table with DISTRIBUTION = ROUND_ROBIN if not exists based on the source schema.
Refer official doc: Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines
Source...
Sink...
So Azure Synapse will be used as the sink. Additionally, an Azure Synapse table has to be created which matches the column names, column order, and column data types of source.
For dynamic mapping
If you view the pipeline code, you can see in the Translator section the JSON equivalent of the mapping section from UI.
You can reuse this as a base in Dynamic mapping to enable further copying similar files without having to manually configure schema.
Copy the JSON under mappings in translator

Using existing data sets withing Azure data factory in a new data flow

I'm having trouble using existing data sets withing Azure data factory when I want to create a new data flow. under data set combo box in data factory doesn't load the existing data sets.
Also when I want to create a new data set most of the data sources such as SQL server is disable.
Is there any idea?
please see this screen shot
I Can not select the data sets which was built before
When I press new data set sql server is disable and I can not select it.
SQL server in this list is disabled and can not be selected
Data flow supports seven type source by now, SQL server is not supported. If your exist dataset isn't within these types, you can't choose them. More Details please refer to this documentation.
Since you can't use SQL Server within a Data Flow, prior to the Data Flow you need to run a Copy activity to stage your data from SQL Server into a location reachable by the subsequent Data Flow. The easiest option to use is a Delimited text file (CSV) in Azure Blob Storage.

Get files list after azure data factory copy activity

Is there a method that gives me the list of files copied in azure data lake storage after a copy activity in azure data factory? I have to copy data from a datasource and after i have to skip files based on a particular condition. Condition must check also file path and name with other data from sql database. any idea?
As of now, there's no function to get the files list after a copy activity. You can however use a get Metadata activity or a Lookup Activity and chain a Filter activity to it to get the list of files based on your condition.
There's a workaround that you can check out here.
"The solution was actually quite simple in this case. I just created another pipeline in Azure Data Factory, which was triggered by a Blob Created event, and the folder and filename passed as parameters to my notebook. Seems to work well, and a minimal amount of configuration or code required. Basic filtering can be done with the event, and the rest is up to the notebook.
For anyone else stumbling across this scenario, details below:
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger"

is it posible update row values from tables in Azure Data Factory?

I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...
There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.
Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.
This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.

azure data factory dataset cleanup

Does anyone tell me how Azure Data factory datasets are cleaned up (removed, deleted etc). Is there any policy or settings to control it?
From what I can say, all the time series of data sets are left behind intact.
Say, I want to develop an activity which overwrites data daily in the destination folder in Azure Blob or Data Lake storage (for example which is mapped to external table in Azure Datawarehouse and it is a full data extract). How can I achieve this with just copy activity? Shall I add custom .Net activity to do the cleanup no longer needed datasets myself?
Yes, you would need a custom activity to delete pipepline output.
You could have pipeline activities that overwrite the same output but you must be careful to understand how ADF slicing model and activity dependency works, so that anything using the output gets a clear and repeatable set.

Resources