Azure Data Factory: Moving data from Table Storage to SQL Azure - azure

While moving data from Table Storage to SQL Azure, is it possible to obtain only the Delta (The data that hasn't been already moved) using Azure Data Factory?
A more detailed explanation:
There is an Azure Storage Table, which contains some data, which will be updated periodically. And I want to create a Data Factory pipeline which moves this data to an SQL Azure Database. But during each move I only want the newly added data to be written to SQL DB. Is it possible with Azure Data Factory?

See more information on azureTableSourceQuery and copy activity at this link : https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-table-connector/#azure-table-copy-activity-type-properties.
Also see this link for invoking stored procedure for sql: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink
You can query each time on timestamp to achieve something similar to delta copy, but this is not true delta copy.

Related

Can we pass whole data of csv from blob storage to on-premises SQL database using Azure data factory?

I am trying to pass all records of csv from blob storage to on-premises database of SQL using Azure data factory. I know how to pass data one by one using LookUp and Copy activity but I don't know how to pass all records of csv.
you can directly use copy activity with blob file as the source and Azure SQL DB table as sink wherein all records of file would be copied into table.
There is no need of a lookup activity
You will have to use copy activity to copy data from azure blob storage to on-prem SQL database.
You can follow below steps:
Step1:
Select copy activity in data factory.
Step2:
Select Source dataset as azure blob storage
Step3:
Select on-prem sql database as a sink
Step4:
Click on import schema to do the mapping.
Step5:
Finally Execute Copy activity. No need to use lookup activity here.

How to perform data factory transformations on large datasets in Azure data warehouse

We have Data warehouse tables that we perform transformations using ADF.
If I have a group of ADW tables, and I need to perform transformations on them to land them back onto ADW, should I save the transformations into Azure Blob Storage? or go direct into the target table.
The ADW tables are in excess of 100Mil records.
Is it an acceptable practice to use Blob Storage as the middle piece.
I can think of two ways to do this (they do not require moving the data into blob storage),
Do the transformation within SQL DW using stored procedure and use ADF to orchestrate the stored procedure call
Use ADF's data flow to apply the transformation to read from SQL DW and write back to SQL DW
Yes, you'd better using the use Blob Storage as the middle piece.
You can not copy the tables from SQL DW(Source) to the same SQL DW(Sink) directly! If you have tried this, you will have the problems:
Copy data: has the error in data mapping, copy data to the same table, not create new tales.
Copy Active: Table is required for Copy activity.
If you want to copy the data from SQL DW tables to new tables with Data Factor, you need at least two steps:
copy the data from the SQL DW tables to Blob storage(create the csv
files).
Load these csv files to SQL DW and create new tables.
Reference tutorials:
Copy and transform data in Azure Synapse Analytics (formerly Azure
SQL Data Warehouse) by using Azure Data Factory
Copy and transform data in Azure Blob storage by using Azure Data
Factory
Data Factory is good at transfer big data. Reference Copy performance of Data Factory. I think it may faster than SELECT - INTO Clause (Transact-SQL).
Hope this helps.

Is there any way to filter out some data in copy activity when source is Blob storage and sink is SQL database?

I am trying to copy data from Azure blobs to azure SQL database using Azure Data Factory.
The azure blobs are incrementally stored everytime in the storage account. They are just JSON having key-value pairs. So I want to filter the data on the basis of one key-value before it get copied inside the SQL databse.
you could use stored procedure to do some filtering before writing into your db.
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#invoking-stored-procedure-for-sql-sink

Is possible to read an Azure Databricks table from Azure Data Factory?

I have a table into an Azure Databricks Cluster, i would like to replicate this data into an Azure SQL Database, to let another users analyze this data from Metabase.
Is it possible to acess databricks tables through Azure Data factory?
No, unfortunately not. Databricks tables are typically temporary and last as long as your job/session is running. See here.
You would need to persist your databricks table to some storage in order to access it. Change your databricks job to dump the table to Blob storage as it's final action. In the next step of your data factory job, you can then read the dumped data from the storage account and process further.
Another option may be databricks delta although I have not tried this yet...
If you register the table in the Databricks hive metastore then ADF could read from it using the ODBC source in ADF. Though this would require an IR.
Alternatively you could write the table to external storage such as blob or lake. ADF can then read that file and push it to your sql database.

ETL using azure table storage

Is there a way i can transform per minute data logged in azure table storage to hourly ,daily and monthly tables?
I have heard of stream analytics and data lake but don't get how this can be done through above two technologies.
As I know we could do that with Azure Data Factory easily on the azure portal . Please have a try to follow my detail steps.
1.Login the Azure new Portal
2.Add a Data Factory
3.Click [Copy data (preview)] to set properties, we can set Recurring pattern as minute , hourly, daily …as we like
4.Choose the source data store as we like, in the demo I choose azure blob table.
5.Specify new Azure storage connection
6.Select tables from the azure storage which to copy data
7.Apply filter if we want to
8.Select destination data store
9.Table mapping
10.Select Parallel copy settings
11.Get the setting summary
12.We can check that copy action has been done from Data Factory
13.Check from the Azure storage table

Resources