How to bulkload Azure SQLDB from ADLS - azure

I am aware that in ADF copy activity can be used to load data from ADLS to Azure SQL DB.
Is there any possibility of bulk loading.
For example, ADLS --> Synapse have to option of PolyBase for bulk loading.
Is there any efficient way to load huge number of records from ADLS to Azure SQL DB.
Thanks
Madhan

You can use either BULK INSERT or OPENROWSET to get data from blob storage into Azure SQL Database. A simple example with OPENROWSET:
SELECT *
FROM OPENROWSET (
BULK 'someFolder/somecsv.csv',
DATA_SOURCE = 'yourDataSource',
FORMAT = 'CSV',
FORMATFILE = 'yourFormatFile.fmt',
FORMATFILE_DATA_SOURCE = 'MyAzureInvoices'
) AS yourFile;
A simple example with BULK INSERT:
BULK INSERT yourTable
FROM 'someFolder/somecsv.csv'
WITH (
DATA_SOURCE = 'yourDataSource',
FORMAT = 'CSV'
);
There is some setup to be done first, ie you have to use the CREATE EXTERNAL DATA SOURCE statement, but I find it a very effective way of getting data in Azure SQL DB without the overhead of setting up an ADF pipeline. It's especially good for ad hoc loads.
This article talks the steps through in more detail:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15

Data Factory has the good performance for big data transferring, ref: Copy performance and scalability achievable using ADF. You could follow this document to improve the copy performance for the huge number of records in ADLS. I think it may be better than BULK INSERT.
We can not use BULK INSERT (Transact-SQL) directly in Data Factory. But we can using bulk copy for ADLS to Azure SQL database. Data Factory gives us the tutorial and example.
Ref here: Bulk copy from files to database:
This article describes a solution template that you can use to copy
data in bulk from Azure Data Lake Storage Gen2 to Azure Synapse
Analytics / Azure SQL Database.
Hope it's helpful.

Related

How to perform data factory transformations on large datasets in Azure data warehouse

We have Data warehouse tables that we perform transformations using ADF.
If I have a group of ADW tables, and I need to perform transformations on them to land them back onto ADW, should I save the transformations into Azure Blob Storage? or go direct into the target table.
The ADW tables are in excess of 100Mil records.
Is it an acceptable practice to use Blob Storage as the middle piece.
I can think of two ways to do this (they do not require moving the data into blob storage),
Do the transformation within SQL DW using stored procedure and use ADF to orchestrate the stored procedure call
Use ADF's data flow to apply the transformation to read from SQL DW and write back to SQL DW
Yes, you'd better using the use Blob Storage as the middle piece.
You can not copy the tables from SQL DW(Source) to the same SQL DW(Sink) directly! If you have tried this, you will have the problems:
Copy data: has the error in data mapping, copy data to the same table, not create new tales.
Copy Active: Table is required for Copy activity.
If you want to copy the data from SQL DW tables to new tables with Data Factor, you need at least two steps:
copy the data from the SQL DW tables to Blob storage(create the csv
files).
Load these csv files to SQL DW and create new tables.
Reference tutorials:
Copy and transform data in Azure Synapse Analytics (formerly Azure
SQL Data Warehouse) by using Azure Data Factory
Copy and transform data in Azure Blob storage by using Azure Data
Factory
Data Factory is good at transfer big data. Reference Copy performance of Data Factory. I think it may faster than SELECT - INTO Clause (Transact-SQL).
Hope this helps.

Is possible to read an Azure Databricks table from Azure Data Factory?

I have a table into an Azure Databricks Cluster, i would like to replicate this data into an Azure SQL Database, to let another users analyze this data from Metabase.
Is it possible to acess databricks tables through Azure Data factory?
No, unfortunately not. Databricks tables are typically temporary and last as long as your job/session is running. See here.
You would need to persist your databricks table to some storage in order to access it. Change your databricks job to dump the table to Blob storage as it's final action. In the next step of your data factory job, you can then read the dumped data from the storage account and process further.
Another option may be databricks delta although I have not tried this yet...
If you register the table in the Databricks hive metastore then ADF could read from it using the ODBC source in ADF. Though this would require an IR.
Alternatively you could write the table to external storage such as blob or lake. ADF can then read that file and push it to your sql database.

Azure SQL DW to Azure SQL DW using Polybase

I know you can use polybase using external table to load large volume of data from Blob Storage to Azure SQL DW. But is there any possibility that we can import the data from SQL DW to another SQL DW using polybase directly? Or is there some other way? There must be some way to avoid control node in both SQL DW.
You might be better off using Azure Data Factory to move data between two Azure SQL Data Warehouses. It would make light work of moving the data, but beware any data movement costs, particularly moving across region. Start here. Check the 'Use Polybase' checkbox.
If you do just want to use Polybase and Blob Storage, then you would have to:
first export the data from the source system internal tables to blob store using CETAS.
in the target system create external tables over the files in blob store
n the target system import the data from the external tables into the database using CTAS.
As far as I know, you have to use PolyBase and either Blob Storage or Data Lake Store to get the maximum throughput (bypass the control node)
You can create a new SQL DW from a geo-backup which should be a complete copy of the SQL DW with a 24 hour SLA. First click on create new SQL DW and select backup as an option as opposed to blank or sample.

How do I bulk insert into an Azure SQLServer Database?

I would like to do bulk inserts to my Azure database from Python, but I can't find the documentation for how it's done.
This page says:
The following table summarizes the options for moving data to an Azure SQL Database.
The section linked from that table says:
The steps for the procedure using the Bulk Insert SQL Query are similar to those covered in the sections for moving data from a flat file source to SQL Server on an Azure VM.
And that provides the following query:
BULK INSERT <tablename>
FROM
'<datafilename>'
WITH
(
FirstRow=2,
FIELDTERMINATOR =',', --this should be column separator in your data
ROWTERMINATOR ='\n' --this should be the row separator in your data
)
But presumably that datafile has to live somewhere, but I can't find where in the documentation that it confirms where this data file should live. I can create a csv file and upload it as a blob to Azure storage, but nobody in the last year had an answer for how get it from there to SQL Azure.
How can I bulk insert into SQL Azure?
SQL Server vNext CTP supports T-SQL commands that load from Azure Blob Storage. This will be available in Azure SQL Database soon so you can use BULK INSERT command from your example to load data from Azure Blob Storage.

Azure Data Factory: Moving data from Table Storage to SQL Azure

While moving data from Table Storage to SQL Azure, is it possible to obtain only the Delta (The data that hasn't been already moved) using Azure Data Factory?
A more detailed explanation:
There is an Azure Storage Table, which contains some data, which will be updated periodically. And I want to create a Data Factory pipeline which moves this data to an SQL Azure Database. But during each move I only want the newly added data to be written to SQL DB. Is it possible with Azure Data Factory?
See more information on azureTableSourceQuery and copy activity at this link : https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-table-connector/#azure-table-copy-activity-type-properties.
Also see this link for invoking stored procedure for sql: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink
You can query each time on timestamp to achieve something similar to delta copy, but this is not true delta copy.

Resources