Extracting and Transforming Data from local MySQL to Azure Synapse Data Warehouse - azure

I'm trying to setup a Demo Data Warehouse in Azure Synapse. I would like to extract data from a local MySQL database, transform and aggregate some data and store it in fact-/dimension tables in Azure Synapse Analytics.
Currently I have an instance of Azure SQL Data Warehouse and Data Factory. I created a connection to my MySQL database in Data Factory and my thought was, i can use this connector as input for a new Data Flow, which transforms the dataset and stores it to my destination dataset, which is linked to my Azure Synapse Data Warehouse.
The Problem is, Data Factory just support some Azure Services like Azure Data Lake or Azure SQL Database as Source for a new Data Flow.
What would be the best practice for solving this Problem? Create an Instance of Azure SQL Database, copy the Data from the local MySQL Database to the Azure SQL Database and use it then as Source for a new Data Flow?

Best practice here is to use the Copy Activity in an ADF pipeline to land the data from MySQL into Parquet in Blob or ADLS G2, then transform the data using Data Flows.

Related

Export data from Azure SQL managed instance to Azure Data Lake Storage as json

I have a requirement to export data from Azure SQL managed instance to data lake storage as json documents. I have to use SQL Server Integration Services to accomplish this. I tried using the Flexible File Destination Data Flow task but when I see the supported file formats there's no json being supported. What other options do I have to accomplish this.
Azure Data Factory support data movement between Azure Managed Instance and Data lake account, but unfortunately when the destination is Azure Data Lake storage it also doesn't support JSON format using SSIS.
Azure Data Lake Store Destination
The Azure Data Lake Store Destination component enables an SSIS
package to write data to an Azure Data Lake Store. The supported file
formats are: Text, Avro, and ORC.
Workaround: The possible workaround you can try to use Data Flow activity in Azure Data Factory. Load the data from Managed Instance and transform it using Pivot transformation and store the processed data in Data Lake. This approach doesn't involve SSIS. Check this similar kind of request and approach here.

Loading data into Azure Synapse Analytics from Azure SQL Database

I am followin this tutorial to move data from SQL to Azure Synapse https://learn.microsoft.com/en-us/azure/data-factory/load-azure-sql-data-warehouse?tabs=data-factory
However, once I get to step 5c I cannot select a a Database name, Do I have to create an Azure Synapse Database first to copy data over there? I though that is what this tutorial will do?
I have a SQL database and I want to move the data into Azure Synapse.
Thanks
Yes, in Azure data factory your source and sink needs to be already present w.r.t database scenarios.
So it is expected that you already have an Azure SQL database and Azure SQL datawarehouse in place before proceeding with copy activity

Delta Lake Gen2 for a MongoDB migration

Which Azure pipeline and Data Storage you would prefer for a MongoDB migration?
I know there exists the functionality of an Azure Migration Service where you can shift MongoDB data directly to an Azure CosmosDB. Azure Migration Services seems to be available only for specific licenses. Using Cosmos DB it is also necessary to take care of costs.
Another possibility is to use Stitch to shift MongoDB directly into Azure.
Since we don't want to use an additional tool, we want to use Azure Data Factory to shift the MongoDB data into an Azure Data Storage. We want to use the Data Lake Storage Gen2, as it combines the advantages of the Blob Storage and the Data Lake Storage Gen1.
Which pipeline you would prefer? Any experiences with storing the MongoDB data in Azure Data Lake Storage Gen2?
Please see the following Azure Data Factory document pertaining to Pipelines and Activities, which details the source and target data endpoints that are currently supported.
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory
Copy data from MongoDB using Azure Data Factory
Using the MongoDB connector as a source and Azure Data Lake Storage Gen2 as a sink, you can then perform any transformation and finally, migrate the data to Azure Cosmos DB...if desired.
Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory
Copy data to or from Azure Cosmos DB's API for MongoDB by using Azure Data Factory
If you experience any issues with the migration of data to Azure Cosmos DB, if that is the goal of the migration, then consider the following direct migration paths: Options to migrate your on-premises or cloud data to Azure Cosmos DB

Az MySql to Az SQL Server - Data Lake Gen2

I creating Data Factory Pipeline to Load Initial and Incremental into Data Lake from Az MySql database to an Az SQL Server database.
Initial Pipeline to load data from MySql to Data Lake is all good. Is being persisted as .parquet files.
Now I need to load these into a SQL Server table with some basic type conversions. What is the best way?
Databricks => mount these .parquet files, standardised and load into SQL Server tables?
Or can I create an external source to these files in SQL Server on Azure and do standardisation. We are not on Synapse (dwh) yet.
Or is there better way?
Since you are already using ADF , you can explore Mapping data flow .
https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview

Azure Data Factory: Moving data from Table Storage to SQL Azure

While moving data from Table Storage to SQL Azure, is it possible to obtain only the Delta (The data that hasn't been already moved) using Azure Data Factory?
A more detailed explanation:
There is an Azure Storage Table, which contains some data, which will be updated periodically. And I want to create a Data Factory pipeline which moves this data to an SQL Azure Database. But during each move I only want the newly added data to be written to SQL DB. Is it possible with Azure Data Factory?
See more information on azureTableSourceQuery and copy activity at this link : https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-table-connector/#azure-table-copy-activity-type-properties.
Also see this link for invoking stored procedure for sql: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink
You can query each time on timestamp to achieve something similar to delta copy, but this is not true delta copy.

Resources