Delta Lake Gen2 for a MongoDB migration - azure

Which Azure pipeline and Data Storage you would prefer for a MongoDB migration?
I know there exists the functionality of an Azure Migration Service where you can shift MongoDB data directly to an Azure CosmosDB. Azure Migration Services seems to be available only for specific licenses. Using Cosmos DB it is also necessary to take care of costs.
Another possibility is to use Stitch to shift MongoDB directly into Azure.
Since we don't want to use an additional tool, we want to use Azure Data Factory to shift the MongoDB data into an Azure Data Storage. We want to use the Data Lake Storage Gen2, as it combines the advantages of the Blob Storage and the Data Lake Storage Gen1.
Which pipeline you would prefer? Any experiences with storing the MongoDB data in Azure Data Lake Storage Gen2?

Please see the following Azure Data Factory document pertaining to Pipelines and Activities, which details the source and target data endpoints that are currently supported.
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory
Copy data from MongoDB using Azure Data Factory
Using the MongoDB connector as a source and Azure Data Lake Storage Gen2 as a sink, you can then perform any transformation and finally, migrate the data to Azure Cosmos DB...if desired.
Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory
Copy data to or from Azure Cosmos DB's API for MongoDB by using Azure Data Factory
If you experience any issues with the migration of data to Azure Cosmos DB, if that is the goal of the migration, then consider the following direct migration paths: Options to migrate your on-premises or cloud data to Azure Cosmos DB

Related

Copy Data from Azure Data Lake to SnowFlake without stage using Azure Data Factory

All the Azure Data Factory examples of copying data from Azure Data Lake Gen 2 to SnowFlake use a storage account as stage. If the stage is not configured (as shown in picture), I get this error in Data Factory even when my source is a csv file in Azure data lake - "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging".
At the same time, SnowFlake documentation says the the external stage is optional. How can I copy data from Azure Data Lake to SnowFlake using Data Factory's Copy Data Activity without having an external storage account as stage?
If staging storage is needed to make it work, we shouldn't say that data copy from Data Lake to SnowFlake is supported. It works only when, Data Lake data is is first copied in a storage blob and then to SnowFlake.
Though Snowflake supports blob storage, Data Lake storage Gen2, General purpose v1 & v2 storages, loading data into snowflake is supported- through blob storage only.
The source linked service is Azure Blob storage with shared access signature authentication. If you want to directly copy data from Azure Data Lake Storage Gen2 in the following supported format, you can create an Azure Blob linked service with SAS authentication against your ADLS Gen2 account, to avoid using staged copy to Snowflake.
Select Azure blob storage in linked service, provide SAS URI details of Azure data lake gen2 source file.
Blob storage linked service with data lake gen2 file:
You'll have to configure blob storage and use it as staging. As an alternative you can use external stage. You'll have to create a FILE TYPE and NOTIFICATION INTEGRATION and access the ADLS and load data into Snowflake using copy command. Let me know if you need more help on this.

Move data from Cosmos DB to Azure data lake storage

Can we move / load data to azure data lake storage from Cosmos DB. If it can be done what are the prerequisites?
New to to this any help is appreciated.
Have you looked at Azure Data Factory?
You can use the existing activities such as Copy From on how to export documents from Cosmosdb, and this Copy to to know about how to import data to Azure data lake.

How to store data from Azure Analysis services into Azure Datalake using Azure Data Factory?

How to store data from Azure Analysis services into Azure Datalake using Azure Data Factory?
I have two tables in Azure Analysis service and I need to copy and store those data into Azure Datalake using Azure Data Factory.
Could you please help me or share the reference url.
There is no direct connector for Azure analysis services in ADF but there are some ways via which you can copy the data :
1 such way is to create a linked server across a on prem/IaaS or SQL MI database instance and analysis services and get the data from the SQL instance through ADF.
Can we copy data from Azure Analysis Services using Azure Data Factory?
The above links explains it in details w.r.t the setup.
https://datasharkx.wordpress.com/2021/03/16/copy-data-from-ssas-aas-through-azure-data-factory/

Extracting and Transforming Data from local MySQL to Azure Synapse Data Warehouse

I'm trying to setup a Demo Data Warehouse in Azure Synapse. I would like to extract data from a local MySQL database, transform and aggregate some data and store it in fact-/dimension tables in Azure Synapse Analytics.
Currently I have an instance of Azure SQL Data Warehouse and Data Factory. I created a connection to my MySQL database in Data Factory and my thought was, i can use this connector as input for a new Data Flow, which transforms the dataset and stores it to my destination dataset, which is linked to my Azure Synapse Data Warehouse.
The Problem is, Data Factory just support some Azure Services like Azure Data Lake or Azure SQL Database as Source for a new Data Flow.
What would be the best practice for solving this Problem? Create an Instance of Azure SQL Database, copy the Data from the local MySQL Database to the Azure SQL Database and use it then as Source for a new Data Flow?
Best practice here is to use the Copy Activity in an ADF pipeline to land the data from MySQL into Parquet in Blob or ADLS G2, then transform the data using Data Flows.

Connection between Azure Data Factory and Databricks

I'm wondering what is the most appropriate way of accessing databricks from Azure data factory.
Currently I've got databricks as a linked service to which I gain access via a generated token.
What do you want to do?
Do you want to trigger a Databricks notebook from ADF?
Do you want to supply Databricks with data? (blob storage or Azure Data Lake store)
Do you want to retrieve data from Databricks? (blob storage or Azure Data Lake store)

Resources