Move data from Cosmos DB to Azure data lake storage - azure

Can we move / load data to azure data lake storage from Cosmos DB. If it can be done what are the prerequisites?
New to to this any help is appreciated.

Have you looked at Azure Data Factory?
You can use the existing activities such as Copy From on how to export documents from Cosmosdb, and this Copy to to know about how to import data to Azure data lake.

Related

How to Ingest data into Azure data explorer from the Azure Table Storage source without Data Factory

I'm new to Azure Data Explorer. Here I need to migrate the data from Azure Table Storage table data into Azure Data Explorer Cluster's Database's Table without using Azure Data Factory.
May be if able to do programatically using .NET, kindly suggest it.
Thanks in advance.

Export data from Azure SQL managed instance to Azure Data Lake Storage as json

I have a requirement to export data from Azure SQL managed instance to data lake storage as json documents. I have to use SQL Server Integration Services to accomplish this. I tried using the Flexible File Destination Data Flow task but when I see the supported file formats there's no json being supported. What other options do I have to accomplish this.
Azure Data Factory support data movement between Azure Managed Instance and Data lake account, but unfortunately when the destination is Azure Data Lake storage it also doesn't support JSON format using SSIS.
Azure Data Lake Store Destination
The Azure Data Lake Store Destination component enables an SSIS
package to write data to an Azure Data Lake Store. The supported file
formats are: Text, Avro, and ORC.
Workaround: The possible workaround you can try to use Data Flow activity in Azure Data Factory. Load the data from Managed Instance and transform it using Pivot transformation and store the processed data in Data Lake. This approach doesn't involve SSIS. Check this similar kind of request and approach here.

Migrate data from Azure data lake in one subscription to another

I have been looking for options to migrate data present in my ADLS in one subscription to ADLS in another subscription within Azure. I tried ADF for this purpose and it worked fine.
But the copy speed is too slow in ADF. It copies at a speed of 10-15 KB/sec. Is there some way to increase speed of copy while using ADF?
Yes, there is a way you can migrate data from Azure Data Lake between different subscription: Data Factory.
No matter Data Lake Gen1 or Gen2, Data Factory all support them as the connector. Please ref these tutorials:
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory.
Copy and transform data in Azure Data Lake Storage Gen2 using Azure
Data Factory.
You can create the source and sink dataset in different subscription through linked service:
But this option may cost you some money. You also could ref the Azure Az-copy tutorials: Copy blobs between Azure storage accounts by using AzCopy.
Here is another blog How To Copy Files From One Azure Storage Account To Another:
In this post, Bloger will outline how to copy data from one Azure
Storage Account in one subscription to another Storage Account in
another subscription.
These maybe what you're looking for.

Delta Lake Gen2 for a MongoDB migration

Which Azure pipeline and Data Storage you would prefer for a MongoDB migration?
I know there exists the functionality of an Azure Migration Service where you can shift MongoDB data directly to an Azure CosmosDB. Azure Migration Services seems to be available only for specific licenses. Using Cosmos DB it is also necessary to take care of costs.
Another possibility is to use Stitch to shift MongoDB directly into Azure.
Since we don't want to use an additional tool, we want to use Azure Data Factory to shift the MongoDB data into an Azure Data Storage. We want to use the Data Lake Storage Gen2, as it combines the advantages of the Blob Storage and the Data Lake Storage Gen1.
Which pipeline you would prefer? Any experiences with storing the MongoDB data in Azure Data Lake Storage Gen2?
Please see the following Azure Data Factory document pertaining to Pipelines and Activities, which details the source and target data endpoints that are currently supported.
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory
Copy data from MongoDB using Azure Data Factory
Using the MongoDB connector as a source and Azure Data Lake Storage Gen2 as a sink, you can then perform any transformation and finally, migrate the data to Azure Cosmos DB...if desired.
Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory
Copy data to or from Azure Cosmos DB's API for MongoDB by using Azure Data Factory
If you experience any issues with the migration of data to Azure Cosmos DB, if that is the goal of the migration, then consider the following direct migration paths: Options to migrate your on-premises or cloud data to Azure Cosmos DB

Extracting and Transforming Data from local MySQL to Azure Synapse Data Warehouse

I'm trying to setup a Demo Data Warehouse in Azure Synapse. I would like to extract data from a local MySQL database, transform and aggregate some data and store it in fact-/dimension tables in Azure Synapse Analytics.
Currently I have an instance of Azure SQL Data Warehouse and Data Factory. I created a connection to my MySQL database in Data Factory and my thought was, i can use this connector as input for a new Data Flow, which transforms the dataset and stores it to my destination dataset, which is linked to my Azure Synapse Data Warehouse.
The Problem is, Data Factory just support some Azure Services like Azure Data Lake or Azure SQL Database as Source for a new Data Flow.
What would be the best practice for solving this Problem? Create an Instance of Azure SQL Database, copy the Data from the local MySQL Database to the Azure SQL Database and use it then as Source for a new Data Flow?
Best practice here is to use the Copy Activity in an ADF pipeline to land the data from MySQL into Parquet in Blob or ADLS G2, then transform the data using Data Flows.

Resources