Copy Data from Azure Data Lake to SnowFlake without stage using Azure Data Factory - azure

All the Azure Data Factory examples of copying data from Azure Data Lake Gen 2 to SnowFlake use a storage account as stage. If the stage is not configured (as shown in picture), I get this error in Data Factory even when my source is a csv file in Azure data lake - "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging".
At the same time, SnowFlake documentation says the the external stage is optional. How can I copy data from Azure Data Lake to SnowFlake using Data Factory's Copy Data Activity without having an external storage account as stage?
If staging storage is needed to make it work, we shouldn't say that data copy from Data Lake to SnowFlake is supported. It works only when, Data Lake data is is first copied in a storage blob and then to SnowFlake.

Though Snowflake supports blob storage, Data Lake storage Gen2, General purpose v1 & v2 storages, loading data into snowflake is supported- through blob storage only.
The source linked service is Azure Blob storage with shared access signature authentication. If you want to directly copy data from Azure Data Lake Storage Gen2 in the following supported format, you can create an Azure Blob linked service with SAS authentication against your ADLS Gen2 account, to avoid using staged copy to Snowflake.
Select Azure blob storage in linked service, provide SAS URI details of Azure data lake gen2 source file.
Blob storage linked service with data lake gen2 file:

You'll have to configure blob storage and use it as staging. As an alternative you can use external stage. You'll have to create a FILE TYPE and NOTIFICATION INTEGRATION and access the ADLS and load data into Snowflake using copy command. Let me know if you need more help on this.

Related

Migrate data from Azure data lake in one subscription to another

I have been looking for options to migrate data present in my ADLS in one subscription to ADLS in another subscription within Azure. I tried ADF for this purpose and it worked fine.
But the copy speed is too slow in ADF. It copies at a speed of 10-15 KB/sec. Is there some way to increase speed of copy while using ADF?
Yes, there is a way you can migrate data from Azure Data Lake between different subscription: Data Factory.
No matter Data Lake Gen1 or Gen2, Data Factory all support them as the connector. Please ref these tutorials:
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory.
Copy and transform data in Azure Data Lake Storage Gen2 using Azure
Data Factory.
You can create the source and sink dataset in different subscription through linked service:
But this option may cost you some money. You also could ref the Azure Az-copy tutorials: Copy blobs between Azure storage accounts by using AzCopy.
Here is another blog How To Copy Files From One Azure Storage Account To Another:
In this post, Bloger will outline how to copy data from one Azure
Storage Account in one subscription to another Storage Account in
another subscription.
These maybe what you're looking for.

Delta Lake Gen2 for a MongoDB migration

Which Azure pipeline and Data Storage you would prefer for a MongoDB migration?
I know there exists the functionality of an Azure Migration Service where you can shift MongoDB data directly to an Azure CosmosDB. Azure Migration Services seems to be available only for specific licenses. Using Cosmos DB it is also necessary to take care of costs.
Another possibility is to use Stitch to shift MongoDB directly into Azure.
Since we don't want to use an additional tool, we want to use Azure Data Factory to shift the MongoDB data into an Azure Data Storage. We want to use the Data Lake Storage Gen2, as it combines the advantages of the Blob Storage and the Data Lake Storage Gen1.
Which pipeline you would prefer? Any experiences with storing the MongoDB data in Azure Data Lake Storage Gen2?
Please see the following Azure Data Factory document pertaining to Pipelines and Activities, which details the source and target data endpoints that are currently supported.
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory
Copy data from MongoDB using Azure Data Factory
Using the MongoDB connector as a source and Azure Data Lake Storage Gen2 as a sink, you can then perform any transformation and finally, migrate the data to Azure Cosmos DB...if desired.
Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory
Copy data to or from Azure Cosmos DB's API for MongoDB by using Azure Data Factory
If you experience any issues with the migration of data to Azure Cosmos DB, if that is the goal of the migration, then consider the following direct migration paths: Options to migrate your on-premises or cloud data to Azure Cosmos DB

Connection between Azure Data Factory and Databricks

I'm wondering what is the most appropriate way of accessing databricks from Azure data factory.
Currently I've got databricks as a linked service to which I gain access via a generated token.
What do you want to do?
Do you want to trigger a Databricks notebook from ADF?
Do you want to supply Databricks with data? (blob storage or Azure Data Lake store)
Do you want to retrieve data from Databricks? (blob storage or Azure Data Lake store)

Is there a way to load data to Azure data lake storage gen 2 using logic app?

I have load data to azure datalake storage gen2 using logic app.I tried using the connector azure file storage but i couldn't get any filesytem folder in that.Can some one help me on this issue?
Note: without using copy activity.
Currently, there has no connector for data lake gen2 in logic app. https://feedback.azure.com/forums/287593-logic-apps/suggestions/37118125-connector-for-azure-data-lake-gen-2.
Here is a workaround which I have tested to work.
1. create a azure data factory service.
2. create a pipeline to copy files from data lake gen1 to data lake gen2.
https://learn.microsoft.com/en-us/azure/data-factory/load-azure-data-lake-storage-gen2#load-data-into-azure-data-lake-storage-gen2.
use data factory connector in logic app to create a pipeline run.
Once run successfully, the related files will be copied to the target folder under data lake gen2.
Isn't ADLS Gen2 just a blob container? Select the Azure Blob Storage connector, then Create Blob task.
I selected "Azure Blob Storage" as action in logic app and then selected my ADLSGen2 storage account name. it is working fine. Do you guys see any issue ??

Read from ADLS gen 2 with SSIS

Does anyone know which connection and Data Flow Component to use for ADLS (Azure Data Lake Store) gen2?
I've managed to use the blob connector in the connection manager and successfully connect to ADLS Gen2, but when I try to use the blob source component I get a 400 bad request. Works fine if it's just a blob storage without HNS.
The ADLS components states it's just for ADLS gen 1.
So how to read and write to/from ADLS Gen 2?
A current version of SSIS Azure Feature Pack supports ADLS Gen2. It can be used as a data source or destination in dataflow:
The screenshot is to show it as a destination, but the ADLSgen2 works well also as a source via corresponding "Flexible File Destination" and "Flexible File Source"
First of all, based on the great link provided by #rickvdbosch it looks like that there are many temporary limitations with Azure Data Lake Storage Gen2 concerning the BLOB Storage API. Which means that it is not a component limitation and maybe you should wait until it will be integrated with SSIS.
Microsoft SQL SERVER Feature pack for Azure
If you meant these components when you mentioned that:
The ADLS components states it's just for ADLS gen 1.
Then ignore this part.
I am not pretty sure if it supports Gen2, but I think you can use the Azure Data Lake Store components which are a part of the Microsoft SQL SERVER feature pack for Azure. For more information you can refer to:
Azure Data Lake Store in SSIS
Azure Data Lake Store Source
Azure Data Lake Store Destination
Download Link
Azure Feature Pack for Integration Services (SSIS)
Other methods
If the suggestion above didn't work then you should use Azure Data Factory or a command line by Installing AWS CLI and using AzCopy v10
I got the following info:
"At the moment Gen 2 don’t support BLOB API (but it will in a short time) and hence, SSIS is not able to connect."
So for SSIS it's currently either ADLS Gen 1, or blob store
I used the Script Task to write files or System.Objects (converted to csv in Memory) to Azure Storage Gen 2 (Hierarchical Namespace Enabled) using the Rest API. I did this as a demo until the SSIS components are released.
You can't write to ADLS Gen2 using the old components from the Azure Feature Pack, but you can connect to the blob Gen2 (non-hierarchical) using the Azure Blob Destination Component.

Resources