Connection between Azure Data Factory and Databricks - azure

I'm wondering what is the most appropriate way of accessing databricks from Azure data factory.
Currently I've got databricks as a linked service to which I gain access via a generated token.

What do you want to do?
Do you want to trigger a Databricks notebook from ADF?
Do you want to supply Databricks with data? (blob storage or Azure Data Lake store)
Do you want to retrieve data from Databricks? (blob storage or Azure Data Lake store)

Related

Copy Data from Azure Data Lake to SnowFlake without stage using Azure Data Factory

All the Azure Data Factory examples of copying data from Azure Data Lake Gen 2 to SnowFlake use a storage account as stage. If the stage is not configured (as shown in picture), I get this error in Data Factory even when my source is a csv file in Azure data lake - "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging".
At the same time, SnowFlake documentation says the the external stage is optional. How can I copy data from Azure Data Lake to SnowFlake using Data Factory's Copy Data Activity without having an external storage account as stage?
If staging storage is needed to make it work, we shouldn't say that data copy from Data Lake to SnowFlake is supported. It works only when, Data Lake data is is first copied in a storage blob and then to SnowFlake.
Though Snowflake supports blob storage, Data Lake storage Gen2, General purpose v1 & v2 storages, loading data into snowflake is supported- through blob storage only.
The source linked service is Azure Blob storage with shared access signature authentication. If you want to directly copy data from Azure Data Lake Storage Gen2 in the following supported format, you can create an Azure Blob linked service with SAS authentication against your ADLS Gen2 account, to avoid using staged copy to Snowflake.
Select Azure blob storage in linked service, provide SAS URI details of Azure data lake gen2 source file.
Blob storage linked service with data lake gen2 file:
You'll have to configure blob storage and use it as staging. As an alternative you can use external stage. You'll have to create a FILE TYPE and NOTIFICATION INTEGRATION and access the ADLS and load data into Snowflake using copy command. Let me know if you need more help on this.

How to store data from Azure Analysis services into Azure Datalake using Azure Data Factory?

How to store data from Azure Analysis services into Azure Datalake using Azure Data Factory?
I have two tables in Azure Analysis service and I need to copy and store those data into Azure Datalake using Azure Data Factory.
Could you please help me or share the reference url.
There is no direct connector for Azure analysis services in ADF but there are some ways via which you can copy the data :
1 such way is to create a linked server across a on prem/IaaS or SQL MI database instance and analysis services and get the data from the SQL instance through ADF.
Can we copy data from Azure Analysis Services using Azure Data Factory?
The above links explains it in details w.r.t the setup.
https://datasharkx.wordpress.com/2021/03/16/copy-data-from-ssas-aas-through-azure-data-factory/

Delta Lake Gen2 for a MongoDB migration

Which Azure pipeline and Data Storage you would prefer for a MongoDB migration?
I know there exists the functionality of an Azure Migration Service where you can shift MongoDB data directly to an Azure CosmosDB. Azure Migration Services seems to be available only for specific licenses. Using Cosmos DB it is also necessary to take care of costs.
Another possibility is to use Stitch to shift MongoDB directly into Azure.
Since we don't want to use an additional tool, we want to use Azure Data Factory to shift the MongoDB data into an Azure Data Storage. We want to use the Data Lake Storage Gen2, as it combines the advantages of the Blob Storage and the Data Lake Storage Gen1.
Which pipeline you would prefer? Any experiences with storing the MongoDB data in Azure Data Lake Storage Gen2?
Please see the following Azure Data Factory document pertaining to Pipelines and Activities, which details the source and target data endpoints that are currently supported.
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory
Copy data from MongoDB using Azure Data Factory
Using the MongoDB connector as a source and Azure Data Lake Storage Gen2 as a sink, you can then perform any transformation and finally, migrate the data to Azure Cosmos DB...if desired.
Copy and transform data in Azure Cosmos DB (SQL API) by using Azure Data Factory
Copy data to or from Azure Cosmos DB's API for MongoDB by using Azure Data Factory
If you experience any issues with the migration of data to Azure Cosmos DB, if that is the goal of the migration, then consider the following direct migration paths: Options to migrate your on-premises or cloud data to Azure Cosmos DB

Is there a way to load data to Azure data lake storage gen 2 using logic app?

I have load data to azure datalake storage gen2 using logic app.I tried using the connector azure file storage but i couldn't get any filesytem folder in that.Can some one help me on this issue?
Note: without using copy activity.
Currently, there has no connector for data lake gen2 in logic app. https://feedback.azure.com/forums/287593-logic-apps/suggestions/37118125-connector-for-azure-data-lake-gen-2.
Here is a workaround which I have tested to work.
1. create a azure data factory service.
2. create a pipeline to copy files from data lake gen1 to data lake gen2.
https://learn.microsoft.com/en-us/azure/data-factory/load-azure-data-lake-storage-gen2#load-data-into-azure-data-lake-storage-gen2.
use data factory connector in logic app to create a pipeline run.
Once run successfully, the related files will be copied to the target folder under data lake gen2.
Isn't ADLS Gen2 just a blob container? Select the Azure Blob Storage connector, then Create Blob task.
I selected "Azure Blob Storage" as action in logic app and then selected my ADLSGen2 storage account name. it is working fine. Do you guys see any issue ??

Is possible to read an Azure Databricks table from Azure Data Factory?

I have a table into an Azure Databricks Cluster, i would like to replicate this data into an Azure SQL Database, to let another users analyze this data from Metabase.
Is it possible to acess databricks tables through Azure Data factory?
No, unfortunately not. Databricks tables are typically temporary and last as long as your job/session is running. See here.
You would need to persist your databricks table to some storage in order to access it. Change your databricks job to dump the table to Blob storage as it's final action. In the next step of your data factory job, you can then read the dumped data from the storage account and process further.
Another option may be databricks delta although I have not tried this yet...
If you register the table in the Databricks hive metastore then ADF could read from it using the ODBC source in ADF. Though this would require an IR.
Alternatively you could write the table to external storage such as blob or lake. ADF can then read that file and push it to your sql database.

Resources