Is possible to read an Azure Databricks table from Azure Data Factory? - azure

I have a table into an Azure Databricks Cluster, i would like to replicate this data into an Azure SQL Database, to let another users analyze this data from Metabase.
Is it possible to acess databricks tables through Azure Data factory?

No, unfortunately not. Databricks tables are typically temporary and last as long as your job/session is running. See here.
You would need to persist your databricks table to some storage in order to access it. Change your databricks job to dump the table to Blob storage as it's final action. In the next step of your data factory job, you can then read the dumped data from the storage account and process further.
Another option may be databricks delta although I have not tried this yet...

If you register the table in the Databricks hive metastore then ADF could read from it using the ODBC source in ADF. Though this would require an IR.
Alternatively you could write the table to external storage such as blob or lake. ADF can then read that file and push it to your sql database.

Related

Ezport data from azure synapse sql

We have azure synapse with external data source as azure dala lake gen2.
We need to export T-SQL query results as csv file on a weekly schedule from Azure synapse to any blob storage or FTP. I could not find documents related to export from synapse. Please guide me through this - I've been stuck here for a long time.
Per this answer, I think the answer is:
Make a Dataflow where
the source is the Synapse db and you pass the query you want
the sink is a csv in ADLS gen2
Make an ADF pipeline with a weekly schedule trigger that calls your Dataflow

Is it possible to access Databricks DBFS from Azure Data Factory?

I am trying to use the Copy Data Activity to copy data from Databricks DBFS to another place on the DBFS, but I am not sure if this is possible.
When I select Azure Delta Storage as a dataset source or sink, I am able to access the tables in the cluster and preview the data, but when validating it says that the tables are not delta tables (which they aren't, but I don't seem to acsess the persistent data on DBFS)
Furthermore, what I want to access is the DBFS, not the cluster tables. Is there an option for this?

Connection between Azure Data Factory and Databricks

I'm wondering what is the most appropriate way of accessing databricks from Azure data factory.
Currently I've got databricks as a linked service to which I gain access via a generated token.
What do you want to do?
Do you want to trigger a Databricks notebook from ADF?
Do you want to supply Databricks with data? (blob storage or Azure Data Lake store)
Do you want to retrieve data from Databricks? (blob storage or Azure Data Lake store)

Azure Data Factory: Moving data from Table Storage to SQL Azure

While moving data from Table Storage to SQL Azure, is it possible to obtain only the Delta (The data that hasn't been already moved) using Azure Data Factory?
A more detailed explanation:
There is an Azure Storage Table, which contains some data, which will be updated periodically. And I want to create a Data Factory pipeline which moves this data to an SQL Azure Database. But during each move I only want the newly added data to be written to SQL DB. Is it possible with Azure Data Factory?
See more information on azureTableSourceQuery and copy activity at this link : https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-table-connector/#azure-table-copy-activity-type-properties.
Also see this link for invoking stored procedure for sql: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink
You can query each time on timestamp to achieve something similar to delta copy, but this is not true delta copy.

How to read/write data from/to Azure Table using Hadoop?

I'd like to have a hadoop job which read data from Azure table storage and write data back into it. How can I do that?
I'm mostly interested in writing data into Azure tables from HDInsight.

Resources