Dynamically get the database name from a Databricks DLT pipeline - databricks

How do I dynamically get the database name from a DLT pipeline?

Related

Getting Azure Synapse pipeline run ID when running noteboot

How can I get Azure Synapse Pipeline run ID using python SDK in a synapse notebook? I have a ML pipeline that do a batch prediction everyday. What I want to do is get the pipeline run ID to save inside a DataFrame that is created in one of the notebook that I run in synapse.
I'm running my pipeline using the following major activities:
notebook: I'm using a synapse notebook to read some parquet files from a blob preprocess using pandas and save again in another blob.
I saw that you can put a system variable in the file name with the pipeline run id. But I want to now if there is a way to get the current pipeline run id and my notebook is being executed using the azure python sdk
You can do it using the Toggle parameter cell option in Synapse notebook cell.
Use Toggle parameter cell option in synapse notebook cell and give any parameter name and assign any value to it. (Here I have given empty string).
In Notebook activity of pipeline, use Base parameters and give same name and same data type for it. In dynamic content of the parameter give #pipeline().RunId like below.
Execute this activity and Go to Monitor-> Pipeline runs -> your pipeline-> Notebook activity snapshot and you can see the output of the Notebook.
You can use this parameter in the Notebook as per your requirement.

DLT table created in target is not queryable

I've created a DLT table by specifying database in the target, but while querying the same using SQL endpoint, its throwing exception.
'Failure to initialize configuration'
Please note that source is defined as ADLS. The DLT table should be queryable.
metastore is hive_metastore catalog

How to set environment variable for Azure Databricks cluster which is triggered from ADF

I have a python script, which check whether APP_HOME directory is set as the environment variable or not and picks up few files in this directory to proceed with the further execution.
If it is running on windows, I am setting the environment variable pointing to APP_HOME directory. If the python script is created as a workflow in databricks, the workflow is giving me an option to set the environment variables while choosing the cluster for the task.
But, If the python script runs as a Databricks python activity from Azure Data factory, I have not found an option to set the environment variable for the databricks cluster that will be created by ADF. Is there a way to set up the environment variable APP_HOME in ADF for databricks cluster when Databricks python activity is used?
I have created data factory and created pipeline in data factory. I added data bricks as linked services linked service.
fill the required fields in additional settings of cluster will find environment variable field.
Reference image:

How to reuse the Azure Data Factory pipeline for multiple users

I have a Azure data factory pipeline that is calling a Databricks notebook.
I have parameterized the pipeline and via this pipeline I am passing the product name to the databricks notebook.
Based on the parameter the Databricks will push the processed data into the specific ADLS directory.
Now the problem is- How do I make my pipeline aware that which parameter need to pass to the Databricks.
Example: If I pass the Nike via the adf to the databricks then my data would get pushed into Nike directory or If I pass Adidas then data would get pushed into Adidas directory.
Please note that I am triggering the ADF from the automation account.
As I understood, you are using product_name = dbutils.widgets.get('product_name') statement in the databricks notebook to get the param and based on that param you process the data. The question is how to pass different params to the notebook? You create one adf pipeline and you can pass different params to the triggers that execute the adf pipeline.
Create ADF pipeline
Adf pipeline
create trigger that will pass params to the ADF pipeline
triggers
This way you will have 1 ADF pipeline with multiple instances of it with different params like Adidas, Nike etc.

Powershell to download ARM template for Azure data factory Pipeline

i have a requirement to create an ADF pipeline using ARM template in powershell and it has to take inputs/validate few things from existing ADF piepline, for that reason i have to download the ARM tempalte for existing ADF pipeline through powershell. Can we do that for single ADF pipeline or multiple ones?
Note: existing pipeline is not created through ARM deployment, so i cant use "Save-AzureRmDeploymentTemplate" as i dont have deployment name created when pipeline is created through portal..
Any help is really appreciated.
Maybe you want to take a look at Export-AzureRmResourceGroup.
But I guess you can only export the entire resource group which may contain other things. You need put your data factory in a special resource group if you only want to export ADF.
Another way is to export arm template in ADF UI.

Resources