Calling Azure AD secure functions from Azure Data Factory - azure

I recently secured my azure functions by Azure Active Directory. Hence you need access token set in auth header to enable you to call them. I am successfully able to do that from my Front end Angular apps. But in my backend I have Azure Data factory as well, How can i enable Azure Data factory to use Azure AD while calling functions and not host key?

You can use the web activity to get the bearer token and then pass this to the subsequent calls .

Azure Data Factory supports Managed Identity:
Managed identity for Data Factory
When creating a data factory, a managed identity can be created along with factory creation. The managed identity is a managed application registered to Azure Activity Directory, and represents this specific data factory.
Managed identity for Data Factory benefits the following features:
Store credential in Azure Key Vault, in which case data factory managed identity is used for Azure Key Vault authentication.
Connectors including Azure Blob storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure SQL Data Warehouse.
Web activity.

Related

How to access ADLS blob containers from Databricks using User Assigned Identity

I have ADLS storage account with blob containers. I have successfully mounted ADLS with Service Principal in Databricks and able to do my necessary transformations on the Data.
Now I'm in a process of using User Assigned Managed Identities to avoid keeping the secrets in my code. For the process, I have created required Managed Identity and enabled it to my service principal by assigning necessary role in the Storage account.
My question is how can I use the managed Identity or how can I do my transformation on the ADLS storage from Databricks without mounting or using secrets?
Please suggest a working solution or any helpful forum for the same.
Thanks.
You can authenticate automatically to Azure Data Lake Storage Gen1
(ADLS Gen1) and Azure Data Lake Storage Gen2 (ADLS Gen2) from Azure
Databricks clusters using the same Azure Active Directory (Azure AD)
identity that you use to log into Azure Databricks. When you enable
Azure Data Lake Storage credential passthrough for your cluster,
commands that you run on that cluster can read and write data in Azure
Data Lake Storage without requiring you to configure service principal
credentials for access to storage.
Enable Azure Data Lake Storage credential passthrough for a High Concurrency cluster
High concurrency clusters can be shared by multiple users. They support only Python and SQL with Azure Data Lake Storage credential passthrough.
When you create a cluster, set Cluster Mode to High Concurrency.
Under Advanced Options, select Enable credential passthrough for user-level data access and only allow Python and SQL commands.
Enable Azure Data Lake Storage credential passthrough for a Standard cluster
When you create a cluster, set the Cluster Mode to Standard.
Under Advanced Options, select Enable credential passthrough for user-level data access and select the user name from the Single User Access drop-down.
Access Azure Data Lake Storage directly using credential passthrough
After configuring Azure Data Lake Storage credential passthrough and creating storage containers, you can access data directly in Azure Data Lake Storage Gen1 using an adl:// path and Azure Data Lake Storage Gen2 using an abfss:// path.
Example:
Python - spark.read.csv("adl://<storage-account-name>.azuredatalakestore.net/MyData.csv").collect()
Refer this offcicial documentation: Access Azure Data Lake Storage using Azure Active Directory credential passthrough

Source linked service should not have "service principal" as authentication method

I am trying to copy data from Azure data lake Gen2 to Azure synapse(SQL data warehouse) through Azure data factory. Following are some details:
source(ADLS) linked service authentication type: service principal
sink(Synapse) linked service authentication type: managed identity
Copy method selected : Polybase
While validating, i am getting this error: "Source linked service should not have authentication method as Service principal".
when i selected "bulk insert" copy type, it works fine.. can anyone help me understand this? is it written anywhere that for polybase we should have same authentication type for linked service?
This is because direct copy by using PolyBase from Azure Data Lake Gen2 only support Account key authentication or managed identity authentication. You can refer to this documentation.
So if you want to direct copy by using PolyBase, you need change your authentication method to account key or managed identity.
There is a workaround, Staged copy by using PolyBase. You can refer to this documentation about this.

Removing Secrets from Azure Function Config

Like most Azure Functions in the beginning we have a connection string to the associated storageaccount that includes the Accountkey like this
DefaultEndpointsProtocol=https;AccountName=ourstorageAccount;EndpointSuffix=core.windows.net;AccountKey=WQfbn+VBhaY1fi/l0eRBzvAvngiCiOwPmx/==
We obviously want to remove that AccountKey. I had hoped we could use ManagedIdentity and the 'Contributor' Role but what I am reading is telling me you cannot use Managed Identity to access Tables in a Storage Account only Blobs.
I know that we could move the whole connection string to KeyVault but that just becomes ann Azure Management Issue if we want to rotate the keys.
Has anyone succesfully controlled access to Azure Table Storage with Managed Identities?
If not what is the next best approach that preferably allows for simple rotation of keys?
Has anyone successfully controlled access to Azure Table Storage with Managed Identities?
Definitely it is unable to access azure table storage with MSI(managed identity, essentially it is a service principal in azure ad), when using MSI to access some azure resources, it essentially uses the azure ad client credential flow to get the token, then uses the token to access the resource.
However, azure ad auth just supported by azure blob and queue storage, table storage doesn't support it currently, see - Authorize access to blobs and queues using Azure Active Directory.
If not what is the next best approach that preferably allows for simple rotation of keys?
You could use azure function to do that, follow this doc - Automate the rotation of a secret for resources with two sets of authentication credentials, I think it completely meets your requirement, this tutorial rotates Azure Storage account keys stored in Azure Key Vault as secrets using a function triggered by Azure Event Grid notification.

How to mount Azure Data Lake Store on DBFS

I need to mount Azure Data Lake Store Gen1 data folders on Azure Databricks File System using Azure Service Principal Client credentials. Please help on the same
There are three ways of accessing Azure Data Lake Storage Gen1:
Pass your Azure Active Directory credentials, also known as credential passthrough.
Mount an Azure Data Lake Storage Gen1 filesystem to DBFS using a service principal and OAuth 2.0.
Use a service principal directly.
1. Pass your Azure Active Directory credentials, also known as credential passthrough:
You can authenticate automatically to Azure Data Lake Storage Gen1 from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that you use to log into Azure Databricks. When you enable your cluster for Azure AD credential passthrough, commands that you run on that cluster will be able to read and write your data in Azure Data Lake Storage Gen1 without requiring you to configure service principal credentials for access to storage.
Enable Azure Data Lake Storage credential passthrough for a standard cluster
For complete setup and usage instructions, see Secure access to Azure Data Lake Storage using Azure Active Directory credential passthrough.
2. Mount an Azure Data Lake Storage Gen1 filesystem to DBFS using a
service principal and OAuth 2.0.
Step1: Create and grant permissions to service principal
If your selected access method requires a service principal with adequate permissions, and you do not have one, follow these steps:
Create an Azure AD application and service principal that can access resources. Note the following properties:
application-id: An ID that uniquely identifies the client application.
directory-id: An ID that uniquely identifies the Azure AD instance.
service-credential: A string that the application uses to prove its identity.
Register the service principal, granting the correct role assignment, such as Contributor, on the Azure Data Lake Storage Gen1 account.
Step2: Mount Azure Data Lake Storage Gen1 resource using a service principal and OAuth 2.0
Python code:
configs = {"<prefix>.oauth2.access.token.provider.type": "ClientCredential",
"<prefix>.oauth2.client.id": "<application-id>",
"<prefix>.oauth2.credential": dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>"),
"<prefix>.oauth2.refresh.url": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "adl://<storage-resource>.azuredatalakestore.net/<directory-name>",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
3. Access directly with Spark APIs using a service principal and OAuth 2.0
You can access an Azure Data Lake Storage Gen1 storage account directly (as opposed to mounting with DBFS) with OAuth 2.0 using the service principal.
Access using the DataFrame API:
To read from your Azure Data Lake Storage Gen1 account, you can configure Spark to use service credentials with the following snippet in your notebook:
spark.conf.set("<prefix>.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("<prefix>.oauth2.client.id", "<application-id>")
spark.conf.set("<prefix>.oauth2.credential","<key-name-for-service-credential>"))
spark.conf.set("<prefix>.oauth2.refresh.url", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
Reference: Azure Databricks - Azure Data Lake Storage Gen1

Using service principal to access blob storage from Databricks

I followed Access an Azure Data Lake Storage Gen2 account directly with OAuth 2.0 using the Service Principal and want to achieve the same but with blob storage general purpose v2 (with hierarchical fs disabled). Is it possible to get this working, or authenticating using access key or SAS is the only way?
No that is not possible as of now. OAuth Bearer Token is supported for Azure Data Lake Storage Gen2 (with the hierarchical namespace enabled when creating the storage account). To access Azure Data Lake Store Gen2 the ABFS-driver is used:
abfss://<your-file-system-name>#<your-storage-account-name>.dfs.core.windows.net/
To access the Blob Storage you use WASB:
wasbs://<your-container-name>#<your-storage-account-name>.blob.core.windows.net
only supporting token based access.

Resources