Azure Databricks: can't connect to Azure Data Lake Storage Gen2 - azure

I have Storage account kagsa1 with container cont1 inside and need it to accessible (mounted) via Databricks
If I use storage account key in KeyVault it works correctly:
configs = {
"fs.azure.account.key.kagsa1.blob.core.windows.net":dbutils.secrets.get(scope = "kv-db1", key = "storage-account-access-key")
}
dbutils.fs.mount(
source = "wasbs://cont1#kagsa1.blob.core.windows.net",
mount_point = "/mnt/cont1",
extra_configs = configs)
dbutils.fs.ls("/mnt/cont1")
..but if I'm trying to connect using Azure Active Directory credentials:
configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
..it fails:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: GET https://kagsa1.dfs.core.windows.net/cont1?resource=filesystem&maxResults=5000&timeout=90&recursive=false
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
Databrics Workspace tier is Premium,
Cluster has Azure Data Lake Storage Credential Passthrough option enabled,
Storage account has hierarchical namespace option enabled,
Filesystem was initialized with
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
and I have full access to container in storage account:
What am I doing wrong?

Note: When performing the steps in the Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
As part of repro, I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.
Reference: Azure Databricks - ADLS Gen2 throws 403 error message.

Related

is Databricks access connector trusted resource in Azure?

I'm trying to set up the external location for the unity catalog. it was able to connect to storage while trying to test a connection in which storage access is limited to selected vnets and ips. but I'm getting a 403 error while accessing the storage from the notebook even adding the blob contributor access to managed identity. Did I miss anything?
my assumption is since I added a connector to the trusted resources it will bypass the network rules.
Databricks throwing 403 error
The main reason for 403 error is related to authorization issues for accessing azure storage account to avoid access related issues Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
You need to have only (Storage Blob Data Contributor) Role specified on your storage for your service principal. To assign Storage Blob Data Contributor roles using portal follow this link.
I have created demt1 storage account for demo, open Access controls -> Role assignment
![enter image description here](https://i.imgur.com/a140fKd.png
Under Role assignment select Storage Blob Data Contributor created initially
To check if the role is assigned open Access control -> Check Access -> Check access
and search for databricks
Under Current assignments there will be assigned role
Open databricks account and try to access storage by mounting an existing container
Additional Settings
Try adding the Databricks' workspace managed identity as a Storage Blob Data Contributor.
IAM for Databricks Managed Identity
You'll also want to add the relevant IAM conditional access, such as Read / Write permissions.
Conditional Access for Managed Identity

I am unable to mount ADLS Gen2. Please assist

Code to mount ADLS Gen2:
Error while mounting ADLS Gen2:
From the error message it clear says - the provided tenant is not found in your subscription. Please make sure the tenant ID is available in your subscription.
Prerequisites:
Create and grant permissions to service principal If your selected access method requires a service principal with adequate permissions, and you do not have one, follow these steps:
Step1: Create an Azure AD application and service principal that can access resources. Note the following properties:
application-id: An ID that uniquely identifies the application.
directory-id: An ID that uniquely identifies the Azure AD instance.
service-credential: A string that the application uses to prove its identity.
storage-account-name: The name of the storage account.
filesystem-name: The name of the filesystem.
Step2: Register the service principal, granting the correct role assignment, such as Storage Blob Data Contributor role, on the Azure Data Lake Storage Gen2 account.
Step3: Azure Data Lake Gen2 mount by passing the values directly.
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "06ecxx-xxxxx-xxxx-xxx-xxxef", #Enter <appId> = Application ID
"fs.azure.account.oauth2.client.secret": "1i_7Q-XXXXXXXXXXXXXXXXXXgyC.Szg", #Enter <password> = Client Secret created in AAD
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/72f98-xxxx-xxxx-xxx-xx47/oauth2/token", #Enter <tenant> = Tenant ID
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://<filesystem>#<StorageName>.dfs.core.windows.net/<Directory>", #Enter <container-name> = filesystem name <storage-account-name> = storage name
mount_point = "/mnt/<mountname>",
extra_configs = configs)
For more details, refer to Azure Databricks - Azure Data Lake Storage Gen2.

Accessing ADLS Gen 2 storage from Databricks

I am trying to read files in ADLS Gen 2 Storage from Databricks Notebook using Python.
The storage container however has it's public access level set to "Private".
I have Storage Account Contributor and Storage Blob Data Contributor access.
How can the Databricks be allowed to read and write into ADLS Storage ?
According to the information you provided, Storage Account Contributor has been assigned to your account. You have permission to get the storage account access key. So we can use access key to do auth then we can read and write into ADLS Gen 2 Storage. For more details, please refer to here
For example
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>"))
dbutils.fs.ls("abfss://<file-system-name>#<storage-account-name>.dfs.core.windows.net/<directory-name>")

unable to create mount point in databricks for adls gen 2

I am trying to create mount point to the ADLS Gen2 using key vault in databricks, however i am not being able to do so due to some error that i am getting.
I have contributor access and i tried with Storage Blob Data Contributor and contributor access to the SPN still i am not being able to create it the mount points.
I request some help please
configs= {"fs.azure.account.auth.type":"OAuth",
"fs.azure.account.oauth.provider.type":"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":"abcdefgh",
"fs.azure.account.oauth2.client.secret":dbutils.secrets.get(scope="myscope",key="mykey"),
"fs.azure.account.oauth2.client.endpoint":"https://login.microsoftonline.com/tenantid/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source= "abfss://cont1#storageaccount.dfs.core.windows.net/",
mount_point="/mnt/cont1",
extra_configs=configs)
the error i am getting is
An error occurred while calling o280.mount.
: HEAD https://storageaccount.dfs.core.windows.net/cont1?resource=filesystem&timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
When performing the steps in the Assign the application to a role, make sure that your user account has the Storage Blob Data Contributor role assigned to it.
Repro: I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Solution: Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning the Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.

Azure Data Factory: map reduce activity fails when Linked Service configured with Service Principal to access Storage Account

We have an issue configuring the MapReduce activity in ADF.
If we use Linked Service with Service Principal as an Authentication method, it causes the following issue: Activity Selector failed: The storage connection string is invalid.
The Linked Service's test connection is successful.
MapReduce activity allows browsing the storage.
The Service Principal has enough rights assigned in Storage Account's resource group: "Storage Blob Data Owner"
When Linked Service configured with Account Key - everything works correctly.

Resources