I am trying to create mount point to the ADLS Gen2 using key vault in databricks, however i am not being able to do so due to some error that i am getting.
I have contributor access and i tried with Storage Blob Data Contributor and contributor access to the SPN still i am not being able to create it the mount points.
I request some help please
configs= {"fs.azure.account.auth.type":"OAuth",
"fs.azure.account.oauth.provider.type":"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":"abcdefgh",
"fs.azure.account.oauth2.client.secret":dbutils.secrets.get(scope="myscope",key="mykey"),
"fs.azure.account.oauth2.client.endpoint":"https://login.microsoftonline.com/tenantid/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source= "abfss://cont1#storageaccount.dfs.core.windows.net/",
mount_point="/mnt/cont1",
extra_configs=configs)
the error i am getting is
An error occurred while calling o280.mount.
: HEAD https://storageaccount.dfs.core.windows.net/cont1?resource=filesystem&timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
When performing the steps in the Assign the application to a role, make sure that your user account has the Storage Blob Data Contributor role assigned to it.
Repro: I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Solution: Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning the Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.
Related
I'm trying to set up the external location for the unity catalog. it was able to connect to storage while trying to test a connection in which storage access is limited to selected vnets and ips. but I'm getting a 403 error while accessing the storage from the notebook even adding the blob contributor access to managed identity. Did I miss anything?
my assumption is since I added a connector to the trusted resources it will bypass the network rules.
Databricks throwing 403 error
The main reason for 403 error is related to authorization issues for accessing azure storage account to avoid access related issues Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
You need to have only (Storage Blob Data Contributor) Role specified on your storage for your service principal. To assign Storage Blob Data Contributor roles using portal follow this link.
I have created demt1 storage account for demo, open Access controls -> Role assignment
 via Databricks
If I use storage account key in KeyVault it works correctly:
configs = {
"fs.azure.account.key.kagsa1.blob.core.windows.net":dbutils.secrets.get(scope = "kv-db1", key = "storage-account-access-key")
}
dbutils.fs.mount(
source = "wasbs://cont1#kagsa1.blob.core.windows.net",
mount_point = "/mnt/cont1",
extra_configs = configs)
dbutils.fs.ls("/mnt/cont1")
..but if I'm trying to connect using Azure Active Directory credentials:
configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
..it fails:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: GET https://kagsa1.dfs.core.windows.net/cont1?resource=filesystem&maxResults=5000&timeout=90&recursive=false
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
Databrics Workspace tier is Premium,
Cluster has Azure Data Lake Storage Credential Passthrough option enabled,
Storage account has hierarchical namespace option enabled,
Filesystem was initialized with
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
and I have full access to container in storage account:
What am I doing wrong?
Note: When performing the steps in the Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
As part of repro, I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.
Reference: Azure Databricks - ADLS Gen2 throws 403 error message.