I have Storage account kagsa1 with container cont1 inside and need it to accessible (mounted) via Databricks
If I use storage account key in KeyVault it works correctly:
configs = {
"fs.azure.account.key.kagsa1.blob.core.windows.net":dbutils.secrets.get(scope = "kv-db1", key = "storage-account-access-key")
}
dbutils.fs.mount(
source = "wasbs://cont1#kagsa1.blob.core.windows.net",
mount_point = "/mnt/cont1",
extra_configs = configs)
dbutils.fs.ls("/mnt/cont1")
..but if I'm trying to connect using Azure Active Directory credentials:
configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
..it fails:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: GET https://kagsa1.dfs.core.windows.net/cont1?resource=filesystem&maxResults=5000&timeout=90&recursive=false
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
Databrics Workspace tier is Premium,
Cluster has Azure Data Lake Storage Credential Passthrough option enabled,
Storage account has hierarchical namespace option enabled,
Filesystem was initialized with
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://cont1#kagsa1.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
and I have full access to container in storage account:
What am I doing wrong?
Note: When performing the steps in the Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
As part of repro, I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.
Reference: Azure Databricks - ADLS Gen2 throws 403 error message.
Related
I'm trying to set up the external location for the unity catalog. it was able to connect to storage while trying to test a connection in which storage access is limited to selected vnets and ips. but I'm getting a 403 error while accessing the storage from the notebook even adding the blob contributor access to managed identity. Did I miss anything?
my assumption is since I added a connector to the trusted resources it will bypass the network rules.
Databricks throwing 403 error
The main reason for 403 error is related to authorization issues for accessing azure storage account to avoid access related issues Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
You need to have only (Storage Blob Data Contributor) Role specified on your storage for your service principal. To assign Storage Blob Data Contributor roles using portal follow this link.
I have created demt1 storage account for demo, open Access controls -> Role assignment

For more details, refer to Azure Databricks - Azure Data Lake Storage Gen2.
I am trying to read files in ADLS Gen 2 Storage from Databricks Notebook using Python.
The storage container however has it's public access level set to "Private".
I have Storage Account Contributor and Storage Blob Data Contributor access.
How can the Databricks be allowed to read and write into ADLS Storage ?
According to the information you provided, Storage Account Contributor has been assigned to your account. You have permission to get the storage account access key. So we can use access key to do auth then we can read and write into ADLS Gen 2 Storage. For more details, please refer to here
For example
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>"))
dbutils.fs.ls("abfss://<file-system-name>#<storage-account-name>.dfs.core.windows.net/<directory-name>")
I am trying to create mount point to the ADLS Gen2 using key vault in databricks, however i am not being able to do so due to some error that i am getting.
I have contributor access and i tried with Storage Blob Data Contributor and contributor access to the SPN still i am not being able to create it the mount points.
I request some help please
configs= {"fs.azure.account.auth.type":"OAuth",
"fs.azure.account.oauth.provider.type":"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":"abcdefgh",
"fs.azure.account.oauth2.client.secret":dbutils.secrets.get(scope="myscope",key="mykey"),
"fs.azure.account.oauth2.client.endpoint":"https://login.microsoftonline.com/tenantid/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source= "abfss://cont1#storageaccount.dfs.core.windows.net/",
mount_point="/mnt/cont1",
extra_configs=configs)
the error i am getting is
An error occurred while calling o280.mount.
: HEAD https://storageaccount.dfs.core.windows.net/cont1?resource=filesystem&timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
When performing the steps in the Assign the application to a role, make sure that your user account has the Storage Blob Data Contributor role assigned to it.
Repro: I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Solution: Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning the Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.
We have an issue configuring the MapReduce activity in ADF.
If we use Linked Service with Service Principal as an Authentication method, it causes the following issue: Activity Selector failed: The storage connection string is invalid.
The Linked Service's test connection is successful.
MapReduce activity allows browsing the storage.
The Service Principal has enough rights assigned in Storage Account's resource group: "Storage Blob Data Owner"
When Linked Service configured with Account Key - everything works correctly.