Cannot connect Databricks with Azure blob storage - azure

I successfully mounted Azure blob storage using SAS key with below code
dbutils.fs.mount( source = "wasbs://"+user+"#"+account+".blob.core.windows.net",
mount_point = "/mnt/"+mountName,
extra_configs = {"fs.azure.sas."+user+"."+account+".blob.core.windows.net":key})
However, I cannot save output to the blob or display the blob directory (which it was worked before)
Here are some part of an error
"shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details."
"Caused by: com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation using this resource type.
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:178)
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)"
Does anyone know cause of this issue?

Related

Azure blob storage - SAS - Data Factory

I was able to blob test connection and it's successful, but when I attempt to look for the storage path it shows this error. screenshot
Full error:
Failed to load
Blob operation failed for: Blob Storage on container '' and path '/' get failed with 'The remote server returned an error: (403) Forbidden.'. Possible root causes: (1). Grant service principal or managed identity appropriate permissions to do copy. For source, at least the “Storage Blob Data Reader” role. For sink, at least the “Storage Blob Data Contributor” role. For more information, see https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#service-principal-authentication. (2). It's possible because some IP address ranges of Azure Data Factory are not allowed by your Azure Storage firewall settings. Azure Data Factory IP ranges please refer https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses. If you allow trusted Microsoft services to access this storage account option in firewall, you must use https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#managed-identity. For more information on Azure Storage firewalls settings, see https://docs.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal.. The remote server returned an error: (403) Forbidden.StorageExtendedMessage=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Context: I'm trying to copy data from SQL db to Snowflake and I am using Azure Data Factory for that. Since this doesn't publish, I enable the staged copy and connect blob storage.
I already tried to check network and it's set for all network. I'm not sure what I'm missing here because I found a youtube video that has it working but they didn't show an issue related/similar to this one. https://www.youtube.com/watch?v=5rLbBpu1f6E.
I also tried to retain empty storage path but trigger for copy data pipeline isn't successfully to.
Full error from trigger:
Operation on target Copy Contacts failed: Failure happened on 'Sink' side. ErrorCode=FileForbidden,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a blob, detailed message: dbo.vw_Contacts.txt,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (403) Forbidden.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I created Blob storage and generated SAS token for that. I created a blob storage linked service using SAS URI It created successfully.
Image for reference:
When I try to retrieve the path I got below error
I changed the networking settings of storage account by enabling enabled from all networks of storage account
Image for reference:
I try to retrieve the path again in data factory. It worked successfully. I was able to retrieve the path.
Image for reference:
Another way is by whitelisting the IP addresses we can resolve this issue.
From the error message:
'The remote server returned an error: (403) Forbidden.'
It's likely the authentication method you're using doesn't have enough permissions on the blob storage to list the paths. I would recommend using the Managed Identity of the Data Factory to do this data transfer.
Take the name of the Data Factory
Assign the Blob Data Contributor role in the context of the container or the blob storage to the ADF Managed Identity (step 1).
On your blob linked service inside of Data Factory, choose the managed identity authentication method.
Also, if you stage your data transfer on the blob storage, you have to make sure the user can write to the blob storage, and also bulk permissions on SQL Server.

Access data from ADLS using Azure Databricks

I am trying to access data files stored in ADLS location via Azure Databricks using storage account access keys.
To access data files, I am using python notebook in azure databricks and below command works fine,
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
"<access-key>"
)
However, when I try to list the directory using below command, it throws an error
dbutils.fs.ls("abfss://<container-name>#<storage-account-name>.dfs.core.windows.net")
ERROR:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://<storage-account-name>.dfs.core.windows.net/<container-name>?upn=false&resource=filesystem&maxResults=500&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:<request-id> Time:2021-08-03T08:53:28.0215432Z"
I am not sure on what permission would it require and how can I proceed with it.
Also, I am using ADLS Gen2 and Azure Databricks(Trial - premium).
Thanks in advance!
The complete config key is called "spark.hadoop.fs.azure.account.key.adlsiqdigital.dfs.core.windows.net"
However it would be beneficial for a production environment to use a service account and a mount point. This way the actions on the storage can be traced back to this application more easily than just with the generic access key and the mount point avoid specifying the connection string everywhere in your code.
Try this out.
spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net","<your-storage-account-access-key>")
dbutils.fs.mount(source = "abfss://<container-name>#<your-storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/test")
You can mount ADLS storage account using access key via Databricks and then read/write data. Please try below code:
dbutils.fs.mount(
source = "wasbs://<container-name>#<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
dbutils.fs.ls("/mnt/<mount-name>")

getting error while mounting azure storage account with Databricks file system

I am new to Azure Databricks and even to Spark technology. I am trying to mount my azure storage on HDFS using below method but it is giving error mentioned below. Can somebody please help me to fix this? In the notebook, I have selected Scala as a language.
dbutils.fs.mount( source = "wasbs://rmwblobcontainer#rmwsa1.blob.core.windows.net/", mountPoint = "/mnt/mypath", extraConfigs = Map("fs.azure.account.key.rmwsa1.blob.core.windows.net" -> "{MX6BzXjcdIW+SJrvfocw8uFLT99Gs1aLtWBWkpQK7OyXIlctaoW1A/WQ9gBEGaxXcQ76FjEAI2hJGTiOQ6lCAA==}"))
Error --
shaded.databricks.org.apache.hadoop.fs.azure.AzureException:
java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string
You will recieve this error message "shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string", when you have additional values in the Azure Storage access key.
Azure Storage access key contains curly braces "{}" at start & end, please do remove curly brackets and rerun the cell.

save spark ML model in azure blobs

I tried saving my machine learning model in pyspark to azure blob. But this is giving error.
lr.save('wasbs:///user/remoteuser/models/')
Illegal Argument Exception: Cannot initialize WASB file system, URI authority not recognized.'
Also tried,
m = lr.save('wasbs://'+container_name+'#'+storage_account_name+'.blob.core.windows.net/models/')
But getting unable to identify user identity in stack trace.
P.S. : I am not using Azure HDInsight. I am just using Databricks and Azure blob storage
To access Azure Blob Storage from Azure Databricks directly (not mounted), you have to set an an account access key:
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
or a SAS for a container. Then you should be able to access the Blob Storage:
val df = spark.read.parquet("wasbs://<your-container-name>#<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")

CannotVerifyCopySource, extended message: The specified blob does not exist. in Azure Data Factory

While connecting from a Azure Data Factory to My Azure Blob storage I am getting the following error.
Error: The remote server returned an error: (404) Not Found., HTTP status code: 404, HTTP status message The specified blob does not exist., extended status code: CannotVerifyCopySource, extended message: The specified blob does not exist.
The error information had shown the issue reason The specified blob does not exist., which be listed in Blob Service Error Codes as below.
My guess is that you were copying some blobs which have been also using by other process like HDInsight as your thread tag. When Azure Data Factory was ready for doing the copy operation, the specified blob as a temporary object had been deleted. Please check and make sure whether exists this case.

Resources