Databricks: How to solve DBFS DOWN? - databricks

I use Azure Databricks. In a notebook I tried to do:
dbutils.fs.ls("dbfs:/")
And I have this error:
java.lang.RuntimeException: java.io.IOException: Failed to perform 'getMountFileState(forceRefresh=true)' for mounts after 3 attempts. Please, retry the operation.
Original exception: 'shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation.
In Event logs in Databricks I see DBFS DOWN.
How can I solve the DBFS DOWN ?

One of the reasons for DBFS DOWN error could be that the driver node is overloaded, for example if you have many users who use the same cluster, etc. To resolve it you may need to increase driver memory, decrease number of users in the cluster, etc.
But the real error is the This request is not authorized to perform this operation - usually it happens if you didn't allow access to the storage account from other networks - check the firewall rules for the storage account, and add Databricks workspace networks to allowed ranges.

Related

Unable to replicate(two-way) objects in Azure Storage account

we are trying to achieve object replication in azure storage account. Currently we are able to achieve replication between source to destination, but wouldn't able to do destination to source. what we wanted to achieve is, each region will have its own specific storage account and ours is kind of blue/green deployment. so, we need two way replication. for e.g
our Env1 storage replicates to Env2 Storage account and then we bring Env3 storage which will start replicate from Env2 storage account, post that we will scrap Env1 storage account. I understand that this is currently not possible with Azure storage any alternate PaaS service which we can use?
I was thinking custom solution like, logic app/function app which might do the job. Is there any other way to achieve?
Two-way replication requires destination to be write-enabled. As per the docs, we have the following note:
“After you create the replication policy, write operations to the destination container aren't permitted. Any attempts to write to the destination container fail with error code 409 (Conflict). To write to a destination container for which a replication rule is configured, you must either delete the rule that is configured for that container, or remove the replication policy. Read and delete operations to the destination container are permitted when the replication policy is active.”
https://learn.microsoft.com/en-us/azure/storage/blobs/object-replication-overview#replication-rules
If it helps, you should be able to set up replication from B -> A on a different set of containers than the ones in the initial rule.
That said, if the container that is used to receive the replica needs to be write enabled in active-active fashion, can you please submit a detailed technical feedback on the feature and associated opportunity? We are also collecting use-cases and requests on active-active replication.
https://feedback.azure.com/d365community
Currently, two way replication is not possible. you have one storage account which acts as a driver and it will be replicated to other storage accounts where the replication enabled.
It is not possible to have replication happens from storage account 1 to 2 and vice versa, currently. it always happens from storage account 1 to 2.

Copy activitiy (from Cosmos SQL api to ADLS gen2) getting failed in Synapse

I am trying to run a pipeline which copies data from Cosmos (SQL API) to ADLS gen2 for multiple tables. Lookup Activity is passing list of queries and Copy Activity runs within ForEach, using self-hosted IR.
However it keeps failing after 1st iteration with below error:
Operation on target Copy data1_copy1 failed: Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path tfs/OU Cosmos Data/LATAM/fact\dl-br-prod.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request timed out.
Also I'm sure it is not the issue with any one specific table since I have tried passing queries in different order, in each attempt first query passed completes successfully and for rest of iteration Copy Activity runs for sometime and eventually fails.
I have tried following so far:
Running ForEach in sequential mode
Changing Block Size (in MB) on Sink side to 20MB. By default it is 100MB
Can you check the workaround suggested in the official MS docs as this involves Self-Hosted IR.
Request to Azure Data Lake Storage Gen2 account caused a timeout error
Cause: The issue is caused by the Azure Data Lake Storage Gen2 sink timeout error, which usually occurs on the Self-hosted Integration Runtime (IR) machine.
Recommendation:
Place your Self-hosted IR machine and target Azure Data Lake Storage Gen2 account in the same region, if possible. This can help
avoid a random timeout error and produce better performance.
Check whether there's a special network setting, such as ExpressRoute, and ensure that the network has enough bandwidth. We
suggest that you lower the Self-hosted IR concurrent jobs setting when
the overall bandwidth is low. Doing so can help avoid network resource
competition across multiple concurrent jobs.
If the file size is moderate or small, use a smaller block size for nonbinary copy to mitigate such a timeout error. For more information,
see Blob Storage Put
Block
I was able to get response from Microsoft Cosmos product team:
Root cause:
The SDK client is configured with some Timeout value and the request
is taking longer time.
Reason for the timeouts is an increase in Gateway latency (Gateway has
no latency SLA) due to large result size. This is probably expected
(more data tends longer to be read, sent, and received).
Resolution:
Increase the RequestTimeout used in the client.
The team owning the Synapse Data Transfer (which uses the .NET 2.5.1
SDK and owns the Microsoft.DataTransfer aplication) can increase the
RequestTimeout used on the .NET SDK to a higher value. In newer SDK
versions, this value is 65 seconds by default.
Though we have opted to bypass this route altogether and include either SynapseLink or Private Endpoint.

Databricks, SPARK UI, sql logs: Retrieve with REST API

Is it possible to retrieve Databricks/Spark UI/SQL logs using the rest-API, any retention limit?, can't see any related API rest-api azure Databricks
Note: cluster /advanced options/logging has not been set.
cluster_log_conf: The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every
5 mins. The destination of driver logs is //driver, while the destination of executor logs is //executor.
Refer official documentation

Azure Synapse: Target Spark pool specified in Spark job definition is not in succeeded state. Current state: Provisioning

I am trying to assign a few workspace packages to the Apache Spark pools in the Azure Synapse Analytics Workspace. Corresponding wheel files were uploaded to the workspace package manager. And I am assigning them to the specific spark pools. But when I apply these settings for the Spark pool, it says
Target Spark pool specified in Spark job definition is not in succeeded state. Current state: Provisioning
Someone guide how to overcome this error and successfully assign packages to my spark pool.
This error message is displayed when the spark pool resource did take longer than expected to provision. But after provisioning is completed you shouldn't see this error. If you still see this error we recommend you to file a support ticket for deeper analysis.

Can't connect to Azure Data Lake Gen2 using PySpark and Databricks Connect

Recently, Databricks launched Databricks Connect that
allows you to write jobs using Spark native APIs and have them execute remotely on an Azure Databricks cluster instead of in the local Spark session.
It works fine except when I try to access files in Azure Data Lake Storage Gen2. When I execute this:
spark.read.json("abfss://...").count()
I get this error:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
Does anybody know how to fix this?
Further information:
databricks-connect version: 5.3.1
If you mount the storage rather use a service principal you should find this works: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html
I posted some instructions around the limitations of databricks connect here. https://datathirst.net/blog/2019/3/7/databricks-connect-limitations
Likely too late but for completeness' sake, there's one issue to look out for on this one. If you have this spark conf set, you'll see that exact error (which is pretty hard to unpack):
fs.abfss.impl org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem
So you can double check the spark configs to make sure you have the permissions to directly access ADLS gen2 using the storage account access key.

Resources