Job aborted when writing table using different cluster on Databricks - databricks

I have two clusters on databricks and i used one (cluster1) to write a table on the datalake. I need to use the other cluster (cluster2) to schedule the job in charge of writing this table. However, this error occurs:
Py4JJavaError: An error occurred while calling o344.saveAsTable.
: org.apache.spark.SparkException: Job aborted.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3740.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3740.0 (TID
113976, 10.246.144.215, executor 13): org.apache.hadoop.security.AccessControlException:
CREATE failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the
resource does not exist or the user is not authorized to perform the requested operation.).
[7974c88e-0300-4e1b-8f07-a635ad8637fb] failed with error 0x83090aa2 (Forbidden.
ACL verification failed. Either the resource does not exist or the user is not authorized
to perform the requested operation.).
From the "Caused by" message it seems that I do not have the authorization to write on the datalake, but if i change the table name it successfully write the df onto the datalake.
I am trying to write the table with the following command:
df.write \
.format('delta') \
.mode('overwrite')\
.option('path', path)\
.option('overwriteSchema', "true")\
.saveAsTable(table_name)
I tried to drop the table and rewriting it using the cluster2 but this doesn't work, as if the location on the datalake is already occupied: only using cluster1 I can write in that location.
In the past I simply changed the table name as a workaround, but this time I need to keep the old name.
How can I solve this? Why the datalake is related to the cluster with which i wrote the table?

The issue was cause by different Service Principals used for the two clusters.
To solve the problem I had to drop the table and remove the path in the datalake with cluster1. Then, I could write the table again using cluster2.
The command to delete the path is:
rm -r 'adl://path/to/table'

Related

Delta lake error on DeltaTable.forName in k8s cluster mode cannot assign instance of java.lang.invoke.SerializedLambda

I am trying to merge some data to delta table in a streaming application in k8s using spark submit in cluster mode
Getting the below error, But its works fine in k8s local mode and in my laptop, none of the operations related to delta lake is working in k8s cluster mode,
Below is the library versions i am using , is it some compatibility issue,
SPARK_VERSION_DEFAULT=3.3.0
HADOOP_VERSION_DEFAULT=3
HADOOP_AWS_VERSION_DEFAULT=3.3.1
AWS_SDK_BUNDLE_VERSION_DEFAULT=1.11.974
below is the error message
py4j.protocol.Py4JJavaError: An error occurred while calling o128.saveAsTable. : java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4) (192.168.15.250 executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF
Finaly able to resolve this issue , issue was due to some reason dependant jars like delta, kafka are not available in executor , as per the below SO response
cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.sql.execution.datasources.v2.DataSourceRDD
i have added the jars in spark/jars folder using docker image and issue got resolved ,

Spark error when trying to load new View from Power BI

I am using Spark cli service in Power Bi, it throwing the below error trying to load View from spark.
DataSource.Error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2891.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2891.0 (TID 1227) (ip-XXX-XXX-XXX.compute.internal executor driver): java.io.FileNotFoundException: /tmp/blockmgr-51aefd41-4d64-49fb-93d0-10deca23cad3/03/temp_shuffle_39d969f9-b0af-4d4a-b476-b264eb18fd1c (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
The view returns data in spark-sql cli:
New Tables are working fine in the refresh, the error happens only with the views.
I also verify the disk space, it is not full.
It seems it was bug in spark-core
https://issues.apache.org/jira/browse/SPARK-36500
Others have similar issues:
Spark - java IOException :Failed to create local dir in /tmp/blockmgr*
After a research, in my case the solution is to increase the executor memory.
In the spark-defaults.conf
spark.executor.memory 5g
Then restart

Unable to fetch value from SNOWFLAKE_SAMPLE_DATA database in spark

I have a test account for snowflake. I can able to a fetch data from Python but unable to fetch it from Pyspark. Error is showing like unable to create a stage for shared DB. How the stage is creating in Python-snowflake connector?
i think the python connector creates a temp stage for the results in S3
you can see that when you run the connector in debug mode

Spark Dataproc job failing due to unable to rename error in GCS

I have a spark job which is getting failed due to following error.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34338.0 failed 4 times, most recent failure: Lost task 0.3 in stage 34338.0 (TID 61601, homeplus-cmp-transient-20190128165855-w-0.c.dh-homeplus-cmp-35920.internal, executor 80): java.io.IOException: Failed to rename FileStatus{path=gs://bucket/models/2018-01-30/model_0002002525030015/metadata/_temporary/0/_temporary/attempt_20190128173835_34338_m_000000_61601/part-00000; isDirectory=false; length=357; replication=3; blocksize=134217728; modification_time=1548697131902; access_time=1548697131902; owner=yarn; group=yarn; permission=rwx------; isSymlink=false} to gs://bucket/models/2018-01-30/model_0002002525030015/metadata/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/attempt_20190128173835_34338_m_000000_61601/part-00000
I'm unable to figure out what permission is missing, since the Spark job was able to write the temporary files, I'm assuming there are write permissions already.
Per OP comment, issue was in permissions configuration:
So I figured out that the I had only Storage Legacy Owner role on the bucket. I added Storage Admin role as well and that seem to solve the issue. Thanks.

Spark - Cosmos - connector problems

I am playing around with the Azure Spark-CosmosDB connector which lets you access CosmosDB nodes directly from a Spark cluster for analytics using Jupyter on HDINsight
I have been following the steps described here,including uploading the required jars to Azure storage and executing the %%configure magic to prepare the environment.
But it always seems to terminate due to an I/O exception when trying to open the jar (see yarn log below)
17/10/09 20:10:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.IOException: Error accessing /mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1507534135641_0014/container_1507534135641_0014_01_000001/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar)
17/10/09 20:10:35 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
17/10/09 20:10:35 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.io.IOException: Error accessing /mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1507534135641_0014/container_1507534135641_0014_01_000001/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar)`
Not sure whether this is related to the jar not being copied to the worker nodes.
any idea? thanks, Nick

Resources