The root scratch dir: /tmp/hive on HDFS should be writable Spark app error - apache-spark

I have created a Spark application which uses Hive metastore but in the line of the external Hive table creation, I get such an error when I execute the application (Spark driver logs):
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
I run the application using the Spark operator for K8s.
So I checked the permissions of the directories ob driver pod of the Spark application:
ls -l /tmp
...
drwxrwxr-x 1 1001 1001 4096 Feb 22 16:47 hive
If I try to change permissions it does not make any effect.
I run Hive metastore and HDFS in K8s as well.
How this problem can be fixed?

This is a common error which can be fixed by creating a directory at another place and pointing the spark to use the new dir.
Step 1: Create a new dir called tmpops at /tmp/tmpops
Step 2: Give permission for the dir chmod -777 /tmp/tmpops
Note: -777 is for local testing. If you are working with sensitive data make sure to add this path to security groups to avoid accidental data leakage and security loophole.
Step 3: Add the below property in your hive-site.xml that the spark app is referring to:
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/tmpops</value>
</property>
Once you do this, the error will no longer appear unless someone deletes that dir.

I face the same issue in window 10, below solution helped me to get this fixed.
Following steps solved my problem
Open Command Prompt in Admin Mode
winutils.exe chmod 777 /tmp/hive
Open Spark-Shell --master local[2]

Related

Why spark app with deploy-mode local use "ls -F" on windows? [duplicate]

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Unable to start bin/dse spark-sql. File not exception /tmp/hive

I'am trying to run following command on DSE cassandra :-
dse$ bin/dse spark-sql
It gives following error :-
2018-05-24 16:59:41 [main] ERROR o.a.s.d.DseSparkSubmitBootstrapper - Failed to start or submit Spark application - see details in the log file(s): /home/aditya/.spark-sql-shell.log
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) ~[hive-exec-1.2.1.spark2.jar:1.2.1.spark2]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:114) ~[spark-hive-thriftserver_2.11-2.0.2.16.jar:2.0.2.16]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) ~[spark-hive-thriftserver_2.11-2.0.2.16.jar:2.0.2.16]
I'dont understand is this permission issue or something else but directory has all permissions.
Thanks,
I solved my issue. It was because I was not starting Cassandra in Analytic mode so if you face such problem make sure that you have started your Cassandra in Analytic mode by -
bin/dse cassandra -k
Thanks,

spark-shell error on Windows - can it be ignored if not using hadoop?

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

new Spark StreamingContext failes with hdfs errors

I'm using dcos installed via Azure ACS and installed hdfs and spark via dcos tool with default options.
Creating a SparkStreamingContext gives:
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Exception in thread "main" java.lang.IllegalArgumentException:
java.net.UnknownHostException: namenode1.hdfs.mesos
I expect I have to redeploy the spark package with dcos package install with –options= but can't figure out what the hdfs.config-url should be. The https://docs.mesosphere.com/1.7/usage/service-guides/spark/install/#hdfs docs seem out of date.
Yes, it is out of date. We'll fix that.
DC/OS HDFS now serves its config on http://hdfs.marathon.mesos:[port]/v1/connect

./spark-shell doesn't start correctly (spark1.6.1-bin.hadoop2.6 version)

I installed this spark version: spark-1.6.1-bin-hadoop2.6.tgz.
Now when I start spark with ./spark-shell command Im getting this issues (it shows a lot of error lines so I just put some that seems important)
Cleanup action completed
16/03/27 00:19:35 ERROR Schema: Failed initialising database.
Failed to create database 'metastore_db', see the next exception for details.
org.datanucleus.exceptions.NucleusDataStoreException: Failed to create database 'metastore_db', see the next exception for details.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516)
Caused by: java.sql.SQLException: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
... 128 more
Caused by: ERROR XBM0H: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
Nested Throwables StackTrace:
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
... 128 more
Caused by: ERROR XBM0H: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
at org.apache.derby.iapi.error.StandardException.newException
Caused by: java.sql.SQLException: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
at
... 128 more
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
scala>
I tried some configurations to fix this issue that I search in other questions about the value sqlContext not found issue, like:
/etc/hosts file:
127.0.0.1 hadoophost localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.0.15 hadoophost
echo $HOSTNAME returns:
hadoophost
.bashrc file contains:
export SPARK_LOCAL_IP=127.0.0.1
But dont works, can you give some help to try understand why spark is not starting correctly?
hive-default.xml.template
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--><configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
In the home folder I get the same issues:
[hadoopadmin#hadoop home]$ pwd
/home
[hadoopadmin#hadoop home]$
Folder permissions:
[hadoopdadmin#hadoop spark-1.6.1-bin-hadoop2.6]$ ls -la
total 1416
drwxr-xr-x. 12 hadoop hadoop 4096 .
drwxr-xr-x. 16 root root 4096 ..
drwxr-xr-x. 2 hadoop hadoop 4096 bin
-rw-r--r--. 1 hadoop hadoop 1343562 CHANGES.txt
drwxr-xr-x. 2 hadoop hadoop 4096 conf
drwxr-xr-x. 3 hadoop hadoop 4096 data
drwxr-xr-x. 3 hadoop hadoop 4096 ec2
drwxr-xr-x. 3 hadoop hadoop 4096 examples
drwxr-xr-x. 2 hadoop hadoop 4096 lib
-rw-r--r--. 1 hadoop hadoop 17352 LICENSE
drwxr-xr-x. 2 hadoop hadoop 4096 licenses
-rw-r--r--. 1 hadoop hadoop 23529 NOTICE
drwxr-xr-x. 6 hadoop hadoop 4096 python
drwxr-xr-x. 3 hadoop hadoop 4096 R
-rw-r--r--. 1 hadoop hadoop 3359 README.md
-rw-r--r--. 1 hadoop hadoop 120 RELEASE
drwxr-xr-x. 2 hadoop hadoop 4096 sbin
Apparently you don't have permissions to write in that directory, I recommend you to run ./spark-shell in your HOME (you might want to add that command to your PATH), or in any other directory accessible and writable by your user.
This might also be relevant for you Notebooks together with Spark
You are using spark built with hive support.
There are two possible solutions based on what you want to do later with your spark-shell or in your spark jobs -
You want to access hive tables in your hadoop+hive installation.
You should place hive-site.xml in your spark installation's conf sub-directory. Find hive-site.xml from your existing hive installation. For example, in my cloudera VM the hive-site.xml is at /usr/lib/hive/conf. Launching the spark-shell after doing this step should successfully connect to existing hive metastore and will not try to create a temporary .metastore database in your current working directory.
You do NOT want to access hive tables in your hadoop+hive installation.
If you do not care about connecting to hive tables, then you can follow Alberto's solution. Fix the permission issues in the directory from which you are launching spark-shell. Make sure you are allowed to create directories/files in that directory.
Hope this helps.

Resources