Spark 1.5.1 spark-shell throws RuntimeException - apache-spark

I am simply trying to launch the spark shell on my local Windows 8 and here's the error message that i get :
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are:
rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 56 more
Somehow the REPL is here, but i can't use the sqlContext..
Did anyone faced this problem before? Any answer will be helpful, thanks.

RESOLVED : Downloaded the correct Winutils version and issue was resolved. Ideally, it should be locally compiled but if downloading compiled version make sure that it is 32/64 bit as applicable.
I tried on Windows 7 64 bit, Spark 1.6 and downloaded winutils.exe from https://www.barik.net/archive/2015/01/19/172716/ and it worked..!!
Complete Steps are at : http://letstalkspark.blogspot.com/2016/02/getting-started-with-spark-on-window-64.html

First you need to download the correct compatible winutils.exe for your spark and operating system. Place it somewhere in folder followed by bin directory.
Lets say D:\winutils\bin\winutils.exe
Now if /tmp/hive is present in your D: drive, run following command:
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
For more details, refer these posts:
Frequent Issues occurred during Spark Development
https://issues.apache.org/jira/browse/SPARK-10528

This might be helpful in this case :
https://issues.apache.org/jira/browse/SPARK-10528

Related

Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

I am getting this error while trying to write txt file to local path in windows.
Error:
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
Spark , hadoop versions : spark-3.0.3-bin-hadoop2.7.
winutils is placed in C:\winutils\bin
hadoop.dll is placed in C:\winutils\bin and c:\System32
Environment Variables set
HADOOP_HOME C:\winutils
Path %HADOOP_HOME%\bin
Tried restarting
I found below cause and solution:
Root Cause:
Gradle dependency was with higher version of spark. I have spark 3.0.3 installed but here it was 3.2.0
implementation 'org.apache.spark:spark-core_2.13:3.2.0'
Fix:
Replaced with
implementation 'org.apache.spark:spark-core_2.12:3.0.3'

can't run "spark-submit" command

i'm trying to learn a big data online course and came across the problem while installing apache spark.
i've done everything correctly but when i try to run spark-submit it seems that there is an issue with java i guess.
when i run this:
(base) C:\SparkCourse>spark-submit ratings-counter.py
i get this error:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54)
at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module #5b94b04d
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188)
at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181)
at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56)
... 13 more
any ideas?
Cheers!
I reinstalled windows and started everything from scratch.
Installed jdk version 8, and this version of spark "spark-3.0.3-bin-hadoop2.7.tgz". Indicated all the paths correctly and. It worked as i can open pyspark shell and do spark-submit for example, but there still is a lot of text in the cmd that i can't get rid of.

Issue upon Spark Upgrade : key not found: _PYSPARK_DRIVER_CONN_INFO_PATH

Downloaded the latest Spark version because of the fix for
ERROR AsyncEventQueue:70 - Dropping event from queue appStatus.
After setting environment variables and running the same code in PyCharm, I'm getting this error, which I can't find a solution of.
Exception in thread "main" java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CONN_INFO_PATH
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:64)
at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Any help?
i met this question too. The next is what i do, hoping to help you:
1 . find your spark version, my spark's version is 2.4.3;
2 . find your pyspark version, my pyspark,version is 2.2.0;
3 . reinstall your pyspark as same as the spark's version
pip install pyspark==2.4.3
Then everything is ok. Hope to help you.
I am using Pyspark 2.3.1 with pycharm 2018.1.4 and facing similar issue on my windows machine.
When I run this python file using spark-submit, it get executed successfully.
I have followed below steps
Created a new project in pycharm, lets call it Demo
Goto Settings->Project:Demo->Project Interpreter. Make sure project interpreter is python 2.7
Goto Settings->Project:Demo->Project Structure. Add Content Root.
I have added two content root one pointing to directory where content of apache spark is present and the other location is of py4j-0.10.7.src.zip
In my case these locations are
C:\apache-spark
and
C:\apache-spark\python\lib\py4j-0.10.7-src.zip
Created new python file(Demo1.py) and pasted below content inside it.
from pyspark import SparkContext
sc = SparkContext(master="local", appName="Spark Demo")
rdd = sc.textFile("C:/apache-spark/README.md")
wordsRDD = rdd.flatMap(lambda words: words.split(" "))
wordsRDD = wordsRDD.map(lambda word: (word, 1))
wordsCount = wordsRDD.reduceByKey(lambda x, y: x+y)
print wordsCount.collect()
Running this python file on pycharm gives below error
Exception in thread "main" java.util.NoSuchElementException: key not
found: _PYSPARK_DRIVER_CONN_INFO_PATH
Where as same program when executed from command prompt yields correct result.
C:\Users\manish>spark-submit C:\Demo\demo1.py
Any suggestions to solve this problem?
I have had a similar exception. My problem was running jupyter and spark with different users. When I run them with the same user problem is solved.
Details;
When I updated spark from v2.2.0 to v2.3.1 then run the Jupyter notebook, the error log was as follows;
Exception in thread "main" java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CONN_INFO_PATH
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:64)
at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
When I googling I encountered the following link;
spark-commits mailing list archives
In the code
/core/src/main/scala/org/apache/spark/api/python/PythonGatewayServer.scala
there is a change
+ // Communicate the connection information back to the python process by writing the
+ // information in the requested file. This needs to match the read side in java_gateway.py.
+ val connectionInfoPath = new File(sys.env("_PYSPARK_DRIVER_CONN_INFO_PATH"))
+ val tmpPath = Files.createTempFile(connectionInfoPath.getParentFile().toPath(),
+ "connection", ".info").toFile()
According to this change it is created a temp dir and a file in it. My problem was running jupyter and spark with different users. Because of this I think the process could not created the temp file. When I run them with the same user problem solved. I hope it helps.
I had this problem too, and it ended up being the pyspark code I was importing/running from PyCharm was still the spark 2.2 install instead of the spark 2.3 installation that I had updated SPARK_HOME to point to.
Specifically, I added spark-2.2 to my PyCharm project structure and then marked it's python folder a "Sources" so PyCharm would recognize all it's symbols. So the PyCharm code was importing from there, instead of spark-2.3, and the older code didn't set the _PYSPARK_DRIVER_CONN_INFO_PATH environment variable.
If Vezir's answer didn't fix your case, try tracing into the creation SparkContext and compare carefully the path that is being read from as opposed to path of your spark install. Similarly, if you installed pyspark into your python project via pip, make sure you installed 2.3.1 to match your installed spark version.
This can happen when you are running spark 2.3.1 jars with an older version of pyspark (eg: 2.3.0)

What is causing this `ClassNotFoundException` while opening spark shell?

I've downloaded a prebuilt version of Spark on my mac (OS Mavericks), but when I try to open an interactive shell, typing bin/pyspark, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/launcher/Main
Caused by: java.lang.ClassNotFoundException: org.apache.spark.launcher.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I have googled every part of the error and checkout out some other stack overflow threads, but I can't find anything that addresses this error. Any idea what's going on/how to fix it?
One idea I have is that scala is a dependency that I need to download separately...but I really don't know.
I had the same issue before, and it turned out to be a permission issue and I'm under user who has no access to the spark files (root downloaded spark).
Another possibility is, you downloaded the source code and did not build the project from source code :P
Hope it helps.

Can not install eclim

Has anyone used eclim? I wanted to try it out and I use vim as my primary editor so I want to run it as a headless instance. Anyway I installed it via the Unattended (automated) install
$ java \
-Dvim.files=$HOME/.vim \
-Declipse.home=/opt/eclipse \
-jar eclim_2.4.0.jar install
I had already downloaded eclipse luna and I have jdk 7 installed (but I don't know if it is part of the environment variables) and I ended up with:
2014-08-30 10:37:40,569 INFO [ANT] [eclim:unattended] Finished analyzing your eclipse installation.
2014-08-30 10:37:40,572 ERROR [ANT]
jar:file:/home/jim/Downloads/eclim_2.4.0.jar!/installer.xml:119: java.lang.NullPointerException
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:116)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
at org.formic.ant.Main.runBuild(Main.java:232)
at org.formic.ant.Main.startAnt(Main.java:81)
at org.formic.ant.Main.main(Main.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.simontuffs.onejar.Boot.run(Boot.java:306)
at com.simontuffs.onejar.Boot.main(Boot.java:159)
Caused by: java.lang.NullPointerException
at org.formic.Installer.getString(Installer.java:201)
at org.eclim.installer.step.FeatureProvider.getFeatures(FeatureProvider.java:99)
at org.eclim.installer.ant.UnattendedInstallTask.execute(UnattendedInstallTask.java:73)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
... 16 more
2014-08-30 10:37:40,582 DEBUG [ANT]
BUILD SUCCESSFUL
Total time: 19 seconds
java.lang.NullPointerException
So I have no idea what happened. But I can not find eclimd anywhere in my system
/opt is owned by root per default. My guess is that it indeed is in your setup and since eclim needs to write to /opt/eclipse during installation it results in an error. Try changing ownership of /opt/eclipse using the -R option or run the installation as root. Note though that using $HOME will then probably not lead to the desired result.
I had the same issue. I followed the instructions to build from the source code and that worked for me.
I checked out the master branch from the Git repository and used ant to build and install eclim. At the time of this writing that resulted in version 2.4.0.11-ge560abe getting installed without errors. Running eclimd and then :PingEclim and :EclimValidate from vim reported that everything is fine.
Note that eclimd dumped an exception on startup:
java.lang.RuntimeException: Unable to aquire PluginConverter service during generation for: /home/pappmar/dev/eclipse/plugins/org.eclim.installer_2.4.0.11-ge560abe.jar
I don't know if that's a problem or not. It seems to be running all the same.

Resources