Apache Spark exception when executing spark-shell - apache-spark

I have installed apache-spark on a single node. When I run the spark-shell I get the below exception. I can still create RDDs and run scala code snippets inspite of the exception.
This is the exception:
16/02/15 14:21:29 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/02/15 14:21:31 WARN : Your hostname, Rahul-PC resolves to a loopback/non-reachable address: fe80:0:0:0:c0c1:cd2e:990d:17ac%e
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
My JAVA_HOME is set to point to the right jdk installation folder.
JAVA_HOME = C:\Program Files\Java\jdk1.8.0
Is there anything else I need to do. Please advise.

I found the solution. Spark needs winutils.exe in order to initialize hive context. The C:\Windows\tmp folder that created when running the spark shell needs to have sufficient permissions as well.
http://blogs.msdn.com/b/arsen/archive/2016/02/09/resolving-spark-1-6-0-quot-java-lang-nullpointerexception-not-found-value-sqlcontext-quot-error-when-running-spark-shell-on-windows-10-64-bit.aspx

Related

Unable to run Spark in yarn-cluster mode

I'm trying to run spark job with YARN in cluster deploy mode.
I tried to run the simpliest spark-submit command only with jar path, class parameter and master yarn-cluster. However I still have the same error, which actually tells me nothing.
Exception in thread "main" org.apache.spark.SparkException: Application application_1506196351647_0032 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1078)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1125)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
If anyone had the similar problem please let me know, I'm using spark 1.6, hadoop 2.6.

adding multiple jars in Oozie-Spark action

I'm using HDP2.6. where is installed oozie 4.2. and Spark2.
After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html for adding libs for Spark2 in 4.2. version of Oozie.
After I submit the job with this add-on:
oozie.action.sharelib.for.spark=spark2
The error I'm getting is this:
2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W#spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
2017-07-19 12:36:53,275 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W#spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:629)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:620)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:620)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:619)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
I have read that new Spark2 will not work with Spark 2.1 (via oozie anyway) due to a change in how Spark handles multiple files found in distributed cache, as mentioned here: see here
Keep in mind that I'm using Ambari and HDP2.6. How can I deal with this?
You need to check the content of the oozie directory and spark2 directory into the Oozie sharelib. If there are any jars present into both, just remove them from one place and try again. Also, do execute the oozie admin sharelub update command to update it.
Hope this will help you.

IllegalArgumentException with Spark 1.6

I'm running Spark 1.6.0 on CDH 5.7 and I've upgraded my original application from 1.4.1 to 1.6.0. When I try to run my application (which previously worked fine) I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:221)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$3.apply(Client.scala:473)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$3.apply(Client.scala:471)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:471)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:469)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:469)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:725)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1023)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1083)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I submit the application with:
--jars is a comma-separated list of jars (with absolute paths)
--files is a comma-separated list of files (with absolute paths)
--driver-class-path is a colon-separated list of resources (without the full path, just the file names)
I have tried it with full paths for the driver (and executor) class paths, but that gives me the same issue. All files and jars submitted with the app exist, I checked.
Could this be related to the issue with duplicates in the distributed cache or is this another issue?
From the source code I see that the only calls to require without a custom message (as in the stack trace) are related to the distribute() method. If so, how can I run applications without upgrading Spark?
This is the exception which results from having the same path/URI appear twice in the argument to the --files option.

Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

I have a cluster (Hortonworks supported system) running Apache spark on 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
I don't have admin access to this cluster and would like to run spark (1.5.2 and later versions) on java 8.
I come from HPC/MPI background. So I naively copied all executables of spark "/usr/hdp/current/spark-client/" into my root folder.
When I run spark-shell from my copied folder, it runs as expected on java 7.
When I change $JAVA_HOME to point to java 8, and run spark-shell, I get the following error.
Could you please help me fix this error?
Edit: by comparing log output when using java 7 and java 8, it seems that this exception occurs when starting http server
Exception in thread "main" java.security.NoSuchAlgorithmException: Error constructing implementation (algorithm: Default, provider: SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)
at java.security.Provider$Service.newInstance(Provider.java:1617)
at sun.security.jca.GetInstance.getInstance(GetInstance.java:236)
at sun.security.jca.GetInstance.getInstance(GetInstance.java:164)
at javax.net.ssl.SSLContext.getInstance(SSLContext.java:156)
at javax.net.ssl.SSLContext.getDefault(SSLContext.java:96)
at org.apache.spark.SSLOptions.liftedTree1$1(SSLOptions.scala:122)
at org.apache.spark.SSLOptions.<init>(SSLOptions.scala:114)
at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:243)
at org.apache.spark.repl.SparkIMain.<init>(SparkIMain.scala:118)
at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.<init>(SparkILoop.scala:187)
at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:949)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653)
at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225)
at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
at java.security.KeyStore.load(KeyStore.java:1445)
at sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(TrustManagerFactoryImpl.java:226)
at sun.security.ssl.SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.java:767)
at sun.security.ssl.SSLContextImpl$DefaultSSLContext.<init>(SSLContextImpl.java:733)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.security.Provider$Service.newInstance(Provider.java:1595)
... 28 more

spark-shell - failure to initialize terminal (Windows 10, official cmd.exe)

I'm trying to run Spark on Windows 10 with hadoop on the official Microsoft terminal cmd.exe.
I don't have problem with Hadoop. The installation and stating is OK.
I'm using Java 8 x64 (jdk1.8.0_92)
When I start Spark with the command spark-shell, I got the Java error bellow :
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.NoClassDefFoundError: Could not initialize class scala.tools.fusesource_embedded.jansi.internal.Kernel32
at scala.tools.fusesource_embedded.jansi.internal.WindowsSupport.getConsoleMode(WindowsSupport.java:50)
at scala.tools.jline_embedded.WindowsTerminal.getConsoleMode(WindowsTerminal.java:204)
at scala.tools.jline_embedded.WindowsTerminal.init(WindowsTerminal.java:82)
at scala.tools.jline_embedded.TerminalFactory.create(TerminalFactory.java:101)
at scala.tools.jline_embedded.TerminalFactory.get(TerminalFactory.java:158)
at scala.tools.jline_embedded.console.ConsoleReader.(ConsoleReader.java:229)
at scala.tools.jline_embedded.console.ConsoleReader.(ConsoleReader.java:221)
at scala.tools.jline_embedded.console.ConsoleReader.(ConsoleReader.java:209)
at scala.tools.nsc.interpreter.jline_embedded.JLineConsoleReader.(JLineReader.scala:61)
at scala.tools.nsc.interpreter.jline_embedded.InteractiveReader.(JLineReader.scala:33)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at scala.tools.nsc.interpreter.ILoop$$anonfun$scala$tools$nsc$interpreter$ILoop$$instantiate$1$1.apply(ILoop.scala:865)
at scala.tools.nsc.interpreter.ILoop$$anonfun$scala$tools$nsc$interpreter$ILoop$$instantiate$1$1.apply(ILoop.scala:862)
at scala.tools.nsc.interpreter.ILoop.scala$tools$nsc$interpreter$ILoop$$mkReader$1(ILoop.scala:871)
at scala.tools.nsc.interpreter.ILoop$$anonfun$15$$anonfun$apply$8.apply(ILoop.scala:875)
at scala.tools.nsc.interpreter.ILoop$$anonfun$15$$anonfun$apply$8.apply(ILoop.scala:875)
at scala.util.Try$.apply(Try.scala:192)
at scala.tools.nsc.interpreter.ILoop$$anonfun$15.apply(ILoop.scala:875)
at scala.tools.nsc.interpreter.ILoop$$anonfun$15.apply(ILoop.scala:875)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223)
at scala.collection.immutable.Stream.collect(Stream.scala:435)
at scala.tools.nsc.interpreter.ILoop.chooseReader(ILoop.scala:877)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$2.apply(ILoop.scala:916)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:916)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:911)
at org.apache.spark.repl.Main$.main(Main.scala:49)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I get exactly the same stack trace when opening the scala console in Netbeans. If I type any scala expression below the stack trace, it works fine though.

Resources