Spark streaming - class not found - HDFS file streaming - java.lang.ClassNotFoundException: com.pepperdata.spark.metrics.PepperdataSparkListener - apache-spark

I have submitted the spark streaming job with the yarn cluster mode.
But I am getting the following error.
SparkSubmit Command:
export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar
spark-submit --master yarn-cluster --keytab /etc/security/keytabs/srvc_egsc_hdpuser.service.keytab --principal srvc_egsc_hdpuser#EAPKDC.HOUSTON.HP.COM --queue sc_streaming --class com.reni.scmplatform.data.producer.DPMain --executor-memory 5g --driver-memory 8g --conf spark.sql.shuffle.partitions=10 --conf spark.default.parallelism=50 --jars /usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar --files /etc/spark/conf/hbase-site.xml,/etc/spark/conf/hive-site.xml hdfs://EAPROD/EA/supplychain/streaming/logistics/entaly/jars/DataProducer-assembly-1.0.15-SNAPSHOT.jar --platform.framework.hdfs.logging.dir=/EA/supplychain/process/logs/logistics/entaly/dataProducer --platform.framework.logging.level=info --platform.framework.logging.publish=true
Error:
18/03/12 05:14:30 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Exception when registering SparkListener
org.apache.spark.SparkException: Exception when registering SparkListener
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2154)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:578)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2280)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:140)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:877)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:877)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:877)
at com.reni.scmplatform.data.producer.helper.DPStreamEventHandler.start(DPStreamEventHandler.scala:63)
at com.reni.scmplatform.data.producer.DPMain$.main(DPMain.scala:27)
at com.reni.scmplatform.data.producer.DPMain.main(DPMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:561)
Caused by: java.lang.ClassNotFoundException: com.pepperdata.spark.metrics.PepperdataSparkListener
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2122)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2119)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2119)
... 15 more
18/03/12 05:14:30 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/03/12 05:14:30 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/03/12 05:14:30 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Exception when registering SparkListener)

You should add the JAR containing the missing class to the job classpath by using the --jars option (see this answer: spark submit add multiple jars in classpath)
Moreover, I use sbt-assembly plugin to take care of these things for you:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
Then build with sbt compile assemble and all the jars needed for your application will be included in the job jar sent to Yarn.

Related

Spark job not running when jar is in HDFS

I am trying to run a spark job in standalone mode but the command is not picking up the jar from HDFS.The jar is present in the HDFS location and Its working fine when I run it in local mode.
Below is the command I am using
spark-submit --deploy-mode client --master yarn --class com.main.WordCount /spark/wc.jar
Below is my program:
val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")
val spark = new SparkContext(conf)
val file = spark.textFile(args(0))
val count = file.flatMap(f=>f.split(" ")).map(word=>(word,1)).reduceByKey(_+_).collect
count.foreach(println)
And I am getting below error:
Warning: Local jar /spark/wc.jar does not exist, skipping.
java.lang.ClassNotFoundException: com.main.WordCount
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
But If i use deploy mode cluster I am getting below error:
Exception in thread "main" java.io.FileNotFoundException: File file:/spark/wc.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:530)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:529)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Could you please clarify what is local mode. There are only two deploy mode client and cluster, the only difference is in client mode Driver program will run on the system and in cluster mode driver program will run from random node in the cluster.
For spark submit command:
When you execute spark submit command spark will pull all the local resources/files defined with --files , --py-files argument as well as Spark Main Jar to temporary HDFS location/directory, which is created by that particular spark application with the application name. when you give HDFS location, it will fail to location the Jar on local machine. It is mandatory to keep the Jar on local.

Spark Streaming - java.io.IOException: Lease timeout of 0 seconds expired

I have spark streaming application using checkpoint writing on HDFS.
Has anyone know the solution?
Previously we were using the kinit to specify principal and keytab and got the suggestion to specify these via spark-submit command instead kinit but still this error and cause spark streaming application down.
spark-submit --principal sparkuser#HADOOP.ABC.COM --keytab /home/sparkuser/keytab/sparkuser.keytab --name MyStreamingApp --master yarn-cluster --conf "spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC --conf "spark.eventLog.enabled=true" --conf "spark.streaming.backpressure.enabled=true" --conf "spark.streaming.stopGracefullyOnShutdown=true" --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC --class com.abc.DataProcessor myapp.jar
I see multiple occurrences of following exception in logs and finally SIGTERM 15 that kills the executor and driver. We are using CDH 5.5.2
2016-10-02 23:59:50 ERROR SparkListenerBus LiveListenerBus:96 -
Listener EventLoggingListener threw an exception
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:148)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:148)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:148)
at org.apache.spark.scheduler.EventLoggingListener.onUnpersistRDD(EventLoggingListener.scala:184)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1135)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
Caused by: java.io.IOException: Lease timeout of 0 seconds expired.
at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:2370)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:964)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:932)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)

How to get the working directory in executor

I am using the following command to submit Spark job, I hope to send jar and config files to each executor and load it there
spark-submit --verbose \
--files=/tmp/metrics.properties \
--jars /tmp/datainsights-metrics-source-assembly-1.0.jar \
--total-executor-cores 4\
--conf "spark.metrics.conf=metrics.properties" \
--conf "spark.executor.extraClassPath=datainsights-metrics-source-assembly-1.0.jar" \
--class org.microsoft.ofe.datainsights.StartServiceSignalPipeline \
./target/datainsights-1.0-jar-with-dependencies.jar
--files and --jars is used to send files to executors, I found that the files are sent to the working directory of executor like 'worker/app-xxxxx-xxxx/0/
But when job is running, the executor always throws exception saying that it could not find the file 'metrics.properties'or the class which is contained in 'datainsights-metrics-source-assembly-1.0.jar'. It seems that the job is looking for files under another dir rather than working directory.
Do you know how to load the file which is sent to executors?
Here is the trace (The class 'org.apache.spark.metrics.PerfCounterSource' is contained in the jar 'datainsights-metrics-source-assembly-1.0.jar'):
ERROR 2016-01-14 16:10:32 Logging.scala:96 - org.apache.spark.metrics.MetricsSystem: Source class org.apache.spark.metrics.PerfCounterSource cannot be instantiated
java.lang.ClassNotFoundException: org.apache.spark.metrics.PerfCounterSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_80]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_80]
at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_80]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_80]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_80]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_80]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_80]
at java.lang.Class.forName0(Native Method) ~[na:1.7.0_80]
at java.lang.Class.forName(Class.java:195) ~[na:1.7.0_80]
It looks like you have a typo in your --jars argument, so it could be that it's not actually loading the file and and continuing silently.

JDBC Driver not found - On submitting to YARN from Spark

Trying to read all rows from a DB table and write the same to another empty target table. So when I issue the following command at the main node, it works as expected -
$./bin/spark-submit --class cs.TestJob_publisherstarget --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar,./lib/univocity-parsers-1.5.6.jar,./lib/commons-csv-1.1.1-SNAPSHOT.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar
(Where: uber-ski-spark-job-0.0.1-SNAPSHOT.jar is the packaged jar in ../spark/lib folder and cs.TestJob_publisherstarget is the class)
The above command works perfectly for the code and reads all rows from a table in MySQL and dumps all roes to target table, using the JDBC driver mentioned with --jars option.
Here is the issue:
Everything remaining the same as above, when I submit the same job to YARN, it fails with en exception indicating - can't find the driver
$./bin/spark-submit --verbose --class cs.TestJob_publisherstarget --master yarn-cluster --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar
Exception in YARN Console:
Error: application failed with exception
org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:625)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:650)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
EXCEPTION AT LOG:
5/10/12 20:38:59 ERROR yarn.ApplicationMaster: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:96)
at org.apache.spark.sql.jdbc.JDBCRelation.<init>(JDBCRelation.scala:133)
at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at com.cambridgesemantics.application.sdi.compiler.spark.DataSource.getDataFrame(DataSource.scala:20)
at cs.TestJob_publisherstarget$.main(TestJob_publisherstarget.scala:29)
at cs.TestJob_publisherstarget.main(TestJob_publisherstarget.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:484)
15/10/12 20:38:59 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root)
Anyway: Where am I supposed to put the JDBC driver jar file? I have copied it over to the lib of each child node, still no luck!
I was having the same issue, it was working in local mode but not in yarn-client.
I added to spark submit:
--conf "spark.executor.extraClassPath=/path/to/mysql-connector-java-5.1.34.jar
and that worked for me
For Spark 1.6, I have the issue to store DataFrame to Oracle by using org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable
In yarn-cluster mode, I put these options in the submit script:
--conf "spark.driver.extraClassPath=$HOME/jdbc-11.2.0.3.0.jar" \
--conf "spark.executor.extraClassPath=$HOME/jdbc-11.2.0.3.0.jar" \
I also have to put Class.forName("..") like below before the saving line:
try {
Class.forName("oracle.jdbc.OracleDriver");
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, "RD_SPARK_DTL_INCL_HY ", p);
} catch (Exception e) {....
Of course, you have to copy the lib to each node. Not pretty, but it works. Hope someone can come up better solution later.
I do strongly recommend to use this API -- amazingly convenient and fast.

Spark 1.1.0 on cdh5.1.3 does not work in yarn-cluster mode

I am having CDH 5.1 (Hadoop 2.3.0-cdh5.1.3) installed on my cluster, version:
I have installed and configured a prebuilt version of Spark 1.1.0 (Apache Version), built for hadoop 2.3 on my cluster.
when I run the Pi example in the ‘client mode’, it runs successfully, but it fails in the ‘yarn-cluster’ mode. The spark job is successfully submitted, but fails after polling the application master for sometime:
More Logs:
Application application_1415193640322_0016 failed 2 times due to Error launching appattempt_1415193640322_0016_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: java.io.EOFException
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:710)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:151)
at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:262)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:696)
... 10 more
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnException): java.io.EOFException
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:710)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:151)
at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:262)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:696)
... 10 more
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy69.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
... 5 more
. Failing the application.
When I go to node Manager logs:
Log Type: stderr
Log Length: 87
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
Can you please suggest any solution.Do you think I should compile the spark code on my cluster. Or should I use Spark provided with CDH5.1.
Any help will be appreciated!
spark-shell does not work with spark yarn-cluster mode. You should add --master yarn-client
Example:
path/to/pyspark --master yarn-client

Resources