I am using Spark 2.0.0 with Hadoop 2.7 and use the yarn-cluster mode. Every time, I get the following error:
17/01/04 11:18:04 INFO spark.SparkContext: Successfully stopped SparkContext
17/01/04 11:18:04 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/01/04 11:18:04 INFO util.ShutdownHookManager: Shutdown hook called
17/01/04 11:18:04 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-hduser/nm-local-dir/usercache/harry/appcache/application_1475261544699_0833/spark-42e40ac3-279f-4c3f-ab27-9999d20069b8
17/01/04 11:18:04 INFO spark.SparkContext: SparkContext already stopped.
However, I do get the correct printed output.
The same code works fine in Spark 1.4.0-Hadoop 2.4.0 where I do not see any exit codes.
This issue .sparkStaging not cleaned if application exited incorrectly https://issues.apache.org/jira/browse/SPARK-17340 started after Spark 1.4 (Affects Version/s: 1.5.2, 1.6.1, 2.0.0)
The issue is: When running Spark (yarn,cluster mode) and killing application
.sparkStaging is not cleaned.
When this issue happened exitCode 16 raised in Spark 2.0.X
ERROR ApplicationMaster: RECEIVED SIGNAL TERM
INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
Is it possible that in your code, something is killing the application?
If so - it shouldn't be seen in Spark 1.4, but should be seen in Spark 2.0.0
Please search your code for "exit" (as if you have such in your code, the error won't be shown in Spark 1.4, but will in Spark 2.0.0)
Seems like excess memory is used by JVM, try adding the property
yarn.nodemanager.vmem-check-enabled to false in your yarn-site.xml
Reference
Related
I added to my spark config a package (in spark-default.conf) but when I create a new session with livy it causes me a problem (see the error below) and the session and death .
ps: when I remove this package all work fine .
20/05/04 00:17:35 WARN RSCClient: Error stopping RPC.
io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise#6d493840(uncancellable)
at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:394)
at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
...........
Exception in thread "Thread-32" java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
........
at org.apache.livy.utils.LineBufferedStream$$anon$1.run(LineBufferedStream.scala:46)
20/05/04 00:17:36 WARN ContextLauncher: Child process exited with code 143.
20/05/04 00:17:36 ERROR SparkProcApp: job was killed by user
20/05/04 00:17:36 INFO InteractiveSession: Stopped InteractiveSession 0.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleting InteractiveSession 0 because it was inactive for more than 3600000.0 ms.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleting session 0
20/05/04 00:28:17 INFO InteractiveSession: Stopping InteractiveSession 0...
20/05/04 00:28:17 INFO InteractiveSession: Stopped InteractiveSession 0.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleted session 0
I use :
cloudera hdp2.6.5 :
spark 2.3
livy 0.7.0
Hadoop 2.7
lib unsupervised (https://github.com/unsupervise/spark-tss)
step :
livy conf => livy.spark.master yarn-cluster
spark-default conf => spark.jars.repositories https://dl.bintray.com/unsupervise/maven/
spark-defaultconf => spark.jars.packages com.github.unsupervise:spark-tss:0.1.1
Please try also to add to spark defaults
spark.jars.repositories https://dl.bintray.com/unsupervise/maven/
Alternatively (just to be sure that you have no issues with spark-default.conf file) please try to include these configs to the Livy request body instead when submitting the Spark job (refer Livy API).
​Hello, I am using Hive 2.3.0 with spark version 2.0.2, When i am trying to run Hive commands on Spark from hive console,
The Job is getting stuck and i have to manually kill it.
Following was the error message in Spark worker Log. Could you please advise if i am doing something wrong.
INFO worker.Worker: Executor app-20171114093447-0000/0 finished with state KILLED exitStatus 143
I was able to fix it by configuring correct EventLog directory in Spark.
I am trying to run a spark job via Mesos
it throws an exception
WARN MesosExternalShuffleClient: Unable to register app
6c1b7274-960f-47ef-9fa7-1dd06b05d4f1-0010
with external shuffle service.
Please manually remove shuffle data after driver exit
Error:java.lang.RuntimeException:
java.lang.UnsupportedOperationException: Unexpected message:
org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver#88d86a24
"
and the logs in Stderr as
ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
INFO DiskBlockManager: Shutdown hook called
E0911 05:32:34.711486 6619 process.cpp:951] Failed to accept socket: future discarded**
In Spark-Defaults.conf
spark.mesos.coarse true
spark.network.timeout 3600s
spark.shuffle.io.connectionTimeout 3600s
Who is killing my application..?
I have a spark streaming job which reads in data from Kafka and does some operations on it. I am running the job over a yarn cluster, Spark 1.4.1, which has two nodes with 16 GB RAM each and 16 cores each.
I have these conf passed to the spark-submit job :
--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3
The job returns this error and finishes after running for a short while :
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)
.....
ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver
Updated :
These logs were found too :
INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....
INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.
....
INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
INFO yarn.ExecutorRunnable: Starting Executor Container.....
INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...
INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)
INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
What might be the reasons for this? Appreciate some help.
Thanks
can you please show your scala/java code that is reading from kafka? I suspect you probably not creating your SparkConf correctly.
Try something like
SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");
also try running application in yarn-client mode and share the output.
I got the same issue. and I have found 1 solution to fix the issue by removing sparkContext.stop() at the end of main function, leave the stop action for GC.
Spark team has resolved the issue in Spark core, however, the fix has just been master branch so far. We need to wait until the fix has been updated into the new release.
https://issues.apache.org/jira/browse/SPARK-12009
I have a 6 node cluster, one of those is spark enabled.
I also have a spark job that I would like to submit to the cluster / that node, so I enter the following command
spark-submit --class VDQConsumer --master spark://node-public-ip:7077 target/scala-2.10/vdq-consumer-assembly-1.0.jar
it launches the spark ui on that node, but eventually gets here:
15/05/14 14:19:55 INFO SparkContext: Added JAR file:/Users/cwheeler/dev/git/vdq-consumer/target/scala-2.10/vdq-consumer-assembly-1.0.jar at http://node-ip:54898/jars/vdq-consumer-assembly-1.0.jar with timestamp 1431627595602
15/05/14 14:19:55 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:19:55 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#node-ip:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/05/14 14:20:15 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:20:35 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:20:55 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/14 14:20:55 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
15/05/14 14:20:55 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
Does anyone have any idea what just happened?