Executor finished with state KILLED exitStatus 1 - apache-spark

After starting the master and a worker on one single machine...
spark-class org.apache.spark.deploy.master.Master -i 127.0.0.1 -p 7070
spark-class org.apache.spark.deploy.worker.Worker 127.0.0.1:7070
and submitting the following Spark job...
spark-submit --class Main --master spark://127.0.0.1:7070 --deploy-mode client /path/to/app.jar
the application is successfully executed but the executor is for some reason forcefully killed:
19/05/10 09:28:31 INFO Worker: Asked to kill executor app-20190510092810-0000/0
19/05/10 09:28:31 INFO ExecutorRunner: Runner thread for executor app-20190510092810-0000/0 interrupted
19/05/10 09:28:31 INFO ExecutorRunner: Killing process!
19/05/10 09:28:31 INFO Worker: Executor app-20190510092810-0000/0 finished with state KILLED exitStatus 1
Is this normal behavior? If not, how can I prevent this from happening?
I am using Spark 2.4.0.

Related

WARN - Running Spark Locally with Docker - Initial job has not accepted any resources

I launched Spark master and worker in my laptop using Docker bridge network spark
docker network create spark
I put the following command
docker run -ti -p 8080:8080 -p 7077:7077 -p 4040:4040 -e SPARK_NO_DAEMONIZE=true --network=spark --name spark-master apache/spark:v3.3.0 /opt/spark/sbin/start-master.sh
docker run -ti -p 8080:8080 -p 7077:7077 -p 4040:4040 -e SPARK_NO_DAEMONIZE=true --network=spark --name spark-master apache/spark:v3.3.0 /opt/spark/sbin/start-worker.sh spark://<master>:7077
Once they both start, I try and launch the following application code from my IDE (written in Kotlin, but doesn't matter if it's also in Java)
var sparkSession = SparkSession.builder()
.appName("mapreduce")
.master("spark://localhost:7077")
.config("spark.dynamicAllocation.enabled", "false")
.orCreate
var dataset: Dataset<String> = sparkSession.createDataset(listOf("Banana", "Car", "Glass", "Banana", "Computer", "Car"),
Encoders.STRING())
dataset = dataset.map(MapFunction{c: String->"word: "+c} , Encoders.STRING())
dataset.show()
The code works if master is local. I opened localhost:8080 and localhost:8081 and I can see the job getting registered. So, why am I getting a warning message as follows
22/09/05 01:17:38 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20220905001717-0003/1 is now EXITED (Command exited with code 1)
22/09/05 01:17:38 INFO StandaloneSchedulerBackend: Executor app-20220905001717-0003/1 removed: Command exited with code 1
22/09/05 01:17:38 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
22/09/05 01:17:38 INFO BlockManagerMaster: Removal of executor 1 requested
22/09/05 01:17:38 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20220905001717-0003/2 on worker-20220905000436-172.18.0.3-8082 (172.18.0.3:8082) with 8 core(s)
22/09/05 01:17:38 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 1
22/09/05 01:17:38 INFO StandaloneSchedulerBackend: Granted executor ID app-20220905001717-0003/2 on hostPort 172.18.0.3:8082 with 8 core(s), 1024.0 MiB RAM
22/09/05 01:17:38 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20220905001717-0003/2 is now RUNNING
22/09/05 01:17:40 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

All executors finish with state KILLED and exitStatus 1

I am trying to setup a local Spark cluster. I am using Spark 2.4.4 on Windows 10 machine.
To start the master and one worker I do
spark-class org.apache.spark.deploy.master.Master
spark-class org.apache.spark.deploy.worker.Worker 172.17.1.230:7077
After submitting an application to the cluster, it finishes successfully but in the Spark web admin UI it says that the application is KILLED. It's also what I get from worker logs. I have tried running my own examples and examples included in the Spark installation. They all get killed with exitStatus 1.
To start spark JavaSparkPi example from spark installation folder
Spark> spark-submit --master spark://172.17.1.230:7077 --class org.apache.spark.examples.JavaSparkPi .\examples\jars\spark-examples_2.11-2.4.4.jar
Part of the log after finishing calculation outputs
20/01/19 18:55:11 INFO DAGScheduler: Job 0 finished: reduce at JavaSparkPi.java:54, took 4.183853 s
Pi is roughly 3.13814
20/01/19 18:55:11 INFO SparkUI: Stopped Spark web UI at http://Nikola-PC:4040
20/01/19 18:55:11 INFO StandaloneSchedulerBackend: Shutting down all executors
20/01/19 18:55:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
20/01/19 18:55:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/01/19 18:55:11 WARN TransportChannelHandler: Exception in connection from /172.17.1.230:58560
java.io.IOException: An existing connection was forcibly closed by the remote host
stderr log of the completed application outputs this at the end
20/01/19 18:55:11 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 910 bytes result sent to driver
20/01/19 18:55:11 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 910 bytes result sent to driver
20/01/19 18:55:11 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
The worker log outputs
20/01/19 18:55:06 INFO ExecutorRunner: Launch command: "C:\Program Files\Java\jdk1.8.0_231\bin\java" "-cp" "C:\Users\nikol\Spark\bin\..\conf\;C:\Users\nikol\Spark\jars\*" "-Xmx1024M" "-Dspark.driver.port=58484" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#Nikola-PC:58484" "--executor-id" "0" "--hostname" "172.17.1.230" "--cores" "12" "--app-id" "app-20200119185506-0001" "--worker-url" "spark://Worker#172.17.1.230:58069"
20/01/19 18:55:11 INFO Worker: Asked to kill executor app-20200119185506-0001/0
20/01/19 18:55:11 INFO ExecutorRunner: Runner thread for executor app-20200119185506-0001/0 interrupted
20/01/19 18:55:11 INFO ExecutorRunner: Killing process!
20/01/19 18:55:11 INFO Worker: Executor app-20200119185506-0001/0 finished with state KILLED exitStatus 1
I have tried with Spark 2.4.4 for Hadoop 2.6 and 2.7. The problem remains in both the cases.
This problem is the same as this one.

Spark in Yarn Cluster Mode - Yarn client reports FAILED even when job completes successfully

I am experimenting with running Spark in yarn cluster mode (v2.3.0). We have traditionally been running in yarn client mode, but some jobs are submitted from .NET web services, so we have to keep a host process running in the background when using client mode (HostingEnvironment.QueueBackgroundWorkTime...). We are hoping we can execute these jobs in a more "fire and forget" style.
Our jobs continue to run successfully, but we see a curious entry in the logs where the yarn client that submits the job to the application manager is always reporting failure:
18/11/29 16:54:35 INFO yarn.Client: Application report for application_1539978346138_110818 (state: RUNNING)
18/11/29 16:54:36 INFO yarn.Client: Application report for application_1539978346138_110818 (state: RUNNING)
18/11/29 16:54:37 INFO yarn.Client: Application report for application_1539978346138_110818 (state: FINISHED)
18/11/29 16:54:37 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: <ip address>
ApplicationMaster RPC port: 0
queue: root.default
start time: 1543510402372
final status: FAILED
tracking URL: http://server.host.com:8088/proxy/application_1539978346138_110818/
user: p800s1
Exception in thread "main" org.apache.spark.SparkException: Application application_1539978346138_110818 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1153)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1568)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/11/29 16:54:37 INFO util.ShutdownHookManager: Shutdown hook called
We always create a SparkSession and always return sys.exit(0) (although that appears to be ignored by the Spark framework regardless of how we submit a job). We also have our own internal error logging that routes to Kafka/ElasticSearch. No errors are reported during the job run.
Here's an example of the submit command: spark2-submit --keytab /etc/keytabs/p800s1.ktf --principal p800s1#OURDOMAIN.COM --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 4g --class com.path.to.MainClass /path/to/UberJar.jar arg1 arg2
This seems to be harmless noise, but I don't like noise that I don't understand. Has anyone experienced something similar?

Spark streaming job fails after getting stopped by Driver

I have a spark streaming job which reads in data from Kafka and does some operations on it. I am running the job over a yarn cluster, Spark 1.4.1, which has two nodes with 16 GB RAM each and 16 cores each.
I have these conf passed to the spark-submit job :
--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3
The job returns this error and finishes after running for a short while :
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)
.....
ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver
Updated :
These logs were found too :
INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....
INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.
....
INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
INFO yarn.ExecutorRunnable: Starting Executor Container.....
INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...
INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)
INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
What might be the reasons for this? Appreciate some help.
Thanks
can you please show your scala/java code that is reading from kafka? I suspect you probably not creating your SparkConf correctly.
Try something like
SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");
also try running application in yarn-client mode and share the output.
I got the same issue. and I have found 1 solution to fix the issue by removing sparkContext.stop() at the end of main function, leave the stop action for GC.
Spark team has resolved the issue in Spark core, however, the fix has just been master branch so far. We need to wait until the fix has been updated into the new release.
https://issues.apache.org/jira/browse/SPARK-12009

Spark driver program launching in `cluster` mode failed in a weird way

I'm new to Spark. Now I encountered a problem: when I launch a program in a standalone spark cluster while command line:
./spark-submit --class scratch.Pi --deploy-mode cluster --executor-memory 5g --name pi --driver-memory 5g --driver-java-options "-XX:MaxPermSize=1024m" --master spark://bx-42-68:7077 hdfs://bx-42-68:9000/jars/pi.jar
It will throws following error:
15/01/28 19:48:51 INFO Slf4jLogger: Slf4jLogger started
15/01/28 19:48:51 INFO Utils: Successfully started service 'driverClient' on port 59290.
Sending launch command to spark://bx-42-68:7077
Driver successfully submitted as driver-20150128194852-0003
... waiting before polling master for driver state
... polling master for driver state
State of driver-20150128194852-0003 is FAILED
Master of cluster outputs following log:
15/01/28 19:48:52 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
15/01/28 19:48:52 INFO Master: Launching driver driver-20150128194852-0003 on worker worker-20150126133948-bx-42-151-26286
15/01/28 19:48:55 INFO Master: Removing driver: driver-20150128194852-0003
15/01/28 19:48:57 INFO Master: akka.tcp://driverClient#bx-42-68:59290 got disassociated, removing it.
15/01/28 19:48:57 INFO Master: akka.tcp://driverClient#bx-42-68:59290 got disassociated, removing it.
15/01/28 19:48:57 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://driverClient#bx-42-68:59290] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/01/28 19:48:57 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.16.42.68%3A48091-16#-1393479428] was not delivered. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
And the corresponding worker for launching driver program outputs:
15/01/28 19:48:52 INFO Worker: Asked to launch driver driver-20150128194852-0003
15/01/28 19:48:52 INFO DriverRunner: Copying user jar hdfs://bx-42-68:9000/jars/pi.jar to /data11/spark-1.2.0-bin-hadoop2.4/work/driver-20150128194852-0003/pi.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/01/28 19:48:55 INFO DriverRunner: Launch Command: "/opt/apps/jdk-1.7.0_60/bin/java" "-cp" "/data11/spark-1.2.0-bin-hadoop2.4/work/driver-20150128194852-0003/pi.jar:::/data11/spark-1.2.0-bin-hadoop2.4/sbin/../conf:/data11/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/data11/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data11/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data11/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-XX:MaxPermSize=128m" "-Dspark.executor.memory=5g" "-Dspark.akka.askTimeout=10" "-Dspark.rdd.compress=true" "-Dspark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" "-Dspark.serializer=org.apache.spark.serializer.KryoSerializer" "-Dspark.app.name=YANL" "-Dspark.driver.extraJavaOptions=-XX:MaxPermSize=1024m" "-Dspark.jars=hdfs://bx-42-68:9000/jars/pi.jar" "-Dspark.master=spark://bx-42-68:7077" "-Dspark.storage.memoryFraction=0.6" "-Dakka.loglevel=WARNING" "-XX:MaxPermSize=1024m" "-Xms5120M" "-Xmx5120M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkWorker#bx-42-151:26286/user/Worker" "scratch.Pi"
15/01/28 19:48:55 WARN Worker: Driver driver-20150128194852-0003 exited with failure
My spark-env.sh is:
export SCALA_HOME=/opt/apps/scala-2.11.5
export JAVA_HOME=/opt/apps/jdk-1.7.0_60
export SPARK_HOME=/data11/spark-1.2.0-bin-hadoop2.4
export PATH=$JAVA_HOME/bin:$PATH
export SPARK_MASTER_IP=`hostname -f`
export SPARK_LOCAL_IP=`hostname -f`
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=10.16.42.68:2181,10.16.42.134:2181,10.16.42.151:2181,10.16.42.150:2181,10.16.42.125:2181 -Dspark.deploy.zookeeper.dir=/spark"
SPARK_WORKER_MEMORY=43g
SPARK_WORKER_CORES=22
And my spark-defaults.conf is:
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
spark.executor.memory 20g
spark.rdd.compress true
spark.storage.memoryFraction 0.6
spark.serializer org.apache.spark.serializer.KryoSerializer
However, when I launch the program with client mode with following command, it works fine.
./spark-submit --class scratch.Pi --deploy-mode client --executor-memory 5g --name pi --driver-memory 5g --driver-java-options "-XX:MaxPermSize=1024m" --master spark://bx-42-68:7077 /data11/pi.jar
The reason why it works in "client" mode and not in "cluster" mode is because there is no support for "cluster" mode in a standalone cluster.(mentioned in the spark documentation).
Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors.
Note that cluster mode is currently not supported for standalone
clusters, Mesos clusters, or python applications.
If you look at "Submitting Applications" section in spark documentation, it is clearly mentioned that the support for cluster mode is not available in standalone clusters.
Reference link : http://spark.apache.org/docs/1.2.0/submitting-applications.html
Go to above link and have a look at "Launching Applications with spark-submit" section.
Think it will help. Thanks.

Resources