Executor app finished with state KILLED exitStatus 143 - apache-spark

​Hello, I am using Hive 2.3.0 with spark version 2.0.2, When i am trying to run Hive commands on Spark from hive console,
The Job is getting stuck and i have to manually kill it.
Following was the error message in Spark worker Log. Could you please advise if i am doing something wrong.
INFO worker.Worker: Executor app-20171114093447-0000/0 finished with state KILLED exitStatus 143

I was able to fix it by configuring correct EventLog directory in Spark.

Related

Application failed 2 times due to AM Container, exited with exitcode -104

I am running a Spark application with two input files and a jar file which is taken up from Amazon S3 bucket. I am creating a cluster using AWS CLI with instance type as m5.12xlarge and instance-count as 11 and spark properties as:
--deploy-mode cluster
--num-executors 10
--executor-cores 45
--executor-memory 155g
My spark job was running for some time and then it failed and restarted automatically and it ran again for some time and then it showed this diagnostics (pulled from the logs)
diagnostics: Application application_1557259242251_0001 failed 2 times due to AM Container for appattempt_1557259242251_0001_000002 exited with exitCode: -104
Failing this attempt.Diagnostics: Container [pid=11779,containerID=container_1557259242251_0001_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.5 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1557259242251_0001_02_000001 :
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Exception in thread "main" org.apache.spark.SparkException: Application application_1557259242251_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/05/07 20:03:35 INFO ShutdownHookManager: Shutdown hook called
19/05/07 20:03:35 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-3deea823-45e5-4a11-a5ff-833b01e6ae79
19/05/07 20:03:35 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-d6c3f8b2-34c6-422b-b946-ad03b1ee77d6
Command exiting with ret '1'
I am not able to figure out what is the problem?
I have tried change the instance type or lowering the executor memory and executor-cores but still the same problem keep on occuring.
Sometimes the same configuration settings terminates the cluster successfully and results are generated but many time these error are generated.
Can someone please help?
If you are providing more than 1 input file to the spark job. Make a jar and then execute it.
Step 1: How to make a zip file
zip abc.zip file1.py file2.py
Step 2: Execute job with a zip file
spark2-submit --master yarn --deploy-mode cluster --py-files /home/abc.zip /home/main_program_file.py

spark error: in state DEFINE instead of RUNNING

i'm using spark-shell to run spark hbase script.
When i run this command :
val job = Job.getInstance(conf)
I got this error
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
The error is due to running in spark-shell. Please use spark-submit, this should solve your problem.

Why does Spark exit with exitCode: 16?

I am using Spark 2.0.0 with Hadoop 2.7 and use the yarn-cluster mode. Every time, I get the following error:
17/01/04 11:18:04 INFO spark.SparkContext: Successfully stopped SparkContext
17/01/04 11:18:04 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/01/04 11:18:04 INFO util.ShutdownHookManager: Shutdown hook called
17/01/04 11:18:04 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-hduser/nm-local-dir/usercache/harry/appcache/application_1475261544699_0833/spark-42e40ac3-279f-4c3f-ab27-9999d20069b8
17/01/04 11:18:04 INFO spark.SparkContext: SparkContext already stopped.
However, I do get the correct printed output.
The same code works fine in Spark 1.4.0-Hadoop 2.4.0 where I do not see any exit codes.
This issue .sparkStaging not cleaned if application exited incorrectly https://issues.apache.org/jira/browse/SPARK-17340 started after Spark 1.4 (Affects Version/s: 1.5.2, 1.6.1, 2.0.0)
The issue is: When running Spark (yarn,cluster mode) and killing application
.sparkStaging is not cleaned.
When this issue happened exitCode 16 raised in Spark 2.0.X
ERROR ApplicationMaster: RECEIVED SIGNAL TERM
INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
Is it possible that in your code, something is killing the application?
If so - it shouldn't be seen in Spark 1.4, but should be seen in Spark 2.0.0
Please search your code for "exit" (as if you have such in your code, the error won't be shown in Spark 1.4, but will in Spark 2.0.0)
Seems like excess memory is used by JVM, try adding the property
yarn.nodemanager.vmem-check-enabled to false in your yarn-site.xml
Reference

Spark 1.5.1 standalone cluster - wrong Akka remoting config?

Doing my firsts steps with Spark, I'm facing problems submitting jobs to cluster from the application code. Digging the logs, I noticed some periodic WARN messages on master log:
15/10/08 13:00:00 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver#192.168.254.167:64014] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
The problem is that ip address not exist on our network, and wasn't configured anywhere. The same wrong ip is shown on the worker log when it tries execute the task (wrong ip passed to --driver-url):
15/10/08 12:58:21 INFO worker.ExecutorRunner: Launch command: "/usr/java/latest//bin/java" "-cp" "/path/spark/spark-1.5.1-bin-ha
doop2.6/sbin/../conf/:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/path/spark/
spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.ja
r:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/hadoop/2.6.0//etc/hadoop/" "-Xms102
4M" "-Xmx1024M" "-Dspark.driver.port=64014" "-Dspark.driver.port=53411" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
"akka.tcp://sparkDriver#192.168.254.167:64014/user/CoarseGrainedScheduler" "--executor-id" "39" "--hostname" "192.168.10.214" "--cores" "16" "--app-id" "app-20151008123702-0003" "--worker-url" "akka.tcp://sparkWorker#192.168.10.214:37625/user/Worker"
15/10/08 12:59:28 INFO worker.Worker: Executor app-20151008123702-0003/39 finished with state EXITED message Command exited with code 1 exitStatus 1
Any idea what I did wrong and how can this be fixed?
The java version is 1.8.0_20, and I'm using pre-built Spark binaries.
Thanks!
Maybe it will give you some clues here at my answer to a similar question which is similar question to yours with "Association with remote system has failed"

Spark cluster set up error

With some research over the internet, I can use
sbin/start-master.sh
to start the spark master server spark service over my Ubuntu Linux computers
and use
bin/spark-class org.apache.spark.deploy.worker.Worker spark://...
for the slave nodes service up and running.
The good news was I can see the local web page with works found alive.
However, after such, I tried to launch the shell to work ...
MASTER=spark://localhost:7077 bin/spark-shell
but it returned:
sparkMaster#localhost:7077 ...
And therefore I modified the code to
MASTER=spark://sparkuser#localhost:7077 bin/spark-shell
where the sparkuser is the one connected to the two nodes
However, with such modification, I got:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
and when I tried
MASTER=local-cluster[3,2,1024] bin/spark-shell
It did come out with the spark logo in the shell but I was afraid the slave nodes were not binding in.
Did I miss anything for the Spark cluster setting?
Just launch spark-shell on cluster with --master flag as follows
./spark-shell --master spark://localhost:7077 bin/spark-shell

Resources