Why spark executor receives SIGTERM? - apache-spark

I am using Spark API (Spark core API, not Stream,SQL etc.)
I often see this kind of error in spark dumped log:
Spark environment: 1.3.1 yarn-client
ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
Who triggers SIGTERM. YARN,Spark or myself?
Will this signal terminate Spark Executor? If not, wow will it affect spark program.
I do press Ctrl+c, but that whould be SIGINT. If YARN kill executor, that would be SIGKILL.

You will likely find the reason in yarn logs. If you activated log aggregation, you can type
yarn logs -applicationId [app_id]
and lookup for exceptions.

Related

Job spark blocked and runs indefinitely

We encounter a problem on a Spark job 1.6(on yarn) that never ends, whene several jobs launched simultaneously.
We found that by launching the job spark in yarn-client mode we do not have this problem, unlike launching it in yarn-cluster mode.
it could be a trail to find the cause.
we changed the code to add a sparkContext.stop ()
Indeed, the SparkContext was created (val sparkContext = createSparkContext) but not stopped. this solution has allowed us to decrease the number of jobs that remains blocked but nevertheless we still have some jobs blocked.
by analyzing the logs we have found this log that repeats without stopping:
17/09/29 11:04:37 DEBUG SparkEventPublisher: Enqueue SparkListenerExecutorMetricsUpdate(1,WrappedArray())
17/09/29 11:04:41 DEBUG ApplicationMaster: Sending progress
17/09/29 11:04:41 DEBUG ApplicationMaster: Number of pending allocations is 0. Sleeping for 5000.
it seems that the job block whene we call newAPIHadoopRDD to get data from Hbase. it may be the issue !!
Does someone have any idea about this issue ?
Thank you in advance

Spark streaming job exited abruptly - RECEIVED SIGNAL TERM

The running spark streaming job, which is supposed to run continuously, exited abruptly with the following error (found in the executor logs):
2017-07-28 00:19:38,807 [SIGTERM handler] ERROR org.apache.spark.util.SignalUtils$$anonfun$registerLogger$1$$anonfun$apply$1 (SignalUtils.scala:43) - RECEIVED SIGNAL TERM
The spark streaming job ran for ~62 hours before receiving this signal.
I couldn't find any other ERROR/ WARN in the executor logs. Unfortunately I haven't set up the driver logs yet, so I am not able to check on this specific issue deeper.
I am using Spark cluster in Standalone mode.
Any reason why driver might send this signal? (after spark streaming ran well and good for more than 60 hours)

Spark streaming fails to launch worker on worker failure

I'm trying to setup a spark cluster and I've come across an annoying bug...
When I submit a spark application it runs fine on workers until I kill one (for example by using stop-slave.sh on the worker node).
When the worker is killed spark will then try to relaunch an executor on an available worker node but it fails everytime (I know because the webUI either displays FAILED or LAUNCHING for the executor, it never succeeds).
I can't seem to find any help, even on the documentation, so can someone assure me that spark can and will try to relaunch a worker on an available node if one is killed (on the same node where the worker previously ran or on another available node if the node where it previously rank is unreachable) ?
Here's the output from the worker node :
Spark worker error
Thank you for your help !

What is the difference between Spark executor states Exited vs Killed?

I`m working with spark 1.2.1.
When I run spark jobs sometimes I get executor state "Exited" and sometimes "Killed", in both scenarios the job is finished successfully and I invoking SparkContext.stop()...
I failed to understand the meaning of those states.
What is the difference between Spark executor states Exited vs Killed?
Exited - It means that the Executor finished the processing and it exists cleanly without any errors or exception.
Killed - It means that that executor is killed by an the Worker who stopped and asked to kill the executor. This situation can be because of many reasons like by some user driven action or may be your executor finished processing but due to some reason it does not exists but worker is exiting so it needs to Kill the executor.
Also as a good practice we should invoke SparkContext.stop() method at the end of Job. Though this would not ensure that you will always have "Exited" status but it will definitely ensure that cleanup is executed and resources are de-allocated.

Spark:executor.CoarseGrainedExecutorBackend: Driver Disassociated disassociated

I am learning how to use spark and I have a simple program.When I run the jar file it gives me the right result but I have some error in the stderr file.just like this:
15/05/18 18:19:52 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor#localhost:51976] -> [akka.tcp://sparkDriver#172.31.34.148:60060] disassociated! Shutting down.
15/05/18 18:19:52 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver#172.31.34.148:60060] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
You can get the whole stderr file in there:
http://172.31.34.148:8081/logPage/?appId=app-20150518181945-0026&executorId=0&logType=stderr
I searched this problem and find this:
Why spark application fail with "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?
And I turn up the spark.yarn.executor.memoryOverhead as it said but it doesn't work.
I just have one master node(8G memory) and in the spark's slaves file there is only one slave node--the master itself.I submit like this:
./bin/spark-submit --class .... --master spark://master:7077 --executor-memory 6G --total-executor-cores 8 /path/..jar hdfs://myfile
I don't know what is the executor and what is the driver...lol...
sorry about that..
anybody help me?
If Spark Driver fails, it gets disassociated (from YARN AM). Try the following to make it more fault-tolerant:
spark-submit with --supervise flag on Spark Standalone cluster
yarn-cluster mode on YARN
spark.yarn.driver.memoryOverhead parameter for increasing Driver's memory allocation on YARN
Note: Driver supervisation (spark.driver.supervise) is not supported on a YARN cluster (yet).
An overview of driver vs. executor (and others) can be found at http://spark.apache.org/docs/latest/cluster-overview.html or https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-architecture.html
They are java processes that could run in different or the same machine depending on your configuration. Driver contains the SparkContext, declares the RDD transformation (and if I'm not mistaken - think execution plan) then communicates that to the spark master which creates task definitions, asks the cluster manager (it's own,yarn, mesos) for resources (worker nodes) and those tasks in turn gets sent to executors (for execution).
Executors communicate back to master certain information and as far as I understand if the driver encounters a problem or crashes, the master will take note and will tell the executor (and it in turn logs) what you see "driver is disassociated". This could be because of a lot of things but the most common ones are because the java process (driver) runs out of memory (try increasing spark.driver.memory)
Some differences when running on Yarn vs Stand-alone vs Mesos but hope this helps. If driver is disassociated, the java process running (as the driver) likely encountered an error - the master logs might have something and not sure if there are driver specific logs. Hopefully someone more knowledgeable than me can provide more info.

Resources