Does the successful tasks also gets reprocessed on an executor crash? - apache-spark

I am seeing about 3018 tasks failed for the job as about 4 executors died.
The Executors summary (as below in Spark UI) have a completely different statistics. Out of 3018, about 2994 properly completed. My question is,
Will they be re-tried again?
Is there a config to override/limit this?

After monitoring the job and manually validation the attempt counts event for successful tasks, realised
Will they be re-tried again?
- Yes, even the successful tasks are retried.
Is there a config to override/limit this?
- Did not find any config to override this behaviour.
If an executer (kubernetes pod) dies (like with an OOM or timeout), all the tasks, even if successfully completed are re-executed. One of the main reason is, the shuffle writes from the executers are lost with the executor itself!!!

Related

Spark keeps relaunching executors after yarn kills them

I was testing with spark yarn cluster mode.
The spark job runs in lower priority queue.
And its containers are preempted when a higher priority job comes.
However it relaunches the containers right after being killed.
And higher priority app kills them again.
So apps are stuck in this deadlock.
Infinite retry of executors is discussed here.
Found below trace in logs.
2019-05-20 03:40:07 [dispatcher-event-loop-0] INFO TaskSetManager :54 Task 95 failed because while it was being computed, its executor exited for a reason unrelated to the task. Not counting this failure towards the maximum number of failures for the task.
So it seems any retry count I set is not even considered.
Is there a flag to indicate that all failures in executor should be counted, and job should fail when maxFailures happen ?
spark version 2.11
Spark distinguishes between code throwing some exception and external issues, ie code failures and container failures.
But spark does not consider preemption as container failure.
See ApplicationMaster.scala, here spark decides to quit if container failure limit is hit.
It gets number of failed executors from YarnAllocator.
YarnAllocator updates its failed containers in some cases. But not for preemptions, see case ContainerExitStatus.PREEMPTED in same function.
We use spark 2.0.2, where code is slightly different but logic is same.
Fix seems to update failed containers collection for preemptions too.

How to handle executor failure in apache spark

i have run the job using spark-submit while that time we lost executor and the certain point we can recover or not if recover how we will recover and while how we have to get back that executor
You cannot handle executor failures programmatically in your application, if thats what you are asking.
You can configure spark configuration properties which guides the actual job execution including how YARN would schedule jobs and handle task and executor failures.
https://spark.apache.org/docs/latest/configuration.html#scheduling
Some important properties you may want to check out:
spark.task.maxFailures(default=4): Number of failures of any particular task
before giving up on the job. The total number of failures spread
across different tasks will not cause the job to fail; a particular
task has to fail this number of attempts. Should be greater than or
equal to 1. Number of allowed retries = this value - 1.
spark.blacklist.application.maxFailedExecutorsPerNode(default=2): (Experimental)
How many different executors must be blacklisted for the entire
application, before the node is blacklisted for the entire
application. Blacklisted nodes will be automatically added back to the
pool of available resources after the timeout specified by
spark.blacklist.timeout. Note that with dynamic allocation, though,
the executors on the node may get marked as idle and be reclaimed by
the cluster manager.
spark.blacklist.task.maxTaskAttemptsPerExecutor(default=1): (Experimental)
For a given task, how many times it can be retried on one executor
before the executor is blacklisted for that task.

Oozie: kill a job after a timeout

Sorry but can't find he configuration point a need. I schedule spark application, sometimes they may not succeed after 1 hour, in this case I want to automatically kill this task (because I am sure it will never succeed, and another scheduling may start).
I found a timeout configuration, but as I understand it, this is used to delay the start of a workflow.
So is there a kind of living' timeout ?
Oozie cannot kill a workflow that it triggered. However you can ensure that a single workflow is running at same time by setting Concurrency = 1 in the Coordinator.
Also you can have a second Oozie workflow monitoring the status of the Spark job.
Anyawy, you should investigate the root cause of Spark job not successful or being blocked.

Job spark blocked and runs indefinitely

We encounter a problem on a Spark job 1.6(on yarn) that never ends, whene several jobs launched simultaneously.
We found that by launching the job spark in yarn-client mode we do not have this problem, unlike launching it in yarn-cluster mode.
it could be a trail to find the cause.
we changed the code to add a sparkContext.stop ()
Indeed, the SparkContext was created (val sparkContext = createSparkContext) but not stopped. this solution has allowed us to decrease the number of jobs that remains blocked but nevertheless we still have some jobs blocked.
by analyzing the logs we have found this log that repeats without stopping:
17/09/29 11:04:37 DEBUG SparkEventPublisher: Enqueue SparkListenerExecutorMetricsUpdate(1,WrappedArray())
17/09/29 11:04:41 DEBUG ApplicationMaster: Sending progress
17/09/29 11:04:41 DEBUG ApplicationMaster: Number of pending allocations is 0. Sleeping for 5000.
it seems that the job block whene we call newAPIHadoopRDD to get data from Hbase. it may be the issue !!
Does someone have any idea about this issue ?
Thank you in advance

Scheduler delay time in spark and YARN

I'm doing some instrumentation in Spark and I've realised that some of my tasks take really long times to complete because the Scheduler Delay Time that can be extracted from the TaskMetrics.
I know there are some questions already about this topic like this What is scheduler delay in spark UI's event timeline but the answers have not been accepted and it says that a task waiting for an open slot is considered scheduler delay, which I think is not true (as far as I know if a task doesn't have a slot into an executor it doesn't start generating metrics).
I'm a bit confused with from where does this Delay really starts. I was wondering if this Delay time takes also into account the period between an app being accepted by the YARN client and submitting the first job of the app. Or in other words, between this moment where the app is accepted:
and this one where is running:
I checked directly by launching one app with few resources available in the cluster. It stayed in the queue until enough executors could be launched for the stage. Then the yarn.Client launched the stage in the cluster. The metrics in spark don't consider this time in the queue as any delay. Also it doesn't matter if you have more tasks than cores like the stack overflow answer I posted above. The tasks will be allocated in the executors as they become available.
In short, scheduler delay time only considers sending the task to the executor. If there is a delay in here, YARN is not the bottleneck but the load in the nodes involved ( normally the driver and the worker nodes with the executors for the app)

Resources