Spark Streaming failed executor tasks - apache-spark

When i look at Jobs tab on spark UI, i can see Tasks status like 20/20/ (4 failed).
Does it mean there is data loss on failed tasks? Aren't those failed tasks moved to a diff executor?

While you should be wary of failing tasks (they are frequently an indicator of an underlying memory issue), you need not worry about data loss. The stages have been marked as successfully completed, so the tasks that failed were in fact (eventually) successfully processed.

Related

Spark is dropping all executors at the beginning of a job

I'm trying to configure a spark job to run with fixed resources on a Dataproc cluster, however after the job was running for 6 minutes I noticed that all but 7 executors had been dropped. 45 minutes later the job has not progressed at all, and I cannot find any errors or logs to explain.
When I check the timeline in the job details it shows all but 7 executors being removed at the 6 minute mark, with the message Container [really long number] exited from explicit termination request..
The command I am running is:
gcloud dataproc jobs submit spark --region us-central1 --cluster [mycluster] \
--class=path.to.class.app --jars="gs://path-to-jar-file" --project=my-project \
--properties=spark.executor.instances=72,spark.driver.memory=28g,spark.executor.memory=28g
My cluster is 1 + 24 n2-highmem16 instances if that helps.
EDIT: I terminated the job, reset, and tried again. The exact same thing happened at the same point in the job (Job 9 Stage 9/12)
Typically that message is expected to be associated with Spark Dynamic Allocation; if you want to always have a fixed number of executors, you can try to add the property:
...
--properties=spark.dynamicAllocation.enabled=false,spark.executor.instances=72...
However, that probably won't address the root problem in your case aside from seeing idle executors continue to stick around; if the dynamic allocation was relinquishing those executors, that would be due to those tasks having completed already but where your remaining executors for whatever reason are not yet done for a long time. This often indicates some kind of data skew where the remaining executors have a lot more work to do than the ones that already completed for whatever reason, unless the remaining executors were simply all equally loaded as part of a smaller phase of the pipeline, maybe in a "reduce" phase.
If you're seeing lagging tasks out of a large number of equivalent tasks, you might consider adding a repartition() step to your job to chop it up more fine-grained in the hopes of spreading out those skewed partitions, or otherwise changing the way your group or partition your data through other means.
Fixed. The job was running out of resources. Allocated some more executors to the job and it completed.

What does "failed" mean in a completed Spark job?

I have jobs that repartition the huge datasets in parquet format and the file system used is s3a (S3).
Browsing through the Spark UI, I stumbled upon a job which has uncompleted tasks but the job marked is successful.
The different categories of jobs: i) Active, ii) Completed, iii) Failed.
I am unable to deduce the reason for this failed job, nor I am able to assert whether this was actually a failed one, given that there is another category for failed jobs.
How do I resolve this ambiguity?

Distribution of spark code into jobs, stages and tasks [duplicate]

This question already has answers here:
What is the concept of application, job, stage and task in spark?
(5 answers)
Closed 5 years ago.
As per my understanding each action in whole job is translated to job, whil each shuffling stage within a job is traslated into stage and each partition for each stages input is translated into task.
Please corrrect me if I am wrong, I am unable to get any actual definition.
Invoking an action inside a Spark application triggers the launch of a Spark job to fulfill it.Spark examines the DAG and formulates an execution plan.The execution plan consists of assembling the job’s transformations into stages.
When Spark optimises code internally, it splits it into stages, where
each stage consists of many little tasks.Each stage contains a sequence of transformations that can be completed without shuffling the full data.
Every task for a given stage is a single-threaded atom of computation consisting of exactly the same
code, just applied to a different set of data.The number of tasks is determined by the number of partitions.
To manage the job flow and schedule tasks Spark relies on an active driver process.
The executor processes are responsible for executing this work, in the form of tasks, as well as for storing any data that the user chooses to cache
A single executor has a number of slots for running tasks and will run many concurrently throughout its lifetime.

spark - contiue job processing after tasks failure

Is there a way to tell spark to continue a job after a single task failed?
Or even better:
Can we configure a job to fail only if a certain percent of the tasks fails?
My scenario is like this:
I'm using pyspark to do some parallel computations.
I have a job that is composed from thousands of tasks (which are more or less independent from each other - i can allow some to fail).
1 task fails (throws exception), and after few retries for this task the entire job is aborted.
Is there a way to change this (weird) behavior?
No, there is no such feature in spark.
There is an open jira ticket(SPARK-10781) for it but I don't see any action there.
You can do it in mapreduce using config mapreduce.map.failures.maxpercent and mapreduce.max.reduce.failures.percent

What is scheduler delay in spark UI's event timeline

I am using YARN environment to run spark programs,
with option --master yarn-cluster.
When I open a spark application's application master, I saw a lot of Scheduler Delay in a stage. Some of them are even more than 10 minutes. I wonder what are they and why it takes so long?
Update:
Usually operations like aggregateByKey take much more time (i.e. scheduler delay) before executors really start doing tasks. Why is it?
Open the "Show Additional Metrics" (click the right-pointing triangle so it points down) and mouse over the check box for "Scheduler Delay". It shows this tooltip:
Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to
the scheduler. If scheduler delay is large, consider decreasing the
size of tasks or decreasing the size of task results.
The scheduler is part of the master that divides the job into stages of tasks and works with the underlying cluster infrastructure to distribute them around the cluster.
Have a look at TaskSetManager's class comment:
..Schedules the tasks within a single TaskSet in the TaskSchedulerImpl. This class keeps track of each task, retries tasks if they fail (up to a limited number of times), and handles locality-aware scheduling for this TaskSet via delay scheduling...
I assume it is the result of the following paper, on which Matei Zaharia was working (co-founder and Chief Technologist of Databricks which develop Spark) ,too: https://cs.stanford.edu/~matei/
Thus, Spark is checking the partition's locality of a pending task. If the locality-level is low (e.g. not on local jvm) the task gets not directly killed or ignored, Instead it gets a launch delay, which is fair.

Resources