SLURM add nodes to suspended job - slurm

Is there a possibility to add nodes (cores) to the suspended jobs ?
As an example:
scontrol update jobid= then something to accomplish the task
Thank you in advance.
Regards,
Wahi

Changing the size of a suspended job is not possible, regarding the Slurm documentation. Only pending and running jobs.
Job(s) changing size must not be in a suspended state...
Hope it helps!

Related

Change CPU count for RUNNING Slurm Jobs

I have a SLURM cluster and a RUNNING job where I have requested 60 threads by
#SBATCH --cpus-per-task=60
(I am sharing threads on a node using cgroups)
I now want to reduce the amount of threads to 30.
$ scontrol update jobid=274332 NumCPUs=30
Job is no longer pending execution for job 274332
The job has still 60 threads allocated.
$ scontrol show job 274332
JobState=RUNNING Reason=None Dependency=(null)
NumNodes=1 NumCPUs=60 NumTasks=1 CPUs/Task=60 ReqB:S:C:T=0:0:*:*
How would be the correct way to accomplish this?
Thanks!
In the current version of Slurm, scontrol only allows to reduce the number of nodes allocated to a running job, but not the number of CPUs (or the memory).
The FAQ says:
Use the scontrol command to change a job's size either by specifying a new node count (NumNodes=) for the job or identify the specific nodes (NodeList=) that you want the job to retain.
(Emphasis mine)

Apache Spark Executors Dead - is this the expected behaviour?

I am running a pipeline to process my data on Spark. It seems like my Executors die every now and then when they reach near the Storage Memory limit. The job continues and eventually finishes but is this the normal behaviour? Is there something I should be doing to prevent this from happening? Every time this happens the job hangs for some time until (and I am guessing here) YARN provides some new executors for the job to continue.
I think this turned out to be related with a Yarn bug. It doesn't happen anymore after I set the following YARN options like suggested in section 4. of this blog post
Best practice 5: Always set the virtual and physical memory check flag
to false.
"yarn.nodemanager.vmem-check-enabled":"false",
"yarn.nodemanager.pmem-check-enabled":"false"

Is it possible to make Spark run whole Taskset on a single executor?

I run a single spark job on a local cluster(1 master-2workers/executors).
From what i have understood until now, all stages of a job are splited into tasks. Each stage has its own task set. Each task of this TaskSet will be scheduled on an executor of the local cluster.
I want to make TaskSetManager of Spark to schedule all tasks of a TaskSet(of a single Stage) on the same (local) executor, but i have not figured up how to do so.
Thanks,
Jim
while submitting the job , provide the number of executors as one

Why are the durations of tasks belong to the same job are quite different in spark streaming?

Look at the picture below, these 24 tasks belong to a same job and
the amount of data to be processed for each task is basically the same and time used to gc is very short, my question is why are the durations of tasks belong to the same job are so different?
May be you can try and check Event Timeline for tasks in your spark UI. Check why slow task are running slow.
Are they taking more time in serialization/deserialization?
Is it because of scheduler delay?
or the executor computation time?

What is scheduler delay in spark UI's event timeline

I am using YARN environment to run spark programs,
with option --master yarn-cluster.
When I open a spark application's application master, I saw a lot of Scheduler Delay in a stage. Some of them are even more than 10 minutes. I wonder what are they and why it takes so long?
Update:
Usually operations like aggregateByKey take much more time (i.e. scheduler delay) before executors really start doing tasks. Why is it?
Open the "Show Additional Metrics" (click the right-pointing triangle so it points down) and mouse over the check box for "Scheduler Delay". It shows this tooltip:
Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to
the scheduler. If scheduler delay is large, consider decreasing the
size of tasks or decreasing the size of task results.
The scheduler is part of the master that divides the job into stages of tasks and works with the underlying cluster infrastructure to distribute them around the cluster.
Have a look at TaskSetManager's class comment:
..Schedules the tasks within a single TaskSet in the TaskSchedulerImpl. This class keeps track of each task, retries tasks if they fail (up to a limited number of times), and handles locality-aware scheduling for this TaskSet via delay scheduling...
I assume it is the result of the following paper, on which Matei Zaharia was working (co-founder and Chief Technologist of Databricks which develop Spark) ,too: https://cs.stanford.edu/~matei/
Thus, Spark is checking the partition's locality of a pending task. If the locality-level is low (e.g. not on local jvm) the task gets not directly killed or ignored, Instead it gets a launch delay, which is fair.

Resources