How to make slurm make a scheduling decision when jobs are submitted? - slurm

I'm using back-fill scheduler with Slurm to manage a small GPU cluster. The backfill scheduler makes a scheduling decision every bf_interval seconds (default value is 30 seconds). This means even when GPU resources are available sometimes I have to wait for a while until the they are allocated. I can obviously reduce bf_interval but given that we don't have a lot of job submissions it'd be good if I could force slurm to run the scheduling routine the moment a job is queued. Is this possible?

By default Slurm does it. From the documentation:
Slurm is designed to perform a quick and simple scheduling attempt at events such as job submission or completion and configuration changes.
Have you change the default configuration for this? And, are you sure that not scheduling on submission is your problem?

Related

Does memory configuration really matter with fair scheduler?

We have a hadoop cluster with fair scheduler configured. We used to see the scenario whan there were not many jobs in the cluster to run, the running job was trying to take as much as memory and cores available.
With the Fair scheduler does executor memory and cores are really matter for the spark Jobs? Or does it depend upon the fair scheduler to decide how much to give?
It's the policy of Fair Scheduler that the first job assigned to it will have all the resources provided.
When we run the second job, all the resources will be divided in to (available resources)/(no. of jobs)
Now the main thing to focus is, how much maximum number of container memory you have given to run the job. If it is equal to the total number of resources available then it's genuine for your job to use all the resources.

How does spark.dynamicAllocation.enabled influence the order of jobs?

Need an understanding on when to use spark.dynamicAllocation.enabled - What are advantages and disadvantages of using it? I have queue where jobs get submitted.
9:30 AM --> Job A gets submitted with dynamicAllocation enabled.
10:30 AM --> Job B gets submitted with dynamicAllocation enabled.
Note: My Data is huge (processing will be done on 10GB data with transformations).
Which Job gets the preference on allocation of executors to Job A or Job B and how does the spark co-ordinates b/w 2 applications?
Dynamic Allocation of Executors is about resizing your pool of executors.
Quoting Dynamic Allocation:
spark.dynamicAllocation.enabled Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload.
And later on in Dynamic Resource Allocation:
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.
In other words, job A will usually finish before job B will be executed. Spark jobs are usually executed sequentially, i.e. a job has to finish before another can start.
Usually...
SparkContext is thread-safe and can handle jobs from a Spark application. That means that you may submit jobs at the same time or one after another and in some configuration expect that these two jobs will run in parallel.
Quoting Scheduling Within an Application:
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).
By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc.
it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
Wrapping up...
Which Job gets the preference on allocation of executors to Job A or Job B and how does the spark co-ordinates b/w 2 applications?
Job A.
Unless you have enabled Fair Scheduler Pools:
The fair scheduler also supports grouping jobs into pools, and setting different scheduling options (e.g. weight) for each pool. This can be useful to create a “high-priority” pool for more important jobs, for example, or to group the jobs of each user together and give users equal shares regardless of how many concurrent jobs they have instead of giving jobs equal shares.

Spark Schedule: FIFO or FAIR?

How to choose Spark Scheduler: FIFO or FAIR?
What is the different between Spark Scheduler and YARN Scheduler?
When you submit your jobs in the cluster either with spark-submit or any other mean, it will be given to Spark schedulers which is responsible to materialize logical plan of your jobs. In spark, we have two modes.
1. FIFO
By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into stages (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.
2. FAIR
The fair scheduler also supports grouping jobs into pools and setting different scheduling options (e.g. weight) for each pool. This can be useful to create a high-priority pool for more important jobs, for example, or to group the jobs of each user together and give users equal shares regardless of how many concurrent jobs they have instead of giving jobs equal shares. This approach is modeled after the Hadoop Fair Scheduler.
Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool "local property" to the SparkContext in the thread that’s submitting them.
For more info

How do I limit the number of spark applications in state=RUNNING to 1 for a single queue in YARN?

I have multiple spark jobs. Normally I submit my spark jobs to yarn and I have an option that is --yarn_queue which tells it which yarn queue to enter.
But, the jobs seem to run in parallel in the same queue. Sometimes, the results of one spark job, are the inputs for the next spark job. How do I run my spark jobs sequentially rather than in parallel in the same queue?
I have looked at this page for a capacity scheduler. But the closest thing I can see is the property yarn.scheduler.capacity.<queue>.maximum-applications. But this only sets the number of applications that can be in both PENDING and RUNNING. I'm interested in setting the number of applications that can be in the RUNNING state, but I don't care the total number of applications in PENDING (or ACCEPTED which is the same thing).
How do I limit the number of applications in state=RUNNING to 1 for a single queue?
You can manage appropriate queue run one task a time in capacity scheduler configuration. My suggestion to use ambari for that purpose. If you haven't such opportunity apply instruction from guide
From https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.
Specifically, you need to configure:
maxRunningApps: limit the number of apps from the queue to run at once
E.g.
<?xml version="1.0"?>
<allocations>
<queue name="sample_queue">
<maxRunningApps>1</maxRunningApps>
<other options>
</queue>
</allocations>

Run a multithreaded job in 1 slot?

What happens if I would try to run a multithreaded job in 1 SGE slot? Would it fail to start multiple threads? Or would it still start these multiple threads and potentially overload the SGE cluster node, because it is going to run more threads than there are slots?
I know I should use the -pe threaded nrThreads parameter. But I am running a program of which I am not sure how many threads it is using for every step.
It's been a while since I've used SGE, but at least back then, a job which launched more computational threads than allocated would not be prevented from launching those threads, usually then stealing CPU time from other jobs.
Perhaps current SGE versions are capable of using cpusets, which allow the administrator to limit the CPU's used by a job. At least the slurm scheduler can do this.

Resources