how to limit the number of jobs running on the same node using SLURM? - slurm

I have a job array of 100 jobs. I want at most 2 jobs from the job array can be allocated to the same node. How can I do this using SLURM? Thanks!

Assuming that jobs can share nodes, and that nodes have homogeneous configuration, and that you are alone on the cluster,
use the sinfo -Nl command to find the number of CPUs per nodes
submit jobs that request half that number with either of #SBATCH --tasks-per-node=... or #SBATCH --cpus-per-task=... based on what your jobs do
If you are administrating a cluster that is shared among other people, you can define GRES of a dummy type, and assign two of them to each node in slurm.conf and then request one per job with --gres=dummy:1

Related

Can I submit a job that may be executed in different partitions

When using the parameter -p you can define the partition for your job.
In my case a job can run in different partitions, so I do want to restrict my job to only a given partition.
If my job can perfectly run in partitions "p1" and "p3", how can I configure the sbatch command to allow more than one partition?
The --partition option accepts a list of partition. So in your case you would write
#SBATCH --partition=p1,p3
The job will start in the partition that offers the resources the earliest.

SLURM error: sbatch: error: Batch job submission failed: Requested node configuration is not available

I'm trying to use a cluster to run an MPI code. the cluster hardware consist of 30 nodes, each with the following specs:
16 Cores at 2 Sockets (Intel Xeon e5-2650 v2) - (32 Cores with multithreading enabled)
64 GByte 1866 MT/s main memory
named: aria
the slurm config file is as following:
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --cpus-per-task=1 # Number of cores per MPI rank
#SBATCH --nodes=2 # Number of nodes
#SBATCH --ntasks-per-node=32 # How many tasks on each node
#SBATCH --ntasks-per-socket=16 # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb # Memory per core
when I submit the job, a return message comes out with the following content: sbatch: error: Batch job submission failed: Requested node configuration is not available
which is a little bit confusing. I'm submitting one task per a cpu and dividing the tasks equally between nodes and sockets, can anyone please advise on the problem with the aforementioned configs? and one more thing: what is the optimum configuration given the hardware specs?
Thanks in advance
Look exactly what nodes offer with the sinfo -Nl command.
If could be that:
hyper threading is not enabled (which is often the case on HPC clusters)
or one core is reserved for Slurm and the Operating System
or hyper threading is enabled but Slurm is configured to schedule physical cores
As for optimal job configuration, it depends how 'optimal' is defined. For optimal time to solution, often it is better to let Slurm decide how to organise the ranks on the nodes because it will then be able to start your job sooner.
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --mem-per-cpu=100mb # Memory per core
For optimal job performance (in case of benchmarks, or cost analysis, etc.) you will need to take switches into accounts as well. (although with 30 nodes you probably have only one switch)
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --exclusive
#SBATCH --switches=1
#SBATCH --mem-per-cpu=100mb # Memory per core
Using --exclusive will make sure your job will not be bothered by other jobs.

Change CPU count for RUNNING Slurm Jobs

I have a SLURM cluster and a RUNNING job where I have requested 60 threads by
#SBATCH --cpus-per-task=60
(I am sharing threads on a node using cgroups)
I now want to reduce the amount of threads to 30.
$ scontrol update jobid=274332 NumCPUs=30
Job is no longer pending execution for job 274332
The job has still 60 threads allocated.
$ scontrol show job 274332
JobState=RUNNING Reason=None Dependency=(null)
NumNodes=1 NumCPUs=60 NumTasks=1 CPUs/Task=60 ReqB:S:C:T=0:0:*:*
How would be the correct way to accomplish this?
Thanks!
In the current version of Slurm, scontrol only allows to reduce the number of nodes allocated to a running job, but not the number of CPUs (or the memory).
The FAQ says:
Use the scontrol command to change a job's size either by specifying a new node count (NumNodes=) for the job or identify the specific nodes (NodeList=) that you want the job to retain.
(Emphasis mine)

How to let slurm limit memory per node

Slurm manages a cluster with 8core/64GB ram and 16core/128GB ram nodes.
There is a low-priority "long" partition and a high-priority "short" partition.
Jobs running in the long partition can be suspended by jobs in the short partition, in which case pages from the suspended job get mostly pushed to swap. (Swap usage is intended for this purpose only, not for active jobs.)
How can I configure in slurm the total amount of RAM+swap available in each node for jobs?
There is the MaxMemPerNode parameter, but that is a partition property and thus cannot accommodate different values for different nodes in the partition.
There is the MaxMemPerCPU parameter, but that prevents low-memory jobs to share unused memory with big-memory jobs.
You need to specify the memory of each node using the RealMemory parameter in the node definition (see the slurm.conf manpage)

tasks distribution among nodes

how map tasks are distributed among nodes.
Can i set nodes to run in parallel.
for example if datanodes are 2, and map tasks are 4 which of them take the task? i see from the manager sometimes datanode takes the task and sometimes the other datanode works. The job is given to both nodes and when any of them is assigned a task it runs one at a time.
on what basis do the namenode assign tasks.
Is it done manually in the code or done auto.
If there is a link to a tutorial on how tasks run on nodes it will be great if you provided it.

Resources