I have a SLURM cluster and a RUNNING job where I have requested 60 threads by
#SBATCH --cpus-per-task=60
(I am sharing threads on a node using cgroups)
I now want to reduce the amount of threads to 30.
$ scontrol update jobid=274332 NumCPUs=30
Job is no longer pending execution for job 274332
The job has still 60 threads allocated.
$ scontrol show job 274332
JobState=RUNNING Reason=None Dependency=(null)
NumNodes=1 NumCPUs=60 NumTasks=1 CPUs/Task=60 ReqB:S:C:T=0:0:*:*
How would be the correct way to accomplish this?
Thanks!
In the current version of Slurm, scontrol only allows to reduce the number of nodes allocated to a running job, but not the number of CPUs (or the memory).
The FAQ says:
Use the scontrol command to change a job's size either by specifying a new node count (NumNodes=) for the job or identify the specific nodes (NodeList=) that you want the job to retain.
(Emphasis mine)
Related
How can I set the maximum number of CPUs each job can ask for in Slurm?
We're running a GPU cluster and want a sensible number of CPUs to be always available for GPU jobs. This is kind of fine as long as the job asks for GPUs because there's GPU <-> CPU mapping in the gres.conf. But this doesn't stop a job that doesn't ask for any GPUs not to acquire all CPUs in the system.
To set the maximum number of CPUs a single job can use, at the cluster level, you can run the following command:
sacctmgr modify cluster <cluster_name> set maxtresperjob=cpu=<nb of CPUs>
Note that you must have SelectType=select/cons_tres in your configuration file for this to work.
Alternatively the same restriction can be applied partition-wise, QOS-wise, account-wise, etc.
I'm trying to use a cluster to run an MPI code. the cluster hardware consist of 30 nodes, each with the following specs:
16 Cores at 2 Sockets (Intel Xeon e5-2650 v2) - (32 Cores with multithreading enabled)
64 GByte 1866 MT/s main memory
named: aria
the slurm config file is as following:
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --cpus-per-task=1 # Number of cores per MPI rank
#SBATCH --nodes=2 # Number of nodes
#SBATCH --ntasks-per-node=32 # How many tasks on each node
#SBATCH --ntasks-per-socket=16 # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb # Memory per core
when I submit the job, a return message comes out with the following content: sbatch: error: Batch job submission failed: Requested node configuration is not available
which is a little bit confusing. I'm submitting one task per a cpu and dividing the tasks equally between nodes and sockets, can anyone please advise on the problem with the aforementioned configs? and one more thing: what is the optimum configuration given the hardware specs?
Thanks in advance
Look exactly what nodes offer with the sinfo -Nl command.
If could be that:
hyper threading is not enabled (which is often the case on HPC clusters)
or one core is reserved for Slurm and the Operating System
or hyper threading is enabled but Slurm is configured to schedule physical cores
As for optimal job configuration, it depends how 'optimal' is defined. For optimal time to solution, often it is better to let Slurm decide how to organise the ranks on the nodes because it will then be able to start your job sooner.
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --mem-per-cpu=100mb # Memory per core
For optimal job performance (in case of benchmarks, or cost analysis, etc.) you will need to take switches into accounts as well. (although with 30 nodes you probably have only one switch)
#SBATCH --ntasks=64 # Number of MPI ranks
#SBATCH --exclusive
#SBATCH --switches=1
#SBATCH --mem-per-cpu=100mb # Memory per core
Using --exclusive will make sure your job will not be bothered by other jobs.
What is the equivalent of
qmgr -c "print queue UCTlong"
in Slurm?
PS: the command up there displays all the characteristics of the queue "UCTlong" (max walltime, max running jobs, ....).
Slurm uses the word 'partition' rather than queue. The command would be
scontrol show partition UCTlong
I have a job array of 100 jobs. I want at most 2 jobs from the job array can be allocated to the same node. How can I do this using SLURM? Thanks!
Assuming that jobs can share nodes, and that nodes have homogeneous configuration, and that you are alone on the cluster,
use the sinfo -Nl command to find the number of CPUs per nodes
submit jobs that request half that number with either of #SBATCH --tasks-per-node=... or #SBATCH --cpus-per-task=... based on what your jobs do
If you are administrating a cluster that is shared among other people, you can define GRES of a dummy type, and assign two of them to each node in slurm.conf and then request one per job with --gres=dummy:1
Slurm manages a cluster with 8core/64GB ram and 16core/128GB ram nodes.
There is a low-priority "long" partition and a high-priority "short" partition.
Jobs running in the long partition can be suspended by jobs in the short partition, in which case pages from the suspended job get mostly pushed to swap. (Swap usage is intended for this purpose only, not for active jobs.)
How can I configure in slurm the total amount of RAM+swap available in each node for jobs?
There is the MaxMemPerNode parameter, but that is a partition property and thus cannot accommodate different values for different nodes in the partition.
There is the MaxMemPerCPU parameter, but that prevents low-memory jobs to share unused memory with big-memory jobs.
You need to specify the memory of each node using the RealMemory parameter in the node definition (see the slurm.conf manpage)