Print the number of cpus in use per job in slurm? - slurm

We have switched to using slurm from sge for our cluster job queuing system. In sge when you used the qstat function it printed the number of cpus/slots in use per job - is there a simple way to do this in slurm? squeue only shows the number of nodes used per job. Thanks.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1981 q run-01 root R 16:21 1 node001
1982 q run-02 root R 16:21 1 node002
1983 q run-03 root R 16:21 1 node003

The squeue command has two parameters that allow choosing the columns displayed in the output --format and --Format. Each have an option (respectively %c and NumCPUs) to display the number of cores requested by the job.
Try with
squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %c"
This will show the default columns and add the number of cores as the last column. You can fiddle with the format string to arrange the columns as you want. Then, when you are happy with the output, you can set it as the value of the SQUEUE_FORMAT variable in your .bash_profile or .bashrc.
export SQUEUE_FORMAT='%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %c'
See the squeue man page for more details.

Related

Cores assigned to SLURM job

Let's say I want to submit a slurm job just assigning the total amount of tasks (--ntasks=someNumber), without specifying the number of nodes and the tasks per node. Is there a way to know within the launched slurm script how many cores are assigned by slurm for each of the reserved nodes? I need to know this info to properly create a machinefile for the program I'm launching, that must be structured like this:
node02:7
node06:14
node09:3
Once the job is launched, the only way I figured out to see what cores have been allocated on the nodes is using the command:
scontrol show jobid -dd
In its output the abovementioned info is stored (together with plenty of other details).
Is there a better way to get this info?
The way the srun documentation illustrates creating a machine file is by running srun hostname. To get the output you want you could run
srun hostname -s | sort | uniq -c | awk '{print $2":"$1}' > $MACHINEFILE
You should check the documentation of your program to see if it accepts a machine file with repetitions rather than a suffix count. If so you can simplify the command as
srun hostname -s > $MACHINEFILE
And of course the first step is actually to make sure you indeed need a machine file in the first place as many parallel programs/libraries have Slurm support and can gather the needed information from the environment variables setup by Slurm upon job start.

How to set the limit of cpu per user using maxTRESperuser on qos for slurm

I just set the qos parameter MaxTRESperuser to cpu=10 for testing purpose, but slurm is schedulling the job.
I used:
sacctmgr modify qos normal set maxtresperuser=cpu=1
and we can view on
sacctmgr show qos
Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
normal 0 00:00:00 cluster 1.000000 cpu=1
but all the jobs sent from the same user were allocated, each job using 2 cpus
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
370 teste script.s root R 0:11 1 slurmcomputenode2.novalocal
371 teste script.s root R 0:11 1 slurmcomputenode2.novalocal
372 teste teste.sh root R 0:07 1 slurmcomputenode1.novalocal
The slurm documetantion doesn't say anything else about it.
Do I need to change something on slurm.conf file?
Thanks
Make sure AccountingStorageEnforce is set to limits,qos. You also need proper accounting for limits to be enforced. See the documentation.

SLURM: When we reboot the node, does jobID assignments start from 0?

For example:
sacct --start=1990-01-01 -A user returns job table with latest jobID as 136, but when I submit a new job as sbatch -A user -N1 run.sh submitted bash job returns 100 which is smaller than 136. And seems like sacct -L -A user returns a list which ends with 100.
So it seems like submitted batch jobs overwrites to previous jobs' informations, which I don't want.
[Q] When we reboot the node, does jobID assignments start from 0? If yes, what should I do it to continue from latest jobID assignment before the reboot?
Thank you for your valuable time and help.
There are two main reasons why job ID's might be recycled:
the maximum job ID was reached (see MaxJobId in slurm.conf)
the Slurm controller was restarted with FirstJobId set to a new value
Other than that, Slurm will always increase the job ID's.
Note that the job information in the database is not overwrite; they have a unique ID which is different from the job ID. sacct has a -D, --duplicates option to view all jobs in the database. By default, it only shows the most recent one among all those which have the same job ID.

Understanding the -t option in qsub

The documentation is a bit unclear on exactly what the -t option is doing on a job submission using qsub
http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm
From the documentation:
-t Specifies the task ids of a job array. Single task arrays are allowed.
The array_request argument is an integer id or a range of integers.
Multiple ids or id ranges can be combined in a comma delimited list.
Examples: -t 1-100 or -t 1,10,50-100
Here's an example where things go wrong, I've requested 2 nodes, 8 processes per node, and an array of 16 jobs. Which I had hoped would be distributed naturally across the 2 nodes, but the 16 tasks were distributed ad-hoc across more than 2 nodes.
$ echo 'hostname' | qsub -q gpu -l nodes=2:ppn=8 -t 1-16
52727[]
$ cat STDIN.o52727-* | sort
gpu-3.local
gpu-3.local
gpu-3.local
gpu-3.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local
I suspect this will not completely answer your question, but what exactly you hope to accomplish remains unclear.
Specifying an array with qsub -t simply creates individual jobs, all with the same primary ID. Submitting the way you indicated will create 16 jobs, each requesting 16 total cores. This syntax merely makes it easier to submit a large number of jobs at once, without having to script a submission loop.
With Torque alone (i.e., disregarding the scheduler), you can force jobs to specific nodes by saying something like this:
qsub -l nodes=gpu-node01:ppn=8+gpu-node02:ppn=8
A more advanced scheduler can give you greater flexibility (e.g., Moab or Maui allow "-l nodes=2:ppn=8,nallocpolicy=exactnode", which applies NODEALLOCATIONPOLICY EXACTNODE to the job when scheduling, and will give you 8 cores each on exactly two nodes (ANY two nodes, in this case)).

qsub job array, where each job uses a single process?

I have a job script with the following PBS specifications at the beginning:
#PBS -t 0-99
#PBS -l nodes=1:ppn=1
The first line says that this is a job array, with jobs indexed from 0 to 99.
I want each individual indexed job to use only a single node and a single core per node, hence my second PBS line. But I am worried that TORQUE qsub will interpret the second line as saying that the whole job array should run sequentially on a single core.
How does TORQUE qsub interpret the PBS second line?
It interprets it as 100 jobs that should each use 1 execution slot on one node. For more information, please look at the qsub documentation and look for the details on the -t switch.

Resources