Is there a way to specify a niceness value per partition of a sbatch command? - slurm

I launch a bunch of jobs with the following format:
sbatch -p partitionA,partitionB --nice=${NICE} script_to_run.sh
Is there a way to specify the nice value per partition, or is the way to do this is to set a default niceness for each partition and use that?

Related

Cores assigned to SLURM job

Let's say I want to submit a slurm job just assigning the total amount of tasks (--ntasks=someNumber), without specifying the number of nodes and the tasks per node. Is there a way to know within the launched slurm script how many cores are assigned by slurm for each of the reserved nodes? I need to know this info to properly create a machinefile for the program I'm launching, that must be structured like this:
node02:7
node06:14
node09:3
Once the job is launched, the only way I figured out to see what cores have been allocated on the nodes is using the command:
scontrol show jobid -dd
In its output the abovementioned info is stored (together with plenty of other details).
Is there a better way to get this info?
The way the srun documentation illustrates creating a machine file is by running srun hostname. To get the output you want you could run
srun hostname -s | sort | uniq -c | awk '{print $2":"$1}' > $MACHINEFILE
You should check the documentation of your program to see if it accepts a machine file with repetitions rather than a suffix count. If so you can simplify the command as
srun hostname -s > $MACHINEFILE
And of course the first step is actually to make sure you indeed need a machine file in the first place as many parallel programs/libraries have Slurm support and can gather the needed information from the environment variables setup by Slurm upon job start.

PBS Pro: setting job array slot limit by the user

Using torque user can specify slot limit when submitting the job array by using the %, e.g.: qsub job.sh -t 1-20%5 will create a job array with 20 jobs, but with only 5 running simultaneously.
Currently I work with PBS Professional, but unfortunately, as far as I can see, option % is not supported. How can I achieve similar behavior as % in torque as simple as possible?

SLURM: Changing the maximum number of simultaneously running tasks for a running array job

I have set of an array job as follows:
sbatch --array=1:100%5 ...
which will limit the number of simultaneously running tasks to 5. The job is now running, and I would like to change this number to 10 (i.e. I wish I'd run sbatch --array=1:100%10 ...).
The documentation on array jobs mentions that you can use scontrol to change options after the job has started. Unfortunately, it's not clear what this option's variable name is, and I don't think it is listed in the documentation of the sbatch command here.
Any pointers well received.
You can change the array throttling limit with the following command:
scontrol update ArrayTaskThrottle=<count> JobId=<jobID>

How to find the default values for all the optional srun and sbatch parameters?

The question title pretty much says it all.
As an example, the --mem parameter for srun is optional (or at least this is the case for the SLURM instance I have access to). I would like to know the value for this option that would have the same effect as not specifying the option at all. (I realize that this particular default may depend on the values of other parameters passed to srun, such as the partition, etc.)
Ditto for all the other optional srun and sbatch parameters.
By default, --mem gets DefMemPerNode.
You can check that value using the command:
scontrol show config
If it is not defined, its default values will be 0 and it means that the job will be given all the memory of the node.
You can have the other default values from the FAQ.

qsub job array, where each job uses a single process?

I have a job script with the following PBS specifications at the beginning:
#PBS -t 0-99
#PBS -l nodes=1:ppn=1
The first line says that this is a job array, with jobs indexed from 0 to 99.
I want each individual indexed job to use only a single node and a single core per node, hence my second PBS line. But I am worried that TORQUE qsub will interpret the second line as saying that the whole job array should run sequentially on a single core.
How does TORQUE qsub interpret the PBS second line?
It interprets it as 100 jobs that should each use 1 execution slot on one node. For more information, please look at the qsub documentation and look for the details on the -t switch.

Resources