What is the equivalent of
qmgr -c "print queue UCTlong"
in Slurm?
PS: the command up there displays all the characteristics of the queue "UCTlong" (max walltime, max running jobs, ....).
Slurm uses the word 'partition' rather than queue. The command would be
scontrol show partition UCTlong
Related
When using the parameter -p you can define the partition for your job.
In my case a job can run in different partitions, so I do want to restrict my job to only a given partition.
If my job can perfectly run in partitions "p1" and "p3", how can I configure the sbatch command to allow more than one partition?
The --partition option accepts a list of partition. So in your case you would write
#SBATCH --partition=p1,p3
The job will start in the partition that offers the resources the earliest.
I'd like to know if it's possible to know for queue the number of jobs pending in a MapR queue ?
I tried with the mapred -info job-queue-name [-showJobs] command but it doesn't give the result i'm looking for.
I have a SLURM cluster and a RUNNING job where I have requested 60 threads by
#SBATCH --cpus-per-task=60
(I am sharing threads on a node using cgroups)
I now want to reduce the amount of threads to 30.
$ scontrol update jobid=274332 NumCPUs=30
Job is no longer pending execution for job 274332
The job has still 60 threads allocated.
$ scontrol show job 274332
JobState=RUNNING Reason=None Dependency=(null)
NumNodes=1 NumCPUs=60 NumTasks=1 CPUs/Task=60 ReqB:S:C:T=0:0:*:*
How would be the correct way to accomplish this?
Thanks!
In the current version of Slurm, scontrol only allows to reduce the number of nodes allocated to a running job, but not the number of CPUs (or the memory).
The FAQ says:
Use the scontrol command to change a job's size either by specifying a new node count (NumNodes=) for the job or identify the specific nodes (NodeList=) that you want the job to retain.
(Emphasis mine)
I have a SLURM job script a which internally issues an sbatch call to a second job script b. Thus, the job a starts job b.
Now I also have an srun command in job a which depends on the successful execution of b. So I did
srun -d afterok:$jobid <command>
The issue is that dependencies are seemingly not honoured for job steps which I have in this case because my srun runs within the job allocation a (see the --dependency section of https://slurm.schedmd.com/srun.html).
The question: I really need to wait for job b to finish before I issue the job step. How can I do this without resorting to separate jobs?
I have a job array of 100 jobs. I want at most 2 jobs from the job array can be allocated to the same node. How can I do this using SLURM? Thanks!
Assuming that jobs can share nodes, and that nodes have homogeneous configuration, and that you are alone on the cluster,
use the sinfo -Nl command to find the number of CPUs per nodes
submit jobs that request half that number with either of #SBATCH --tasks-per-node=... or #SBATCH --cpus-per-task=... based on what your jobs do
If you are administrating a cluster that is shared among other people, you can define GRES of a dummy type, and assign two of them to each node in slurm.conf and then request one per job with --gres=dummy:1