I am trying to hold all jobs submitted from my account. However, scontrol hold only takes in array and I have many arrays. Is there an alternative command like scancel -u user?
Edit1:
If iterating all job id is the only way, this is my method:
squeue -u user | awk '{print $1;}' | while read jobid; do scontrol hold $jobid; done
While piping formatted text to sh is clever, I would probably do something like this:
squeue -u <user> --format "%i" --noheader | xargs scontrol hold
or
sacct --allocation --user=<user> --noheader --format=jobid | xargs scontrol hold
If you wanted to filter by state, you could do that as well:
squeue -u <user> --format "%i" --noheader --states=PENDING | xargs scontrol hold
or
sacct --allocation --user=<user> --noheader --format=jobid --state=PENDING | xargs scontrol hold
source: Slurm man pages
A often-used method is to (ab)use the formatting possibilities of squeue to build the scontrol line:
squeue -u user --format "scontrol hold job %i"
and then pipe that into a shell:
squeue -u user --format "scontrol hold job %i" | sh
Related
Often I need to cancel all jobs created after a certain time or job id. Is there some syntax like scancel -j -gt <myid> or scancel -j $(< some call to sacct to get jobs after time "T">)
Turns out that "> job id" is simple:
squeue -u $USER | awk '{if (NR!=1 && $1 > 8122014) {print $1}}' | xargs -n 1 scancel
I have a bunch of job arrays that are running right now (SLURM).
For example, 2552376_1, 2552376_10, 2552376_20, 2552376_80, 2552377_1, 2552377_10, 2552377_20, 2552377_80 and so on.
Currently, I am interested in that which end with _1.
Is there any way to hold all others without specifying job ids (because I have several hundreds of them)?
The following command works for holding all the jobs:
squeue -r -t PD -u $USER -o "scontrol hold %i" | tail -n +2 | sh
For releasing the one with needed id I use
squeue -r -u $USER -o "scontrol release %i" | tail -n +2 | grep "_1$" | sh
which picks correct jobs.
Mass update of jobs can be done by abusing the output formatting of squeue:
Hold all your pending jobs:
squeue -r -t PD -u $USER -o "scontrol hold %i" | sh
then release all your jobs ending in _1
squeue -r -t PD -u $USER -o "scontrol release %i" | grep "_1$" | sh
First run the commands without the | sh part to make sure it is working the way intended.
Note the -r option to display one job array element per line.
Is there a simple way to get SLURM to print out, for a given user, the number of jobs of each status (e.g., running, pending, completed, failed, etc.)?
One way to get that information is with:
squeue -u $USER -o%T -ST | uniq -c
The -u argument will filter jobs for the specific user, the -o%T argument will only output the job state, and the -S argument will sort them. Then uniq -c will do the counting.
Example output:
$ squeue -u $USER -o%T -ST | uniq -c
147 PENDING
49 RUNNING
If I know the name of a job I have run, how could I return only its jobID through a script.
For example, running sacct --name run.sh returns following output, where I want to return only 50 (jobID).
$ sacct --name run.sh
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
50 run.sh debug alper 1 COMPLETED 0:0
50.batch batch alper 1 COMPLETED 0:0
As a solution I can run: sacct --name run.sh | head -n3 | tail -n1 | awk '{print $1}' that returns 50, but sometimes order of 50 and 50.batch changes for the other jobs.
Use the following combination of options:
sacct -n -X --format jobid --name run.sh
where
-n will suppress the header
-X will suppress the .batch part
--format jobid will only show the jobid column
This will output only the jobid, but if several jobs correspond to the given job name, you will get several results.
How can I clear all the PBS jobs that have been finished or are having status 'F'. I just want to see job that are in Queue or running currently. This will shorten the output of qstat command.
With
$ qselect -x -s F
You can view the ID of all the jobs that have been finished.
With
$ qdel -x <job_id>
You can delete a job history.
Combine them together with xargs.
$ qselect -x -s F | xargs qdel -x
This will delete every job history.