Slurm: How to obtain only jobID using jobName through a script - slurm

If I know the name of a job I have run, how could I return only its jobID through a script.
For example, running sacct --name run.sh returns following output, where I want to return only 50 (jobID).
$ sacct --name run.sh
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
50 run.sh debug alper 1 COMPLETED 0:0
50.batch batch alper 1 COMPLETED 0:0
As a solution I can run: sacct --name run.sh | head -n3 | tail -n1 | awk '{print $1}' that returns 50, but sometimes order of 50 and 50.batch changes for the other jobs.

Use the following combination of options:
sacct -n -X --format jobid --name run.sh
where
-n will suppress the header
-X will suppress the .batch part
--format jobid will only show the jobid column
This will output only the jobid, but if several jobs correspond to the given job name, you will get several results.

Related

Which command is correct in Autosys

Which command is correct to check the current job in running status on a specific agent in Autosys
1 ) Autorep -I machine_name | find RU
OR
2 ) Autorep -j ALL - M machine_name | find RU
Below commands returns the list of running job on a particular machine / agent.
autorep -M <machine/agent_name> -d
example:
autorep -M host123#some.com -d
Output Columns:
JobName | Machine | Status | Load | Priority
If executing from linux server, use grep, awk as per need to reformat the report.

Easy way to hold/release jobs by job array task id in slurm

I have a bunch of job arrays that are running right now (SLURM).
For example, 2552376_1, 2552376_10, 2552376_20, 2552376_80, 2552377_1, 2552377_10, 2552377_20, 2552377_80 and so on.
Currently, I am interested in that which end with _1.
Is there any way to hold all others without specifying job ids (because I have several hundreds of them)?
The following command works for holding all the jobs:
squeue -r -t PD -u $USER -o "scontrol hold %i" | tail -n +2 | sh
For releasing the one with needed id I use
squeue -r -u $USER -o "scontrol release %i" | tail -n +2 | grep "_1$" | sh
which picks correct jobs.
Mass update of jobs can be done by abusing the output formatting of squeue:
Hold all your pending jobs:
squeue -r -t PD -u $USER -o "scontrol hold %i" | sh
then release all your jobs ending in _1
squeue -r -t PD -u $USER -o "scontrol release %i" | grep "_1$" | sh
First run the commands without the | sh part to make sure it is working the way intended.
Note the -r option to display one job array element per line.

scontrol all jobs in user account

I am trying to hold all jobs submitted from my account. However, scontrol hold only takes in array and I have many arrays. Is there an alternative command like scancel -u user?
Edit1:
If iterating all job id is the only way, this is my method:
squeue -u user | awk '{print $1;}' | while read jobid; do scontrol hold $jobid; done
While piping formatted text to sh is clever, I would probably do something like this:
squeue -u <user> --format "%i" --noheader | xargs scontrol hold
or
sacct --allocation --user=<user> --noheader --format=jobid | xargs scontrol hold
If you wanted to filter by state, you could do that as well:
squeue -u <user> --format "%i" --noheader --states=PENDING | xargs scontrol hold
or
sacct --allocation --user=<user> --noheader --format=jobid --state=PENDING | xargs scontrol hold
source: Slurm man pages
A often-used method is to (ab)use the formatting possibilities of squeue to build the scontrol line:
squeue -u user --format "scontrol hold job %i"
and then pipe that into a shell:
squeue -u user --format "scontrol hold job %i" | sh

List number of jobs of each status

Is there a simple way to get SLURM to print out, for a given user, the number of jobs of each status (e.g., running, pending, completed, failed, etc.)?
One way to get that information is with:
squeue -u $USER -o%T -ST | uniq -c
The -u argument will filter jobs for the specific user, the -o%T argument will only output the job state, and the -S argument will sort them. Then uniq -c will do the counting.
Example output:
$ squeue -u $USER -o%T -ST | uniq -c
147 PENDING
49 RUNNING

How to clear PBS job history

How can I clear all the PBS jobs that have been finished or are having status 'F'. I just want to see job that are in Queue or running currently. This will shorten the output of qstat command.
With
$ qselect -x -s F
You can view the ID of all the jobs that have been finished.
With
$ qdel -x <job_id>
You can delete a job history.
Combine them together with xargs.
$ qselect -x -s F | xargs qdel -x
This will delete every job history.

Resources