Chain multiple SLURM jobs with dependency - slurm

In a previous question I asked how to queue a job B to start after job A, which is done with
sbatch --dependency=after:123456:+5 jobB.slurm
where 123456 is the id for job A, and :+5 denotes that it will start five minutes after job A.
I now need to do this for several jobs. Job B should depend on job A, job C on B, job D on C.
sbatch jobA.slurm will return Submitted batch job 123456, and I will need to pass the job id to the call with dependency for all but the first job. As I am using a busy cluster, I can't rely on incrementing the job ids by one, as someone might queue a job between.
As such I want to write a script that takes the job scripts (*.slurm) I want to run as arguments, e.g.
./run_jobs.sh jobA.slurm jobB.slurm jobC.slurm jobD.slurm
The script should then run, for all jobs scripts passed to it,
sbatch jobA.slurm # Submitted batch job 123456
sbatch --dependency=after:123456:+5 jobB.slurm # Submitted batch job 123457
sbatch --dependency=after:123457:+5 jobC.slurm # Submitted batch job 123458
sbatch --dependency=after:123458:+5 jobD.slurm # Submitted batch job 123459
What is an optimal way to do this with bash?

You can use the --parsable option to get the jobid of the previously submitted job:
#!/bin/bash
ID=$(sbatch --parsable $1)
shift
for script in "$#"; do
ID=$(sbatch --parsable --dependency=after:${ID}:+5 $script)
done

Related

Slurm - job name, job ids, how to know which job is which?

I often run many jobs on slurm. Some finish faster than others. However, it is always hard to keep track which job is which. Can I give custom job names on slurm? If so what is the command on the batch script? Would that show up when I do squeue --me?
The parameter is --job-name (or -J), for instance:
#SBATCH --job-name=exp1_run2
The squeue output will list exp1_run2 for the corresponding job ID under column NAME.

Dealing with job submission limits

I am running slurm job arrays with --array, and I would like to run about 2000 tasks/array items. However this is beyond the cluster's job submission limit of ~500 at a time.
Are there any tips/best practices for splitting this up? I'd like to submit it all at once and still be able to pass the array id arguments 1-2000 to my programs if possible. I think something like waiting to submit pieces of the array might be helpful but I'm not sure how to do this at the moment.
If the limit is on the size of an array:
You will have to split the array into several job arrays. The --array parameter accepts values of the form <START>-<END> so you can submit four jobs:
sbatch --array=1-500 ...
sbatch --array=501-1000 ...
sbatch --array=1001-1500 ...
sbatch --array=1501-200 ...
This way you will bypass the 500-limit and still keep the SLURM_ARRAY_TASK_ID ranging from 1 to 2000.
To ease things a bit, you can write this all in one line like this:
paste -d- <(seq 1 500 2000) <(seq 500 500 2000) | xargs -I {} sbatch --array={} ...
If the limit is on the number of submitted jobs:
Then one option is to have the last job of the array submit the following chunk.
#!/bin/bash
#SBATCH ...
...
...
if [[ $((SLURM_ARRAY_TASK_ID % 500)) == 0 ]] ; then
sbatch --array=$((SLURM_ARRAY_TASK_ID+1))-$((SLURM_ARRAY_TASK_ID+500)) $0
fi
Note that ideally, the last running job of the array should submit the job, and it may or may not be the one with the highest TASK ID, but this has worked for all practical purposes in many situations.
Another options is to setup a cron job to monitor the queue and submit each chunk when possible, or to use a workflow manager that will that for you.
you can run a script to submit your jobs and try to make the program sleep a few seconds after every 500 submissions. see https://www.osc.edu/resources/getting_started/howto/howto_submit_multiple_jobs_using_parameters

Submit N sequential jobs in SLURM

I want to train a neural network on a cluster which uses SLURM to manage jobs. There is a time limit of 10 hours imposed on each job submitted. Therefore, I need a script that could automatically submit sequential jobs i.e. train from scratch for the first job and reload the checkpoint from the most recent job to continue training for the second job afterwards.
I have written the following script. I would like to know if this is okay or if there is any standard way to handle this in SLURM.
#!/bin/bash
Njobs=1000
# Read the configuration variables
# Each training should have a difference config
CONFIG=experiments/model.cfg
source $CONFIG
# Submit first job - no dependencies
j0=$(sbatch run-debug.slurm $CONFIG)
echo "ID of the first job: $j0"
# add first job to the list of jobs
jIDs+=($j0)
# for loop: submit Njobs: where job (i+1) is dependent on job i.
# and job (i+1) (i.e. new_job) resume from the checkpoint of job i
for i in $(seq 0 $Njobs); do
# Submit job (i+1) with dependency ('afterok:') on job i
RESUME_CHECKPOINT=$OUTPUTPATH/$EXPNAME/${jIDs[$i - 1 ]}/checkpoint.pkl
new_job=$(sbatch --dependency=afterok:${jIDs[$i - 1 ]} run-debug.slurm $CONFIG $RESUME_CHECKPOINT)
echo "Submitted job $new_job that will be executed once job ${jIDs[$i - 1 ]} has completed with success."
echo "This task will resume training from $RESUME_CHECKPOINT."
jIDs+=($new_job)
echo "List of jobs that have been submitted: $jIDs"
done
Thank you so much in advance for your help!

How to submit jobs when certain jobs has finished?

I submit jobs to a cluster (high-performance computer) using file1.sh and file2.sh.
The content of file1.sh is
qsub job1.sh
qsub job2.sh
qsub job3.sh
...
qsub job999.sh
qsub job1000.sh
The content of file2.sh is
qsub job1001.sh
qsub job1002.sh
qsub job1003.sh
...
qsub job1999.sh
qsub job2000.sh
After typing ./file1.sh in putty, job1 to job1000 are submitted.
Is there an automatic way to type ./file2.sh ONLY after job1000 has completed? Please note, I want to type ./file2.sh automatically only after job1000 has finished (not just successfully submitted).
The reason for doing this, is that we can only submit 1000 jobs at a time. This 1000 limit includes the jobs at running and at the queue. The use of -hold_jid will still be considered within the limit of 1000. So I have to wait for all the first 1000 jobs finished (not simply submitted) then I am able to submit the next 1000 jobs.
Without the limitation to submitting 1000 Jobs, you could name your first jobs. You can then tell the next jobs to wait until the first jobs are finished. But as all jobs will be submitted, I think you will run against your 1000 jobs limit.
qsub -N job1 ./a.sh
qsub -N job2 ./b.sh
qsub -hold_jid job1,job2 -N job3 ./c.sh
You could write a shell script that submits the first 1000 jobs. Then the scripts waits until some jobs have finished and submits the next jobs. The script checks with something like
qstat -u username | wc -l
How many jobs you have submitted. If you have less than 1000 submitted jobs, the script could submit the next x jobs, where x = 1000 - #SubmittedJobs.
Cluster operators usually vary, what user behaviour they tolerate. So maybe it would be better to ask if this is ok for them. Also, some schedulers give jobs of powerusers (here in number of jobs) a lower priority for new jobs. So it could be the case, that your new jobs spend more time in the queue.

Slurm: Is it possible to give or change pid of the submitted job via sbatch

When we submit a job via sbatch, the pid to jobs given by incremental order. This order start from again from 1 based on my observation.
sbatch -N1 run.sh
Submitted batch job 20
//Goal is to change submitted batch job's id, if possible.
[Q1] For example there is a running job under slurm. When we reboot the node, does the job continue running? and does its pid get updated or stay as it was before?
[Q2] Is it possible to give or change pid of the submitted job with a unique id that the cluster owner want to give?
Thank you for your valuable time and help.
If the node fails, the job is requeued - if this is permitted by the JobRequeue parameter in slurm.conf. It will get the same Job ID as the previously started run since this is the only identifier in the database for managing the jobs. (Users can override requeueing with the --no-requeue sbatch parameter.)
It's not possible to change Job ID's, no.

Resources