Use slurm job id - linux

When I launch a computation on the cluster, I usually have a separate program doing the post-processing at the end :
sbatch simulation
sbatch --dependency=afterok:JOBIDHERE postprocessing
I want to avoid mistyping and automatically have the good job id inserted. Any idea? Thanks

You can do something like this:
RES=$(sbatch simulation) && sbatch --dependency=afterok:${RES##* } postprocessing
The RES variable will hold the result of the sbatch command, something like Submitted batch job 102045. The construct ${RES##* } isolates the last word (see more info here), in the current case the job id. The && part ensures you do not try to submit the second job in the case the first submission fails.

Related

Slurm - job name, job ids, how to know which job is which?

I often run many jobs on slurm. Some finish faster than others. However, it is always hard to keep track which job is which. Can I give custom job names on slurm? If so what is the command on the batch script? Would that show up when I do squeue --me?
The parameter is --job-name (or -J), for instance:
#SBATCH --job-name=exp1_run2
The squeue output will list exp1_run2 for the corresponding job ID under column NAME.

Scheduling more jobs than MaxArraySize

Let's say I have 6233 simulations to run. The commands are generated and stored in a file, one in each line. I would like to use Slurm to schedule and run these commands. However, the MaxArraySize limit is 2000. So I can't use one job array to schedule all of them.
One solution is given here, where we create four separate jobs and use arithmetic indexing into the file, with the last job having a smaller number of tasks to run (233).
Is it possible to do this using one sbatch script with one job ID?
I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?
Update:
Following Damien's solution and examples given here, I ended up with the following line in my bash script:
curID=$(( ${SLURM_ARRAY_TASK_ID} * ${SLURM_NTASKS} + ${SLURM_PROCID} ))
The same can be done using Python (shown in the referenced page). The only difference is that the environment variables should be imported into the script.
Is it possible to do this using one sbatch script with one job ID?
No. That solution will give you multiple job IDs
I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?
Yes, that is a factor that you can leverage.
Each job in the array can spawn multiple tasks (--ntasks=...). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $SLURM_PROCID, and the program must be started with srun. Each task in a job member of the array will run in parallel. How large the job can be will depend on the MaxJobsize limit defined on the cluster/partition/qos you have access to.
Another option is to chain the tasks inside each job of the array, with a Bash loop (for i in $seq(...) ; do ...; done). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $i. Each task in a job member of the array will run serially. How large the job can be will depend on the MaxWall limit defined on the cluster/partition/qos you have access to.

one input file to yield many output files

This is a bit of a backwards approach to snakemake whose main paradigm is "one job -> one output", but i need many reruns in parallel of my script on the same input matrix on the slurm batch job submission cluster. How do I achieve that?
I tried specifying multiple threads, multiple nodes, each time indicating one cpu per task, but it never submits an array of many jobs, just an array of one job.
I don't think there is a nice way to submit an array job like that. In snakemake, you need to specify a unique output for each job. But you can have the same input. If you want 1000 runs of a job:
ids = range(1000)
rule all:
input: expand('output_{sample}_{id}', sample=samples, id=ids)
rule simulation:
input: 'input_{sample}'
output: 'output_{sample}_{id}'
shell: echo {input} > {output}
If that doesn't help, provide more information about the rule/job you are trying to run.

how to find the command used for a slurm job based on job id?

After submitting a slurm job using sbatch file.slurm, you get a job ID. You can use squeue and sacct to check the job's status. But neither returns the original submission command (sbatch file.slurm) for the job. Is there a command to show the submission command, namely sbatch file.slurm? I need to link job IDs with my submission commands.
So far, the only way is by saving the return of sbatch command somewhere.
No, there is no command to show the submission command. One workaround is to put the jobname as the filename.
#SBATCH -J "file_name"
So, when you do squeue or scontrol show job, then you can match your job id with the filename.
There is no other way to achieve the desired objective.

SLURM: Changing the maximum number of simultaneously running tasks for a running array job

I have set of an array job as follows:
sbatch --array=1:100%5 ...
which will limit the number of simultaneously running tasks to 5. The job is now running, and I would like to change this number to 10 (i.e. I wish I'd run sbatch --array=1:100%10 ...).
The documentation on array jobs mentions that you can use scontrol to change options after the job has started. Unfortunately, it's not clear what this option's variable name is, and I don't think it is listed in the documentation of the sbatch command here.
Any pointers well received.
You can change the array throttling limit with the following command:
scontrol update ArrayTaskThrottle=<count> JobId=<jobID>

Resources