how to find the command used for a slurm job based on job id? - slurm

After submitting a slurm job using sbatch file.slurm, you get a job ID. You can use squeue and sacct to check the job's status. But neither returns the original submission command (sbatch file.slurm) for the job. Is there a command to show the submission command, namely sbatch file.slurm? I need to link job IDs with my submission commands.
So far, the only way is by saving the return of sbatch command somewhere.

No, there is no command to show the submission command. One workaround is to put the jobname as the filename.
#SBATCH -J "file_name"
So, when you do squeue or scontrol show job, then you can match your job id with the filename.
There is no other way to achieve the desired objective.

Related

Slurm - job name, job ids, how to know which job is which?

I often run many jobs on slurm. Some finish faster than others. However, it is always hard to keep track which job is which. Can I give custom job names on slurm? If so what is the command on the batch script? Would that show up when I do squeue --me?
The parameter is --job-name (or -J), for instance:
#SBATCH --job-name=exp1_run2
The squeue output will list exp1_run2 for the corresponding job ID under column NAME.

How can I see in slurm the details all the jobs per user

I want to see the details of all jobs of a user.
I know that I can do the following:
scontrol show job
and then I can see all the details of all the jobs of all the users.
But I am searching for something like this:
scontrol show job UserId=Jon
Thanks.
One way to do that is to use squeue with the formatting option to build the command line and pipe that into a shell:
squeue --user Jon --format "scontrol show job %j" | sh
You can then use all the filtering options of squeue like per partition, per state, etc.

slurm job status for an old already finished job

I want to see the status of one of my older jobs submitted using slurm. I have used sacct -j , but it does not give me information on exactly the date when the job was submitted/terminated etc. I want to check the date, time of the job submissio. I tried to use scontrol, but I suppose that only works for current running/pending jobs not for older jobs which are already finished. It will be great if someone could suggest me a slurm command for checking the job status along with job submission date and time etc for an already finished old job. Thanks in advance
As you mentioned that sacct -j is working but not providing the proper information, I'll assume that accounting is properly set and working.
You can select the output of the sacct command with the -o flag, so to get exactly what you want you can use:
sacct -j JOBID -o jobid,submit,start,end,state
You can use sacct --helpformat to get the list of all the available fields for the output.

SLURM: When we reboot the node, does jobID assignments start from 0?

For example:
sacct --start=1990-01-01 -A user returns job table with latest jobID as 136, but when I submit a new job as sbatch -A user -N1 run.sh submitted bash job returns 100 which is smaller than 136. And seems like sacct -L -A user returns a list which ends with 100.
So it seems like submitted batch jobs overwrites to previous jobs' informations, which I don't want.
[Q] When we reboot the node, does jobID assignments start from 0? If yes, what should I do it to continue from latest jobID assignment before the reboot?
Thank you for your valuable time and help.
There are two main reasons why job ID's might be recycled:
the maximum job ID was reached (see MaxJobId in slurm.conf)
the Slurm controller was restarted with FirstJobId set to a new value
Other than that, Slurm will always increase the job ID's.
Note that the job information in the database is not overwrite; they have a unique ID which is different from the job ID. sacct has a -D, --duplicates option to view all jobs in the database. By default, it only shows the most recent one among all those which have the same job ID.

Slurm: Is it possible to give or change pid of the submitted job via sbatch

When we submit a job via sbatch, the pid to jobs given by incremental order. This order start from again from 1 based on my observation.
sbatch -N1 run.sh
Submitted batch job 20
//Goal is to change submitted batch job's id, if possible.
[Q1] For example there is a running job under slurm. When we reboot the node, does the job continue running? and does its pid get updated or stay as it was before?
[Q2] Is it possible to give or change pid of the submitted job with a unique id that the cluster owner want to give?
Thank you for your valuable time and help.
If the node fails, the job is requeued - if this is permitted by the JobRequeue parameter in slurm.conf. It will get the same Job ID as the previously started run since this is the only identifier in the database for managing the jobs. (Users can override requeueing with the --no-requeue sbatch parameter.)
It's not possible to change Job ID's, no.

Resources