slurm job status for an old already finished job - slurm

I want to see the status of one of my older jobs submitted using slurm. I have used sacct -j , but it does not give me information on exactly the date when the job was submitted/terminated etc. I want to check the date, time of the job submissio. I tried to use scontrol, but I suppose that only works for current running/pending jobs not for older jobs which are already finished. It will be great if someone could suggest me a slurm command for checking the job status along with job submission date and time etc for an already finished old job. Thanks in advance

As you mentioned that sacct -j is working but not providing the proper information, I'll assume that accounting is properly set and working.
You can select the output of the sacct command with the -o flag, so to get exactly what you want you can use:
sacct -j JOBID -o jobid,submit,start,end,state
You can use sacct --helpformat to get the list of all the available fields for the output.

Related

SLURM get job id of last run jobs

Is there a way to get the last x SLURM job ids of (finished) jobs for a certain user (me)?
Or maybe all job IDs run the x-hours?
(My use case is, that I want to get some metrics via a sacctbut idealy don't want to parse them from outputfiles etc.)
For the next time it's maybe adviseable to plan this in advance, like hereā€¦
To strictly answer the question, you can use sacct like this:
sacct -X --start now-3hours -o jobid
This will list the jobs of the jobs that started within the past 3 hours.
But then, if what you want is to feed those job IDs to sacct to get metrics, you can directly add the metrics to the -o option, or remove that -o option altogether.
Also, the -X is there to have one line per job, but memory-related metrics are stored per job step, so you might want to remove it at some point to get one line per job step instead..

how to find the command used for a slurm job based on job id?

After submitting a slurm job using sbatch file.slurm, you get a job ID. You can use squeue and sacct to check the job's status. But neither returns the original submission command (sbatch file.slurm) for the job. Is there a command to show the submission command, namely sbatch file.slurm? I need to link job IDs with my submission commands.
So far, the only way is by saving the return of sbatch command somewhere.
No, there is no command to show the submission command. One workaround is to put the jobname as the filename.
#SBATCH -J "file_name"
So, when you do squeue or scontrol show job, then you can match your job id with the filename.
There is no other way to achieve the desired objective.

How can I see in slurm the details all the jobs per user

I want to see the details of all jobs of a user.
I know that I can do the following:
scontrol show job
and then I can see all the details of all the jobs of all the users.
But I am searching for something like this:
scontrol show job UserId=Jon
Thanks.
One way to do that is to use squeue with the formatting option to build the command line and pipe that into a shell:
squeue --user Jon --format "scontrol show job %j" | sh
You can then use all the filtering options of squeue like per partition, per state, etc.

SLURM: When we reboot the node, does jobID assignments start from 0?

For example:
sacct --start=1990-01-01 -A user returns job table with latest jobID as 136, but when I submit a new job as sbatch -A user -N1 run.sh submitted bash job returns 100 which is smaller than 136. And seems like sacct -L -A user returns a list which ends with 100.
So it seems like submitted batch jobs overwrites to previous jobs' informations, which I don't want.
[Q] When we reboot the node, does jobID assignments start from 0? If yes, what should I do it to continue from latest jobID assignment before the reboot?
Thank you for your valuable time and help.
There are two main reasons why job ID's might be recycled:
the maximum job ID was reached (see MaxJobId in slurm.conf)
the Slurm controller was restarted with FirstJobId set to a new value
Other than that, Slurm will always increase the job ID's.
Note that the job information in the database is not overwrite; they have a unique ID which is different from the job ID. sacct has a -D, --duplicates option to view all jobs in the database. By default, it only shows the most recent one among all those which have the same job ID.

Slurm: Is it possible to give or change pid of the submitted job via sbatch

When we submit a job via sbatch, the pid to jobs given by incremental order. This order start from again from 1 based on my observation.
sbatch -N1 run.sh
Submitted batch job 20
//Goal is to change submitted batch job's id, if possible.
[Q1] For example there is a running job under slurm. When we reboot the node, does the job continue running? and does its pid get updated or stay as it was before?
[Q2] Is it possible to give or change pid of the submitted job with a unique id that the cluster owner want to give?
Thank you for your valuable time and help.
If the node fails, the job is requeued - if this is permitted by the JobRequeue parameter in slurm.conf. It will get the same Job ID as the previously started run since this is the only identifier in the database for managing the jobs. (Users can override requeueing with the --no-requeue sbatch parameter.)
It's not possible to change Job ID's, no.

Resources