I am quite new to slurm. I am looking on how to display ONLY current running and pending jobs, no prolog.
> sacct -s PD,R
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
5049168 SRR600493 general cluster_u+ 1 RUNNING 0:0
5049168.0 prolog cluster_u+ 1 COMPLETED 0:0
Why is it printing the prolog and what the prolog is?
You should use squeue for that, rather than sacct. squeue will list running and pending jobs, and will be able to display more information (requested resources, etc.) than sacct. And squeue will not show job steps (like 'prolog' here)
When you submit a job with slurm there are two things that happen. First, it allocates resources and then when you launch something on this resource, it creates a step.
So the two lines you are showing belong to the same job. The first line is the allocation and the second is the first step. So someone launched a step with a binary named prolog, this step is now finished but the allocation of the resource is not released. The user probably ran salloc first and then srun.
If you think that nobody launched a binary named prolog it's maybe that you have configured a prolog on slurm to be run at each first step of a job.
Related
I used following command to submit my dependent job.
qsub current_job_file -W depend=afterany:previous_job_id
Then I find out my current job is under status 'H'. And it won't automatically run after the previous job finished. Is it how it suppose to be or I made a mistake somewhere? How can I let it run automatically after the previous job finish?
I also tried the following command. The result is the same.
qsub -W depend=afterany:previous_job_id current_job_file
That is how it is supposed to be. If your current job is dependent on another job and the dependency is "after" then it will be held until the other job finishes (or starts depending on what kind of dependency you have used. In your case it is "any" so it will wait for the other job to finish) and then move your current job to "Q" (queued) state for PBS scheduler to consider the job for running.
I have a shell script that calls five other scripts from it. The first script creates 50 qsub jobs in the cluster. Individual job execution time varies from a couple of minutes to an hour. I need to know when all the 50 jobs get finished because after completing all the jobs I need to run the second script. How to find whether all the qsub jobs are completed or not? One possible solution can be using an infinite loop and check job status by using qstate command with job ID. In this case, I need to check the job status continuously. It is not an excellent solution. Is it possible that after execution, qsub job will notify me by itself. Hence, I don't need to monitor frequently job status.
qsub is capable of handling job dependencies, using -W depend=afterok:jobid.
e.g.
#!/bin/bash
# commands to run on the cluster
COMMANDS="script1.sh script2.sh script3.sh"
# intiliaze JOBID variable
JOBIDS=""
# queue all commands
for CMD in $COMMANDS; do
# queue command and store the job id
JOBIDS="$JOBIDS:`qsub $CMD`"
done
# queue post processing, depended on the submitted jobs
qsub -W depend=afterok:$JOBIDS postprocessing.sh
exit 0
More examples can be found here http://beige.ucs.indiana.edu/I590/node45.html
I never heard about how to do that, and I would be really interested if someone came with a good answer.
In the meanwhile, I suggest that you use file tricks. Either your script outputs a file at the end, or you check for the existence of the log files (assuming they are created only at the end).
while [ ! -e ~/logs/myscript.log-1 ]; do
sleep 30;
done
We execute our job application through bsub command in Linux
OS.
when the job completes, what is the command to retrieve the job information from the LSF archive. i know there is command like bacct jobNo. But it does not retrieve the information.
Please help.
bacct retrieves summary information about sets of finished jobs for the purposes of accounting -- it gives you info like average turnaround time, resource usage etc.
I think what you might be looking for is bhist -l <jobid>, which will give you the historical information about that job's submission and execution (similar to bjobs -l but more detailed and works for jobs that finished long ago).
The command 'qstat -a' outputs lots of lines of information for completed jobs all with status 'C'. It seems that they will stay forever. How to cleanup these unneeded job information since those jobs are already 'completed'? Thanks!
This is controlled by the qmgr parameter keep_completed. keep_completed specifies a number of seconds after completion a job should continue to be visible. If you would like to immediately delete a job without waiting this amount of time, you can execute
qdel -p <jobid>
Type qstat -r to get only the running jobs
I've been using Sun Grid Engine to run my jobs on a node of a cluster.
Usually I would wait for the job to complete before exiting and I use:
qsub -sync yes perl Script.pl
However now I don't use anymore Sun Grid Engine but PBS Pro 10.4
I'm not able to find a corresponding instruction to -sync.
Could someone help me?
Thanks in advance
PBSPro doesn't have a -sync equivalent but you might be able to use the
-I option combined with the use of expect to tell it what code to run in order to get the same effect.
The equivalent of -sync for PBS is -Wblock=true.
This prevents qsub from exiting until the job has completed. It is perhaps unusual to need this, but I found it useful when using some software that was not designed for HPC. The software executes multiple instances of a worker program, which run simultaneously. However, it then has to wait for one (or sometimes more) of the instances to complete, and do some work with the results, before spawning the next. If the worker program completes without writing a particular file, it is assumed to have failed. I was able to write a wrapper script for the worker program, to qsub it, and used the -Wblock=true option to make it wait for the worker program job to complete.