Wait for set of qsub jobs to complete - qsub

I have a batch script which starts off a couple of qsub jobs, and I want to trap when they are all completed.
I don't want to use the -sync option, because I want them to be running simultaneously. Each job has a different set of command line parameters.
I want my script to wait till when all the jobs have been completed, and do something after that. I don't want to use the sleep function e.g. to check if certain files have been generated after each 30 s, because this is a drain on resources.
I believe Torque may have some options, but I am running SGE.
Any ideas on how I could implement this please?
Thanks
P.s.
I did find another thread
Link
which had a reponse
You can use wait to stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.
but I am not sure how to use it without polling on some value. Can bash trap be used, but how would I with qsub?

Launch your qsub jobs, using the -N option to give them arbitrary names (job1, job2, etc):
qsub -N job1 -cwd ./job1_script
qsub -N job2 -cwd ./job2_script
qsub -N job3 -cwd ./job3_script
Launch your script and tell it to wait until the jobs named job1, job2 and job3 are finished before it starts:
qsub -hold_jid job1,job2,job3 -cwd ./results_script

If all the jobs have a common pattern in the name, you can provide that pattern when you submit the jobs. https://linux.die.net/man/1/sge_types shows you what patterns you can use. example:
-hold_jid "job_name_pattern*"

Another alternative (from here) is as follows:
FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD
The insight is that qsub returns the jobid and this is typically dumped to standard output. Instead, capture it in a variable ($FIRST, $SECOND, $THIRD) and use the -W depend=afterany:[JOBIDs] flag when you enqueue your jobs to control the dependency structure of when they are dequeued.

qsub -hold_jid job1,job2,job3 -cwd ./myscript

This works in bash, but the ideas should be portable. Use -terse to facilitate building up a string with job ids to wait on; then submit a dummy job that uses -hold_jid to wait on the previous jobs and -sync y so that qsub doesn't return until it (and thus all prereqs) has finished:
# example where each of three jobs just sleeps for some time:
job_ids=$(qsub -terse -b y sleep 10)
job_ids=job_ids,$(qsub -terse -b y sleep 20)
job_ids=job_ids,$(qsub -terse -b y sleep 30)
qsub -hold_jid ${job_ids} -sync y -b y echo "DONE"
-terse option makes the output of qsub just be the job id
-hold_jid option (as mentioned in other answers) makes a job wait on specified job ids
-sync y option (referenced by the OP) asks qsub not to return until the submitted job is finished
-b y specifies that the command is not a path to a script file (for instance, I'm using sleep 30 as the command)
See the man page for more details.

#!/depot/Python-2.4.2/bin/python
import os
import subprocess
import shlex
def trackJobs(jobs, waittime=4):
while len(jobs) != 0:
for jobid in jobs:
x = subprocess.Popen(['qstat', '-j', jobid], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
std_out, std_err = x.communicate()
if std_err :
jobs.remove(jobid)
break
os.system("sleep " + str(waittime))
return
This is the simple code where you can track the stauts of completion of qsub jobs .
Here function accepts list of jobIds (for example ['84210770', '84210774', '84210776', '84210777', '84210778'] )

In case you have 150 files that you want process and be able to run only 15 each time, while the other are in holding in the queue you can set something like this.
# split my list files in a junk of small list having 10 file each
awk 'NR%10==1 {x="F"++i;}{ print > "list_part"x".txt" }' list.txt
qsub all the jobs in such a way that the first of each list_part*.txt hold the second one ....the second one hold the third one .....and so on.
for list in $( ls list_part*.txt ) ; do
PREV_JOB=$(qsub start.sh) # create a dummy script start.sh just for starting
for file in $(cat $list ) ; do
NEXT_JOB=$(qsub -v file=$file -W depend=afterany:$PREV_JOB myscript.sh )
PREV_JOB=$NEXT_JOB
done
done
This is useful if you have in myscript.sh a procedure that require move or download many files or create intense traffic in the cluster-lan

You can start a job array qsub -N jobname -t 1-"$numofjobs" -tc 20, then it has only one job id and runs 20 at a time. You give it a name, and just hold until that array is done using qsub -hold_jid jid or qsub -hold_jid jobname.

I needed more flexibility, so I built a Python module for this and other purposes here. You can run the module directly as a script (python qsub.py) for a demo.
Usage:
$ git clone https://github.com/stevekm/util.git
$ cd util
$ python
Python 2.7.3 (default, Mar 29 2013, 16:50:34)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import qsub
>>> job = qsub.submit(command = 'echo foo; sleep 60', print_verbose = True)
qsub command is:
qsub -j y -N "python" -o :"/home/util/" -e :"/home/util/" <<E0F
set -x
echo foo; sleep 60
set +x
E0F
>>> qsub.monitor_jobs(jobs = [job], print_verbose = True)
Monitoring jobs for completion. Number of jobs in queue: 1
Number of jobs in queue: 0
No jobs remaining in the job queue
([Job(id = 4112505, name = python, log_dir = None)], [])
Designed with Python 2.7 and SGE since thats what our system runs. The only non-standard Python libraries required are the included tools.py and log.py modules, and sh.py (also included)
Obviously not as helpful if you wish to stay purely in bash, but if you need to wait on qsub jobs then I would imagine your workflow is edging towards a complexity that would benefit from using Python instead.

Related

Sun Grid Engine: submitted jobs by qsub command

I am using Sun Grid Engine queuing system.
Assume I submitted multiple jobs using a script that looks like:
#! /bin/bash
for i in 1 2 3 4 5
do
sh qsub.sh python run.py ${i}
done
qsub.sh looks like:
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
Assuming that 5 jobs are running, I want to find out which command each job is executing.
By using qstat -f, I can see which node is running which jobID, but not what specific command each jobID is related to.
So for example, I want to check which jobID=xxxx is running python run.py 3 and so on.
How can I do this?
I think you'll see it if you use qstat -j *. See https://linux.die.net/man/1/qstat-ge .
You could try running array jobs. Array jobs are useful when you have multiple inputs to process in the same way. Qstat will identify each instance of the array of jobs. See the docs for more information.
http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm#-t
http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto

Redirect output of my java program under qsub

I am currently running multiple Java executable program using qsub.
I wrote two scripts: 1) qsub.sh, 2) run.sh
qsub.sh
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
run.sh
#! /bin/bash
for param in 1 2 3
do
./qsub.sh java -jar myProgram.jar -param ${param}
done
Given the two scripts above, I submit jobs by
sh run.sh
I want to redirect the messages generated by myProgram.jar -param ${param}
So in run.sh, I replaced the 4th line with the following
./qsub.sh java -jar myProgram.jar -param ${param} > output-${param}.txt
but the messages stored in output.txt is "Your job 730 ("STDIN") has been submitted", which is not what I intended.
I know that qsub has an option -o for specifying the location of output, but I cannot figure out how to use this option for my case.
Can anyone help me?
Thanks in advance.
The issue is that qsub doesn't return the output of your job, it returns the output of the qsub command itself, which is simply informing your resource manager / scheduler that you want that job to run.
You want to use the qsub -o option, but you need to remember that the output won't appear there until the job has run to completion. For Torque, you'd use qstat to check the status of your job, and all other resource managers / schedulers have similar commands.

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.
GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.
You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done
Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX
I found a fairly good solution using make, which is a part of the standard distributions. See here

can i delete a shell script after it has been submitted using qsub without affecting the job?

I want to submit a a bunch of jobs using qsub - the jobs are all very similar. I have a script that has a loop, and in each instance it rewrites over a file tmpjob.sh and then does qsub tmpjob.sh . Before the job has had a chance to run, the tmpjob.sh may have been overwritten by the next instance of the loop. Is another copy of tmpjob.sh stored while the job is waiting to run? Or do I need to be careful not to change tmpjob.sh before the job has begun?
Assuming you're talking about torque, then yes; torque reads in the script at submission time. In fact the submission script need never exist as a file at all; as given as an example in the documentation for torque, you can pipe in commands to qsub (from the docs: cat pbs.cmd | qsub.)
But several other batch systems (SGE/OGE, PBS PRO) use qsub as a queue submission command, so you'll have to tell us what queuing system you're using to be sure.
Yes. You can even create jobs and sub-jobs with HERE Documents. Below is an example of a test I was doing with a script initiated by a cron job:
#!/bin/env bash
printenv
qsub -N testCron -l nodes=1:vortex:compute -l walltime=1:00:00 <<QSUB
cd \$PBS_O_WORKDIR
printenv
qsub -N testsubCron -l nodes=1:vortex:compute -l walltime=1:00:00 <<QSUBNEST
cd \$PBS_O_WORKDIR
pwd
date -Isec
QSUBNEST
QSUB

Perl or Bash threadpool script?

I have a script - a linear list of commands - that takes a long time to run sequentially. I would like to create a utility script (Perl, Bash or other available on Cygwin) that can read commands from any linear script and farm them out to a configurable number of parallel workers.
So if myscript is
command1
command2
command3
I can run:
threadpool -n 2 myscript
Two threads would be created, one commencing with command1 and the other command2. Whichever thread finishes its first job first would then run command3.
Before diving into Perl (it's been a long time) I thought I should ask the experts if something like this already exists. I'm sure there should be something like this because it would be incredibly useful both for exploiting multi-CPU machines and for parallel network transfers (wget or scp). I guess I don't know the right search terms. Thanks!
If you need the output not to be mixed up (which xargs -P risks doing), then you can use GNU Parallel:
parallel -j2 ::: command1 command2 command3
Or if the commands are in a file:
cat file | parallel -j2
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
In Perl you can do this with Parallel::ForkManager:
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 8 ); # number of jobs to run in parallel
open FILE, "<commands.txt" or die $!;
while ( my $cmd = <FILE> ) {
$pm->start and next;
system( $cmd );
$pm->finish;
}
close FILE or die $!;
$pm->wait_all_children;
There is xjobs which is better at separating individual job output then xargs -P.
http://www.maier-komor.de/xjobs.html
You could also use make. Here is a very interesting article on how to use it creatively
Source: http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/posts/7
# That's commands.txt file
echo Hello world
echo Goodbye world
echo Goodbye cruel world
cat commands.txt | xargs -I CMD --max-procs=3 bash -c CMD

Resources