Redirect output of my java program under qsub - pbs

I am currently running multiple Java executable program using qsub.
I wrote two scripts: 1) qsub.sh, 2) run.sh
qsub.sh
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
run.sh
#! /bin/bash
for param in 1 2 3
do
./qsub.sh java -jar myProgram.jar -param ${param}
done
Given the two scripts above, I submit jobs by
sh run.sh
I want to redirect the messages generated by myProgram.jar -param ${param}
So in run.sh, I replaced the 4th line with the following
./qsub.sh java -jar myProgram.jar -param ${param} > output-${param}.txt
but the messages stored in output.txt is "Your job 730 ("STDIN") has been submitted", which is not what I intended.
I know that qsub has an option -o for specifying the location of output, but I cannot figure out how to use this option for my case.
Can anyone help me?
Thanks in advance.

The issue is that qsub doesn't return the output of your job, it returns the output of the qsub command itself, which is simply informing your resource manager / scheduler that you want that job to run.
You want to use the qsub -o option, but you need to remember that the output won't appear there until the job has run to completion. For Torque, you'd use qstat to check the status of your job, and all other resource managers / schedulers have similar commands.

Related

Sun Grid Engine: submitted jobs by qsub command

I am using Sun Grid Engine queuing system.
Assume I submitted multiple jobs using a script that looks like:
#! /bin/bash
for i in 1 2 3 4 5
do
sh qsub.sh python run.py ${i}
done
qsub.sh looks like:
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
Assuming that 5 jobs are running, I want to find out which command each job is executing.
By using qstat -f, I can see which node is running which jobID, but not what specific command each jobID is related to.
So for example, I want to check which jobID=xxxx is running python run.py 3 and so on.
How can I do this?
I think you'll see it if you use qstat -j *. See https://linux.die.net/man/1/qstat-ge .
You could try running array jobs. Array jobs are useful when you have multiple inputs to process in the same way. Qstat will identify each instance of the array of jobs. See the docs for more information.
http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm#-t
http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto

SGE Cluster - script fails after submission - works in terminal

I have a script that I am trying to submit to a SGE cluster (on Redhat Linux). The very first part of the script defines the current folder from the full CWD path, as a variable to use downstream:
#!/usr/bin/bash
#
#$ -cwd
#$ -A username
#$ -M user#server
#$ -j y
#$ -m aes
#$ -N test
#$ -o test.log.txt
echo 'This is a test.'
result="${PWD##*/}"
echo $result
In bash, this works as expected:
CWD:
-bash-4.1$ pwd
/home/user/test
Run script:
-bash-4.1$ bash test.sh
This is a test.
test
When I submit the job to the cluster:
-bash-4.1$ qsub -V test.sh
and examine the log file:
This is a test.
Missing }.
Does anyone know why the job submission is saying "Missing } " when it works right from the command-line? I'm not sure what I'm missing here.
Thanks.
The posix standard for batch schedulers requires them to ignore the #! line and instead use either a shell configured into the cluster or one selected by the -S option of qsub. The default is usually csh. So adding something like #$ -S /usr/bin/bash into the script will cause it to be interpreted by bash.
Alternatively you could convince the cluster admin to change the queues to unix_behavior from posix_compliant.

Torque nested/successive qsub call

I have a jobscript compile.pbs which runs on a single CPU and compiles source code to create an executable. I then have a 2nd job script jobscript.pbs which I call using 32 CPU's to run that newly created executable with MPI. They both work perfectly when I manually call them in succession, but I would like to automate the process by having the first script call the 2nd jobscript just before it ends. Is there a way to properly nest qsub calls or have them be called in succession?
Currently my attempt is to have the first script call the 2nd script right before it ends, but when I try that I get a strange error message from the 2nd (nested) qsub:
qsub: Bad UID for job execution MSG=ruserok failed validating masterhd/masterhd from s59-16.local
I think the 2nd script is being called properly, but maybe the permissions are not the same as when I called the original one. Obviously my user name masterhd is allowed to run the jobscripts because it works fine when I call the jobscript manually. Is there a way to accomplish what I am trying to do?
Here is a more detailed example of the procedure. First I call the first jobscript and specify a variable with -v:
qsub -v outpath='/home/dest_folder/' compile.pbs
That outpath variable just specifies where to copy the new executable, and then the 2nd jobscript changes to that output directory and attempts to run jobscript.pbs.
compile.pbs:
#!/bin/bash
#PBS -N compile
#PBS -l walltime=0:05:00
#PBS -j oe
#PBS -o ocompile.txt
#Perform compiling stuff:
module load gcc-openmpi-1.2.7
rm *.o
make -f Makefile
#Copy the executable to the destination:
cp visct ${outpath}/visct
#Change to the output path before calling the next jobscript:
cd ${outpath}
qsub jobscript
jobscript.pbs:
#!/bin/bash
#PBS -N run_exe
#PBS -l nodes=32
#PBS -l walltime=96:00:00
#PBS -j oe
#PBS -o results.txt
cd $PBS_O_WORKDIR
module load gcc-openmpi-1.2.7
time mpiexec visct
You could make a submitting script that qsubs both jobs, but makes the second execute only if, and after, the first was completed without errors:
JOB1CMD="qsub -v outpath='/home/dest_folder/' compile.pbs -t" # -t for terse output
JOB1OUT=$(eval $JOB1CMD)
JOB1ID=${JOB1OUT%%.*} # parse to get job id, change accordingly
JOB2CMD="qsub jobscript.pbs -W depend=afterok:$JOB1ID"
eval $JOB2CMD
It's possible that there are restrictions on your system to run scripts inside scripts. Your first job only runs for 5 minutes and then the second job needs 96 hours. If the second job is requested inside the first job, that would violate the time limit of the first job.
Why can't you just put the compile part at the beginning of the second script?

can i delete a shell script after it has been submitted using qsub without affecting the job?

I want to submit a a bunch of jobs using qsub - the jobs are all very similar. I have a script that has a loop, and in each instance it rewrites over a file tmpjob.sh and then does qsub tmpjob.sh . Before the job has had a chance to run, the tmpjob.sh may have been overwritten by the next instance of the loop. Is another copy of tmpjob.sh stored while the job is waiting to run? Or do I need to be careful not to change tmpjob.sh before the job has begun?
Assuming you're talking about torque, then yes; torque reads in the script at submission time. In fact the submission script need never exist as a file at all; as given as an example in the documentation for torque, you can pipe in commands to qsub (from the docs: cat pbs.cmd | qsub.)
But several other batch systems (SGE/OGE, PBS PRO) use qsub as a queue submission command, so you'll have to tell us what queuing system you're using to be sure.
Yes. You can even create jobs and sub-jobs with HERE Documents. Below is an example of a test I was doing with a script initiated by a cron job:
#!/bin/env bash
printenv
qsub -N testCron -l nodes=1:vortex:compute -l walltime=1:00:00 <<QSUB
cd \$PBS_O_WORKDIR
printenv
qsub -N testsubCron -l nodes=1:vortex:compute -l walltime=1:00:00 <<QSUBNEST
cd \$PBS_O_WORKDIR
pwd
date -Isec
QSUBNEST
QSUB

Redirecting Output of Bash Child Scripts

I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?
I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?
try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.

Resources