Job summited with qsub does not write output and enters E status - qsub

I have a job called test.sh:
#!/bin/sh -e
#PBS -S /bin/sh
#PBS -V
#PBS -o /my/many/directories/file.log
#PBS -e /my/many/directories/fileerror.log
#PBS -r n
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -V
#############################################################################
echo 'hello'
date
sleep 10
date
I submit it with qsub test.sh
It counts to 10 seconds, but it doesn't write hello to file.log or anywhere else. If I include a call to another script I need that I programmed (and runs outside the cluster), it just goes to Exiting status after said 10 seconds and plainly ignores the call.
Help, please?

Thanks Ott Toomet for your suggestion! I found the problem elsewhere. The .tschrc file had "bash" written in it. Don't ask me why. I deleted it and now the jobs happily run.

Related

how to submit multiple jobs using a single .sh file

I am using WinSCP SSH Client to connect my Windows PC to a HPC system. I created a file called RUN1.sh including following code:
#PBS -S /bin/bash
#PBS -q batch
#PBS -N Rjobname
#PBS -l nodes=1:ppn=1:AMD
#PBS -l walltime=400:00:00
#PBS -l mem=20gb
cd $PBS_O_WORKDIR
module load mplus/7.4
mplus -i mplus1.inp -o mplus1.out > output_${PBS_JOBID}.log
It simply calls mplus1.inp file and save results of the analysis in the mplus1.out file. I can do it on Linux one by one. I can do the same for mplus2.inp file running RUN2.sh file below:
#PBS -S /bin/bash
#PBS -q batch
#PBS -N Rjobname
#PBS -l nodes=1:ppn=1:AMD
#PBS -l walltime=400:00:00
#PBS -l mem=20gb
cd $PBS_O_WORKDIR
module load mplus/7.4
mplus -i mplus2.inp -o mplus2.out > output_${PBS_JOBID}.log
However, I have 400 files like this (RUN1.sh, RUN2.sh, ......RUN400.sh). I was wondering if there is a way to create a single file in order to run all 400 jobs in linux.

Qsub job runs but doesn't write to file

I am running a parallelised code on an SGE cluster, via the qsub command.
The code (which compiled successfully on the system on which it is supposed to run) is meant to take a file of input values, minimise some function of those values, and then output the new values to the same input file.
The job executes succesfully (code 0), and runs for about 40 minutes of walltime: but nothing is written to the input file.
This is my script to submit the jobs:
#!/bin/bash
#PBS -V
#PBS -l select=1:ncpus=20:mpiprocs=20,walltime=02:00:00
#PBS -o some/path
#PBS -e some/path
#PBS -q smp
#PBS -m ae
#PBS -M user#username.com
#PBS -P Name
#PBS -I
#PBS -N minMg-1
module load gcc/5.1.0
module load chpc/openmpi/1.10.2/gcc-5.1.0
mpirun -np 20 $SRCDIR/myexecutable args < inputfile.inp
I can't see why the thing executes successfully, but doesn't write to inputfile.inp. Strangely, I also don't get the standard ".o" and ".e" files, either. I am sure my mistake may be obvious to someone in the know! Any help would be deeply appreciated.

qsub array job delay

#!/bin/bash
#PBS -S /bin/bash
#PBS -N garunsmodel
#PBS -l mem=2g
#PBS -l walltime=1:00:00
#PBS -t 1-2
#PBS -e error/error.txt
#PBS -o error/output.txt
#PBS -A improveherds_my
#PBS -m ae
set -x
c=$PBS_ARRAYID
nodeDir=`mktemp -d /tmp/phuong.XXXXX`
cp -r /group/dairy/phuongho/garuns $nodeDir
cp /group/dairy/phuongho/jo/parity1/my/simplex.bin $nodeDir/garuns/simplex.bin
cp /group/dairy/phuongho/jo/parity1/nttp.txt $nodeDir/garuns/my.txt
cp /group/dairy/phuongho/jo/parity1/delay_input.txt $nodeDir/garuns/delay_input.txt
cd $nodeDir/garuns
module load gcc vle
XXX=`pwd`
sed -i "s|/group/dairy/phuongho/garuns/out|$XXX/out/|" exp/garuns.vpz
awk -v i="$c" 'NR == 1 || $8==i' my.txt > simplex-observed.txt
awk -v i="$c" 'NR == 1 || $7==i {print $6}' delay_input.txt > afm_param.txt
cp "/group/dairy/phuongho/garuns_param.txt" "$nodeDir/garuns/garuns_param.txt"
while true
do
./simplex.bin &
sleep 5m
done
awk 'NR >1' < simplex-optimum-output.csv>> /group/dairy/phuongho/jo/parity1/my/finalresuls${c}.csv
cp simplex-all-output.csv "/group/dairy/phuongho/jo/parity1/my/simplex-all-output${c}.csv"
#awk '$28==1{print $1, $12,$26,$28,c}' c=$c out/exp_tempfile.csv > /group/dairy/phuongho/jo/parity1/my/simulated_my${c}.csv
cp /out/exp_tempfile.csv /group/dairy/phuongho/jo/parity1/my/exp_tempfile${c}.csv
rm simplex-observed.txt
rm garuns_param.txt
I have above bash script that allows submitting multiple jobs at the same time via PBS_ARRAYID. My issue is that my model (simplex.bin) when it executes it writes something to my home directory. Thus, if one jobs runs at a time or wait until next jobs finished writing stuff to home then it is fine. However, as I want to have >1000 jobs running at a time, 1000 of them try to write the same stuff to home, then leading to crash.
Is there any a smart way to just submit the second job after the first one has already started for a certain amount of time (let's say 5 minutes)?
I already checked and found two options: starts 2nd job when 1st finished, or start at a specific date/time.
Thanks
You can try something like the following:
while [ yes ]
do
./simplex.bin &
sleep 2
done
It endlessly starts ./simplex.bin process in the background, waits for 2 seconds, starts a new ./simplex.bin, etc.
Please note that you may also need nohup and add standard input/output redirection for your ./simplex.bin. Depending on your exact requirements
If you are using Torque, you can set a limit on the number of jobs that can run concurrently:
# Only allow 100 jobs to concurrently execute from this job array
qsub myscript.sh -t 0-10000%100
I know this isn't exactly what you're looking for, but I'm guessing you can find a slot limit that'll make it run without crashing.

How to pass an argument to a job and keep it unchanged in parallel fashion

I am trying to execute a series of job in different directories. I want to pass the directory as an input argument to the job. So far I understood that I can use environmental variable as a way to send argument to the jobs. But the problem is since jobs run in parallel fashion, the last value of this variable will be used for all jobs. let look at my code :
for i in "${arr[#]}"
do
export dir=$i
qsub myBashFile.sh
done
and in my job I used the variable dir to do some operation. I want each job execute with its own input parameter.
Edit: this is my job
#!/bin/sh
#
#
#PBS -N Brownie
#PBS -o test.output.txt
#PBS -e test.error.txt
#PBS -l walltime=2:00:00
#PBS -m n
#PBS -V dir
cd $dir
./run_mycode.sh
I know this is not correct, but i am looking for an alternative way to keep the value of dir unchanged and unique for all jobs independently.
I also tried to modify a variable in job file with sed command like below:
sed "s/dir/"'$i'"/g" my_job.sh > alljobs/my_jobNew.sh
but instead of putting the actual value of $i, dir changes exactly to $i which is meaningless in my_job.sh.
Have you tried passing the directory as command_args as explained in the manpage qsub(1)? That would be:
for i in "${arr[#]}"
do
qsub myBashFile.sh -- "$i"
done
You should be able to access it as $1 inside myBashFile.sh.
I would use $PBS_O_WORKDIR for this. Change your submission script to this:
for i in "${arr[#]}"
do
cd /path/to/$i
qsub /path/to/myBashFile.sh
done
In your job you would then change 'cd $dir' to 'cd $PBS_O_WORKDIR'.

Running samtools from a qsub

I'm trying to run some samtools commands from a qsub call (to run on a cluster). For some reason, the commands do not seem to be recognized. However, if I copy-paste the command and run it directly from the terminal cluster, it works fine. Has anybody experienced such issues or have an idea what I'm doing wrong?
Thanks,
Patrick
My qsub (this doesn't work):
#!/bin/bash
#./etc/sysconfig/pssc
#PBS -S /bin/bash
#PBS JOB_NAME="QSH_$(whoami)"
#PBS NODE_NUM="1"
#PBS NODE_PPN="${NODE_NCPUS}"
#PBS HOURS="24"
#PBS MINUTES="00"
#PBS SECONDS="00"
#PBS WALLTIME=${HOURS}:${MINUTES}:${SECONDS}
#PBS RES_LIST="nodes=${NODE_NUM}:ppn=${NODE_PPN}"
#PBS DIR_WORK="${PBS_O_WORKDIR}"
#PBS QUEUE="high"
#PBS cd ${DIR_WORK}
samtools index /data/test.bam /data/test.bai
If I run the command directly from the terminal, it works:
samtools index /data/test.bam /data/test.bai
Did you remember to cd into your working dir? I do not believe that qsub expands the $ variables in e.g. PBS cd ${DIR_WORK}.
Try with this script:
#!/bin/bash
#./etc/sysconfig/pssc
#PBS JOB_NAME=test
#PBS WALLTIME=24:00:00
cd ${PBS_O_WORKDIR}
echo `pwd`
dir

Resources