Write log on "TIME LIMIT" - slurm

I run a bunch of commands during my job. For example:
#!/bin/bash -l
#SBATCH --time 1-00:00:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --job-name foo
#SBATCH --out foo_%j.out
stdbuf -o0 -e0 mycommand input1.yaml
stdbuf -o0 -e0 mycommand input2.yaml
stdbuf -o0 -e0 mycommand input3.yaml
stdbuf -o0 -e0 mycommand input4.yaml
stdbuf -o0 -e0 mycommand input5.yaml
stdbuf -o0 -e0 mycommand input6.yaml
stdbuf -o0 -e0 mycommand input7.yaml
If my job get cut when it reaches the time limit, I would like to know where it was cut in order to be able to easily continue and/or remove corrupted data.

I don't know whether this is what you need, but anyways you could just print it out using echo.
#!/bin/bash -l
#SBATCH --time 1-00:00:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --job-name foo
#SBATCH --out foo_%j.out
stdbuf -o0 -e0 mycommand input1.yaml
echo "Finished Job step 1"
stdbuf -o0 -e0 mycommand input2.yaml
echo "Finished Job step 2"
stdbuf -o0 -e0 mycommand input3.yaml
echo "Finished Job step 3"
You could also write this into a separate file like. For example, echo "message" >> job_record.out

Related

Is there any way to run more than one parallel job simultaneously using a single job script?

Is there any way to run more than one parallel job simultaneously using a single job script? I have written a script like this. However, it is not processing four jobs simultaneously. Only 12 cores out of 48 are running a single job. Only one by one the four codes (from four different directories) are running.
#!/bin/sh
#SBATCH --job-name=my_job_name # Job name
#SBATCH --ntasks-per-node=48
#SBATCH --nodes=1
#SBATCH --time=24:00:00 # Time limit hrs:min:sec
#SBATCH -o cpu_srun_new.out
#SBATCH --partition=medium
module load compiler/intel/2019.5.281
cd a1
mpirun -np 12 ./a.out > output.txt
cd ../a2
mpirun -np 12 ./a.out > output.txt
cd ../a3
mpirun -np 12 ./a.out > output.txt
cd ../a4
mpirun -np 12 ./a.out > output.txt
Commands in sh (like in any other shell) are blocking, meaning that once you run them, the shell waits for its completion before looking at the next comment, unless you append an ampersand & at the end of the command.
Your script should look like this:
#!/bin/sh
#SBATCH --job-name=my_job_name # Job name
#SBATCH --ntasks-per-node=48
#SBATCH --nodes=1
#SBATCH --time=24:00:00 # Time limit hrs:min:sec
#SBATCH -o cpu_srun_new.out
#SBATCH --partition=medium
module load compiler/intel/2019.5.281
cd a1
mpirun -np 12 ./a.out > output1.txt &
cd ../a2
mpirun -np 12 ./a.out > output2.txt &
cd ../a3
mpirun -np 12 ./a.out > output3.txt &
cd ../a4
mpirun -np 12 ./a.out > output4.txt &
wait
Note the & at the end of the mpirun lines, and the addition of the wait command at the end of the script. That command is necessary to make sure the script does not end before the mpirun commands are completed.

Bash unable to assign variable

#!/usr/bin/env bash
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=100G
USEAGE="metascript.sh <wd> <wd1>"
source ~/anaconda2/etc/profile.d/conda.sh
conda activate assembly
wd=$1
wd1=$2
cd $wd
cd $wd1
for f in SRR*/ ; do
[[ -e $f ]] || continue
SRR=${f::-1}
cd ../..
jdid=$(sbatch -J FirstQC_$SRR ./pipelines/preprocessingbowtietrinity/FirstFastqc.sh $wd $wd1 $SRR)
#echo ${jdid[0]}|grep -o '[0-9]\+'
jobid=${jdid[0]}
jobid1=${jobid[0]}|grep -o '[0-9]\+'
#echo $jobid1
Hi all just having issues with my bash scripting, so I can print the line ${jdid[0]}|grep -o '[0-9]+' however when I assign it to a variable it is unable to return anything.
If the idea is to extract just the job ID from the output of sbatch, you can also use sbatch's --parsable argument. See here in the documentation.
jdid=$(sbatch --parsable -J FirstQC_$SRR ./pipelines/preprocessingbowtietrinity/FirstFastqc.sh $wd $wd1 $SRR)
and jdij will only contain the job ID if the cluster is not part of a federation.
jobid1=${jobid[0]}|grep -o '[0-9]\+'
I can print the line ${jdid[0]}|grep -o '[0-9]+' however when I assign it to a variable it is unable to return anything.
In order to assign the output of a pipeline to a variable or insert it into a command line for any other purpose, you have Command Substitution at hand:
jobid1=`echo ${jobid[0]}|grep -o '[0-9]\+'`
Of course, with bash this is better written as:
jobid1=`<<<${jobid[0]} grep -o '[0-9]\+'`
If the issue is printing the line ${jdid[0]}|grep -o '[0-9]+' as your question.
Just put the line in double quotation marks and it will work out.
Here is a little test i made:
jobid1="{jobid[0]}|grep -o '[0-9]\+'"
echo $jobid1
the out put is {jobid[0]}|grep -o '[0-9]\+'

slurm/sbatch doesn't work when option `-o` is specified

I'm trying to run the following script with sbatch on our cluster.
#!/bin/bash
#SBATCH -o /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/work/chunkaa/work/a4/6d0605f453add1d97d609839cfd318/command.log
#SBATCH --no-requeue
#SBATCH --partition=Bird
set -e
echo "Hello" 1>&2
sbatch displays a job-id on stdout, there is nothing listed in squeue and it looks like nothing was written/executed.
If the line #SBATCH -o /SCRATCH-BIRD/users/... is removed , then the script works.
the directory exists
$ test -w /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/work/chunkaa/work/a4/6d0605f453add1d97d609839cfd318/ && echo OK
OK
could it be a problem with the filesystem ? how can I test this ?
OK, got it the partition is visible from the login node but not from the cluster nodes.

set values ​for job variables from another file

I wanted to indicate the name and other values ​​of some variables of a job from another file but I get an error.
sbatch: error: Unable to open file 10:12:35
file.sh
#!/bin/bash
DATE=`date '+%Y-%m-%d %H:%M:%S'`
name='test__'$DATE
sbatch -J $name -o $name'.out' -e $name'.err' job.sh
job.sh
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=2 # number of cores
#SBATCH --output=.out
#SBATCH --error=.err
#module load R
Rscript script.R
script.R
for(i in 1:1e6){print(i)}
You are wrongly quoting the variables and the space requested in the date is creating two arguments to sbatch, hence he is complaining about that wrong parameter.
If I were you, I would avoid the space (as a general rule, cause it is more error prone and always requires quoting):
file.sh:
#!/bin/bash
DATE=$(date '+%Y-%m-%dT%H:%M:%S')
name="test__$DATE"
sbatch -J "$name" -o "${name}.out" -e "${name}.err" job.sh

SGE qsub define variable using bach?

I am trying to automatically set up several variables for SGE system but get no luck.
#!/bin/bash
myname="test"
totaltask=10
#$ -N $myname
#$ -cwd
#$ -t 1-$totaltask
apparently $myname will not be recognized. any solution?
thanks a lot
Consider making a wrapper function
qsub_file.sh
#!/bin/bash
#$ -V
#$ -cwd
wrapper_script.sh
#!/bin/bash
myname="test"
totaltask=10
qsub qsub_script.sh -N ${myname} -t 1-${totaltask}

Resources