Submit lots of jobs in a SHELL script - slurm

I am using a simple shell script to read and write lots of files (more than 300 files) in a HPC and I want to submit it using slurm.
The script looks like this:
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=40
#SBATCH --exclusive
for in_file in ${in_files}; do
# do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file}
done
I may not submit all my tasks at one time because the number of files are larger than the nodes I can use. So is there a better way I can deal with the large number of files?

Are the files totally independent? It sounds you could make a bash script that schedules a job for each file and passes it to each job. One way would be to as a environment variable.
Something like
#!/bin/bash
for in_file in ${in_files}; do
export in_file=${in_file}
sbatch slurmjob.sh
done
Where your slurmjob.sh is something like
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file} > somefile.txt
``

Related

linux slurm - separate .out files for tasks run in paralell on 1 node

I am running jobs in parallel on linux using slurm by requesting a node and running one task per cpu.
However, the output as specified joins both streams into the single out file. I tried the %t flag on the epxectation it would separate the tasks, but it just logs everything in the output file with _0 appended (e.g. sample_output__XXX_XX_0.out).
Any advice on how to best generate a separate .out log per task would be much appreciated
#!/bin/bash
#SBATCH --job-name=recon_all_06172021_1829
#SBATCH --output=/path/recon_all_06172021_1829_%A_%a_%t.out
#SBATCH --error=/path/recon_all_06172021_1829_%A_%a.err
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=23:59:00
#! Always keep the following echo commands to monitor CPU, memory usage
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "SLURM_JOB_NUM_NODES: $SLURM_JOB_NUM_NODES"
echo "SLURM_NNODES: $SLURM_NNODES"
echo "SLURM_NTASKS: $SLURM_NTASKS"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
command 1 &
command 2
wait
You can redirect the standard output from the command itself, for example:
command 1 > file1 2>&1
command 2 > file2 2>&1
Not as neat as using the sbatch filename patterns, but it will separate the output from each command.

how to submit slurm job array with different input files

I have a list of text files (~ 200 files) that needs to be processed. So I was trying to submit a slurm job array for this task but I cannot find a solution. What I have tried is to submit a multiple jobs (~ 200 jobs) by looping through the files for just a single task. I am sure that there is a way to create job array for this problem, could you please advice me?
That is my bash script
#!/bin/bash
input_list=$1
prefix=$2
readarray list < $input_list
a=0
for i in "${list[#]}"
do
file=$i
sbatch -a1-1 -- $PWD/kscript.sh $file $prefix"_"$a
a=`expr $a + 1`
done
I figured the solution so it is simpler than I thought, here it is:
#!/bin/bash
# Task name
#SBATCH -J myjob
# Run time limit
#SBATCH --time=4:00:00
# Standard and error output in different files
#SBATCH -o %j_%N.out.log
#SBATCH -e %j_%N.err.log
#SBATCH --ntasks=1
# Execute application code
input_list=$1
prefix=$2
readarray list < $input_list
file=${list[$SLURM_ARRAY_TASK_ID-1]}
input=$PWD"/"$file
output=$prefix"_"$SLURM_ARRAY_TASK_ID
./kscript.sh $input $output
Running the script is as usual:
sbtach --array=1-n
where n is the length/size of the list. For example, if there a list of 100 files then this becomes sbtach --array=1-100

passing a string argument to name a job in SLURM

I want to pass a parameter to as bash script in a cluster in order to name the job. I tried this:
#!/bin/bash
#SBATCH -J "$1" #<--- to name the job with the first parameter
#SBATCH --partition=shortq
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
echo "this is a test job named" $1
Gate main.mac
When I launch the job with
sbatch my_script.sh test_sript
I'm getting a file named $1-23472.out . It appears that "$1" didn't be interpreted. How can I have a file named "test_script-23472.out" ?
Also, is the line Gate main.mac mandatory? Can anyone explains me why we should put it ?
Many thanks
You probably can't do it exactly as you want to, but here's a solution that comes pretty close:
Batch script:
#!/bin/bash
#SBATCH --partition=shortq
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
echo "this is a test job named" $SLURM_JOB_NAME
(rest of your script here)
Submit with:
$ sbatch -J jobname my_script.sh
Slurm will not interpret the Bash variable in the comments. Bash either since it is in a comment.
One solution is a construct like this for submission:
ARG="<something>" sbatch -J "$ARG" my_script.sh test_sript "$ARG"
As for the Gate main.mac line, it is used to start the Gate program with main.mac as argument.
This is how I've been formatting slurm scripts to parse bash variables as job names.
#!/bin/bash
MYVAR=$1
sbatch --export=ALL -J ${MYVAR} --wrap="run something"

Insert system variables to SBATCH

I would like to ask you if is possible to pass global system variables to #SBATCH tags.
I would like to do somethink like that
SBATCH FILE
#!/bin/bash -l
ARG=64.dat
NODES=4
TASK_PER_NODE=8
NP=$((NODES*TASK_PER_NODE))
#SBATCH -J 'MPI'+'_'+$NODES+'_'+$TASK_PER_NODE
#SBATCH -N $NODES
#SBATCH --ntasks-per-node=$TASK_PER_NODE
It isn't workink so that is why I ask you.
Remember that SBATCH parameter lines are viewed by Bash as comments, so it will not try to interpret them at all.
Furthermore, the #SBATCH directives must be before any other Bash command for Slurm to handle them.
Alternatives include setting the parameters in the command line:
NODES=4 sbatch --nodes=$NODES ... submitscript.sh
or passing the submission script through stdin:
#!/bin/bash -l
ARG=64.dat
NODES=4
TASK_PER_NODE=8
NP=$((NODES*TASK_PER_NODE))
sbatch <<EOT
#SBATCH -J "MPI_$NODES_$TASK_PER_NODE"
#SBATCH -N $NODES
#SBATCH --ntasks-per-node=$TASK_PER_NODE
srun ...
EOT
In this later case, you will need to run the submission script rather than handing it to sbatch since it will run sbatch itself. Note also that string concatenation in Bash is not achieved with the + sign.

Using SBATCH Job Name as a Variable in File Output

With SBATCH you can use the job-id in automatically generated output files using the following syntax with %j:
#!/bin/bash
# omitting some other sbatch commands here ...
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH -e slurm-%j.err-%N # name of the stderr, using job and first node values
I've been looking for a similar syntax for using the job-name instead of the job-id. Does anyone have a reference for what other slurm/sbatch values can be referenced in the %j style syntax?
In the newest versions of SLURM there is an option %x that represents job name.
See the "Changes in Slurm 17.02.1" section on the github:
https://github.com/SchedMD/slurm/blob/master/NEWS
However on many current clusters the slurm version is older than that and this option is not implemented. You can view the version of the slurm scheduler on your system:
sbatch --version
However there is a workaround.
You can create your own bash script, that can take a name as an argument, create a submission script that uses that name for the job name and output files and then submit it. For example,
You can create a script submit.sh:
#!/bin/bash
echo "#!/bin/bash" > jobscript.sh
echo "#SBATCH -o $1-%j.out-%N" >> jobscript.sh
echo "#SBATCH -e $1-%j.err-%N" >> jobscript.sh
echo "#SBATCH -J $1" >> jobscript.sh
#other echo commands with SBATCH options
echo "srun mycommand" >> jobscript.sh
#submit the job
sbatch jobscript.sh
And then execute it with an argument that correspond to the job name you want to give to your job:
bash ./submit.sh myJobName

Resources