how to submit slurm job array with different input files - slurm

I have a list of text files (~ 200 files) that needs to be processed. So I was trying to submit a slurm job array for this task but I cannot find a solution. What I have tried is to submit a multiple jobs (~ 200 jobs) by looping through the files for just a single task. I am sure that there is a way to create job array for this problem, could you please advice me?
That is my bash script
#!/bin/bash
input_list=$1
prefix=$2
readarray list < $input_list
a=0
for i in "${list[#]}"
do
file=$i
sbatch -a1-1 -- $PWD/kscript.sh $file $prefix"_"$a
a=`expr $a + 1`
done

I figured the solution so it is simpler than I thought, here it is:
#!/bin/bash
# Task name
#SBATCH -J myjob
# Run time limit
#SBATCH --time=4:00:00
# Standard and error output in different files
#SBATCH -o %j_%N.out.log
#SBATCH -e %j_%N.err.log
#SBATCH --ntasks=1
# Execute application code
input_list=$1
prefix=$2
readarray list < $input_list
file=${list[$SLURM_ARRAY_TASK_ID-1]}
input=$PWD"/"$file
output=$prefix"_"$SLURM_ARRAY_TASK_ID
./kscript.sh $input $output
Running the script is as usual:
sbtach --array=1-n
where n is the length/size of the list. For example, if there a list of 100 files then this becomes sbtach --array=1-100

Related

Submit lots of jobs in a SHELL script

I am using a simple shell script to read and write lots of files (more than 300 files) in a HPC and I want to submit it using slurm.
The script looks like this:
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=40
#SBATCH --exclusive
for in_file in ${in_files}; do
# do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file}
done
I may not submit all my tasks at one time because the number of files are larger than the nodes I can use. So is there a better way I can deal with the large number of files?
Are the files totally independent? It sounds you could make a bash script that schedules a job for each file and passes it to each job. One way would be to as a environment variable.
Something like
#!/bin/bash
for in_file in ${in_files}; do
export in_file=${in_file}
sbatch slurmjob.sh
done
Where your slurmjob.sh is something like
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file} > somefile.txt
``

linux slurm - separate .out files for tasks run in paralell on 1 node

I am running jobs in parallel on linux using slurm by requesting a node and running one task per cpu.
However, the output as specified joins both streams into the single out file. I tried the %t flag on the epxectation it would separate the tasks, but it just logs everything in the output file with _0 appended (e.g. sample_output__XXX_XX_0.out).
Any advice on how to best generate a separate .out log per task would be much appreciated
#!/bin/bash
#SBATCH --job-name=recon_all_06172021_1829
#SBATCH --output=/path/recon_all_06172021_1829_%A_%a_%t.out
#SBATCH --error=/path/recon_all_06172021_1829_%A_%a.err
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=23:59:00
#! Always keep the following echo commands to monitor CPU, memory usage
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "SLURM_JOB_NUM_NODES: $SLURM_JOB_NUM_NODES"
echo "SLURM_NNODES: $SLURM_NNODES"
echo "SLURM_NTASKS: $SLURM_NTASKS"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
command 1 &
command 2
wait
You can redirect the standard output from the command itself, for example:
command 1 > file1 2>&1
command 2 > file2 2>&1
Not as neat as using the sbatch filename patterns, but it will separate the output from each command.

passing a string argument to name a job in SLURM

I want to pass a parameter to as bash script in a cluster in order to name the job. I tried this:
#!/bin/bash
#SBATCH -J "$1" #<--- to name the job with the first parameter
#SBATCH --partition=shortq
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
echo "this is a test job named" $1
Gate main.mac
When I launch the job with
sbatch my_script.sh test_sript
I'm getting a file named $1-23472.out . It appears that "$1" didn't be interpreted. How can I have a file named "test_script-23472.out" ?
Also, is the line Gate main.mac mandatory? Can anyone explains me why we should put it ?
Many thanks
You probably can't do it exactly as you want to, but here's a solution that comes pretty close:
Batch script:
#!/bin/bash
#SBATCH --partition=shortq
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
echo "this is a test job named" $SLURM_JOB_NAME
(rest of your script here)
Submit with:
$ sbatch -J jobname my_script.sh
Slurm will not interpret the Bash variable in the comments. Bash either since it is in a comment.
One solution is a construct like this for submission:
ARG="<something>" sbatch -J "$ARG" my_script.sh test_sript "$ARG"
As for the Gate main.mac line, it is used to start the Gate program with main.mac as argument.
This is how I've been formatting slurm scripts to parse bash variables as job names.
#!/bin/bash
MYVAR=$1
sbatch --export=ALL -J ${MYVAR} --wrap="run something"

set values ​for job variables from another file

I wanted to indicate the name and other values ​​of some variables of a job from another file but I get an error.
sbatch: error: Unable to open file 10:12:35
file.sh
#!/bin/bash
DATE=`date '+%Y-%m-%d %H:%M:%S'`
name='test__'$DATE
sbatch -J $name -o $name'.out' -e $name'.err' job.sh
job.sh
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=2 # number of cores
#SBATCH --output=.out
#SBATCH --error=.err
#module load R
Rscript script.R
script.R
for(i in 1:1e6){print(i)}
You are wrongly quoting the variables and the space requested in the date is creating two arguments to sbatch, hence he is complaining about that wrong parameter.
If I were you, I would avoid the space (as a general rule, cause it is more error prone and always requires quoting):
file.sh:
#!/bin/bash
DATE=$(date '+%Y-%m-%dT%H:%M:%S')
name="test__$DATE"
sbatch -J "$name" -o "${name}.out" -e "${name}.err" job.sh

SLURM sbatch multiple parallel calls to executable

I have an executable that takes multiple options and multiple file inputs in order to run. The executable can be called with a variable number of cores to run.
E.g. executable -a -b -c -file fileA --file fileB ... --file fileZ --cores X
I'm trying to create an sbatch file that will enable me to have multiple calls of this executable with different inputs. Each call should be allocated in a different node (in parallel with the rest), using X cores. The parallelization at core level is taken care of the executable, while at the node level by SLURM.
I tried with ntasks and multiple sruns but the first srun was called multiple times.
Another take was to rename the files and use a SLURM process or node number as filename before the extension but it's not really practical.
Any insight on this?
i do these kind of jobs always with the help of bash script that i run by a sbatch command. The easiest approach would be to have a loop in a sbatch script where you spawn the different job and job steps under your executable with srun specifying i.e. the corresponding node name in your partion with -w . You may also read up the documentation of slurm array jobs (if that befits you better). Alternatively you could also store all parameter combinations in a file and than loop over them with the script of have a look at "array job" manual page.
Maybe the following script (i just wrapped it up) helps you to get a feeling for what i have in mind (i hope its what you need). Its not tested so dont just copy and paste it!
#!/bin/bash
parameter=(10 5 2)
node_names=(node1 node2 node3)
# lets run one job per node each time taking one parameter
for parameter in ${parameter[*]}
# asign parameter to node
#script some if else condition here to specify parameters
# -w specifies the name of the node to use
# -N specifies the amount of nodes
JOBNAME="jmyjob$node-$parameter"
# asign the first job to the node
$node=${node_names[0]}
#delete first node from list
unset node_names[0];
#reinstantiate list
node_names=("${Unix[#]}")
srun -N1 -w$node -psomepartition -JJOBNAME executable.sh model_parameter &
done;
You will have the problem that you need to force your sbatch script to wait for the last job step. In this case the follwoing additional while loop might help you.
# Wait for the last job step to complete
while true;
do
# wait for last job to finish use the state of sacct for that
echo "waiting for last job to finish"
sleep 10
# sacct shows your jobs, -R only running steps
sacct -s R,gPD|grep "myjob*" #your job name indicator
# check the status code of grep (1 if nothing found)
if [ "$?" == "1" ];
then
echo "found no running jobs anymore"
sacct -s R |grep "myjob*"
echo "stopping loop"
break;
fi
done;
I managed to find one possible solution, so I'm posting it for reference:
I declared as many tasks as calls to the executable, as well as nodes and the desired number of cpus per call.
And then a separate srun for each call, declaring the number of nodes and tasks at each call. All the sruns are bound with ampersands (&):
srun -n 1 -N 1 --exclusive executable -a1 -b1 -c1 -file fileA1 --file fileB1 ... --file fileZ1 --cores X1 &
srun -n 1 -N 1 --exclusive executable -a2 -b2 -c2 -file fileA2 --file fileB2 ... --file fileZ2 --cores X2 &
....
srun -n 1 -N 1 --exclusive executable -aN -bN -cN -file fileAN --file fileBN ... --file fileZN --cores XN
--Edit: After some tests (as I mentioned in a comment below), if the process of the last srun ends before the rest, it seems to end the whole job, leaving the rest unfinished.
--edited based on the comment by Carles Fenoy
Write a bash script to populate multiple xyz.slurm files and submit each of them using sbatch. Following script does a a nested for loop to create 8 files. Then iterate over them to replace a string in those files, and then batch them. You might need to modify the script to suit your need.
#!/usr/bin/env bash
#Path Where you want to create slurm files
slurmpath=~/Desktop/slurms
rm -rf $slurmpath
mkdir -p $slurmpath/sbatchop
mkdir -p /exports/home/schatterjee/reports
echo "Folder /slurms and /reports created"
declare -a threads=("1" "2" "4" "8")
declare -a chunks=("1000" "32000")
declare -a modes=("server" "client")
## now loop through the above array
for i in "${threads[#]}"
{
for j in "${chunks[#]}"
{
#following are the content of each slurm file
cat <<EOF >$slurmpath/net-$i-$j.slurm
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --output=$slurmpath/sbatchop/net-$i-$j.out
#SBATCH --wait-all-nodes=1
echo \$SLURM_JOB_NODELIST
cd /exports/home/schatterjee/cs553-pa1
srun ./MyNETBench-TCP placeholder1 $i $j
EOF
#Now schedule them
for m in "${modes[#]}"
{
for value in {1..5}
do
#Following command replaces placeholder1 with the value of m
sed -i -e 's/placeholder1/'"$m"'/g' $slurmpath/net-$i-$j.slurm
sbatch $slurmpath/net-$i-$j.slurm
done
}
}
}
You can also try this python wrapper which can execute your command on the files you provide

Resources