linux slurm - separate .out files for tasks run in paralell on 1 node - linux

I am running jobs in parallel on linux using slurm by requesting a node and running one task per cpu.
However, the output as specified joins both streams into the single out file. I tried the %t flag on the epxectation it would separate the tasks, but it just logs everything in the output file with _0 appended (e.g. sample_output__XXX_XX_0.out).
Any advice on how to best generate a separate .out log per task would be much appreciated
#!/bin/bash
#SBATCH --job-name=recon_all_06172021_1829
#SBATCH --output=/path/recon_all_06172021_1829_%A_%a_%t.out
#SBATCH --error=/path/recon_all_06172021_1829_%A_%a.err
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=23:59:00
#! Always keep the following echo commands to monitor CPU, memory usage
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "SLURM_JOB_NUM_NODES: $SLURM_JOB_NUM_NODES"
echo "SLURM_NNODES: $SLURM_NNODES"
echo "SLURM_NTASKS: $SLURM_NTASKS"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
command 1 &
command 2
wait

You can redirect the standard output from the command itself, for example:
command 1 > file1 2>&1
command 2 > file2 2>&1
Not as neat as using the sbatch filename patterns, but it will separate the output from each command.

Related

Submit lots of jobs in a SHELL script

I am using a simple shell script to read and write lots of files (more than 300 files) in a HPC and I want to submit it using slurm.
The script looks like this:
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=40
#SBATCH --exclusive
for in_file in ${in_files}; do
# do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file}
done
I may not submit all my tasks at one time because the number of files are larger than the nodes I can use. So is there a better way I can deal with the large number of files?
Are the files totally independent? It sounds you could make a bash script that schedules a job for each file and passes it to each job. One way would be to as a environment variable.
Something like
#!/bin/bash
for in_file in ${in_files}; do
export in_file=${in_file}
sbatch slurmjob.sh
done
Where your slurmjob.sh is something like
#!/bin/bash
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
do something with ${in_file} and ${out_file}
echo ${in_file} ${out_file} > somefile.txt
``

how to submit slurm job array with different input files

I have a list of text files (~ 200 files) that needs to be processed. So I was trying to submit a slurm job array for this task but I cannot find a solution. What I have tried is to submit a multiple jobs (~ 200 jobs) by looping through the files for just a single task. I am sure that there is a way to create job array for this problem, could you please advice me?
That is my bash script
#!/bin/bash
input_list=$1
prefix=$2
readarray list < $input_list
a=0
for i in "${list[#]}"
do
file=$i
sbatch -a1-1 -- $PWD/kscript.sh $file $prefix"_"$a
a=`expr $a + 1`
done
I figured the solution so it is simpler than I thought, here it is:
#!/bin/bash
# Task name
#SBATCH -J myjob
# Run time limit
#SBATCH --time=4:00:00
# Standard and error output in different files
#SBATCH -o %j_%N.out.log
#SBATCH -e %j_%N.err.log
#SBATCH --ntasks=1
# Execute application code
input_list=$1
prefix=$2
readarray list < $input_list
file=${list[$SLURM_ARRAY_TASK_ID-1]}
input=$PWD"/"$file
output=$prefix"_"$SLURM_ARRAY_TASK_ID
./kscript.sh $input $output
Running the script is as usual:
sbtach --array=1-n
where n is the length/size of the list. For example, if there a list of 100 files then this becomes sbtach --array=1-100

set values ​for job variables from another file

I wanted to indicate the name and other values ​​of some variables of a job from another file but I get an error.
sbatch: error: Unable to open file 10:12:35
file.sh
#!/bin/bash
DATE=`date '+%Y-%m-%d %H:%M:%S'`
name='test__'$DATE
sbatch -J $name -o $name'.out' -e $name'.err' job.sh
job.sh
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=2 # number of cores
#SBATCH --output=.out
#SBATCH --error=.err
#module load R
Rscript script.R
script.R
for(i in 1:1e6){print(i)}
You are wrongly quoting the variables and the space requested in the date is creating two arguments to sbatch, hence he is complaining about that wrong parameter.
If I were you, I would avoid the space (as a general rule, cause it is more error prone and always requires quoting):
file.sh:
#!/bin/bash
DATE=$(date '+%Y-%m-%dT%H:%M:%S')
name="test__$DATE"
sbatch -J "$name" -o "${name}.out" -e "${name}.err" job.sh

SLURM job script for multiple nodes

I would like to request for two nodes in the same cluster, and it is necessary that both nodes are allocated before the script begins.
In the slurm script, I was wondering if there is a way to launch job-A on a given node and the job-B on the second node with a small delay or simultaneously.
Do you have suggestions on how this could be possible? This is how my script is right now.
#!/bin/bash
#SBATCH --job-name="test"
#SBATCH -D .
#SBATCH --output=./logs_%j.out
#SBATCH --error=./logs_%j.err
#SBATCH --nodelist=nodes[19,23]
#SBATCH --time=120:30:00
#SBATCH --partition=AWESOME
#SBATCH --wait-all-nodes=1
#launched on Node 1
ifconfig > node19.txt
#Launched on Node2
ifconfig >> node23.txt
In other words, if I request for two nodes, how do i run two different jobs on the two nodes simultaneously? Could it be that we deploy it as job steps as given in the last part of srun manual (MULTIPLE PROGRAM CONFIGURATION).. In that context, "-l" isn't defined.
I'm assuming that when you say job-A and job-B you are refering the two echos in the script. I'm also assuming that the setup you show us is working, but without starting the jobs in the proper nodes and serializing the execution (I have the feeling that the requested resources are not clear, there is missing information to me, but if SLURM does not complain, then everything is OK). You should also be careful in the proper writing of the redirected output. If the first job opens the redirection after the second job, it will truncate the file and you will lose the second job output.
For them to be started in the appropriate nodes, run the commands through srun:
#!/bin/bash
#SBATCH --job-name="test"
#SBATCH -D .
#SBATCH --output=./logs_%j.out
#SBATCH --error=./logs_%j.err
#SBATCH --nodelist=nodes[19,23]
#SBATCH --time=120:30:00
#SBATCH --partition=AWESOME
#SBATCH --wait-all-nodes=1
#launched on Node 1
srun --nodes=1 echo 'hello from node 1' > test.txt &
#Launched on Node2
srun --nodes=1 echo 'hello from node 2' >> test.txt &
That did the job! the files ./com_19.bash and ./com_23.bash are acting as binaries.
#!/bin/bash
#SBATCH --job-name="test"
#SBATCH -D .
#SBATCH --output=./logs_%j.out
#SBATCH --error=./logs_%j.err
#SBATCH --nodelist=nodes[19,23]
#SBATCH --time=120:30:00
#SBATCH --partition=AWESOME
#SBATCH --wait-all-nodes=1
# Launch on node 1
srun -lN1 -n1 -r 1 ./com_19.bash &
# launch on node 2
srun -lN1 -r 0 ./com_23.bash &
sleep 1
squeue
squeue -s
wait

Insert system variables to SBATCH

I would like to ask you if is possible to pass global system variables to #SBATCH tags.
I would like to do somethink like that
SBATCH FILE
#!/bin/bash -l
ARG=64.dat
NODES=4
TASK_PER_NODE=8
NP=$((NODES*TASK_PER_NODE))
#SBATCH -J 'MPI'+'_'+$NODES+'_'+$TASK_PER_NODE
#SBATCH -N $NODES
#SBATCH --ntasks-per-node=$TASK_PER_NODE
It isn't workink so that is why I ask you.
Remember that SBATCH parameter lines are viewed by Bash as comments, so it will not try to interpret them at all.
Furthermore, the #SBATCH directives must be before any other Bash command for Slurm to handle them.
Alternatives include setting the parameters in the command line:
NODES=4 sbatch --nodes=$NODES ... submitscript.sh
or passing the submission script through stdin:
#!/bin/bash -l
ARG=64.dat
NODES=4
TASK_PER_NODE=8
NP=$((NODES*TASK_PER_NODE))
sbatch <<EOT
#SBATCH -J "MPI_$NODES_$TASK_PER_NODE"
#SBATCH -N $NODES
#SBATCH --ntasks-per-node=$TASK_PER_NODE
srun ...
EOT
In this later case, you will need to run the submission script rather than handing it to sbatch since it will run sbatch itself. Note also that string concatenation in Bash is not achieved with the + sign.

Resources