Use Bash variable within SLURM sbatch script - linux

I'm trying to obtain a value from another file and use this within a SLURM submission script. However, I get an error that the value is non-numerical, in other words, it is not being dereferenced.
Here is the script:
#!/bin/bash
# This reads out the number of procs based on the decomposeParDict
numProcs=`awk '/numberOfSubdomains/ {print $2}' ./meshModel/decomposeParDict`
echo "NumProcs = $numProcs"
#SBATCH --job-name=SnappyHexMesh
#SBATCH --output=./logs/SnappyHexMesh.log
#
#SBATCH --ntasks=`$numProcs`
#SBATCH --time=240:00
#SBATCH --mem-per-cpu=4000
#First run blockMesh
blockMesh
#Now decompose the mesh
decomposePar
#Now run snappy in parallel
mpirun -np $numProcs snappyHexMesh -parallel -overwrite
When I run this as a normal Bash shell script, it prints out the number of procs correctly and makes the correct mpirun call. Thus the awk command parses out the number of procs correctly and the variable is dereferenced as expected.
However, when I submit this to SLURM using:
sbatch myScript.sh
I get the error:
sbatch: error: Invalid numeric value "`$numProcs`" for number of tasks.
Can anyone help with this?

This won't work. What happens when you run
sbatch myscript.sh
is that slurm parses the script for those special #SBATCH lines, generates a job record, stores the batch script somewhere. The batch script is executed only later when the job runs.
So you need to structure you workflow in a slightly different way, and first calculate the number of procs you need before submitting the job. Note that you can use something like
sbatch -n $numProcs myscript.sh
, you don't need to autogenerate the script (also, mpirun should be able to get the number of procs in your allocation automatically, no need to use "-np").

Slurm stops processing #SBATCH directives on the first line of executable code in a script. For users whose #SBATCH directives are not dependent on the code they're trying to run above those directives, just put the #SBATCH lines at the top.
See the other answer for a workaround/solution if, as with OP, your sbatch options are dependent on the commands you've placed above them.
The batch script may contain options preceded with "#SBATCH" before
any executable commands in the script. sbatch will stop processing
further #SBATCH directives once the first non-comment non-whitespace
line has been reached in the script.
From the sbatch docs, my emphasis.

Related

Having issue with slurm. error: "no such file or directory"

I'm trying to run a slurm script using sbatch <script.sh>. However, despite checking my path variable multiple times, i get a file not found error. Moreover I think this has to do with my go environment but I also get a "cannot import absolute path" error. I'm not sure what the issue is. I have attached my slurm configuration file as well as the error output below
#!/bin/bash
#SBATCH --partition production
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=5:00:00
#SBATCH --mem=2GB
#SBATCH --job-name=myTest
#SBATCH --mail-type=END
#SBATCH --mail-user=atd341#nyu.edu
#SBATCH --output=slurm_%j.out
module purge
module load go/1.17
##RUNDIR=${SCRATCH}/run-${SLURM_JOB_ID/.*}
##mkdir -p ${RUNDIR}
DATADIR=${SCRATCH}/inmap_sandbox
cd $SLURM_WORK_DIR
source $DATADIR/setup.sh
go run $DATADIR/
Here is the output:
/var/spool/slurmd/job16296/slurm_script: line 19: /inmap_sandbox/setup.sh: No such file or directory
import "/inmap_sandbox": cannot import absolute path
I have tried checking my path variable and making sure I'm following the correct path. For reference by directory structure is /scratch/inmap_sandbox. I'm trying to run the sbatch file in the /scratch directory
Offhand it appears the ${SCRATCH} variable might not be set inside the environment running the script. Try explicitly setting that to /scratch?
Once you get past that problem, note that if this batch script is running on a compute node that is separate from the frontend node you are using interactively, then they might not both mount the same ${SCRATCH} file system (or possibly mount it in different places).
Consult the system documentation to find out which file systems are shared between the frontend and the compute nodes. You might even need to pass SLURM capability options to request certain shared filesystems. In the absence of documentation, comparing the output of mount on the frontend and from within the batch script might be helpful. More specifically, add the mount command on a line by itself early in your batch script, and compare the output it generates to the output of the same command on the frontend.

bash variable expansion in slurm's #SBATCH directive

is it possible to use variable expansion in #SBATCH lines in slurm? for instance I want to have line below:
#SBATCH --array=0-100%{$1-10}
so that by default it uses 10 concurrent job unless I manually pass an argument when I call sbatch.
Above gives me an Invalid job array specification error.
No, this isn't possible. But you can overwrite the scripts default --array by giving it explicitly on the sbatch command line.

Snakemake and sbatch

I have a Snakefile that has a rule which sends 7 different shell commands.
I want to run each of these shell commands in sbatch and I want them to run in different slurm nodes.
Right now when I include sbatch inside the shell command in the Snakemake rule I do not get the desired output file because it takes awhile to run and when sbatch returns the command is still running. I think Snakemake thinks that I don't have the required output file because it thinks that the command "finished executing" before the submitted job completed.
What can I do to submit each rule in one slurm node using sbatch command in Snakemake file
I suspect that what you are doing is:
rule one:
input:
...
output:
...
shell:
"""
sbatch [sbatch-options] "some-command-or-script"
"""
What you want maybe is:
rule one:
input:
...
output:
...
shell:
"""
some-command-or-script
"""
To be executed as
snakemake --cluster "sbatch [sbatch-options]"
In this way every rule will send its jobs to the cluster and snakemake will handle them. If you want a rule to execute its jobs locally (not via sbatch) mark that rule with the directive localrule (check documentation for more detail)

How to get SLURM task ID in program

I'm running srun -n 100 python foo.py. Inside the python script how does it find out which task number/id/rank it is? Is there an environment variable set?
Have a look at man srun or man sbatch for a list of environment variables. $SLURM_PROCID might be the one you need.

Stop slurm sbatch from copying script to compute node

Is there a way to stop sbatch from copying the script to the compute node. For example when I run:
sbatch --mem=300 /shared_between_all_nodes/test.sh
test.sh is copied to /var/lib/slurm-llnl/slurmd/etc/ on the executing compute node. The trouble with this is there are other scripts in /shared_between_all_nodes/ that test.sh needs to use and I would like to avoid hard coding the path.
In sge I could use qsub -b y to stop it from copying the script to the compute node. Is there a similar option or config in slurm?
Using sbatch --wrap is a nice solution for this
sbatch --wrap /shared_between_all_nodes/test.sh
quotes are required if the script has parameters
sbatch --wrap "/shared_between_all_nodes/test.sh param1 param2"
from sbatch docs http://slurm.schedmd.com/sbatch.html
--wrap=
Sbatch will wrap the specified command string in a simple "sh" shell script, and submit that script to the slurm controller. When --wrap is used, a script name and arguments may not be specified on the command line; instead the sbatch-generated wrapper script is used.
The script might be copied there, but the working directory will be the directory in which the sbatch command is launched. So if the command is launched from /shared_between_all_nodes/ it should work.
To be able to lauch sbatch form anywhere, use this option
-D, --workdir=<directory>
Set the working directory of the batch script to directory before
it is executed.
like
sbatch --mem=300 -D /shared_between_all_nodes /shared_between_all_nodes/test.sh

Resources