SLURM environmental variables are empty - slurm

I tried to submit a job via the command line incorporating --wrap instead of submitting through a submission script. And for some reason none of the slurm_variables are initialized:
sbatch --job-name NVP --time 01:00:00 --nodes 1 --ntasks 1 --cpus-per-task 2 --wrap "echo "var1:" $SLURM_CPUS_PER_TASK "var2:" $SLURM_JOB_NAME"
In this case both $SLURM_CPUS_PER_TASK and $SLURM_JOB_NAME are empty.
Once the exact same code is submitted via a script, the variables show up.
I could not figure out what is wrong with my command line submission.

The problem is that you are not escaping the variable names and those get expanded by bash before submitting the job. Try using single quotes for the wrapped command like:
sbatch --job-name NVP --time 01:00:00 --nodes 1 --ntasks 1 --cpus-per-task 2 --wrap 'echo "var1:" $SLURM_CPUS_PER_TASK "var2:" $SLURM_JOB_NAME'
or if you need to use double quotes (for other variables expansion), just escape the variables with a \ like:
sbatch --job-name NVP --time 01:00:00 --nodes 1 --ntasks 1 --cpus-per-task 2 --wrap "echo "var1:" \$SLURM_CPUS_PER_TASK "var2:" \$SLURM_JOB_NAME"

Related

Slurm: srun inside sbatch is ignored / skipped Can anyone explain why?

I'm still exploring how to work with the Slurm scheduler and this time I really got stuck. The following batch script somehow doesn't work:
#!/usr/bin/env bash
#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4
cd ~/RS1
for n in {1..4};
do
echo "Starting ${n}"
srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done
Since most of the SBATCH options are inside the batch script the invocation is just: 'sbatch script.sh'
The slurm-20466.out only contains the four echo'ing outputs: cat slurm-20466.out
Starting 1
Starting 2
Starting 3
Starting 4
I double checked the command without srun and that works without errors.
I must confess I am also responsible for the Slurm scheduler configuration itself. Let me know if I could try to change anything or when more information is needed.
You start your srun commands in the background to have them run in parallel. But you never wait for the commands to finish.
So the loop runs through very quickly, echoes the "Starting ..." lines, starts the srun command in the background and afterwards finishes. After that, your sbatch-script is done and terminates successfully, meaning that your job is done. With that, your allocation is revoked and your srun commands are also terminated. You might be able to see that they started with sacct.
You need to instruct the batch script to wait for the work to be done before it terminates, by waiting for the background processes to finish. To do that, you simply have to add a wait command in your script at the end:
#!/usr/bin/env bash
#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4
cd ~/RS1
for n in {1..4};
do
echo "Starting ${n}"
srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done
wait

Running parallel jobs in slurm

I was wondering if I could ask something about running slurm jobs in parallel.(Please note that I am new to slurm and linux and have only started using it 2 days ago...)
As per the insturctions on the picture below (source : https://hpc.nmsu.edu/discovery/slurm/serial-parallel-jobs/),
I have designed the following bash script
#!/bin/bash
#SBATCH --job-name fmriGLM #job name을 다르게 하기 위해서
#SBATCH --nodes=1
#SBATCH -t 16:00:00 # Time for running job
#SBATCH -o /scratch/connectome/dyhan316/fmri_preprocessing/FINAL_loop_over_all/output_fmri_glm.o%j #%j : job id 가 [>
#SBATCH -e /scratch/connectome/dyhan316/fmri_preprocessing/FINAL_loop_over_all/error_fmri_glm.e%j
pwd; hostname; date
#SBATCH --ntasks=30
#SBATCH --mem-per-cpu=3000MB
#SBATCH --cpus-per-task=1
for num in {0..29}
do
srun --ntasks=1 python FINAL_ARGPARSE_RUN.py --n_division 30 --start_num ${num} &
done
wait
The, I ran sbatch as follows: sbatch test_bash
However, when I view the outputs, it is apparent that only one of the sruns in the bash script are being executed... Could anyone tell me where I went wrong and how I can fix it?
**update : when I look at the error file I get the following : srun: Job 43969 step creation temporarily disabled, retrying. I searched the internet and it says that this could be caused by not specifying the memory and hence not having enough memory for the second job.. but I thought that I already specifeid the memory when I did --mem_per_cpu=300MB?
**update : I have tried changing the code as said as in here : Why are my slurm job steps not launching in parallel?, but.. still it didn't work
**potentially pertinent information: our node has about 96cores, which seems odd when compared to tutorials that say one node has like 4cores or something
Thank you!!
Try adding --exclusive to the srun command line:
srun --exclusive --ntasks=1 python FINAL_ARGPARSE_RUN.py --n_division 30 --start_num ${num} &
This will instruct srun to use a sub-allocation and work as you intended.
Note that the --exclusive option has a different meaning in this context than if used with sbatch.
Note also that different versions of Slurm have a distinct canonical way of doing this, but using --exclusive should work across most versions.
Even though you have solved your problem which turned out to be something else, and that you have already specified --mem_per_cpu=300MB in your sbatch script, I would like to add that in my case, my Slurm setup doesn't allow --mem_per_cpu in sbatch, only --mem. So the srun command will still allocate all the memory and block the subsequent steps. The key for me, is to specify --mem_per_cpu (or --mem) in the srun command.

slurm job name in a for loop

I would like to have my job name function of the parameters of the loop.
#!/bin/bash
#SBATCH -n 4
#SBATCH -p batch576
MAXLEVEL=8
process_id=$!
for Oh in '0.0001' '0.0005'
do
for H in '1.' '0.8'
do
mkdir half$Oh$H
cp half h.py RP.py `pwd`/half$Oh$H/
cd half$Oh$H
srun --mpi=pmi2 -J half${Oh}${H} ./half $Oh $H $MAXLEVEL &
cd ..
done
done
wait $process_id
Instead of test_min i would like : half0.00011. half0.00010.8 ....
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
658 batch576 test_min XXX R 0:06 1 no de1-ib2
Do you have any ideas ?
Thanks you
If you're submitting this job using sbatch, it will be a single job with multiple job steps. The -J option in srun names the jobsteps in your Job, not the job itself. And by default, squeue does not show job step information. Try the --steps paramater for squeue to show the job step names.

On the semantics of `srun ... >output_file` for parallel tasks

Sorry, this question requires a lot of build-up, but in summary, it's about the conditions under which many parallel instances of srun ... >output_file will or won't lead to the clobbering by some process/task of the output produced by some other process/task.
CASE 0: bash only (no SLURM)
Suppose that prog-0.sh is the following toy script:
#!/bin/bash
hostname >&2
if [[ $JOB_INDEX = 0 ]]
then
date
fi
This script prints some output to stderr, and possibly prints the current date to stdout.
The "driver" script case-0.sh shown below spawns $NJOBS processes, all writing to prog-0-stdout.txt:
#!/bin/bash
for i in $( seq 0 $(( NJOBS - 1 )) )
do
JOB_INDEX=$i ./prog-0.sh >prog-0-stdout.txt &
done
After running
% NJOBS=100 ./case-0.sh 2>prog-0-stderr.txt
...my expectation is that prog-0-stderr.txt will contain 100 lines, and that prog-0-stdout.txt will be empty.
My expectation pans out:
% wc prog-0-std*.txt
100 100 3000 prog-0-stderr.txt
0 0 0 prog-0-stdout.txt
100 100 3000 total
The explanation for these results is that, when NJOBS is sufficiently large, it is likely that, for some sufficiently high value of $i, the redirection >prog-0-stdout.txt will be evaluated after the "designated job", the one JOB_INDEX 0 (and the only one that sends output to stdout) has written the date to stdout, and this will therefore clobber whatever output was earlier redirected by the "designated job" to prog-0-stdout.txt.
BTW, the value of NJOBS needs to be high enough for the results to be as I've just described. For example, if I use NJOBS=2:
% NJOBS=2 ./case-0.sh 2>prog-0-stderr.txt
...then not only will prog-0-stderr.txt contain only 2 lines (not surprisingly), but prog-0-stdout.txt will contain a date:
% cat prog-0-stdout.txt
Wed Oct 4 15:02:49 EDT 2017
In this case, all the >prog-0-stdout.txt redirections have been evaluated before the designated job prints the date to prog-0-stdout.txt.
CASE 1: SLURM job arrays
Now, consider a very similar scenario, but using SLURM instead. The script prog-1.sh is identical to prog-0.sh, except that it examines a different variable to decide whether or not to print the date to stdout:
#!/bin/bash
hostname >&2
if [[ $SLURM_ARRAY_TASK_ID = 0 ]]
then
date
fi
And here's the corresponding "driver" script, case-1.sh:
#!/bin/bash
#SBATCH -t 1
#SBATCH -p test
#SBATCH -e prog-1-%02a-stderr.txt
#SBATCH -n 1
#SBATCH -a 0-99
srun ./prog-1.sh >prog-1-stdout.txt
Like case-0.sh, this script redirects the output of its main step to a single file ./prog-1-stdout.txt.
Importantly, this same file will be seen by all the nodes that run ./prog-1.sh for this job.
If I now run
sbatch case-1.sh
...I get 100 files prog-1-00-stderr.txt ... prog-1-99-stderr.txt, containing 1 line each, and an empty prog-1-stdout.txt. I assume that the earlier explanation also explains why prog-1-stdout.txt is empty.
So far so good.
CASE 2: SLURM tasks
Finally, consider one more SLURM-based case, this time using the core script prog-2.sh and the driver script case-2.sh. Again, the only change in prog-2.sh is the variable it examines to decide whether or not to print the date to stdout:
#!/bin/bash
hostname >&2
if [[ $SLURM_PROCID = 1 ]]
then
date
fi
And here is case-2.sh:
#!/bin/bash
#SBATCH -t 1
#SBATCH -p test
#SBATCH -e prog-2-stderr.txt
#SBATCH -N 10
#SBATCH --tasks-per-node=10
srun -l ./prog-2.sh >prog-2-stdout.txt
As before, prog-2-stdout.txt is visible by all the nodes handling the job.
Now, if I run sbatch case-2.sh and wait for the batch job to finish, then prog-2-stderr.txt contains 100 lines (as expected), but, to my surprise, prog-2-stdout.txt is not empty. In fact, it contains a date:
% cat prog-2-stdout.txt
01: Wed Oct 4 15:21:17 EDT 2017
The only explanation I can come up with is analogous to the one I gave earlier for the results I got when I ran
% NJOBS=2 ./case-0.sh 2>prog-0-stderr.txt
If this explanation is correct, my concern is that the fact case-2.sh worked better than expected (i.e. prog-2-stdout.txt ends up with the right output) is just a coincidence, having to do with the relative timing of concurrent events.
Now, at long last, my question is:
Q: does SLURM guarantee that a prog-2-stdout.txt file that contains the output generated by the designated task (i.e. the one that prints the date to stdout) will not be clobbered when the >prog-2-stdout.txt redirection gets evaluated by one of the non-designated tasks?
You have a misconception on how srun works. In CASE 1 the usage of srun is irrelevant as it's used in batch scripts to start parallel jobs. In CASE 1 you only have one task, so
srun ./prog-1.sh >prog-1-stdout.txt is equivalent to:
./prog-1.sh >prog-1-stdout.txt
CASE 2 is different, as you have more than 1 task. In that case, srun -l ./prog-2.sh >prog-2-stdout.txt is only evaluated once, and srun will take care of spawning 10*10 tasks. srun will redirect the output of all the tasks to the master node of the job, and it will be the one writing to prog-2-stdout.txt.
So you can be sure that in this case there will be no clobbering of the output file as it is evaluated only once.

parallel but different Slurm srun job step invocations not working

I'd like to run the same program on a large number of different input files. I could just submit each as a separate Slurm submission, but I don't want to swamp the queue by dumping 1000s of jobs on it at once. I've been trying to figure out how to process the same number of files by instead creating an allocation first, then within that allocation looping over all the files with srun, giving each invocation a single core from the allocation. The problem is that no matter what I do, only one job step runs at a time. The simplest test case I could come up with is:
#!/usr/bin/env bash
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
wait
It doesn't matter how many cores I assign the allocation:
time salloc -n 1 test
time salloc -n 2 test
time salloc -n 4 test
it always takes 4 seconds. Is it not possible to have multiple job steps execute in parallel?
It turned out to be that the default memory per cpu was not defined, so even single core jobs were running by reserving all the node's RAM.
Setting DefMemPerCPU, or specifying explicit RAM reservations did the trick.
Beware that in that scenario, you measure both the running time and the waiting time. Your submission script should look like this:
#!/usr/bin/env bash
time {
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
wait
}
and simply submit with
salloc -n 1 test
salloc -n 2 test
salloc -n 4 test
You then should observe the difference, along with messages such as srun: Job step creation temporarily disabled, retrying when using n<4.
Since the OP solved his issue but didn't provide the code, I'll share my take on this problem below.
In my case, I encountered the error/warning step creation temporarily disabled, retrying (Requested nodes are busy). This is because, the srun command that executed first, allocated all the memory. The same cause as encountered by the OP. To solve this, one first optionally(?) specify the total memory allocation for sbatch (if you are using an sbatch script):
#SBATCH --ntasks=4
#SBATCH --mem=[XXXX]MB
And then specify the memory use per srun task:
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
wait
I didn't specify CPU count for srun because in my sbatch script I included #SBATCH --cpus-per-task=1. For the same reason I suspect you could use --mem instead of --mem-per-cpu in the srun command, but I haven't tested this configuration.

Resources