The scenario is this one, I allocate ressources (2 nodes, 64 CPUs) to job with salloc:
salloc -N 1-2 -n 64 -c 1 -w cluster-node[2-3] -m cyclic -t 5
salloc: Granted job allocation 1720
Then, I use srun to create steps to my job:
for i in (seq 70)
srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60 &
end
Because I created more steps than available cpus for my job, steps are "pending" until a free CPU.
When I use squeue with -s option to list steps, I'm only able to view the running ones.
squeue -s -O stepid:12,stepname:10,stepstate:9
1720.0 sleep RUNNING
[...]
1720.63 sleep RUNNING
My question is, does steps have status different from RUNNING like jobs, and if yes, is there a way to view those with squeue (or other command) ?
Not sure Slurm can offer the information. One alternative would be to use GNU Parallel so that jobs steps are not started at all until a CPU is available. In the current setting all job steps are started at once and those which do not have a CPU available are waiting.
So with the same allocation as you use, replace
for i in (seq 70)
srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60 &
end
with
parallel -P $SLURM_NTASKS srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60
Then the output of squeue should list RUNNING and PENDING steps.
N.B. not sure the --jobid= option is needed here BTW
Related
How can I run a number of python scripts in different nodes in SLURM?
Suppose,
I select 5 cluster nodes using #SBATCH --nodes=5
and
I have 5 python scripts code1.py, code2.py....code5.py and I want to run each of these scripts in 5 different nodes simultaneously. How can I achieve this?
Do these five scripts need to run in a single job? Do they really need to run simultaneously? Is there some communication happeneing between them? Or are they independent from one another?
If they are essentially independent, then you should most likely pu tthem into 5 different jobs with one nodes each. That way you don't have to find five free nodes, but your the first job can start as soon as there is a single free node. If there are many scripts you want to start like that, it might be interesting to look into job arrays.
If you need to run them in parallel, you will need to use srun in your jobscript to start the scripts. This example shows a job where you have 10 cores per task and each node has one task.
#!/bin/bash
#[...]
#SBATCH -N 5
#SBATCH -n 5
#SBATCH -c 10
#[...]
srun -N 1 -n1 python code1.py &
srun -N 1 -n1 python code2.py &
srun -N 1 -n1 python code3.py &
srun -N 1 -n1 python code4.py &
srun -N 1 -n1 python code5.py &
wait
You need to run the srun calls in the background, as bash would otherwise wait for them to finish before executing the next one.
I would like to have my job name function of the parameters of the loop.
#!/bin/bash
#SBATCH -n 4
#SBATCH -p batch576
MAXLEVEL=8
process_id=$!
for Oh in '0.0001' '0.0005'
do
for H in '1.' '0.8'
do
mkdir half$Oh$H
cp half h.py RP.py `pwd`/half$Oh$H/
cd half$Oh$H
srun --mpi=pmi2 -J half${Oh}${H} ./half $Oh $H $MAXLEVEL &
cd ..
done
done
wait $process_id
Instead of test_min i would like : half0.00011. half0.00010.8 ....
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
658 batch576 test_min XXX R 0:06 1 no de1-ib2
Do you have any ideas ?
Thanks you
If you're submitting this job using sbatch, it will be a single job with multiple job steps. The -J option in srun names the jobsteps in your Job, not the job itself. And by default, squeue does not show job step information. Try the --steps paramater for squeue to show the job step names.
I have the following SLURM job script named gzip2zipslurm.sh:
#!/bin/bash
#SBATCH --mem 70G
#SBATCH --ntasks 4
echo "Task 1"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.A-B.xml.tar.gz &
echo "Task 2"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.C-H.xml.tar.gz &
echo "Task 3"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.I-N.xml.tar.gz &
echo "Task 4"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.O-Z.xml.tar.gz &
echo "Waiting for job steps to end"
wait
echo "Script complete"
I submit it to SLURM by sbatch gzip2zipslurm.sh.
When I do, the output of the SLURM log file is
Task 1
Task 2
Task 3
Task 4
Waiting for job steps to end
The tar2zip program reads the given tar.gz file an re-packages it as a ZIP file.
The Problem: Only one CPU (out of 16 available on an idle node) is doing any work. With top I can see that all in all 5 srun commands have been started (4 for my tasks and 1 implicit for the sbatch job, I guess) but there is only one Java process. I can also see it on the files being worked on, only one is written.
How do I manage that all 4 tasks are actually executed in parallel?
Thanks for any hints!
The issue might be with the memory reservation. In the submission script, you set --mem=70GB, that is the global memory usage of the job.
When srun is used within a submission script, it inherits parameters from sbatch, including the --mem=70GB. So you actually implicitly run the following.
srun --mem 70G -n1 java -Xmx10g -jar ...
Try explicitly stating the memory to 70GB/4 with:
srun --mem 17G -n1 java -Xmx10g -jar ...
Also, as per the documentation, you should use --exclusive with srun in such a context.
srun --exclusive --mem 17G -n1 java -Xmx10g -jar ...
This option can also be used when initiating more than one job step
within an existing resource allocation, where you want separate
processors to be dedicated to each job step. If sufficient processors
are not available to initiate the job step, it will be deferred. This
can be thought of as providing a mechanism for resource management to
the job within it's allocation.
I want to run a script on a cluster ~200 times using srun commands in one sbatch script. Since executing the script takes some time it would be great to distribute the tasks evenly over the nodes in the cluster. Sadly, I have issues with that.
Now, I created an example script ("hostname.sh") to test different parameters in the sbatch script:
echo `date +%s` `hostname`
sleep 10
This is my sbatch script:
#SBATCH --ntasks=15
#SBATCH --cpus-per-task=16
for i in `seq 200`; do
srun -n1 -N1 bash hostname.sh &
done
wait
I would expect that hostname.sh is executed 200 times (for loop) but only 15 tasks running at the same time (--ntasks=15). Since my biggest node has 56 cores only three jobs should be able to run on this node at the same time (--cpus-per-task=16).
From the ouptut of the script I can see that the first nine tasks are distributed over nine nodes from the cluster but all the other tasks (191!) are executed on one node at the same time. The whole sbatch script execution just took about 15 seconds.
I think I misunderstand some of slurm's parameters but looking at the official documentation did not help me.
You need to use the --exclusive option of srun in that context:
srun -n1 -N1 --exclusive bash hostname.sh &
From the srun manpage:
By default, a job step has access to every CPU allocated to the job.
To ensure that distinct CPUs are allocated to each job step, use the
--exclusive option.
See also the last-but-one example in said documentation.
I'd like to run the same program on a large number of different input files. I could just submit each as a separate Slurm submission, but I don't want to swamp the queue by dumping 1000s of jobs on it at once. I've been trying to figure out how to process the same number of files by instead creating an allocation first, then within that allocation looping over all the files with srun, giving each invocation a single core from the allocation. The problem is that no matter what I do, only one job step runs at a time. The simplest test case I could come up with is:
#!/usr/bin/env bash
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
wait
It doesn't matter how many cores I assign the allocation:
time salloc -n 1 test
time salloc -n 2 test
time salloc -n 4 test
it always takes 4 seconds. Is it not possible to have multiple job steps execute in parallel?
It turned out to be that the default memory per cpu was not defined, so even single core jobs were running by reserving all the node's RAM.
Setting DefMemPerCPU, or specifying explicit RAM reservations did the trick.
Beware that in that scenario, you measure both the running time and the waiting time. Your submission script should look like this:
#!/usr/bin/env bash
time {
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
wait
}
and simply submit with
salloc -n 1 test
salloc -n 2 test
salloc -n 4 test
You then should observe the difference, along with messages such as srun: Job step creation temporarily disabled, retrying when using n<4.
Since the OP solved his issue but didn't provide the code, I'll share my take on this problem below.
In my case, I encountered the error/warning step creation temporarily disabled, retrying (Requested nodes are busy). This is because, the srun command that executed first, allocated all the memory. The same cause as encountered by the OP. To solve this, one first optionally(?) specify the total memory allocation for sbatch (if you are using an sbatch script):
#SBATCH --ntasks=4
#SBATCH --mem=[XXXX]MB
And then specify the memory use per srun task:
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/4]MB sleep 1 &
wait
I didn't specify CPU count for srun because in my sbatch script I included #SBATCH --cpus-per-task=1. For the same reason I suspect you could use --mem instead of --mem-per-cpu in the srun command, but I haven't tested this configuration.