How to allocate node by node with slurm?

How to allocate node by node with slurm? - slurm

My goal :
I would like to launch multiple codes, nodes by nodes and allocated 100% each nodes
epic* up infinite 4 alloc lio[1-2]
And what I get :
epic* up infinite 4 mix lio[1-3,5]
my script :
#!/bin/bash
#SBATCH -A pt
#SBATCH -p epic
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH -J concentration
#SBATCH --array=1-4
. /usr/share/Modules/init/bash
module purge
module load openmpi-gcc/4.0.4-pmix_v2
MAXLEVEL=14
Ranf=8000
case $SLURM_ARRAY_TASK_ID in
1) phi='0.01'
;;
2) phi='0.008'
;;
3) phi='0.005'
;;
4) phi='0.001'
;;
esac
mkdir RBnf-P=$phi
cp RBnf `pwd`/RBnf-P=$phi/
cd RBnf-P=$phi
srun --mpi=pmix_v2 -J Ra${phi} ./RBnf $Ranf $MAXLEVEL $Phi
Each computation needs 16 process per node and each node have 32 process.
I have 4 computations to make.
My question : How can I allocated at 100% only 2 nodes ?
Because my script will use 4 nodes. So each nodes will be used at 50% of it's capacity (4 * 16/32). I would like to have my codes running on only 2 nodes at 100% of their capacity (2 * 32/32).
With this script slurm will allocate an other node instead of fill a node already used. That's why I have "mix" node and I want only 2 nodes "alloc".
Have you any ideas ?

I found why I couldn't allocated node by node.
The option "oversubscribe" in the slurm.conf file wasn't specified.
That's why I get nodes "mix" and not at 100% allocated.
https://slurm.schedmd.com/cons_res_share.html
Now I automatically use two nodes.

Related

runing a command multiple times in different nodes in SLURM

I want to run three instances of GROMACS mdrun on three different nodes.
I have three temperatures 200,220 and 240 K and I want to run 200 K simulation on node 1, 220 K simulation on node 2 and 240 K simulation on node 3. I need to do all this in one script as I have job number limit.
How can I do that in slurm?
Currently I have:
#!/bin/bash
#SBATCH --nodes=3
#SBATCH --ntasks=3
#SBATCH --ntasks-per-node=1
#SBATCH --time=01:00:00
#SBATCH --job-name=1us
#SBATCH --error=h.err
#SBATCH --output=h.out
#SBATCH --partition=standard
as my sbatch parameters and
for i in 1 2 3
do
T=$(($Ti+($i-1)*20))
cd T_$T/1000
gmx_mpi grompp -f heating.mdp -c init_conf.gro -p topol.top -o quench.tpr -maxwarn 1
gmx_mpi mdrun -s quench.tpr -deffnm heatingLDA -v &
cd ../../
done
wait
this is how I am running mdrun but this is not running as fast I want it to run. Firstly, the mdrun does not start simultaneously but it starts in 200K then after 2-3 min it starts on 220K. Secondly, the speed is much slower as expected.
Could you all tell me how can I achieve that?
Thank you in advance.
Best regards,
Ved

You need to add a line in the slurm script
#SBATCH --nodelist=${NODENAME}
where ${NODENAME} is the name of any of nodes 1, 2 or 3

GPU allocation within a SBATCH

I have access to a large GPU cluster (20+ nodes, 8 GPUs per node) and I want to launch a task several times on n GPUs (1 per GPU, n > 8) within one single batch without booking full nodes with the --exclusive flag.
I managed to pre-allocate the resources (see below), but I struggle very hard with launching the task several times within the job. Specifically, my log shows no value for the CUDA_VISIBLE_DEVICES variable.
I know how to do this operation on fully booked nodes with the --nodes and --gres flags. In this situation, I use --nodes=1 --gres=gpu:1 for each srun. However, this solution does not work for the present question, the job hangs indefinitely.
In the MWE below, I have a job asking for 16 gpus (--ntasks and --gpus-per-task). The jobs is composed of 28 tasks which are launched with the srun command.
#!/usr/bin/env bash
#SBATCH --job-name=somename
#SBATCH --partition=gpu
#SBATCH --nodes=1-10
#SBATCH --ntasks=16
#SBATCH --gpus-per-task=1
for i in {1..28}
do
srun echo $(hostname) $CUDA_VISIBLE_DEVICES &
done
wait
The output of this script should look like this:
nodeA 1
nodeR 2
...
However, this is what I got:
nodeA
nodeR
...

When you write
srun echo $(hostname) $CUDA_VISIBLE_DEVICES &
the expansion of the $CUDA_VISIBLE_DEVICES variable will be performed on the master node of the allocation (where the script is run) rather than on the node targeted by srun. You should escape the $:
srun echo $(hostname) \$CUDA_VISIBLE_DEVICES &
By the way, the --gpus-per-task= appeared in the sbatch manpage in the 19.05 version. When you use it with an earlier option, I am not sure how it goes.

How do I dynamically assign the number of cores per node?

#!/bin/bash
#SBATCH --job-name=Parallel# Job name
#SBATCH --output=slurmdiv.out# Output file name
#SBATCH --error=slurmdiv.err # Error file name
#SBATCH --partition=hadoop# Queue
#SBATCH --nodes = 1
#SBATCH --time=01:00:00# Time limit
The above script does not work without specifying --ntasks-per-node directive. The number of cores per node depends on the queue being used. I would like to assign the maximum number of cores per node without having to specify it ahead of time in the slurm script. I'm using this to run an R script that uses detectCores() and mclapply.

You can try adding the #SBATCH --exclusive parameter to your submission script so that Slurm will allocate a full node for your job, without the need to specify explicitly a specific number of tasks. Then you can use detectCores() in your script.

Why does slurm assign more tasks than I asked when I "sbatch" multiple jobs with a .sh file?

I submit some cluster mode spark jobs which run just fine when I do it one by one with below sbatch specs.
#!/bin/bash -l
#SBATCH -J Spark
#SBATCH --time=0-05:00:00 # 5 hour
#SBATCH --partition=batch
#SBATCH --qos qos-batch
###SBATCH -N $NODES
###SBATCH --ntasks-per-node=$NTASKS
### -c, --cpus-per-task=<ncpus>
### (multithreading) Request that ncpus be allocated per process
#SBATCH -c 7
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --dependency=singleton
If I use a launcher to submit the same job with different node and task numbers, the system gets confused and tries to assign according to $SLURM_NTASK which gives 16. However I ask for example only 1 node,3tasks.
#!/bin/bash -l
for n in {1..4}
do
for t in {3..4}
do
echo "Running benchmark with ${n} nodes and ${t} tasks per node"
sbatch -N ${n} --ntasks-per-node=${t} spark-teragen.sh
sleep 5
sbatch -N ${n} --ntasks-per-node=${t} spark-terasort.sh
sleep 5
sbatch -N ${n} --ntasks-per-node=${t} spark-teravalidate.sh
sleep 5
done
done
How can I fix the error below by preventing slurm assign weird number of tasks per node which exceeds the limit.
Error:
srun: Warning: can't honor --ntasks-per-node set to 3 which doesn't match the
requested tasks 16 with the number of requested nodes 1. Ignoring --ntasks-per-node.
srun: error: Unable to create step for job 233838: More processors requested than
permitted

Slurm: select nodes with specified number of CPUs

I'm using slurm on a cluster where single partitions have dissimilar nodes. Specifically, the nodes have varying # CPUs. My code is a single-core application being used for a parameter sweep and thus I want to fully use an (eg.) 32 CPU node by sending it 32 jobs.
How can I select nodes (within a named partition) that have a specified number of CPUs?
I know my Partition configuration via
sinfo -e -p <partition_name> -o "%9P %3c %.5D %6t " -t idle,mix
PARTITION CPU NODES STATE
<partition_name> 16 63 mix
<partition_name> 32 164 mix
But if I use a submissions script like
[snip preamble]
#SBATCH --partition <partition_name> # resource to be used
#SBATCH --nodes 1 # Num nodes
#SBATCH -N 1 # Num cores per job
#SBATCH --cores-per-socket=32 # Cores per node
the slurm scheduler says
sbatch: error: Socket, core and/or thread specification can not be satisfied
PS. A minor correction: my code to get partition info isn't the best. Just in case anyone looks up this question later, here is a better query (using X,Y for socket, core counts) that helps identify the problem that damien's excellent answer solved
sinfo -e -p <partition_name> -o "%9P %3c %.3D %6t %2X %2Y %N" -t idle,mix

To strictly answer your question: With
#SBATCH --cores-per-socket=32
you request 32 core per socket, which is per physical CPU. I guess those machines have two CPUs so you should request something like
#SBATCH --sockets-per-node=2
#SBATCH --cores-per-socket=16
Another way of requesting the same is to ask for
#SBATCH --nodes 1
#SBATCH --tasks-per-node 32
But please note that, if your cluster allows node sharing, what you do seems more suited for job arrays :
#SBATCH --ntasks 1
#SBATCH --arrays 1-32
IDS=($(seq RUN_ID_FIRST RUN_ID_LAST))
RUN_ID=${IDS[$SLURM_ARRAY_TASK_ID]}
matlab -nojvm -singleCompThread -r "try myscript(${RUN_ID}); catch me; disp(' *** error'); end; exit" > ./result_${RUN_ID}
This will launch 32 independent jobs, each taking care of running the Matlab script for one value of the parameter sweep.
To answer your additional question; if a 32-process job is scheduled on a 16-CPU node, the node will be overloaded, and depending on the containment solution set up by the administrators, your processes might impact others' jobs and slow them down.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to allocate node by node with slurm? - slurm

I found why I couldn't allocated node by node. The option "oversubscribe" in the slurm.conf file wasn't specified. That's why I get nodes "mix" and not at 100% allocated. https://slurm.schedmd.com/cons_res_share.html Now I automatically use two nodes.

Related

runing a command multiple times in different nodes in SLURM

GPU allocation within a SBATCH

How do I dynamically assign the number of cores per node?

Why does slurm assign more tasks than I asked when I "sbatch" multiple jobs with a .sh file?

Slurm: select nodes with specified number of CPUs

Categories

Resources