Slurm job array - slurm

It is fairly easy to submit an array job, e.g.
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --array 1-10
module load python
python script.py
This will run script.py 10 times using separate array jobs.
How do I tell slurm that e.g. only 2 nodes are used for all jobs in the array?
I am aware on how to limit concurrently running jobs via something like 1-10%5, but his is not what I am looking for.
Moreover --nodes seems to be applied to the individual jobs.

Related

Slurm: Schedule Job Arrays to Minimal Number of Nodes

I am running Slurm 19.05.2 with Cloud nodes only. I specified
SelectType = select/cons_tres
SelectTypeParameters = CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK
To make sure that a node is fully utilised before allocating a second node.
It seem to work well with jobs that have many tasks. If I have 8 nodes which each have 16 cores and I submit a job which has 8 tasks and each task requires 2 cores this will be scheduled to one node.
For example the script:
#!/bin/bash
#
#SBATCH --job-name=batch
#SBATCH --output=o_batch.%A.%a.txt
#
#SBATCH --ntasks=8
#SBATCH --time=10:00
#SBATCH --cpus-per-task 2
#SBATCH --mem-per-cpu=100
srun hostname
will output
node-010000
node-010000
node-010000
node-010000
node-010000
node-010000
node-010000
node-010000
If I specify a job array with --array=1-8 (--ntasks=1) all jobs of the array will be scheduled on a different node (even though one node could fit all job requirements)
#!/bin/bash
#
#SBATCH --job-name=array
#SBATCH --output=array.%A.%a.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#SBATCH --array=1-8
srun hostname
will output
node-010000
node-010001
node-010002
node-010003
node-010004
node-010005
node-010006
node-010007
Is there a way of configuring slurm to behave the same with arrays as with task?

How do I dynamically assign the number of cores per node?

#!/bin/bash
#SBATCH --job-name=Parallel# Job name
#SBATCH --output=slurmdiv.out# Output file name
#SBATCH --error=slurmdiv.err # Error file name
#SBATCH --partition=hadoop# Queue
#SBATCH --nodes = 1
#SBATCH --time=01:00:00# Time limit
The above script does not work without specifying --ntasks-per-node directive. The number of cores per node depends on the queue being used. I would like to assign the maximum number of cores per node without having to specify it ahead of time in the slurm script. I'm using this to run an R script that uses detectCores() and mclapply.
You can try adding the #SBATCH --exclusive parameter to your submission script so that Slurm will allocate a full node for your job, without the need to specify explicitly a specific number of tasks. Then you can use detectCores() in your script.

Slurm can't run more than one sbatch task

I've installed Slurm on a 2-node cluster. Both nodes are compute nodes, one is the controller also. I am able to successfully run srun with multiple jobs at once. I am running GPU jobs and have confirmed I can get multiple jobs running on multiple GPUs with srun, up to the number of GPUs in the systems.
However, when I try running sbatch with the same test file, it will only run one batch job, and it only runs on the compute node which is also the controller. The others fail, with an ExitCode of 1:0 in the sacct summary. If I try forcing it to run on the compute node that's not the controller, it won't run and shows the 1:0 exit code. However, just using srun will run on any compute node.
I've made sure the /etc/slurm/slurm.conf files are correct with the specs of the machines. Here is the sbatch .job file I am using:
#!/bin/bash
#SBATCH --job-name=tf_test1
#SBATCH --output=/storage/test.out
#SBATCH --error=/storage/test.err
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2000
##SBATCH --mem=10gb
#SBATCH --gres=gpu:1
~/anaconda3/bin/python /storage/tf_test.py
Maybe there is some limitation with sbatch I don't know about?
sbatch creates a job allocation and launches what is called the 'batch step'.
If you aren't familiar with what a job step is, I recommend this page: https://slurm.schedmd.com/quickstart.html
The batch step runs the script passed to it from sbatch. The only way to launch additional job steps is to invoke srun inside the batch step. In your case, it would be
srun ~/anaconda3/bin/python /storage/tf_test.py
This will create a job step running tf_test.py on each task in the allocation. Note that while the command is the same as when you run srun directly, it detects that is inside an allocation via environment variables from sbatch. You can split up the allocation into multiple job steps by running srun with flags like -n[num tasks] instead. ie
#!/bin/bash
#SBATCH --ntasks=2
srun --ntasks=1 something.py
srun --ntasks=1 somethingelse.py
I don't know if you're having any other problems because you didn't post any other error messages or logs.
If using srun on the second node works and using sbatch with the submission script you mention fails without any output written, the most probable reason would be that /storage does not exist, or is not writable by the user, on the second node.
The slurmd logs on the second node should be explicit about this. The default location is /var/log/slurm/slurmd.log, but check the output of scontrol show config| grep Log for definitive information.
Another probable cause that lead to the same behaviour would be that the user is not defined or has a different UID on the second node (but then srun would fail too)
#damienfrancois answer was closest and maybe even correct. After making sure the /storage location was available on all nodes, things run with sbatch. The biggest issue was the /storage location is shared via NFS, but it was read-only for the compute nodes. This had to be changed in /etc/exports to look more like:
/storage *(rw,sync,no_root_squash)
Before it was ro...
The job file I have that works is also a bit different. Here is the current .job file:
#!/bin/bash
#SBATCH -N 1 # nodes requested
#SBATCH --job-name=test
#SBATCH --output=/storage/test.out
#SBATCH --error=/storage/test.err
#SBATCH --time=2-00:00
#SBATCH --mem=36000
#SBATCH --qos=normal
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER#nothing.com
#SBATCH --gres=gpu
srun ~/anaconda3/bin/python /storage/tf_test.py

SLURM: same script run on different but sequentially numbered files

The cluster I use just switched to SLURM and I'm trying to do something I think is very simple. I have a script I want to run on many files numbered sequentially, like:
python script.py file1.gz
python script.py file2.gz
python script.py file3.gz
I have some pieces, but can't figure out how to put them together to run. I think I need to use #SBATCH --array=0-29 to call the number of files, and $SLURM_ARRAY_TASK_ID is also involved.
#!/bin/bash -l
#SBATCH --ntasks=1
#SBATCH --time=24:00:00
#SBATCH --mem=4G
#SBATCH --array=0-29 ##my files go from file1 - file30
$SLURM_ARRAY_TASK_ID
I'm not sure how to incorporate SBATCH --array and ARRAY_TASK_ID to get script.py running on all files at once.
You almost have it:
#!/bin/bash -l
#SBATCH --ntasks=1
#SBATCH --time=24:00:00
#SBATCH --mem=4G
#SBATCH --array=1-30 ##my files go from file1 - file30
python script.py "file${SLURM_ARRAY_TASK_ID}.gz"
You better number the array elements like the file numbering you have. And then, call the script with the proper file name created with that SLURM variable.
This jobs submission script queues an array of jobs, each job having a time limit of 24h, uses up to 4GiB of memory and only runs one task (of one processor).

Submitting an array of jobs on SLURM

I am trying to submit an array of jobs on SLURM but the sleep command doesn't work as expected. I would like to launch a job every 10 seconds. However, this code waits 10 seconds to launch the whole array of jobs. How should I modify the following bash file?
#!/usr/bin/env bash
# The name to show in queue lists for this job:
#SBATCH -J matlab.sh
# Number of desired cpus:
#SBATCH --cpus=1
#SBATCH --mem=8gb
# The time the job will be running:
#SBATCH --time=167:00:00
# To use GPUs you have to request them:
##SBATCH --gres=gpu:1
# If you need nodes with special features uncomment the desired constraint line:
##SBATCH --constraint=bigmem
#SBATCH --constraint=cal
##SBATCH --constraint=slim
# Set output and error files
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
# MAKE AN ARRAY JOB, SLURM_ARRAYID will take values from 1 to 100
#SARRAY --range=1-60
# To load some software (you can show the list with 'module avail'):
module load matlab
export from=400
export to=1000
export steps=60
mkdir temp_${SLURM_ARRAYID}
cd temp_${SLURM_ARRAYID}
# the program to execute with its parameters:
matlab < ../SS.m > output_temp_${SLURM_ARRAYID}.out
sleep 10
From the documentation
A maximum number of simultaneously running tasks from the job array
may be specified using a "%" separator. For example "--array=0-15%4"
will limit the number of simultaneously running tasks from this job
array to 4.
So if you want to submit a job array of 60 jobs, but run only one job at a time, updating your submission script like this should to the trick
#SBATCH --array=1-60%1

Resources