I have a job script with the following PBS specifications at the beginning:
#PBS -t 0-99
#PBS -l nodes=1:ppn=1
The first line says that this is a job array, with jobs indexed from 0 to 99.
I want each individual indexed job to use only a single node and a single core per node, hence my second PBS line. But I am worried that TORQUE qsub will interpret the second line as saying that the whole job array should run sequentially on a single core.
How does TORQUE qsub interpret the PBS second line?
It interprets it as 100 jobs that should each use 1 execution slot on one node. For more information, please look at the qsub documentation and look for the details on the -t switch.
Related
I often run many jobs on slurm. Some finish faster than others. However, it is always hard to keep track which job is which. Can I give custom job names on slurm? If so what is the command on the batch script? Would that show up when I do squeue --me?
The parameter is --job-name (or -J), for instance:
#SBATCH --job-name=exp1_run2
The squeue output will list exp1_run2 for the corresponding job ID under column NAME.
What is the best way to process lots of files in parallel via Slurm?
I have a lot of files (let's say 10000) in a folder (Each files get 10 secs or so to process). I want to determine sbatch job array size as 1000, naturally. (#SBATCH --array=1-10000%100) But it seems I can't determine more than some numbers(probably 1k). How you handle job array numbers? It seems to me because of my process don't take too much time, I think i should determine one job NOT for one file but for multiple files, right?
Thank you
If the process time is 10 second you should consider packing the tasks in a single job, both because such short jobs take longer to schedule than to run and because there is a limit on the number of jobs in an array.
Your submission script could look like this:
#!/bin/bash
#SBATCH --ntasks=16 # or any other number depending on the size of the cluster and the maximum allowed wall time
#SBATCH --mem-per-cpu=...
#SBATCH --time=... # based on the number of files and number of tasks
find . -name file_pattern -print0 | xargs -I{} -0 -P $SLURM_NTASKS srun -n1 -c1 --exclusive name_of_the_program {}
Make sure to replace all the ... and file_pattern and name_of_the_program with appropriate values.
The script will look for all files matching file_pattern in the submission directory and run the name_of_the_program program on it, limiting the number of concurrent instantes to the number of CPUs (more precisely number of tasks) requested. Note the use of --exclusive here which is specific for this use case and is deprecated with --exact in recent Slurm versions.
Let's say I want to submit a slurm job just assigning the total amount of tasks (--ntasks=someNumber), without specifying the number of nodes and the tasks per node. Is there a way to know within the launched slurm script how many cores are assigned by slurm for each of the reserved nodes? I need to know this info to properly create a machinefile for the program I'm launching, that must be structured like this:
node02:7
node06:14
node09:3
Once the job is launched, the only way I figured out to see what cores have been allocated on the nodes is using the command:
scontrol show jobid -dd
In its output the abovementioned info is stored (together with plenty of other details).
Is there a better way to get this info?
The way the srun documentation illustrates creating a machine file is by running srun hostname. To get the output you want you could run
srun hostname -s | sort | uniq -c | awk '{print $2":"$1}' > $MACHINEFILE
You should check the documentation of your program to see if it accepts a machine file with repetitions rather than a suffix count. If so you can simplify the command as
srun hostname -s > $MACHINEFILE
And of course the first step is actually to make sure you indeed need a machine file in the first place as many parallel programs/libraries have Slurm support and can gather the needed information from the environment variables setup by Slurm upon job start.
Using torque user can specify slot limit when submitting the job array by using the %, e.g.: qsub job.sh -t 1-20%5 will create a job array with 20 jobs, but with only 5 running simultaneously.
Currently I work with PBS Professional, but unfortunately, as far as I can see, option % is not supported. How can I achieve similar behavior as % in torque as simple as possible?
How do the terms "job", "task", and "step" as used in the SLURM docs relate to each other?
AFAICT, a job may consist of multiple tasks, and also it make consist of multiple steps, but, assuming this is true, it's still not clear to me how tasks and steps relate.
It would be helpful to see an example showing the full complexity of jobs/tasks/steps.
A job consists in one or more steps, each consisting in one or more tasks each using one or more CPU.
Jobs are typically created with the sbatch command, steps are created with the srun command, tasks are requested, at the job level with --ntasks or --ntasks-per-node, or at the step level with --ntasks. CPUs are requested per task with --cpus-per-task. Note that jobs submitted with sbatch have one implicit step; the Bash script itself.
Assume the hypothetical job:
#SBATCH --nodes 8
#SBATCH --tasks-per-node 8
# The job requests 64 CPUs, on 8 nodes.
# First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir.
# No need for more than one task per node, but it has to run on every node
srun --nodes 8 --ntasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID
# Second step with the full allocation (64 tasks) to run an MPI
# program on some data to produce some output.
srun process.mpi <input.dat >output.txt
# Third step with a sub allocation of 48 tasks (because for instance
# that program does not scale as well) to post-process the output and
# extract meaningful information
srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt &
# Fourth step with a sub-allocation on a single node
# to compress the raw output. This step runs at the same time as
# the previous one thanks to the ampersand `&`
srun --ntasks 12 --nodes 1 --exclusive compress.mpi output.txt &
wait
Four steps were created and so the accounting information for that job will have 5 lines; one per step plus one for the Bash script itself.