The cluster I use just switched to SLURM and I'm trying to do something I think is very simple. I have a script I want to run on many files numbered sequentially, like:
python script.py file1.gz
python script.py file2.gz
python script.py file3.gz
I have some pieces, but can't figure out how to put them together to run. I think I need to use #SBATCH --array=0-29 to call the number of files, and $SLURM_ARRAY_TASK_ID is also involved.
#!/bin/bash -l
#SBATCH --ntasks=1
#SBATCH --time=24:00:00
#SBATCH --mem=4G
#SBATCH --array=0-29 ##my files go from file1 - file30
$SLURM_ARRAY_TASK_ID
I'm not sure how to incorporate SBATCH --array and ARRAY_TASK_ID to get script.py running on all files at once.
You almost have it:
#!/bin/bash -l
#SBATCH --ntasks=1
#SBATCH --time=24:00:00
#SBATCH --mem=4G
#SBATCH --array=1-30 ##my files go from file1 - file30
python script.py "file${SLURM_ARRAY_TASK_ID}.gz"
You better number the array elements like the file numbering you have. And then, call the script with the proper file name created with that SLURM variable.
This jobs submission script queues an array of jobs, each job having a time limit of 24h, uses up to 4GiB of memory and only runs one task (of one processor).
Related
I used to have access to a slurm server where I would submit the following batch job:
#!/bin/bash
#SBATCH --job-name=sdmodel
#SBATCH --output=logs/out/%a
#SBATCH --error=logs/err/%a
#SBATCH --nodes=1
#SBATCH --partition=common,scavenger
#SBATCH -c 10
#SBATCH --mem-per-cpu=12GB
#SBATCH --array=1-236
module load Matlab/R2021a
matlab -nodisplay -r "run('main.m'); exit"
Now the new server, is simply linux (no slurm). So the sbatch command does not work. Is there anyway to do something similar?
If the "new server" is the frontend for a cluster you want to use that runs a different batch scheduling system (there are several alternatives to SLURM out there), then you'll need to consult the documentation or sysadmin to identify the new batch scheduling system and then read its documentation.
If the new server is just a single interactive time-shared Linux server (as opposed to a batch-scheduled cluster), then you can probably execute your same script unmodified directly from the command line. One of the benefits of the #SBATCH directive format is they are just comments to bash and will be ignored when the script is interactively executed by the shell.
If your question is actually asking how to run your script in the background and capture the output into files (in a manner similar to execution of your script under SLURM), you could try a command like the following (assuming your script above is named myscript.sh):
$ mkdir -p logs/{out,err}
$ id=`date +%Y-%m-%d_%H:%M:%S` ; echo "Running $id" ; nohup myscript.sh >logs/out/$id 2>logs/err/$id &
It is fairly easy to submit an array job, e.g.
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --array 1-10
module load python
python script.py
This will run script.py 10 times using separate array jobs.
How do I tell slurm that e.g. only 2 nodes are used for all jobs in the array?
I am aware on how to limit concurrently running jobs via something like 1-10%5, but his is not what I am looking for.
Moreover --nodes seems to be applied to the individual jobs.
I was wondering if I could ask something about running slurm jobs in parallel.(Please note that I am new to slurm and linux and have only started using it 2 days ago...)
As per the insturctions on the picture below (source : https://hpc.nmsu.edu/discovery/slurm/serial-parallel-jobs/),
I have designed the following bash script
#!/bin/bash
#SBATCH --job-name fmriGLM #job name을 다르게 하기 위해서
#SBATCH --nodes=1
#SBATCH -t 16:00:00 # Time for running job
#SBATCH -o /scratch/connectome/dyhan316/fmri_preprocessing/FINAL_loop_over_all/output_fmri_glm.o%j #%j : job id 가 [>
#SBATCH -e /scratch/connectome/dyhan316/fmri_preprocessing/FINAL_loop_over_all/error_fmri_glm.e%j
pwd; hostname; date
#SBATCH --ntasks=30
#SBATCH --mem-per-cpu=3000MB
#SBATCH --cpus-per-task=1
for num in {0..29}
do
srun --ntasks=1 python FINAL_ARGPARSE_RUN.py --n_division 30 --start_num ${num} &
done
wait
The, I ran sbatch as follows: sbatch test_bash
However, when I view the outputs, it is apparent that only one of the sruns in the bash script are being executed... Could anyone tell me where I went wrong and how I can fix it?
**update : when I look at the error file I get the following : srun: Job 43969 step creation temporarily disabled, retrying. I searched the internet and it says that this could be caused by not specifying the memory and hence not having enough memory for the second job.. but I thought that I already specifeid the memory when I did --mem_per_cpu=300MB?
**update : I have tried changing the code as said as in here : Why are my slurm job steps not launching in parallel?, but.. still it didn't work
**potentially pertinent information: our node has about 96cores, which seems odd when compared to tutorials that say one node has like 4cores or something
Thank you!!
Try adding --exclusive to the srun command line:
srun --exclusive --ntasks=1 python FINAL_ARGPARSE_RUN.py --n_division 30 --start_num ${num} &
This will instruct srun to use a sub-allocation and work as you intended.
Note that the --exclusive option has a different meaning in this context than if used with sbatch.
Note also that different versions of Slurm have a distinct canonical way of doing this, but using --exclusive should work across most versions.
Even though you have solved your problem which turned out to be something else, and that you have already specified --mem_per_cpu=300MB in your sbatch script, I would like to add that in my case, my Slurm setup doesn't allow --mem_per_cpu in sbatch, only --mem. So the srun command will still allocate all the memory and block the subsequent steps. The key for me, is to specify --mem_per_cpu (or --mem) in the srun command.
I've installed Slurm on a 2-node cluster. Both nodes are compute nodes, one is the controller also. I am able to successfully run srun with multiple jobs at once. I am running GPU jobs and have confirmed I can get multiple jobs running on multiple GPUs with srun, up to the number of GPUs in the systems.
However, when I try running sbatch with the same test file, it will only run one batch job, and it only runs on the compute node which is also the controller. The others fail, with an ExitCode of 1:0 in the sacct summary. If I try forcing it to run on the compute node that's not the controller, it won't run and shows the 1:0 exit code. However, just using srun will run on any compute node.
I've made sure the /etc/slurm/slurm.conf files are correct with the specs of the machines. Here is the sbatch .job file I am using:
#!/bin/bash
#SBATCH --job-name=tf_test1
#SBATCH --output=/storage/test.out
#SBATCH --error=/storage/test.err
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2000
##SBATCH --mem=10gb
#SBATCH --gres=gpu:1
~/anaconda3/bin/python /storage/tf_test.py
Maybe there is some limitation with sbatch I don't know about?
sbatch creates a job allocation and launches what is called the 'batch step'.
If you aren't familiar with what a job step is, I recommend this page: https://slurm.schedmd.com/quickstart.html
The batch step runs the script passed to it from sbatch. The only way to launch additional job steps is to invoke srun inside the batch step. In your case, it would be
srun ~/anaconda3/bin/python /storage/tf_test.py
This will create a job step running tf_test.py on each task in the allocation. Note that while the command is the same as when you run srun directly, it detects that is inside an allocation via environment variables from sbatch. You can split up the allocation into multiple job steps by running srun with flags like -n[num tasks] instead. ie
#!/bin/bash
#SBATCH --ntasks=2
srun --ntasks=1 something.py
srun --ntasks=1 somethingelse.py
I don't know if you're having any other problems because you didn't post any other error messages or logs.
If using srun on the second node works and using sbatch with the submission script you mention fails without any output written, the most probable reason would be that /storage does not exist, or is not writable by the user, on the second node.
The slurmd logs on the second node should be explicit about this. The default location is /var/log/slurm/slurmd.log, but check the output of scontrol show config| grep Log for definitive information.
Another probable cause that lead to the same behaviour would be that the user is not defined or has a different UID on the second node (but then srun would fail too)
#damienfrancois answer was closest and maybe even correct. After making sure the /storage location was available on all nodes, things run with sbatch. The biggest issue was the /storage location is shared via NFS, but it was read-only for the compute nodes. This had to be changed in /etc/exports to look more like:
/storage *(rw,sync,no_root_squash)
Before it was ro...
The job file I have that works is also a bit different. Here is the current .job file:
#!/bin/bash
#SBATCH -N 1 # nodes requested
#SBATCH --job-name=test
#SBATCH --output=/storage/test.out
#SBATCH --error=/storage/test.err
#SBATCH --time=2-00:00
#SBATCH --mem=36000
#SBATCH --qos=normal
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER#nothing.com
#SBATCH --gres=gpu
srun ~/anaconda3/bin/python /storage/tf_test.py
I am new to Parallel computing, I cannot understand the use of PBS systems. I have successfully install SLURM and set up processing nodes. But cannot get the idea how I can distribute a task between multiple nodes.
There are a lot of simple examples, but they just run simple "Hello World" programs and that's all.
Consider the following example, I've found on the internet.
#!/bin/bash
#SBATCH -N 4
#SBATCH -c 1
#SBATCH --time=0-00:15:00 # 30 minutes
#SBATCH --job-name="just_a_test"
module load python
python --version
Simple script that run gets the Python version.
When I run it using sbatch python.slurm, the result is saved only on the first node even if I set the number to 4. But srun -N4 /bin/hostname works fine on the other hand.
But this is not the main question.
I cannot understand what I have to write my parallel algorithm.
Any example of parallel algorithm like array sorting, matrix multiplication or whatever.
The steps that are used for example in Hadoop or just multithreaded environment.
Get input from a source.
Divide the input into chunks,the number of chunks should be related to the node count.
Send these chunk to each processing node/thread
Wait for all threads to complete
Gather processed information and show it user after merging
How can I do the same using SLURM or any PBS.
#!/bin/bash
#SBATCH -N 4
#SBATCH -c 1
#SBATCH --time=0-00:15:00 # 30 minutes
#SBATCH --job-name="just_a_test"
what I have to write here ?
Please explain this or give a good article to read about, because I haven't found any.
Thanks
The most basic way to do this is to use pbsdsh:
pbsdsh hostname
will make the hostname command execute once for each execution slot (core or thread) in your job. I will also point out that you'll need to translate your #SBATCH to their #PBS equivalents.
The more universal way to do this is through MPI implementations.