i'm currently using slurm in my project, and am trying to run an very simple hello world job. I want to redirect my stdout and errorout to an specific file in an specific location. Therefore I used the following command: sudo su -c 'sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out' slurm. But I am ignored completely. He just tries (and fails because he has no permission) to create a file where the command is issued.
Im using a Debian 10 vagrant box. And my slurm version is slurm-wlm 18.08.5-2 (output from sinfo -V)
slurm job file:
#!/bin/sh
#SBATCH --time=1
srun -l /bin/hostname
srun -l /bin/pwd
srun -l echo "hello world"
slurm conf file:
ClusterName=slurm_cluster # By default ClusterName=linux
ControlMachine=Kitsune
ControlAddr=172.16.0.20
#
SlurmUser=slurm
SlurmdUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
DebugFlags=NO_CONF_HASH
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
JobCompType=jobcomp/none
#
# COMPUTE NODES
NodeName=worker1 NodeAddr=172.16.0.21 Port=6818 Procs=1 State=UNKNOWN
#NodeName=worker2 NodeAddr=172.16.0.22 Port=6818 Procs=1 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Beware that writing
sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out
assumes --error and --output are parameters to job.script. Try
sbatch --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out /home/slurm/job.script
Let's suppose I have an HPC cluster with one node (node_1) and I want to send and run at the same time 3 jobs' bash scripts in node_1.
So far, when I send a job to node_1 the node is kept busy until the job ends.
How can I do this?
Shall I provide any specific argument in the job's bash script?
thanks
Update
Here below an example of a bash script I am using to send a job to the HPC:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=test
#SBATCH --nodelist=node_1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=8000
#SBATCH --output=1.out
#SBATCH --error=1.err
python /my/HPC/folder/script.py
Update
(base) [id#login_node ~]$ scontrol show node=node_1
NodeName=node_1 Arch=x86_64 CoresPerSocket=32
CPUAlloc=0 CPUTot=64 CPULoad=2.94
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node_1 NodeHostName=node_1 Version=18.08
OS=Linux 4.20.0-1.el7.elrepo.x86_64 #1 SMP Sun Dec 23 20:11:51 EST 2018
RealMemory=128757 AllocMem=0 FreeMem=111815 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=945178 Weight=1 Owner=N/A MCS_label=N/A
Partitions=test
BootTime=2019-12-09T14:09:25 SlurmdStartTime=2020-02-18T03:45:14
CfgTRES=cpu=64,mem=128757M,billing=64
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
You need to change the consumable resource type from nodes to cores in slurm.
Add this to your slurm.conf file
SelectType=select/cons_res
SelectTypeParameters=CR_Core
SelectType: Controls whether CPU resources are allocated to jobs and job steps in units of whole nodes or as consumable resources (sockets, cores or threads).
SelectTypeParameters: Defines the consumable resource type and controls other aspects of CPU resource allocation by the select plugin.
Reference
Also, the node description should also allows for that:
NodeName=<somename> NodeAddr=<someaddress> CPUs=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=12005 State=UNKNOWN
See also serverfault
I have a setup/installation of SLURM on my desktop computer to do some testing and understand how it works before deploying it to a cluster.
The desktop computer is running Ubuntu 18.10 (Cosmic), as the nodes in the cluster are all running on. The used version of SLURM is 17.11.9.
I have tested some of the features of SLURM, e.g. job-arrays and its deployment of tasks.
However I would like to communicate with the different tasks sent out to each node or CPU in the cluster, in order to collect its results (without disk I/O). For that reason, I have looked in how manage that with e.g. message queuing, and MPI, or OpenMPI. (Any other implementation strategy, as an advice or recommendation is much appreciated.)
I have tested MPI with a simple Python snippet, starting a communication between two processes. I am using MPI4PY to handle this communication.
This code snippet runs fine with mpiexec-command, but running it via SLURM and sbatch-command I cannot get it to work. SLURM is configured with OpenMPI and opmi_info states that SLURM is supported.
OpenMPI version 3.1.2-6 (from dpkg -l | grep mpi)
SLURM_VERSION 17.11.9
Ubuntu 18.10 (Cosmic)
MPI4PY version 3.0.1. (from pip list)
This is the Python3.6 code snippet:
$cat mpi_test.py
from mpi4py import MPI
if __name__=='__main__':
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank==0:
data={'param1':1, 'param2':2, 'param3':3}
destinationNode = 1
print('Im', rank, 'sending to ', destinationNode)
comm.send(data, dest=destinationNode, tag=11)
elif rank!=0:
sourceNode = 0
dataRx=comm.recv(source=sourceNode, tag=11)
print('Im', rank, 'recieving from ', sourceNode)
for keys in dataRx.keys():
print('Data recieved: ',str(dataRx[keys]))
The python.mpi.sbatch used at the call with sbatch is:
$cat python.mpi.sbatch
#!/bin/bash -l
#SBATCH --job-name=mpiSimpleExample
#SBATCH --nodes=1
#SBATCH --error=slurm-err-%j.err
#SBATCH --export=all
#SBATCH --time=0-00:05:00
#SBATCH --partition=debug
srun -N 1 mpiexec -n 2 python3 mpi_test.py
#mpiexec -n 2 python3 mpi_test.py
exit 0
Running "sbatch python.mpi.sbatch" with this setup yields the following output:
$sbatch python.mpi.sbatch
$cat slurm-err-104.err
----------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots
that were requested by the application:
python3
Either request fewer slots for your application, or make more slots
available for use.
--------------------------------------------------------------------
Modifying python.mpi.sbatch to instead use:
"srun -n 1 mpiexec -n 1 python3 mpi_test.py" yields the error:
$cat slurm-err-105.error
Traceback (most recent call last):
File "mpi_test.py", line 18, in <module>
comm.send(data, dest=destinationNode, tag=11)
File "mpi4py/MPI/Comm.pyx", line 1156, in mpi4py.MPI.Comm.send
File "mpi4py/MPI/msgpickle.pxi", line 174, in mpi4py.MPI.PyMPI_send
mpi4py.MPI.Exception: MPI_ERR_RANK: invalid rank
---------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero
status, thus causing the job to be terminated. The first process to do
so was:
Process name: [[44366,1],0]
Exit code: 1
---------------------------------------------------------------------
Which is expected since it is only started with 1 node.
Running mpirun hostname, yields four instances of the machine, thus there should be four slots available for this machine.
I may run the Python3.6 with up to four (after modification of mpi_test.py) processess with the command "mpiexec -n 4 python3 mpi_test.py", with success.
Any help is much appreciated.
slurm.conf-file:
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=desktop-comp
#ControlAddr=
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=openmpi
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/lib/slurm-llnl/slurmd
SwitchType=switch/none
#TaskEpilog=
#TaskPlugin=task/affinity
#TaskPluginParam=Sched
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=desktop-comp CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=desktop-compDefault=YES MaxTime=INFINITE State=UP
In your update question you have in your slurm.conf the line
NodeName=desktop-comp CPUs=1 State=UNKNOWN
This tells slurm that you have only one CPU available on your node. You can try running slurmd -C to see what slurm discovers about your computer and copypaste the CPUs, CoresPerSocket etc. values to your slurm.conf.
On newly installed and configured compute nodes in our small cluster I am unable to submit slurm jobs using a batch script and the 'sbatch' command. After submitting, the requested node changes to the 'drained' status. However, I can run the same command interactively using 'srun'.
Works:
srun -p debug --ntasks=1 --nodes=1 --job-name=test --nodelist=node6 -l echo 'test'
Does not work:
sbatch test.slurm
with test.slurm:
#!/bin/sh
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --nodelist=node6
#SBATCH --partition=debug
echo 'test'
It gives me:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 drain node6
and I have to resume the node.
All nodes run Debian 9.8, use Infiniband and NIS.
I have made sure that all nodes have the same config, version of packages and daemons running. So, I don't see what I am missing.
Seems like the issue was connected to the present NIS. Just needed to add to the end of /etc/passwd this line:
+::::::
and restart slurmd on the node:
/etc/init.d/slurmd restart
I submit some cluster mode spark jobs which run just fine when I do it one by one with below sbatch specs.
#!/bin/bash -l
#SBATCH -J Spark
#SBATCH --time=0-05:00:00 # 5 hour
#SBATCH --partition=batch
#SBATCH --qos qos-batch
###SBATCH -N $NODES
###SBATCH --ntasks-per-node=$NTASKS
### -c, --cpus-per-task=<ncpus>
### (multithreading) Request that ncpus be allocated per process
#SBATCH -c 7
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --dependency=singleton
If I use a launcher to submit the same job with different node and task numbers, the system gets confused and tries to assign according to $SLURM_NTASK which gives 16. However I ask for example only 1 node,3tasks.
#!/bin/bash -l
for n in {1..4}
do
for t in {3..4}
do
echo "Running benchmark with ${n} nodes and ${t} tasks per node"
sbatch -N ${n} --ntasks-per-node=${t} spark-teragen.sh
sleep 5
sbatch -N ${n} --ntasks-per-node=${t} spark-terasort.sh
sleep 5
sbatch -N ${n} --ntasks-per-node=${t} spark-teravalidate.sh
sleep 5
done
done
How can I fix the error below by preventing slurm assign weird number of tasks per node which exceeds the limit.
Error:
srun: Warning: can't honor --ntasks-per-node set to 3 which doesn't match the
requested tasks 16 with the number of requested nodes 1. Ignoring --ntasks-per-node.
srun: error: Unable to create step for job 233838: More processors requested than
permitted