slurm sbatch --output and --error flags ignored - slurm

i'm currently using slurm in my project, and am trying to run an very simple hello world job. I want to redirect my stdout and errorout to an specific file in an specific location. Therefore I used the following command: sudo su -c 'sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out' slurm. But I am ignored completely. He just tries (and fails because he has no permission) to create a file where the command is issued.
Im using a Debian 10 vagrant box. And my slurm version is slurm-wlm 18.08.5-2 (output from sinfo -V)
slurm job file:
#!/bin/sh
#SBATCH --time=1
srun -l /bin/hostname
srun -l /bin/pwd
srun -l echo "hello world"
slurm conf file:
ClusterName=slurm_cluster # By default ClusterName=linux
ControlMachine=Kitsune
ControlAddr=172.16.0.20
#
SlurmUser=slurm
SlurmdUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
DebugFlags=NO_CONF_HASH
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
JobCompType=jobcomp/none
#
# COMPUTE NODES
NodeName=worker1 NodeAddr=172.16.0.21 Port=6818 Procs=1 State=UNKNOWN
#NodeName=worker2 NodeAddr=172.16.0.22 Port=6818 Procs=1 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

Beware that writing
sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out
assumes --error and --output are parameters to job.script. Try
sbatch --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out /home/slurm/job.script

Related

Slurm Requested node configuration is not available

hello everyones so im trying to set up a new hpc cluster i made an account and added users and im using a partition but whenerver i run a job it gives me an error that request node configuration is not available i checked my slurm.conf but it seems good to me i need some help
the error Batch job submission failed: Requested node configuration is not available
#
# See the slurm.conf man page for more information.
#
SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
SlurmdSpoolDir=/cm/local/apps/slurm/var/spool
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
#ProctrackType=proctrack/pgid
ProctrackType=proctrack/cgroup
#PluginDir=
#FirstJobId=
ReturnToService=2
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
TaskPlugin=task/cgroup
#TrackWCKey=no
#TreeWidth=50
#TmpFs=
#UsePAM=
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
#SchedulerAuth=
#SchedulerPort=
#SchedulerRootFilter=
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
#PriorityWeightFairshare=100000
#PriorityWeightAge=1000
#PriorityWeightPartition=10000
#PriorityWeightJobSize=1000
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd
#JobCompType=jobcomp/filetxt
#JobCompLoc=/cm/local/apps/slurm/var/spool/job_comp.log
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherType=jobacct_gather/cgroup
#JobAcctGatherFrequency=30
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
# AccountingStorageLoc=slurm_acct_db
# AccountingStoragePass=SLURMDBD_USERPASS
# This section of this file was automatically generated by cmd. Do not edit manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
# Server nodes
SlurmctldHost=omics-master
AccountingStorageHost=master
# Nodes
NodeName=omics[01-05] Procs=48 Feature=local
# Partitions
PartitionName=defq Default=YES MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL Nodes=omics[01-05]
ClusterName=omics
# Scheduler
SchedulerType=sched/backfill
# Statesave
StateSaveLocation=/cm/shared/apps/slurm/var/cm/statesave/omics
PrologFlags=Alloc
# Generic resources types
GresTypes=gpu
# Epilog/Prolog section
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
# Power saving section (disabled)
# END AUTOGENERATED SECTION -- DO NOT REMOVE
and this is my sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up infinite 5 idle omics[01-05]
and this is my test script
#!/bin/bash
#SBATCH --nodes=2 # Number of nodes
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --output=std.out
#SBATCH --error=std.err
#SBATCH --mem-per-cpu=1gb
echo "hello from:"
hostname; pwd; date;
sleep 10
echo "going to sleep during 10 seconds"
echo "wake up, exiting
"
and thanks in advance
In the node definition, you do not specify RealMemory so Slurm assumes the default of 1MB (!) per node. Hence the request of 1GB per CPU cannot be fulfilled.
You should run slurmd -C on the compute node, that will give you the line to insert in the slurm.conf file for Slurm to correctly know the hardware resources it can allocate.
$ slurmd -C | head -1
NodeName=node002 CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=128547

HPC slurm - how to make an HPC node run multiple jobs' bash scripts at the same time

Let's suppose I have an HPC cluster with one node (node_1) and I want to send and run at the same time 3 jobs' bash scripts in node_1.
So far, when I send a job to node_1 the node is kept busy until the job ends.
How can I do this?
Shall I provide any specific argument in the job's bash script?
thanks
Update
Here below an example of a bash script I am using to send a job to the HPC:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=test
#SBATCH --nodelist=node_1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=8000
#SBATCH --output=1.out
#SBATCH --error=1.err
python /my/HPC/folder/script.py
Update
(base) [id#login_node ~]$ scontrol show node=node_1
NodeName=node_1 Arch=x86_64 CoresPerSocket=32
CPUAlloc=0 CPUTot=64 CPULoad=2.94
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node_1 NodeHostName=node_1 Version=18.08
OS=Linux 4.20.0-1.el7.elrepo.x86_64 #1 SMP Sun Dec 23 20:11:51 EST 2018
RealMemory=128757 AllocMem=0 FreeMem=111815 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=945178 Weight=1 Owner=N/A MCS_label=N/A
Partitions=test
BootTime=2019-12-09T14:09:25 SlurmdStartTime=2020-02-18T03:45:14
CfgTRES=cpu=64,mem=128757M,billing=64
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
You need to change the consumable resource type from nodes to cores in slurm.
Add this to your slurm.conf file
SelectType=select/cons_res
SelectTypeParameters=CR_Core
SelectType: Controls whether CPU resources are allocated to jobs and job steps in units of whole nodes or as consumable resources (sockets, cores or threads).
SelectTypeParameters: Defines the consumable resource type and controls other aspects of CPU resource allocation by the select plugin.
Reference
Also, the node description should also allows for that:
NodeName=<somename> NodeAddr=<someaddress> CPUs=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=12005 State=UNKNOWN
See also serverfault

Using SLURM and MPI(4PY): Cannot allocate requested resources

I have a setup/installation of SLURM on my desktop computer to do some testing and understand how it works before deploying it to a cluster.
The desktop computer is running Ubuntu 18.10 (Cosmic), as the nodes in the cluster are all running on. The used version of SLURM is 17.11.9.
I have tested some of the features of SLURM, e.g. job-arrays and its deployment of tasks.
However I would like to communicate with the different tasks sent out to each node or CPU in the cluster, in order to collect its results (without disk I/O). For that reason, I have looked in how manage that with e.g. message queuing, and MPI, or OpenMPI. (Any other implementation strategy, as an advice or recommendation is much appreciated.)
I have tested MPI with a simple Python snippet, starting a communication between two processes. I am using MPI4PY to handle this communication.
This code snippet runs fine with mpiexec-command, but running it via SLURM and sbatch-command I cannot get it to work. SLURM is configured with OpenMPI and opmi_info states that SLURM is supported.
OpenMPI version 3.1.2-6 (from dpkg -l | grep mpi)
SLURM_VERSION 17.11.9
Ubuntu 18.10 (Cosmic)
MPI4PY version 3.0.1. (from pip list)
This is the Python3.6 code snippet:
$cat mpi_test.py
from mpi4py import MPI
if __name__=='__main__':
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank==0:
data={'param1':1, 'param2':2, 'param3':3}
destinationNode = 1
print('Im', rank, 'sending to ', destinationNode)
comm.send(data, dest=destinationNode, tag=11)
elif rank!=0:
sourceNode = 0
dataRx=comm.recv(source=sourceNode, tag=11)
print('Im', rank, 'recieving from ', sourceNode)
for keys in dataRx.keys():
print('Data recieved: ',str(dataRx[keys]))
The python.mpi.sbatch used at the call with sbatch is:
$cat python.mpi.sbatch
#!/bin/bash -l
#SBATCH --job-name=mpiSimpleExample
#SBATCH --nodes=1
#SBATCH --error=slurm-err-%j.err
#SBATCH --export=all
#SBATCH --time=0-00:05:00
#SBATCH --partition=debug
srun -N 1 mpiexec -n 2 python3 mpi_test.py
#mpiexec -n 2 python3 mpi_test.py
exit 0
Running "sbatch python.mpi.sbatch" with this setup yields the following output:
$sbatch python.mpi.sbatch
$cat slurm-err-104.err
----------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots
that were requested by the application:
python3
Either request fewer slots for your application, or make more slots
available for use.
--------------------------------------------------------------------
Modifying python.mpi.sbatch to instead use:
"srun -n 1 mpiexec -n 1 python3 mpi_test.py" yields the error:
$cat slurm-err-105.error
Traceback (most recent call last):
File "mpi_test.py", line 18, in <module>
comm.send(data, dest=destinationNode, tag=11)
File "mpi4py/MPI/Comm.pyx", line 1156, in mpi4py.MPI.Comm.send
File "mpi4py/MPI/msgpickle.pxi", line 174, in mpi4py.MPI.PyMPI_send
mpi4py.MPI.Exception: MPI_ERR_RANK: invalid rank
---------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero
status, thus causing the job to be terminated. The first process to do
so was:
Process name: [[44366,1],0]
Exit code: 1
---------------------------------------------------------------------
Which is expected since it is only started with 1 node.
Running mpirun hostname, yields four instances of the machine, thus there should be four slots available for this machine.
I may run the Python3.6 with up to four (after modification of mpi_test.py) processess with the command "mpiexec -n 4 python3 mpi_test.py", with success.
Any help is much appreciated.
slurm.conf-file:
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=desktop-comp
#ControlAddr=
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=openmpi
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/lib/slurm-llnl/slurmd
SwitchType=switch/none
#TaskEpilog=
#TaskPlugin=task/affinity
#TaskPluginParam=Sched
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=desktop-comp CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=desktop-compDefault=YES MaxTime=INFINITE State=UP
In your update question you have in your slurm.conf the line
NodeName=desktop-comp CPUs=1 State=UNKNOWN
This tells slurm that you have only one CPU available on your node. You can try running slurmd -C to see what slurm discovers about your computer and copypaste the CPUs, CoresPerSocket etc. values to your slurm.conf.

sbatch sends compute node to 'drained' status

On newly installed and configured compute nodes in our small cluster I am unable to submit slurm jobs using a batch script and the 'sbatch' command. After submitting, the requested node changes to the 'drained' status. However, I can run the same command interactively using 'srun'.
Works:
srun -p debug --ntasks=1 --nodes=1 --job-name=test --nodelist=node6 -l echo 'test'
Does not work:
sbatch test.slurm
with test.slurm:
#!/bin/sh
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --nodelist=node6
#SBATCH --partition=debug
echo 'test'
It gives me:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 drain node6
and I have to resume the node.
All nodes run Debian 9.8, use Infiniband and NIS.
I have made sure that all nodes have the same config, version of packages and daemons running. So, I don't see what I am missing.
Seems like the issue was connected to the present NIS. Just needed to add to the end of /etc/passwd this line:
+::::::
and restart slurmd on the node:
/etc/init.d/slurmd restart

SLURM how to qsub a task when another task is finished?

I am currently using HPC based on Linux which use only SLURM to submit jobs, and the HPC only allows a job to be run for 12 hours. However, I may need to run 24 jobs continuously for a week to have good results.
Is there a way to run a job again (automatically) when it is finished?
Kind regards
Add:
When the job is finished a .out file will be created. In other words, the number of .out file will increase by 1.
Is it possible to requeue the job when the number of .out is increased?
#!/bin/bash
#!
#! Example SLURM job script for Darwin (Sandy Bridge, ConnectX3)
#! Last updated: Sat Apr 18 13:05:53 BST 2015
#!
#!#############################################################
#!#### Modify the options in this section as appropriate ######
#!#############################################################
#! sbatch directives begin here ###############################
#! Name of the job:
#SBATCH -J Validation
#! Which project should be charged:
#SBATCH -A SOGA
#! How many whole nodes should be allocated?
#SBATCH --nodes=1
#! How many (MPI) tasks will there be in total? (<= nodes*16)
#SBATCH --ntasks=1
#!SBATCH --mem=200
#! How much wallclock time will be required?
#SBATCH --time=12:00:00
#SBATCH --mail-user=zl352
#SBATCH --mail-type=ALL
#! Uncomment this to prevent the job from being requeued (e.g. if
#! interrupted by node failure or system downtime):
##SBATCH --no-requeue
#! Do not change:
#SBATCH -p sandybridge
#! sbatch directives end here (put any additional directives above this line)
#! Notes:
#! Charging is determined by core number*walltime.
#! The --ntasks value refers to the number of tasks to be launched by SLURM only. This
#! usually equates to the number of MPI tasks launched. Reduce this from nodes*16 if
#! demanded by memory requirements, or if OMP_NUM_THREADS>1.
#! Each task is allocated 1 core by default, and each core is allocated 3994MB. If this
#! is insufficient, also specify --cpus-per-task and/or --mem (the latter specifies
#! MB per node).
#! Number of nodes and tasks per node allocated by SLURM (do not change):
numnodes=$SLURM_JOB_NUM_NODES
numtasks=$SLURM_NTASKS
mpi_tasks_per_node=$(echo "$SLURM_TASKS_PER_NODE" | sed -e 's/^\([0-9][0-9]*\).*$/\1/')
#! ############################################################
#! Modify the settings below to specify the application's environment, location
#! and launch method:
#! Optionally modify the environment seen by the application
#! (note that SLURM reproduces the environment at submission irrespective of ~/.bashrc):
. /etc/profile.d/modules.sh # Leave this line (enables the module command)
module purge # Removes all modules still loaded
module load default-impi # REQUIRED - loads the basic environment
#! Insert additional module load commands after this line if needed:
#! Full path to application executable:
application="~/scratch/code7/viv"
#! Run options for the application:
options=" > test.e"
#! Work directory (i.e. where the job will run):
workdir="$SLURM_SUBMIT_DIR" # The value of SLURM_SUBMIT_DIR sets workdir to the directory
# in which sbatch is run.
#! Are you using OpenMP (NB this is unrelated to OpenMPI)? If so increase this
#! safe value to no more than 16:
export OMP_NUM_THREADS=1
#! Number of MPI tasks to be started by the application per node and in total (do not change):
np=$[${numnodes}*${mpi_tasks_per_node}]
#! The following variables define a sensible pinning strategy for Intel MPI tasks -
#! this should be suitable for both pure MPI and hybrid MPI/OpenMP jobs:
export I_MPI_PIN_DOMAIN=omp:compact # Domains are $OMP_NUM_THREADS cores in size
export I_MPI_PIN_ORDER=scatter # Adjacent domains have minimal sharing of caches/sockets
#! Notes:
#! 1. These variables influence Intel MPI only.
#! 2. Domains are non-overlapping sets of cores which map 1-1 to MPI tasks.
#! 3. I_MPI_PIN_PROCESSOR_LIST is ignored if I_MPI_PIN_DOMAIN is set.
#! 4. If MPI tasks perform better when sharing caches/sockets, try I_MPI_PIN_ORDER=compact.
#! Uncomment one choice for CMD below (add mpirun/mpiexec options if necessary):
#! Choose this for a MPI code (possibly using OpenMP) using Intel MPI.
#!CMD="mpirun -ppn $mpi_tasks_per_node -np $np $application $options"
#! Choose this for a pure shared-memory OpenMP parallel program on a single node:
#! (OMP_NUM_THREADS threads will be created):
CMD="$application $options"
#! Choose this for a MPI code (possibly using OpenMP) using OpenMPI:
#!CMD="mpirun -npernode $mpi_tasks_per_node -np $np $application $options"
###############################################################
### You should not have to change anything below this line ####
###############################################################
cd $workdir
echo -e "Changed directory to `pwd`.\n"
JOBID=$SLURM_JOB_ID
echo -e "JobID: $JOBID\n======"
echo "Time: `date`"
echo "Running on master node: `hostname`"
echo "Current directory: `pwd`"
if [ "$SLURM_JOB_NODELIST" ]; then
#! Create a machine file:
export NODEFILE=`generate_pbs_nodefile`
cat $NODEFILE | uniq > machine.file.$JOBID
echo -e "\nNodes allocated:\n================"
echo `cat machine.file.$JOBID | sed -e 's/\..*$//g'`
fi
echo -e "\nnumtasks=$numtasks, numnodes=$numnodes, mpi_tasks_per_node=$mpi_tasks_per_node (OMP_NUM_THREADS=$OMP_NUM_THREADS)"
echo -e "\nExecuting command:\n==================\n$CMD\n"
eval $CMD
If your job is intrinsically restartable, all you need to do is to call sbatch at the end of your submission script. Assuming it is called submit.sh
if ! job_is_done;
then
sbatch submit.sh
fi
The job_is_done part should be replaced by a command that returns 0 when the job is done (i.e. computation finished, process converged, etc.) for instance by 'grepping' in the log file for certain clues.
You can also re-queue the job:
job_is_done || scontrol requeue $SLURM_JOB_ID
If your program is not intrinsically restartable, you could use a wrapper such as DMCTP to make it restartable.

Resources