Slurm does not allocate the resources and keeps waiting - slurm

I'm trying to use our cluster but I have issues. I tried allocating some resources with:
salloc -N 1 --ntasks-per-node=5 bash
but It keeps wainting on:
salloc: Pending job allocation ...
salloc: job ... queued and waiting for resources
or when I try:
srun -N1 -l echo test
it lingers at waiting queue!
Am I making a mistake or there is something wrong with our cluster?

It might help to set a time limit for the Slurm job using the option --time, for instance set a limit of 10 minutes like this:
srun --job-name="myJob" --ntasks=4 --nodes=2 --time=00:10:00 --label echo test
Without time limit, Slurm will use the partition's default time limit. The issue is that sometimes this might be set to infinity or to several days, so this might cause a delay in the start of the job. To check the partition's default time limit use:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
prod* up infinite 198 ....
gpu* up 4-00:00:00 70 ....
From the Slurm docs:
-t, --time=<time>
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition's default time limit. When the time limit is reached, each task in each job step is sent SIGTERM followed by SIGKILL.

Related

Monitor memory usage of each node in a slurm job

My slurm job uses several nodes, and I want to know the maximum memory usage of each node for a running job. What can I do?
Right now, I can ssh into each node and do free -h -s 30 > memory_usage, but I think there must be a better way to do this.
The Slurm accounting will give you the maximum memory usage over time over all tasks directly. If that information is not sufficient, you can setup profiling following this documentaiton and you will receive from Slurm the full memory usage of each process as a time series for the duration of the job. You can then aggregate per node, find the maximum, etc.

Default job time limit in Slurm

I want to allow the user scheduling a job to list any job time limit using -t, --time=<time>. However, when the user does not set a time limit I'd like to impose a default time limit, for example 1 hour. I can't find any setting in slurm.conf to do this.
The default time limit is set per partition. If not specified, the maximum time limit is used:
DefaultTime
Run time limit used for jobs that don't specify a value. If not set then MaxTime will be used. Format is the same as for MaxTime.
Example:
PartitionName=debug Nodes=dev[0-8,18-25] MaxTime=12:00:00 DefaultTime=00:30:00 Default=YES
This will set the maximum wall time for the partition to 12 hours and the default, if not specified by the user, to 30 minutes.
You can't set default time limit twice, right? If user do not specify time then the job will be terminated automatically when the job is completed. You can read about -t, --time here. Anyways, the default time limit is the partition's default time limit. So, you can have it changed as you like.
Here's an example of slurm.conf to set time-limit for partition -
# slurm.conf file
# for CPU
PartitionName=cpu Nodes=ALL Default=YES MaxTime=INFINITE State=UP
# for GPU
PartitionName=gpu Nodes=ALL MaxTime=INFINITE State=UP

Access reason why slurm stopped a job

Is there a way to find out why a job was canceled by slurm? I would like to distinguish the cases where a resource limit was hit from all other reasons (like a manual cancellation). In case a resource limit was hit, I also would like to know which one.
The slurm log file contains that information explicitly. It is also written to the job's output file with something like:
JOB <jobid> CANCELLED AT <time> DUE TO TIME LIMIT
or
Job <jobid> exceeded <mem> memory limit, being killed:
or
JOB <jobid> CANCELLED AT <time> DUE TO NODE FAILURE
etc.

What should I write in qsub for checkpointing automatically?

checkpoint.mat
is my checkpoint file.
job.m is my matlab file.
When my job exceeds 24 hours, it is terminated by the server. I implemented checkpointing in my own matlab file.
But what should I write in my qsub file?
Here is what a link, https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub, wrote:
[-c checkpoint_options]
n No checkpointing is to be performed.
s Checkpointing is to be performed only when the server executing the job is shutdown.
c Checkpointing is to be performed at the default minimum time for the server executing
the job.
c=minutes
Checkpointing is to be performed at an interval of minutes, which is the integer number
of minutes of CPU time used by the job. This value must be greater than zero.
[-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h]
But from here, I still cannot figure out how to write checkpointing once the job exceed the maximum allowed time, which is 24 hrs in my case. I want the job to be resubmitted once every 24 hours, from the state of checkpointing each time. I am also not from NYU, so is there a different syntax in qsub file to specify checkpointing?
This is what I wrote in my pbs:
.....
#PBS -c c=1440 minutes
....
But it does not work.

Dynamically Submit Qsub Jobs on SGE cluster based on cluster load

I am trying to run qsub jobs on a SGE(Sun Grid Engine) cluster that supports a maximum of 688 jobs. I would like to know if there is any way to find out the total number of jobs that are currently running on the cluster so I can submit jobs based on the current cluster load.
I plan to do something like: sleep for 1 minute and check again if the number of jobs in the cluster is < 688 and then submit jobs further.
And just to clarify my question pertains to knowing the total number of jobs submitted on the cluster not just the jobs I have submitted currently.
Thanks in advance.
You can use qstat to list the jobs of all users; this with awk and wc can be used to find out the total number of jobs on the cluster:
qstat -u "*" | awk '{if ($5 == "r" || $5 == "qw") print $0;}' | wc -l
The above command also takes into account jobs that are queued and waiting to be scheduled on a compute node.
However, the cluster sysadmins could disallow users to check on jobs that don't belong to them. You can verify if you can see other user's jobs by running:
qstat -u "*"
If you know for a fact that another user is running a job and yet you can't see it while running the above command, it's most likely that the sys admins disabled that option.
Afterthought: from my understanding, you're just a regular cluster user - why are you even bothering to submit jobs this way. Why don't you just submit all the jobs that you want and if the cluster can't schedule your jobs, it will just put them in a qw state and schedule them whenever SGE feels is the most appropriate time.
Depending on how cluster is configured, using job array (-t option for qsub) would get around this limit.
I have similar limits set for maximum number of jobs a single user can submit. This limit pertains to individual instances of qsub and not single job array submission potentially many tasks (that limit is set via another configuration variable, max_aj_tasks).

Resources