How to set the maximum priority to a Slurm job? - slurm

as administrator I need to give the maximum priority to a given job.
I have found that submission options like: --priority=<value> or --nice[=adjustment] could be useful, but I do not know which values I should assign them in order to provide the job with the highest priority.
Another approach could be to set a low priority by default to all the jobs and to the special ones increase it.
Any idea of how I could carry it out?
EDIT: I am using sched/backfill policy and the default job priority policy (FIFO).
Thank you.

I found a solution that works without the need of using PriorityType=priority/multifactor (as suggested by Bub Espinja):
$ scontrol update job=<job-id> Priority=<any-integer>
The above command will update the priority of the job and update the queue accordingly.
The minimum priority needed to become the next one in line can be found by checking the priority of the next pending job and adding one to it. You can find the priority of a job using the following:
$ scontrol show job=<job-id>
(scontrol update can be used to change many aspects of a job, such as time limit and others.)
EDIT:
I just learned one can do
$ scontrol top <job-id>
to put a job at the top of their queue.

What I have done is to use the priority plug-in multifactor, with the default configuration, adding this line to slurm.conf:
PriorityType=priority/multifactor
Then, as all the jobs will have the priority 0, I must update the target job priority, in my case using the API:
job_desc_msg_t job_update;
slurm_init_job_desc_msg(&job_update);
job_update.job_id = target_job_id;
job_update.priority = 4294967295;
slurm_update_job(&job_update);
EDITED:
From the Slurm FAQ:
The job's priority is an integer that ranges between 0 and 4294967295. The larger the number, the higher the job will be positioned in the queue, and the sooner the job will be scheduled.

Related

[Slurm]Changing the order of jobs when using the Builtin scheduler type

Basically, I want to let the system follow FIFO, but sometimes I want to change the priority of jobs by the administrator.
This is why I set the scheduler type to builtin, but in that case, I could not change the order of jobs using the scontrol top command.
I also tested the backfill scheduler type with default_queue_depth=0 option but it isn't worked.
If anyone have a good idea, please help me.
I could resolve this issue by using scontrol hold and scontrol release Command. The hold status takes precedence over the builtin scheduler type setting.

Block Resource in Optaplanner Job scheduling

I've managed to use the Job scheduling example for a project I'm working on. I have an additionnal constraint I would like to add. Some Resources should be blocked sometimes. For example a Global renewable Resource shouldn't be used between minutes 10 to 20. Is it currently already doable or if not, how can it be done in the score calculation ?
Thanks
Use a custom shadow variable listener to predict the starting time of each task.
Then simply have a hard constraint to check that the task won't overlap with its blocks.
Penalize the amount of overlap to avoid a "score trap".

Why in kubernetes cron job two jobs might be created, or no job might be created?

In k8s Cron Job Limitations mentioned that there is no guarantee that a job will executed exactly once:
A cron job creates a job object about once per execution time of its
schedule. We say “about” because there are certain circumstances where
two jobs might be created, or no job might be created. We attempt to
make these rare, but do not completely prevent them. Therefore, jobs
should be idempotent
Could anyone explain:
why this could happen?
what are the probabilities/statistic this could happen?
will it be fixed in some reasonable future in k8s?
are there any workarounds to prevent such a behavior (if the running job can't be implemented as idempotent)?
do other cron related services suffer with the same issue? Maybe it is a core cron problem?
The controller:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controller.go
starts with a comment that lays the groundwork for an explanation:
I did not use watch or expectations. Those add a lot of corner cases, and we aren't expecting a large volume of jobs or scheduledJobs. (We are favoring correctness over scalability.)
If we find a single controller thread is too slow because there are a lot of Jobs or CronJobs, we we can parallelize by Namespace. If we find the load on the API server is too high, we can use a watch and UndeltaStore.)
Just periodically list jobs and SJs, and then reconcile them.
Periodically means every 10 seconds:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controller.go#L105
The documentation following the quoted limitations also has some useful color on some of the circumstances under which 2 jobs or no jobs may be launched on a particular schedule:
If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrentPolicy is set to AllowConcurrent, the jobs will always run at least once.
Jobs may fail to run if the CronJob controller is not running or broken for a span of time from before the start time of the CronJob to start time plus startingDeadlineSeconds, or if the span covers multiple start times and concurrencyPolicy does not allow concurrency. For example, suppose a cron job is set to start at exactly 08:30:00 and its startingDeadlineSeconds is set to 10, if the CronJob controller happens to be down from 08:29:00 to 08:42:00, the job will not start. Set a longer startingDeadlineSeconds if starting later is better than not starting at all.
Higher level, solving for only-once in a distributed system is hard:
https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
Clocks and time synchronization in a distributed system is also hard:
https://8thlight.com/blog/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html
To the questions:
why this could happen?
For instance- the node hosting the CronJobController fails at the time a job is supposed to run.
what are the probabilities/statistic this could happen?
Very unlikely for any given run. For a large enough number of runs, very unlikely to escape having to face this issue.
will it be fixed in some reasonable future in k8s?
There are no idemopotency-related issues under the area/batch label in the k8s repo, so one would guess not.
https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Aarea%2Fbatch
are there any workarounds to prevent such a behavior (if the running job can't be implemented as idempotent)?
Think more about the specific definition of idempotent, and the particular points in the job where there are commits. For instance, jobs can be made to support more-than-once execution if they save state to staging areas, and then there is an election process to determine whose work wins.
do other cron related services suffer with the same issue? Maybe it is a core cron problem?
Yes, it's a core distributed systems problem.
For most users, the k8s documentation gives perhaps a more precise and nuanced answer than is necessary. If your scheduled job is controlling some critical medical procedure, it's really important to plan for failure cases. If it's just doing some system cleanup, missing a scheduled run doesn't much matter. By definition, nearly all users of k8s CronJobs fall into the latter category.

sysV init.d: what does the priority really mean?

The docs for chkconfig are a bit loose on what the priority number actually means, and the docs for init don't even mention priority on my machine.
Say you have the following:
/etc/rc.d/rc3.d/S01foo
/etc/rc.d/rc3.d/S02bar
Which one is run first? The one with the higher priority (bar)? Or is the priority number more of a start-order number, so the lower numbers are started before the higher numbers?
What if it was K01foo and K02bar. Which one would be stopped first? Greater priority one, or is it more of a "stop-order"?
After some experimentation I was able to figure it out.
It's more of an 'order from least to greatest' process.
In other words, the lower the priority number, the sooner the job will stop/start.
S01foo will start before S02bar, and K01foo will stop before K02bar.
Hopefully this saves someone 15 minutes.

Possible to change a job's priority after creation in kue?

The requirement is simple, after creating a job in kue with a given priority, is it possible to change its priority (like renice in POSIX) before it's scheduled to run?
I had the same need.
It seems that job.priority(level).update(fn) works. Job#save(fn) could also be used as it calls Job#update(fn) if the job has already been saved.

Resources