Preventing cronjobs from overlapping - linux

I have 3 different jobs set up in crontab (call them jobA, jobB, jobC) that run at different intervals and start at different times during the day. For example, jobA runs once per hour at 5 mins past the hour, jobB runs every 30 mins at 9 and 39 mins past the hour, and jobC runs every 15 mins. They are not dependent on each other, but for various reasons they can NOT be running at the same time.
The problem is that sometimes one of the jobs takes a long time to run and another one starts before the first one is done, causing issues.
Is there some way to queue or spool these jobs so that one will not start until the current running one has finished? I tried using this solution but this does not guarantee that the pending jobs will resume in the same order they were supposed to start. A queue would be best, but I cannot find anything about how to do this.

You can't do that using cron. Cron is used to run a specific command at specific time. You can do it by the solution you proposed, but that adds a lot more complexity.
I suggest, writing/coding the requirement in high level language like java and use a mutil-thread program to achieve what you need.
Control-m is another scheduling software, with a lot of other features as well. You would be able to integrate the above use-case in it.

Related

Prevent concurrent cron jobs in pg-boss

I’m considering pg-boss for running and distributing event-based jobs between the instances of the same service. One of my use cases, apart from event-based, is scheduled jobs. Some of them can take a while and continue running until it’s time to trigger the next invocation - e.g. a job is set to run every 5 minutes but it can take e.g. 8 to complete. In such case I need the system to realize that the previous run is still in progress and not trigger the same job while the previous invocation of it is still in progress, using the example of every 5 minutes and a job taking 8 minutes - I’d like sth like the following to happen:
13:00 job triggered
13:05 job still runs, system sees it and doesn’t trigger once more even though it’s time
13:08 job done
13:10 next job run triggered
Is there an elegant way to achieve it with pg-boss without implementing my own locking mechanism?

Linux task schedule to Hour, minute, second

I'm trying to run a shell script at a specific time up to it's seconds (H:M:S) , but so far all programs such as at only go up to a specific minute (not second).
I don't want to use sleep since it's not accurate. For some reason it ended couple of hours earlier than it was supposed to!
Your question doesn't seem to define accuracy, but there is always some jitter in scheduling in electronic devices. You might use quartz to schedule to the second. You could also use at or cron to schedule to the minute and then sleep the appropriate number of second(s).

Repartition of cron

(I apologize in advance for bad formulation of my problem, please consider english is not my first language).
I have several processes (crons) and I want to "optimize" the schedule when to launch them.
For example, cron c1 starts every 3 minutes, cron c1 starts every 7 minutes and cron c3 starts every 18 minutes. Assume they last only a few seconds before stopping.
The unit of time here is 1 minute.
Now, what I want is that these crons are distributed so that we don't have a moment where many of them start and then long time interval with no cron at all. For example, if c1 and c3 both start at time 0, then they will start again together every 18 minutes. It would be better to start cron c1 at time 0 and then c3 at time 1, so that they are never launched together.
So the idea is, given a list of crons with periodicity, to plan a schedule, so that there is as much time between each cron as possible and as few as possible moments when two crons start together.
Are there some well-known algorithms about such problems?
The real-life application of this problem is: ~ 200 crons. Some of them are launched every 5 or ~10 or ~30 minutes and last very short (few seconds), some (~20 - 25) are launched every 2 hours and last a few minutes. So the idea is also that the big crons are not launched at the same time.
I am a mathematician myself and not a computer scientist, so I asked this question on https://math.stackexchange.com/, since I consider this being a "nice" question for mathematicians too.
I think you should consider the ressources used by each of your crons and then schedule your jobs from that.
I don't think there is a particular algorithm for that.

whether to use job scheduler or sleep() function

I am confused whether to use cron job scheduler or use sleep function in the program itself. There are questions on this previously but I seem to have some different requirements form them.
I need some information from the previous run of the program so if I use cron to schedule
job I would have to store that information at some place and re-read it next time(this can make the program less scale-able if the size of this information grows).
I can also use sleep() but that will be using resources.
I will need to re-run the program every 10 mins or so. Which one is better to use.
Is there any other nice way of doing it which I may be missing.
In general you should use cron whenever you can for something like this.
The only problem I could foresee is if your program somehow took longer than 10 minutes to run, cron is going to call the next execution 10 minutes later anyway. This creates a really long race condition basically, where if you did sleep it would only start sleeping after the previous execution ended.
But assuming your program will take less time to run, I say go with cron.

How to define frequency of a job in application by users?

I have an application that has to launch jobs repeatingly. But (yes, that would have been to easy without a but...) I would like users to define their backup frequency in application.
In worst case, they would have to choose between :
weekly,
daily,
every 12 hours,
every 6 hours,
hourly
In best case, they should be able to use crontab expressions (see documentation for example)
How to do this? Do I launch a job every minutes that check for last execution time, frequency and then launches another job if needed? Do I create a sort of queue that will be executed by a masterjob?
Any clues, ideas, opinions, best pratices, experiences are welcome!
EDIT : Solved this problem using Akka scheduler. Ok, this is a technical solution not a design answer but still everything works great.
Each user defined repetition is an actor that send messages every period to a new actor to execute the actual job.
There may be two ways to do this depending on your requirements/architecture:
If you can only use Play:
The user creates the job and the frequency it will run (crontab, whatever).
On saving the job, you calculate the first time it will have to be run. You then add an entry to a table JOBS with the execution time, job id, and any other information required. This is required as Play is stateless and information must be stored in the DB for later retrieval.
You have a job that queries the table for entries whose execution date is less than now. Retrieves the first, runs it, removes it from the table and adds a new entry for next execution. You should keep some execution counter so if a task fails (which means the entry is not removed from DB) it won't block execution of the other tasks by the job trying again and again.
The frequency of this job is set to run every second. That way while there is information in the table, you should execute the request around as often as they are required. As Play won't spawn a new job while the current one is working if you have enough tasks this one job will serve all. If not, it will be killed at some point and restored when required.
Of course, the crons of the users will not be too precise, as you have to account for you own cron delays plus execution delays on all the tasks in queue, which will be run sequentially. Not the best approach, unless you somehow disallow crons which run every second or more often than every minute (to be safe). Doing a check on execution time of the crons to kill them if they are over a certain amount of time would be a good idea.
If you can use more than Play:
The better alternative I believe is to use Quartz (see this) to create a future execution when the user creates the job, and reproram it once the execution is over.
There was a discussion on google-groups about it. As far as I remember you must define a job which start every 6 hours and check which backups must be done. So you must remember when the last backup job was finished and make the control yourself. I'm unsure if Quartz can handle such a requirement.
I looked in the source-code (always a good source ;-)) and found a method every, where I think this should be do what you want. How ever I'm unsure if this is a clever design, because if you have 1000 user you will have then 1000 Jobs. I'm unsure if Play was build to handle such a large number of jobs.
[Update] For cron-expressions you should have a look into JobPlugin.scheduleForCRON()
There are several ways to solve this.
If you don't have a really huge load of jobs, I'd just persist them to a table using the required flexibility. Then check all of them every hour (or the lowest interval you support) and run those eligible. Simple.
Or, if you prefer to use cron syntax anyway, just write (export) jobs to a user crontab using a wrapper which calls back to your running app, or starts the job in a standalone process if that's possible.

Resources