Can I change Cron launch time when it's processing? - cron

I'm new to Cron. I've just made a cron, which imports 8000 products. I've made it to import every hour, but I never want it to process again and to ensure, that it doesn't start again, when it's still goinf. Therefore, I want to change the time to like every week or something. Is this possible, even though the cron job is running?
Sorry for my bad formulation. I hope you understand.
Have a pleasant day.

Related

How quickly is CRON triggered?

First Example
Suppose I have a CRON job
30 2 * * * ....
Then this would run every time when it is 2:30 at night (local time).
Now suppose I have the time zone Europe/Germany and it's 2017-10-29 (the day when DST is switched). Then this CRON job would run twice, right?
Second Example
Suppose I have the time zone Europe/Germany and the CRON job
30 11 * * * ....
As Germany never had a DST change at 11:30, this will not interfere. But the user could change the local time. To be super clear: This question is NOT about DST.
For the following test cases, I would like to know if/how often the CRON job gets scheduled:
At 11:29:58.0, the user sets the time to 11:31:00
At 11:29:59.1, the user sets the time to 11:31:00
At 11:29:59.6, the user sets the time to 11:31:00
At 11:30:01.0, the user sets the time to 11:29:59.7 - is CRON executed directly afterwards?
They boil down to How quickly is CRON triggered?, where the 4th one also has the question if CRON stores that it was already executed for that minute.
Another variant of the same question:
At 11:29:59, the NTP service corrects the time to 11:31:00 - will the job be executed that day at all?
The easiest way to answer this with confidence is to take a look at the source for the cron daemon. There are a few versions online like this, or you can use apt-get source cron.
The tick cycle in cron is to repeatedly sleep for a minute, or less if there is a job coming up. Immediately after emerging from the sleep, it checks the time and treats the result as one of these wakeupKind values:
Expected time - run any jobs we were expecting
Small jump forwards (up to 5 minutes) - run the jobs for the intervening minutes
Medium jump forwards (up to 3 hours, so this would include DST starting in spring) - run any wildcard jobs first (because the catch up could take more than a minute), then catch up on the intervening fixed time jobs
Large jump (3 hours or more either way) - start over with the current time
Jump backwards (up to 3 hours, so including the end of DST) - because any fixed time jobs have 'probably' already run, only run any wildcard jobs until the time is caught up again
If in doubt, the source comments these wakeupKind values clearly.
Edit
To follow up on whether sleep() could be affected by a clock change, it looks like the answer is indirectly there in a couple of the Linux man pages.
Firstly the notes for the sleep() function confirm that is implemented by nanosleep()
The notes for nanosleep() say Linux measures the time using the CLOCK_MONOTONIC clock (even though POSIX.1 says it shouldn't)
Scroll down a bit in the docs for clock_settime() to see the explanation of CLOCK_MONOTONIC, which explains it is not affected by jumps in the system time, but it would be affected by incremental NTP style clock sync adjustments.
So in summary, a system admin style clock change will have no effect on the sleep(). But for example if an NTP adjustment came in and said to 'gently' advance the clock, cron would experience a series of slightly short sleep() function calls.
There are many implementations of cron systems (See here). One of the most commonly used cron's is Vixie cron. And its man page states:
Daylight Saving Time and other time changes
Local time changes of less than three hours, such as those caused by the Daylight Saving Time changes, are handled in a special way. This only applies to jobs that run at a specific time and jobs that run with a granularity
greater than one hour. Jobs that run more frequently are scheduled normally.
If time was adjusted one hour forward, those jobs that would have run in the interval that has been skipped will be run immediately. Conversely, if time was adjusted backwards, running the same job twice is avoided.
Time changes of more than 3 hours are considered to be corrections to the clock or the timezone, and the new time is used immediately.
source: man 8 cron
I believe this answers most of your points.
In addition to point five:
At 11:29:59, the NTP service corrects the time to 11:31:00 - will the job be executed that day at all?
First of, if NTP corrects the time with more then a minute, you have a very bad clock! This should not happen too often. Generally, you might have such a step when you enable NTP but then it should be much less.
In any case, if the DeltaT is not to high, generally below 125 ms, your system will slew the time. Slewing the time means to change the virtual frequency of the software clock to make the clock go faster or slower until the requested correction is achieved. Slewing the clock for a larger amount of time may require some time, too. For example standard Linux adjusts the time with a rate of 0.5ms per second.
This implies, (under the assumption of Vixie cron, and probably many others):
If NTP jumps more then 3 hours, the job is skipped
If NTP jumps less then 3 hours but more then 125 ms, Vixie cron handles the job nicely by assuming the concepts of the time-jumps.
If NTP corrects the time for less then 125 ms, cron does not notice the time-jump due to the slewing.
Interesting information:
RFC5905: Network Time Protocol Version 4: Protocol and Algorithms Specification
The NTP FAQ and Howto
https://wiki.gentoo.org/wiki/Cron/en
You're actually asking two related questions. The general answer is it depends[1], but I'll answer based on the Debian Linux installation I'm on right now:
How does cron handle DST changes and other 'special' time-related events?
On my Debian Linux system cron handles 'DST and other time-related changes/fixes' (per the man page) so that jobs don't get run twice or skipped due to changes like DST. (See https://debian-handbook.info/browse/stable/sect.task-scheduling-cron-atd.html for more specifics) Related to the 5th point raised in your second question, I would expect these same facilities to deal with NTP-related time jumps but don't know for certain.
How often is cron triggered and how quickly does it pick up my crontab changes?
Again, on my Debian Linux system the cron daemon wakes up once a minute and will detect and utilize any crontab changes man since the previous check/run one minute ago. Note that there is no guarantee that cron fires off at 12:00:00 or 12:00:59 or any specific time between (only that it fire when the time is 12:00:??) so in the event that you change a crontab at 12:00:17 but cron fired at 12:00:13, your changes will not be picked up until the next run (most likely at 12:01:13 though there might be a slight amount of variance due to the Linux scheduler)
[1] It Depends...
The precise answer absolutely depends both on the platform (Linux/Unix/BSD/OS X/Windows) and the particular implementation of cron (there have been several over the decades with derivatives of Vixie cron being prevalent on Linux and BSD per https://en.wikipedia.org/wiki/Vixie_cron). If you're running something other than Linux, the man page / documentation for your implementation should provide details as to the specifics of how often it runs, picks up modified crontabs, DST handling etc. If you really need to know the specific details, df778899 is right in that you should look at the source code for your implementation as needed... because sometimes software/documentation is buggy.
On mac OS:
$> man cron
...
Available options:
-s Enable special handling of situations when the GMT offset of the local timezone changes, such as the switches between the standard time and daylight saving time.
The jobs run during the GMT offset changes time as intuitively expected. If a job falls into a time interval that disappears (for example, during the switch from standard time) to daylight saving time
or is duplicated (for example, during the reverse switch), then it is handled in one of two ways:
The first case is for the jobs that run every at hour of a time interval overlapping with the disappearing or duplicated interval. In other words, if the job had run within one hour before the GMT
offset change (and cron was not restarted nor the crontab(5) changed after that) or would run after the change at the next hour. They work as always, skip the skipped time or run in the added time as
usual.
The second case is for the jobs that run less frequently. They are executed exactly once, they are not skipped nor executed twice (unless cron is restarted or the user's crontab(5) is changed during
such a time interval). If an interval disappears due to the GMT offset change, such jobs are executed at the same absolute point of time as they would be in the old time zone. For example, if exactly
one hour disappears, this point would be during the next hour at the first minute that is specified for them in crontab(5).
-o Disable the special handling of situations when the GMT offset of the local timezone changes, to be compatible with the old (default) behavior. If both options -o and -s are specified, the option
specified last wins.

Linux task schedule to Hour, minute, second

I'm trying to run a shell script at a specific time up to it's seconds (H:M:S) , but so far all programs such as at only go up to a specific minute (not second).
I don't want to use sleep since it's not accurate. For some reason it ended couple of hours earlier than it was supposed to!
Your question doesn't seem to define accuracy, but there is always some jitter in scheduling in electronic devices. You might use quartz to schedule to the second. You could also use at or cron to schedule to the minute and then sleep the appropriate number of second(s).

cron jobs: Monitor time it takes for jobs to finish

I'm doing a research project that requires I monitor cron jobs on a Ubuntu Linux system. I have collected data about the jobs' tasks and when they are started, I just don't know of a way to monitor how long they take to finish running.
I could calculate the time of finishing the task minus starting it with something like this but that would require doing that on the Shell scripts of each cron job. That's not necessarily difficult by any means but it seems a little silly that cron wouldn't in some way log this, so I'm trying to find an easier way :P
tl;dr Figure out time cron jobs take from start to finish
You could just put time in front of your crontabs, and if you're getting notifications about cron script outputs, it'll get sent to you.
For example, if you had:
0 1,13 * * * /maint/run_webalizer.sh
add time in front
0 1,13 * * * time /maint/run_webalizer.sh
and you'll get some output that looks like (the "real" is the time you want):
real 3m1.255s
user 0m37.890s
sys 0m3.492s
If you don't get cron notifications, you can just pipe the output to a file.
man time. Maybe you can create a wrapper and tell Cron to use it as your "shell" or something like that.
Cronitor (https://cronitor.io) is a tool I built exactly for this purpose. It uses http requests to record the start and end of your jobs.
You'll be notified if your job doesn't run on schedule, or if it runs for too long/too short. You can also configure it to send alerts to you via email, sms, but also Slack, Hipchat, Pagerduty and others.
I use the Jenkins CI to do this via its external-monitor-job plugin. Jenkins can track start and end times, track overall execution time over time, save the output of all jobs it tracks, and present success/failure conditions graphically.
https://wiki.jenkins-ci.org/display/JENKINS/Monitoring+external+jobs

whether to use job scheduler or sleep() function

I am confused whether to use cron job scheduler or use sleep function in the program itself. There are questions on this previously but I seem to have some different requirements form them.
I need some information from the previous run of the program so if I use cron to schedule
job I would have to store that information at some place and re-read it next time(this can make the program less scale-able if the size of this information grows).
I can also use sleep() but that will be using resources.
I will need to re-run the program every 10 mins or so. Which one is better to use.
Is there any other nice way of doing it which I may be missing.
In general you should use cron whenever you can for something like this.
The only problem I could foresee is if your program somehow took longer than 10 minutes to run, cron is going to call the next execution 10 minutes later anyway. This creates a really long race condition basically, where if you did sleep it would only start sleeping after the previous execution ended.
But assuming your program will take less time to run, I say go with cron.

How to define frequency of a job in application by users?

I have an application that has to launch jobs repeatingly. But (yes, that would have been to easy without a but...) I would like users to define their backup frequency in application.
In worst case, they would have to choose between :
weekly,
daily,
every 12 hours,
every 6 hours,
hourly
In best case, they should be able to use crontab expressions (see documentation for example)
How to do this? Do I launch a job every minutes that check for last execution time, frequency and then launches another job if needed? Do I create a sort of queue that will be executed by a masterjob?
Any clues, ideas, opinions, best pratices, experiences are welcome!
EDIT : Solved this problem using Akka scheduler. Ok, this is a technical solution not a design answer but still everything works great.
Each user defined repetition is an actor that send messages every period to a new actor to execute the actual job.
There may be two ways to do this depending on your requirements/architecture:
If you can only use Play:
The user creates the job and the frequency it will run (crontab, whatever).
On saving the job, you calculate the first time it will have to be run. You then add an entry to a table JOBS with the execution time, job id, and any other information required. This is required as Play is stateless and information must be stored in the DB for later retrieval.
You have a job that queries the table for entries whose execution date is less than now. Retrieves the first, runs it, removes it from the table and adds a new entry for next execution. You should keep some execution counter so if a task fails (which means the entry is not removed from DB) it won't block execution of the other tasks by the job trying again and again.
The frequency of this job is set to run every second. That way while there is information in the table, you should execute the request around as often as they are required. As Play won't spawn a new job while the current one is working if you have enough tasks this one job will serve all. If not, it will be killed at some point and restored when required.
Of course, the crons of the users will not be too precise, as you have to account for you own cron delays plus execution delays on all the tasks in queue, which will be run sequentially. Not the best approach, unless you somehow disallow crons which run every second or more often than every minute (to be safe). Doing a check on execution time of the crons to kill them if they are over a certain amount of time would be a good idea.
If you can use more than Play:
The better alternative I believe is to use Quartz (see this) to create a future execution when the user creates the job, and reproram it once the execution is over.
There was a discussion on google-groups about it. As far as I remember you must define a job which start every 6 hours and check which backups must be done. So you must remember when the last backup job was finished and make the control yourself. I'm unsure if Quartz can handle such a requirement.
I looked in the source-code (always a good source ;-)) and found a method every, where I think this should be do what you want. How ever I'm unsure if this is a clever design, because if you have 1000 user you will have then 1000 Jobs. I'm unsure if Play was build to handle such a large number of jobs.
[Update] For cron-expressions you should have a look into JobPlugin.scheduleForCRON()
There are several ways to solve this.
If you don't have a really huge load of jobs, I'd just persist them to a table using the required flexibility. Then check all of them every hour (or the lowest interval you support) and run those eligible. Simple.
Or, if you prefer to use cron syntax anyway, just write (export) jobs to a user crontab using a wrapper which calls back to your running app, or starts the job in a standalone process if that's possible.

Resources