How does cron internally schedule jobs? - linux

How do "modern" cron daemons internally schedule their jobs? Some cronds used to schedule a run every so often via at. So after a crontab is written out, does crond:
Parse the crontab for all future events and the sleep for the intervals?
Poll an aggregated crontab database every minute to determine if the current time matches the schedule pattern?
Other?
Thanks,

A few crickets heard in this question. Good 'ol RTFC with some discrete event simulation papers and Wikipedia:
http://en.wikipedia.org/wiki/Cron#Multi-user_capability
The algorithm used by this cron is as
follows:
On start-up, look for a file named .crontab in the home directories of
all account holders.
For each crontab file found, determine the next time in the future
that each command is to be run.
Place those commands on the Franta-Maly event list with their
corresponding time and their "five
field" time specifier.
Enter main loop:
Examine the task entry at the head of the queue, compute how far in the
future it is to be run.
Sleep for that period of time.
On awakening and after verifying the correct time, execute the task at
the head of the queue (in background)
with the privileges of the user who
created it.
Determine the next time in the future to run this command and place
it back on the event list at that time

I wrote a blog post describing it.
Quoting the relevant text from there:
We can have a finite thread-pool which will execute all the tasks by picking them up from a PriorityBlockingQueue (thread-safe heap) prioritized on job.nextExecutionTime().
Meaning that the top element of this heap will be always be the one that will fire the soonest.
We will be following the standard threadpool producer-consumer pattern.
We will have one thread which will be running in an infinite loop and submitting new jobs to the thread pool after consuming them from the queue.
Lets call it QueueConsumerThread:
void goToSleep(job, jobQueue){
jobQueue.push(job);
sleep(job.nextExecutionTime() - getCurrentTime());
}
void executeJob(job, jobQueue){
threadpool.submit(job); // async call
if (job.isRecurring()) {
job = job.copy().setNextExecutionTime(getCurrentTime() + job.getRecurringInterval());
jobQueue.add(job);
}
}
#Override
void run(){
while(true)
{
job = jobQueue.pop()
if(job.nextExecutionTime() > getCurrentTime()){
// Nothing to do
goToSleep(job, jobQueue)
}
else{
executeJob(job, jobQueue)
}
}
}
There will be one more thread which will be monitoring the crontab file for any new job additions and will push them to the queue.
Lets call it QueueProducerThread:
#Override
void run()
{
while(true)
{
newJob = getNewJobFromCrontabFile() // blocking call
jobQueue.push(newJob)
}
}
However, there is a problem with this:
Imagine that Thread1 is sleeping and will wake up after an hour.
Meanwhile a new task arrives which is supposed to run every minute.
This new task will not be able to start executing until an hour later.
To solve this problem, we can have ProducerThread wakeup ConsumerThread from its sleep forcefully whenever the new task has to run sooner than the front task in the queue:
#Override
void run()
{
while(true)
{
newJob = getNewJobFromCrontabFile() // blocking call
jobQueue.push(newJob)
if(newJob == jobQueue.peek())
{
// The new job is the one that will be scheduled next.
// So wakeup consumer thread so that it does not oversleep.
consumerThread.interrupt()
}
}
}
Note that this might not be how cron is implemented internally.
However, this is the most optimal solution that I can think of.
It requires no polling and all threads sleep until they need to do any work.

Related

Interrupting all tasks in an ExecutorService/ThreadPoolExecutor without shutdowning it

Is there a way to interrupt all active tasks in an Executor?
I don't believe I must got all previous submited tasks stored in order to fire such a common operation. What if I am using CompletableFutures, which have no control on the computation code as Future does, do I really need to mess with complete() synchronization when I can simply tell the executor to do it for me?
I look for something asynch like:
//Easier than keeping a collection of tasks by both methods
//in a global var, which needs to parse which ones are
//already executed, during execution or whatever
public void start(MyTask task){
executorService.submit(task);
}
public void stop(){
executorService.cancelAnyTask(); //This or
executorService.interruptAnyActiveTask(); //Even this
}
EDIT: I want to interrupt (or either cancel) active tasks, I don't mind the queued ones (really does not mind if the queued are discarded or not). I just look for clearing the executor work at a given discrete time, even if 1 ms later it starts to execute queued tasks again.

GCD dispatch_sync priority over previously queued dispatch_async

I have a class that wraps a data model and is accessed/modified by multiple threads. I need to make sure modification to data model is synchronized. I am using a dispatch_queue_create(..., DISPATCH_QUEUE_SERIAL). This is working really well for my needs.
Most of the methods on my class internally call "dispatch_async(queue, ^{...});". There are a few places where I need to return a snapshot result. This is a simplified example of how that looks:
- (NSArray*) getSomeData {
__block NSArray* result = nil;
dispatch_sync(queue, ^{
... Do Stuff ...
result = blah.blah;
}
return result;
}
Now, lets assume that 5 "async tasks" are queued and one is currently executing. Now a "sync" task is scheduled. When will the "sync task" execute?
What I would like to have happen is "sync task" is executed ahead of any pending "async tasks". Is this what happens by default? If not is there a way to priority queue the "sync task"?
BTW,
I know I can set an overall queue priority but that is not what this question is about. For me queue priority normal is just fine. I just want my synchronous tasks to happen before any pending asynchronous tasks.
There's not a generic setting for "perform sync tasks first" or for setting relative priority between enqueued blocks in a single queue. To recap what may be obvious, a serial queue is going to work like a queue: first in, first out. That said, it's pretty easy to conceive of how you might achieve this effect using multiple queues and targeting. For example:
realQueue = dispatch_queue_create(NULL, DISPATCH_QUEUE_SERIAL);
asyncOpsQueue = dispatch_queue_create(NULL, DISPATCH_QUEUE_SERIAL);
dispatch_set_target_queue(asyncOpsQueue, realQueue);
for (NSUInteger i = 0; i < 10; i++)
{
dispatch_async(asyncOpsQueue, ^{
NSLog(#"Doing async work block %#", #(i));
sleep(1);
});
}
// Then whenever you have high priority sync work to do, stop the async
// queue, do your work, and then restart it.
dispatch_suspend(asyncOpsQueue);
dispatch_sync(realQueue, ^{
NSLog(#"Doing sync work block");
});
dispatch_resume(asyncOpsQueue);
One thing to know is that an executing block effectively can't be canceled/suspended/terminated (from the outside) once it's begun. So any async enqueued block that's in flight has to run to completion before your sync block can start, but this arrangement of targeting allows you to pause the flow of async blocks and inject your sync block. Note that it also doesn't matter that you're doing a sync block. It could, just as easily, be an async block of high priority, but in that case you would probably want to move the dispatch_resume into the block itself.

.NET - Multiple Timers instances mean Multiple Threads?

I already have a windows service running with a System.Timers.Timer that do a specific work. But, I want some works to run at the same time, but in different threads.
I've been told to create a different System.Timers.Timer instance. Is this correct? Is this way works running in parallel?
for instance:
System.Timers.Timer tmr1 = new System.Timers.Timer();
tmr1.Elapsed += new ElapsedEventHandler(DoWork1);
tmr1.Interval = 5000;
System.Timers.Timer tmr2 = new System.Timers.Timer();
tmr2.Elapsed += new ElapsedEventHandler(DoWork2);
tmr2.Interval = 5000;
Will tmr1 and tmr2 run on different threads so that DoWork1 and DoWork2 can run at the same time, i.e., concurrently?
Thanks!
It is not incorrect.
Be careful. System.Timers.Timer will start a new thread for every Elapsed event. You'll get in trouble when your Elapsed event handler takes too long. Your handler will be called again on another thread, even though the previous call wasn't completed yet. This tends to produce hard to diagnose bugs. Something you can avoid by setting the AutoReset property to false. Also be sure to use try/catch in your event handler, exceptions are swallowed without diagnostic.
Multiple timers might mean multiple threads. If two timer ticks occur at the same time (i.e. one is running and another fires), those two timer callbacks will execute on separate threads, neither of which will be the main thread.
It's important to note, though, that the timers themselves don't "run" on a thread at all. The only time a thread is involved is when the timer's tick or elapsed event fires.
On another note, I strongly discourage you from using System.Timers.Timer. The timer's elapsed event squashes exceptions, meaning that if an exception escapes your event handler, you'll never know it. It's a bug hider. You should use System.Threading.Timer instead. System.Timers.Timer is just a wrapper around System.Threading.Timer, so you get the same timer functionality without the bug hiding.
See Swallowing exceptions is hiding bugs for more info.
Will tmr1 and tmr2 run on different threads so that DoWork1 and DoWork2 can run at the same time, i.e., concurrently?
At the start, yes. However, what is the guarantee both DoWork1 and DoWork2 would finish within 5 seconds? Perhaps you know the code inside DoWorkX and assume that they will finish within 5 second interval, but it may happen that system is under load one of the items takes more than 5 seconds. This will break your assumption that both DoWorkX would start at the same time in the subsequent ticks. In that case even though your subsequent start times would be in sync, there is a danger of overlapping current work execution with work execution which is still running from the last tick.
If you disable/enable respective timers inside DoWorkX, however, your start times will go out of sync from each other - ultimately possible they could get scheduled over the same thread one after other. So, if you are OK with - subsequent start times may not be in sync - then my answer ends here.
If not, this is something you can attempt:
static void Main(string[] args)
{
var t = new System.Timers.Timer();
t.Interval = TimeSpan.FromSeconds(5).TotalMilliseconds;
t.Elapsed += (sender, evtArgs) =>
{
var timer = (System.Timers.Timer)sender;
timer.Enabled = false; //disable till work done
// attempt concurrent execution
Task work1 = Task.Factory.StartNew(() => DoWork1());
Task work2 = Task.Factory.StartNew(() => DoWork2());
Task.Factory.ContinueWhenAll(new[]{work1, work2},
_ => timer.Enabled = true); // re-enable the timer for next iteration
};
t.Enabled = true;
Console.ReadLine();
}
Kind of. First, check out the MSDN page for System.Timers.Timer: http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx
The section you need to be concerned with is quoted below:
If the SynchronizingObject property is null, the Elapsed event is
raised on a ThreadPool thread. If processing of the Elapsed event
lasts longer than Interval, the event might be raised again on another
ThreadPool thread. In this situation, the event handler should be
reentrant.
Basically, this means that where the Timer's action gets run is not such that each Timer has its own thread, but rather that by default, it uses the system ThreadPool to run the actions.
If you want things to run at the same time (kick off all at the same time) but run concurrently, you can not just put multiple events on the elapsed event. For example, I tried this in VS2012:
static void testMethod(string[] args)
{
System.Timers.Timer mytimer = new System.Timers.Timer();
mytimer.AutoReset = false;
mytimer.Interval = 3000;
mytimer.Elapsed += (x, y) => {
Console.WriteLine("First lambda. Sleeping 3 seconds");
System.Threading.Thread.Sleep(3000);
Console.WriteLine("After sleep");
};
mytimer.Elapsed += (x, y) => { Console.WriteLine("second lambda"); };
mytimer.Start();
Console.WriteLine("Press any key to go to end of method");
Console.ReadKey();
}
The output was this:
Press any key to go to end of method
First lambda.
Sleeping 3 seconds
After sleep
second lambda
So it executes them consecutively not concurrently. So if you want "a bunch of things to happen" upon each timer execution, you have to launch a bunch of tasks (or queue up the ThreadPool with Actions) in your Elapsed handler. It may multi-thread them, or it may not, but in my simple example, it did not.
Try my code yourself, it's quite simple to illustrate what's happening.

Node worker process / cron job advice

I have a database of items that I need to update — or rather just perform upon — every so often. I am using a message queue (Kue) to handle the concurrency of these jobs, but my process which goes about adding the jobs to the queue looks like this:
setInterval(function () {
feed.find({}, function (error, foundModels) {
jobs.create('update feeds', {
feeds: foundModels
}).save()
})
}, 6000)
Is polling like this the best way to add the jobs to the queue, do you think? Or should each feed be on its own timer (for example every job will spawn another job 6 afters after it's finished)?
I usually do it the way you've done it. In your case, it always pushes jobs at 6 second intervals. This is fine so long as your jobs don't take more than 6 seconds. If your jobs take more than 6 seconds then you'll start to get a backlog and you'll need to increase resources to handle the larger load. It can be a problem if resource usage spikes and you're not around to adjust for the spike and you don't have automated processes in place (you should).
The alternative is to only call your function 6 seconds after the last call returns. You'd do that like so:
function update() {
feed.find({}, function (error, foundModels) {
jobs.create('update feeds', {
feeds: foundModels
}).save(function() {
setTimeout(update,6000);
});
});
}
setTimeout(update, 6000);
I made the assumption that your .save method takes a callback like all good asynchronous libraries do. :-)

Not sure how to handle multiple timed threads in a Windows service

This is my first time putting together a Windows service application and I've been doing a bit of reading on different approaches to running multiple tasks in some manner of timed intervals. One being once a day and another every 1 min.
What I had concluded to do was use the TimedCallBack and System.threating.timer timer.
Non-elegant example
private void DailyTask(object state) {
//do something daily
}
private void IntervalTask(object state) {
//do something else
}
private void OnStart() {
TimedCallBack dailyTcb = DailyTask;
TimedCallBack intervalTcb = IntervalTask;
System.Threading.Timer dailyTimer = new Timer(dailyTcb, null, 0, 86400000);
System.Threading.Timer intervalTimer = new Timer(intervalTcb, null, 0, 60000);
}
private void OnStop() {
intervalTimer.Dispose();
dailyTimer.Dispose();
}
Questions
1) does the timer start the clock following the completion of the task? Which would slowly cause it to creep past a day if every time it ran it took some amount of time to complete? Which I would assume then that I would need to determine how long it took to
run minus 24hrs?
2) is this a perfectly fine approach for 2 simple tasks.
Thanks
The timer does not wait until after completion of the task to restart the timer. For example, if you set your timer to 20 milliseconds and the callback took more than 20 milliseconds to process, you would get another tick before the first one finished.
I don't see anything particularly wrong with your idea. However, if that's all your service is doing (i.e. just sitting there, waiting to fire once per interval), you might consider making it a simple console mode program and using scheduled tasks to execute it. Check out Windows Task Scheduler and the schtasks command line tool.

Resources