When a task scheduler (e.g. cron) fires a tasks (e.g. cron jobs), does it do so by "polling" the clock every minimum period (e.g. second) or does it registers a callback that gets "pushed" when the time comes?
If it is push/callback, how does the underlying platform (e.g. linux) does it? Is there a "hardware interrupt", or another callback mechanism, for time based events?
So, how does a task scheduler fire a job?
From the man pages:
The cron utility then wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any
output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists).
The cron on version 7 of Unix had following algorithm:
Its algorithm was straightforward:
1) Read /usr/etc/crontab
2) Determine if any commands must run at the current date and time, and if so, run them as the superuser, root.
3) Sleep for one minute
4) Repeat from step 1.
But this was heavy on system. And use to take a lot of resources for Multi user environment. Then a new algorithm was devised:
The algorithm used by this cron is as follows:
1) On start-up, look for a file named .crontab in the home directories of all account holders.
2) For each crontab file found, determine the next time in the future that each command must run.
3) Place those commands on the Franta-Maly event list with their corresponding time and their "five field" time specifier.
4)Enter main loop:
Examine the task entry at the head of the queue, compute how far in
the future it must run.
Sleep for that period of time.
On awakening and after verifying the correct time, execute the task at the head of the queue (in background) with the privileges of the user who created it.
Determine the next time in the future to run this command and place it back on the event list at that time value.
Modern implementations are vixiecron and anacron. This was superseded by fcron. I don't have much insight on their implementation details.
In may depend on implementation. Some do polling (as mentioned above), but some use interrupt approach (check when next task must be run and set system alarm).
The intent of this questions was not about CRON, but task scheduling in general using cron as an example, sorry if this was not clear in the question statement.
I wanted to know how the the lowest level software does time-based scheduling, if it must poll the hardware clock or if there is some sort of hardware interrupt for time based events.
It turns out there is actually a hardware interrupt. From wilipedia:
One typical use is to generate interrupts periodically by dividing the
output of a crystal oscillator and having an interrupt handler count
the interrupts in order to keep time. These periodic interrupts are
often used by the OS's task scheduler to reschedule the priorities of
running processes. Some older computers generated periodic interrupts
from the power line frequency because it was controlled by the
utilities to eliminate long-term drift of electric clocks.
http://en.wikipedia.org/wiki/Interrupt
So, although cron does polling (thanks #joshua-nelson), it is possible not to and the OS does not.
daemons are the programs for running background process. And Cron
is a daemon to execute scheduled commands. And getting these commands
Cron look in to /etc/crontab or files in /usr/lib/cron/tabs and
if there any command file exist there it Cron executes that. Cron
utility is launched by launched process which replaces init as
pid 1.
Related
I would like to know what is the correct way to visualize background jobs which are running on a schedule controlled by scheduling service?
In my opinion the correct way should be to have an action representing the scheduling itself, followed by fork node which splits the flow for each of the respective scheduled jobs.
Example. On a schedule Service X is supposed to collect data from an API every day, on another schedule Service Y is supposed to aggregate the collected data.
I've tried to research old themes and find any diagram representing similar activity.
Your current diagram says that:
first the scheduler does something (e.g. identifying the jobs to launch)
then passes control in parallel to all the jobs it wants to launch
no other jobs are scheduled after the scheduler finished its task
the first job that finshes interrupts all the others.
The way it should work would be:
first the scheduler is setup
then the setup launches the real scheduler, which will run in parallel to the scheduled jobs
scheduled jobs can finish (end flow, i.e. a circle with a X inside) without terminating everything
the activity would stop (flow final) only when the scheduler is finished.
Note that the UML specifications do not specify how parallelism is implemented. And neither does your scheduler: whether it is true parallelism using multithreaded CPUs or multiple CPUs, or whether it is time slicing where some interruptions are used to switch between tasks that are executed in reality in small sequential pieces is not relevant for this modeling.
The remaining challenges are:
the scheduler could launch additional jobs. One way of doing it could be to fork back to itself and to a new job.
the scheduler could launch a variable number of jobs in parallel. A better way to represent it is with a «parallel» expansion region, with the input corresponding to task object, and actions that consume the taks by executing them.
if the scheduler runs in parallel to the expansion region, you could also imagine that the schedule provides at any moment some additional input (new tasks to be processed).
The idle task (a.k.a. swapper task) is chosen to run when no more runnable tasks in the run queue at the point of task scheduling. But what is the usage for this so special task? Another question is why i can't find this thread/process in the "ps aux" output (PID=0) from the userland?
The reason is historical and programatic. The idle task is the task running, if no other task is runnable, like you said it. It has the lowest possible priority, so that's why it's running of no other task is runnable.
Programatic reason: This simplifies process scheduling a lot, because you don't have to care about the special case: "What happens if no task is runnable?", because there always is at least one task runnable, the idle task. Also you can count the amount of cpu time used per task. Without the idle task, which task gets the cpu-time accounted no one needs?
Historical reason: Before we had cpus which are able to step-down or go into power saving modes, it HAD to run on full speed at any time. It ran a series of NOP-instructions, if no tasks were runnable. Today the scheduling of the idle task usually steps down the cpu by using HLT-instructions (halt), so power is saved. So there is a functionality somehow in the idle task in our days.
In Windows you can see the idle task in the process list, it's the idle process.
The linux kernel maintains a waitlist of processes which are "blocked" on IO/mutexes etc. If there is no runnable process, the idle process is placed onto the run queue until it is preempted by a task coming out of the wait queue.
The reason it has a task is so that you can measure (approximately) how much time the kernel is wasting due to blocks on IO / locks etc. Additionally it makes the code that much easier for the kernel as the idle task is the same as every task it needs to context switch, instead of a "special case" idle task which could make changing kernel behaviour more difficult.
There is actually one idle task per cpu, but it's not held in the main task list, instead it's in the cpu's "struct rq" runqueue struct, as a struct task_struct * .
This gets activated by the scheduler whenever there is nothing better to do (on that CPU) and executes some architecture-specific code to idle the cpu in a low power state.
You can use ps -ef and it will list the no of process which are running. Then in the first link, it will list the first pid - 0 which is the swapper task.
I have an embedded system in which there are multiple users processes which run simultaneously as they are interdependent they communicate via posix queue. The issue is that one of the process is taking a bit more time to complete a task (I don't know which process or which section of code) cause of which the other process gets delayed to complete its task.
How can I figure this out that which process is taking more time and in which section of code? The system is a measuring device so it cannot have any delay or spikes in the timing of processing. I tried changing the data rate of the entire system but does not help as the spikes still appears.
Is there any possibility in linux to bind a system call when the process scheduled in the same section of code and reached a certain threshold of the scheduling duration?
As per my understanding, schedulers do following items:
Calculate the time slice for the task(this could be algorithm dependent).
Switch Tasks - An ideal schedulers like to do in O(1). A good scheduling algorithm provides O(logN) complexity. Criteria to pick new task is again scheduling algorithm dependent.
My question is for pre-emptiom. For example a new task is created and it needs to run right away(and it does satisfies the condition - example it has higher priority than current running task).
How will scheduler know that a new task with higher priority is available and needed to run. We need to have some controlling code in Kernel implementation which detects such task entry and invokes Scheduler to save state of current running task and reschedule the new task. I would like to know more detail about such software entity.
Additionally I would expect this code to be scheduled to run on CPU to control "scheduler" and make scheduler switch task.
Please advise how this is implemented or may be I have some gaps in my understanding.
Thanks in advance
The best way to understand this is to read a book like "The design of the X Operating System" where X is one of {Unix, Linux, BSD...}. You should find a chapter on Context Switches and a chapter on the Scheduler. You could also look at https://en.wikipedia.org/wiki/Context_switch and https://en.wikipedia.org/wiki/Scheduling_%28computing%29#Linux, but the book is probably better.
Basically, when user code does a system call (such as to create a new process, or to release a semaphore, or ...) or when you get a clock interrupt, or when you get some sort of other interrupt, the running state of the user process is always dumped out to memory so that kernel code can be run without messing up the user process. Once you have done this, the user process that was running isn't much different from any other runnable user process.
As part of the work required to service the system call, or interrupt, or whatever, the system can notice that there is a new runnable process or that some other process that was not runnable before is now runnable, and ask the scheduler to update its notion of the highest priority runnable process. It might also notice that a scheduling quantum has just expired, and ask the scheduler to run a complete reschedule.
Once the kernel code has done its stuff it will probably see that the scheduler has marked the highest priority runnable process, and the kernel code will read that process's state out of memory and return to it without worrying very much about whether it is the process that was running before the system call or whatever or not.
Exception: once upon a time machines worried about the cost of dumping and restoring floating point registers, which kernel mode didn't really need, because it could be written so that it never did floating point. In this case, the save/restore code might be written so that it didn't save the floating point registers unless it had to, and the kernel might check as part of the restore to see if it was switching to a new process, and needed to dump out and restore the floating point registers. For all I know, stuff might still do this, or there might be some more modern state that is only saved and restored when the process really is changing. But this is really just a detail in either case.
I want to have a real-time process take over my computer. :)
I've been playing a bit with this. I created a process which is essentially a while (1) (never blocks nor yields the processor) and used schedtool to run it with SCHED_FIFO policy (also tried chrt). However, the process was letting other processes run as well.
Then someone told me about sched_rt_runtime_us and sched_rt_period_us. So I set the runtime to -1 in order to make the real-time process take over the processor (and also tried making both values the same), but it didn't work either.
I'm on Linux 2.6.27-16-server, in a virtual machine with just one CPU. What am I doing wrong?
Thanks,
EDIT: I don't want a fork bomb. I just want one process to run forever, without letting other processes run.
There's another protection I didn't know about.
If you have just one processor and want a SCHED_FIFO process like this (one that never blocks nor yields the processor voluntarily) to monopolize it, besides giving it a high priority (not really necessary in most cases, but doesn't hurt) you have to:
Set sched_rt_runtime_us to -1 or to the value in sched_rt_period_us
If you have group scheduling configured, set /cgroup/cpu.rt_runtime_us to -1 (in case
you mount the cgroup filesystem on /cgroup)
Apparently, I had group scheduling configured and wasn't bypassing that last protection.
If you have N processors, and want your N processes to monopolize the processor, you just do the same but launch all of them from your shell (the shell shouldn't get stuck until you launch the last one, since it will have processors to run on). If you want to be really sure each process will go to a different processor, set its CPU affinity accordingly.
Thanks to everyone for the replies.
I'm not sure about schedtool, but if you successfully change the scheduler using sched_setscheduler to SCHED_FIFO, then run a task which does not block, then one core will be entirely allocated to the task. If this is the only core, no SCHED_OTHER tasks will run at all (i.e. anything except a few kernel threads).
I've tried it myself.
So I speculate that either your "non blocking" task was blocking, or your schedtool program failed to change the scheduler (or changed it for the wrong task).
Also You can make you process a SCHED_FIFO with highest priority of 1. So the process would run forever and it wont be pre-empted.