Linux process scheduling in kernel mode - linux

Here is some description quoted from Wiki
The Linux kernel provides preemptive scheduling under certain
conditions. Until kernel version 2.4, only processes were preemptive,
i.e. in addition to time quantum expiration, an execution of current
process in user mode would be interrupted if higher dynamic priority
processes entered TASK_RUNNING state. Towards Linux 2.6, an
ability to interrupt a task executing kernel code was added, although
with that not all sections of the kernel code can be preempted.
Then it also says this,
Preemption improves latency, increases responsiveness, and makes Linux
more suitable for desktop and real-time applications. Older versions
of the kernel had a so-called big kernel lock for synchronization
across the entire kernel. This was finally removed by Arnd Bergmann in
2011
So does the above statement hold true for the current linux kernel that kernel preemption is
conditional? e.g. if a process is trapped into kernel mode by making a system call, this process will not be under preemptive scheduling?
Where can I find some up-to-date introduction articles/books about linux scheduling in both user mode and kernel mode?

Of course kernel preemption is conditional. You would not want the kernel to switch tasks while holding an exclusive lock or while writing to time-sensitive hardware registers in a device driver.
However, the Linux kernel does its best to minimize these conditions in order to make preemption happen as quickly as it can.
Note that this in-kernel preemption is only compiled into the kernel when the compile option CONFIG_PREEMPT is yes. There is also CONFIG_PREEMPT_VOLUNTARY which only does task switching when the kernel explicitly checks for it.
Kernel preemption comes with a cost. Rapidly switching tasks requires doing a lot of mostly wasted housekeeping work instead of actual work. This slows down the whole system and results in less work being done. That is why these compile options exist. A Linux kernel built for a database or web server should not use preemption at all. A kernel built for HPC is sometimes modified to only switch tasks once a second, or less.
That all changes for real-time tasks. These tasks rely on reacting quickly and within a reliable timeframe. The default Linux kernel is pretty good at this, but there is a patch set called the "-rt patches" that makes it really good. The patch set does all sorts of things like prioritize interrupt handlers and change kernel locks so that locks can be dropped and restarted later.

CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state (e.g. I/O request)
2. Switches from running to ready state (e.g. Interrupt)
3. Switches from waiting to ready (e.g. I/O completion)
4. Terminates
Scheduling under 1 and 4 are non-preemptive and all other scheduling is preemptive, have to deal with possibility that operations (system calls) may be incomplete.
Yes Linux provides preemptive scheduling under certain conditions unlike some Unix variants where kernel schedules until completion without preemption. In Linux 2.6, kernel was made preemptive a task running as long it is not holding a lock and safe to rescheduling.
Older versions of the kernel had a so-called big kernel lock for synchronization
across the entire kernel.
refers to each user-level thread maps only to one kernel thread.

Related

Would it makes the kernel level thread clearly preferable to user level thread if system calls is as fast as procedure calls?

Some web searching results told me that the only deficiency of kernel-level thread is the slow speed of its management(create, switch, terminate, etc.). It seems that if the operation on the kernel-level thread is all through system calls, the answer to my question will be true. However, I've searched a lot to find whether the management of kernel-level thread is all through system call but find nothing. And I always have an instinct that such management should be done by the OS automatically because only OS knows which thread would be suitable to run at a specific time. So it seems impossible for programmers to write some explicit system calls to manage threads. I'm appreciative of any ideas.
Some web searching results told me that the only deficiency of kernel-level thread is the slow speed of its management(create, switch, terminate, etc.).
It's not that simple. To understand, think about what causes task switches. Here's a (partial) list:
a device told a device driver that an operation completed (some data arrived, etc) causing a thread that was waiting for the operation to unblock and then preempt the currently running thread. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
enough time passed; either causing an "end of time slice" task switch, or causing a sleeping thread to unblock and preempt. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
the thread accessed virtual memory that isn't currently accessible, triggering the kernel's page fault handler which finds out that the current task has to wait while the kernel fetches data from from swap space or from a file (if the virtual memory is part of a memory mapped file), or has to wait for kernel to free up RAM by sending other pages to swap space (if virtual memory was involved in some kind of "copy on write"); causing a task switch because the currently running task can't continue. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
a new process is being created, and its initial thread preempts the currently running thread. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
the currently running thread asked kernel to do something with a file and kernel got "VFS cache miss" that prevents the request from being performed without any task switches. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
the currently running thread releases a mutex or sends some data (e.g. using a pipe or socket); causing a thread that belongs to a different process to unblock and preempt. For this case you're running kernel code when you find out that a task switch is needed, so kernel task switching is faster.
the currently running thread releases a mutex or sends some data (e.g. using a pipe or socket); causing a thread that belongs to the same process to unblock and preempt. For this case you're running user-space code when you find out that a task switch is needed, so in theory user-space task switching is faster, but in practice it can just as easily be an indicator of poor design (using too many threads and/or far too much lock contention).
a new thread is being created for the same process; and the new thread preempts the currently running thread. For this case you're running user-space code when you find out that a task switch is needed, so in user-space task switching is faster; but only if kernel isn't informed (e.g. so that utilities like "top" can properly display details for threads) - if kernel is informed anyway then it doesn't make much difference where the task switch happens.
For most software (which doesn't use very many threads); doing task switches in the kernel is faster. Of course it's also (hopefully) fairly irrelevant for performance (because time spent switching tasks should be tiny compared to time spend doing other work).
And I always have an instinct that such management should be done by the OS automatically because only OS knows which thread would be suitable to run at a specific time.
Yes; but possibly not for the reason you think.
Another problem with user-space threading (besides making most task switches slower) is that it can't support global thread priorities without becoming a severe security disaster. Specifically; a process can't know if its own thread is higher or lower priority than a thread belonging to a different process (unless it has information about all threads for the entire OS, which is information that normal processes shouldn't be trusted to have); so user-space threading leads to wasting CPU time doing unimportant work (for one process) when there's important work to do (for a different process).
Another problem with user-space threading is that (for some CPUs - e.g. most 80x86 CPUs) the CPUs are not independent, and there may be power management decisions involved with scheduling. For examples; most 80x86 CPUs have hyper-threading (where a core is shared by 2 logical processors), where a smart scheduler may say "one logical processor in the core is running a high priority/important thread, so the other logical processor in the same core should not run a low priority/unimportant thread because that would make the important work slower"; most 80x86 CPUs have "turbo boost" (with similar "don't let low priority threads ruin the turbo-boost/performance of high priority thread" possibilities); and most CPUs have thermal management (where scheduler might say "Hey, these threads are all low priority, so let's underclock the CPU so that it cools down and can go faster later (has more thermal headroom) when there's high priority/more important work to do!").
Would it makes the kernel level thread clearly preferable to user level thread if system calls is as fast as procedure calls?
If system calls were as fast as normal procedure calls, then the performance differences between user-space threading and kernel threading would disappear (but all the other problems with user-space threading would remain). However, the reason why system calls are slower than normal procedure calls is that they pass through a kind of "isolation barrier" (that isolates kernel's code and data from malicious user-space code); so to make system calls as fast as normal procedure calls you'd have to get rid of the isolation (effectively turning the kernel into a kind of "global shared library" that can be dynamically linked) but without that isolation you'll have an extreme security disaster. In other words; to have any hope of achieving acceptable security, system calls must be slower than normal procedure calls.
Your basic premise is wrong. System calls are much slower than procedure calls in almost every interesting architecture.
The perceived cpu throughput is based on pipelining, speculative execution and fetching. The syscall stops the pipeline, invalidates the speculative execution and halts the speculative fetching, is a store and instruction barrier, and may flush the write fifo.
So, the processor slows down to its ‘spec’ speed around the syscall, accelerating back up until the syscall return, whereupon it does about the exact same thing.
Attempts to optimise this area have given rise to lots of papers named after fictional James Bond organizations, and not conciliatory enough apologies from not embarrassed enough cpu product managers. Google spectre as an example, then follow the associated links.
The other cost of syscall
A bit over 30 years ago, some smart guys wrote a paper about least privilege. Conceptually, it is a stunner. The basic premise is that whatever your program is doing, it should do it with the least privilege possible.
If your program is inverting arrays, according to the notion of least privilege, it should not be able to disable interrupts. Disabling interrupts can cause a very difficult to diagnose system failure. Simple user code should not have this ability.
The notion of user and kernel modes of execution evolved from early computer systems, and (with the possible exception of the iax32 / 80286 ) are increasingly showing their inadequacy in the connected computer environment. At one point in time you could say "this is a single user system"; but the IoT dweebs have made everything multi-user.
Least privilege insists that all code should execute with the minimum privilege required to complete the task at hand. Thus, nothing should be in the kernel that absolutely doesn't need to be. If you think that is a radical thought, in Ken Thompson's 1977(?) paper on the UNIX kernel he states exactly the same thing.
So no, putting your junk in the kernel just means you have increased the attack surface for no valid reason. Try to think in terms of exposing minimum risk, it leads to better software and better sleep.

What do we mean by "Non-preemptive Kernel"? [duplicate]

I read that Linux kernel is preemptive, which is different from most Unix kernels. So, what does it really mean for a kernal to be preemptive?
Some analogies or examples would be better than pure theoretical explanation.
ADD 1 -- 11:00 AM 12/7/2018
Preemptive is just one paradigm of multi-tasking. There are others like Cooperative Multi-tasking. A better understanding can be achieved by comparing them.
Prior to Linux kernel version 2.5.4, Linux Kernel was not preemptive which means a process running in kernel mode cannot be moved out of processor until it itself leaves the processor or it starts waiting for some input output operation to get complete.
Generally a process in user mode can enter into kernel mode using system calls. Previously when the kernel was non-preemptive, a lower priority process could priority invert a higher priority process by denying it access to the processor by repeatedly calling system calls and remaining in the kernel mode. Even if the lower priority process' timeslice expired, it would continue running until it completed its work in the kernel or voluntarily relinquished control. If the higher priority process waiting to run is a text editor in which the user is typing or an MP3 player ready to refill its audio buffer, the result is poor interactive performance. This way non-preemptive kernel was a major drawback at that time.
Imagine the simple view of preemptive multi-tasking. We have two user tasks, both of which are running all the time without using any I/O or performing kernel calls. Those two tasks don't have to do anything special to be able to run on a multi-tasking operating system. The kernel, typically based on a timer interrupt, simply decides that it's time for one task to pause to let another one run. The task in question is completely unaware that anything happened.
However, most tasks make occasional requests of the kernel via syscalls. When this happens, the same user context exists, but the CPU is running kernel code on behalf of that task.
Older Linux kernels would never allow preemption of a task while it was busy running kernel code. (Note that I/O operations always voluntarily re-schedule. I'm talking about a case where the kernel code has some CPU-intensive operation like sorting a list.)
If the system allows that task to be preempted while it is running kernel code, then we have what is called a "preemptive kernel." Such a system is immune to unpredictable delays that can be encountered during syscalls, so it might be better suited for embedded or real-time tasks.
For example, if on a particular CPU there are two tasks available, and one takes a syscall that takes 5ms to complete, and the other is an MP3 player application that needs to feed the audio pipe every 2ms, you might hear stuttering audio.
The argument against preemption is that all kernel code that might be called in task context must be able to survive preemption-- there's a lot of poor device driver code, for example, that might be better off if it's always able to complete an operation before allowing some other task to run on that processor. (With multi-processor systems the rule rather than the exception these days, all kernel code must be re-entrant, so that argument isn't as relevant today.) Additionally, if the same goal could be met by improving the syscalls with bad latency, perhaps preemption is unnecessary.
A compromise is CONFIG_PREEMPT_VOLUNTARY, which allows a task-switch at certain points inside the kernel, but not everywhere. If there are only a small number of places where kernel code might get bogged down, this is a cheap way of reducing latency while keeping the complexity manageable.
Traditional unix kernels had a single lock, which was held by a thread while kernel code was running. Therefore no other kernel code could interrupt that thread.
This made designing the kernel easier, since you knew that while one thread using kernel resources, no other thread was. Therefore the different threads cannot mess up each others work.
In single processor systems this doesn't cause too many problems.
However in multiprocessor systems, you could have a situation where several threads on different processors or cores all wanted to run kernel code at the same time. This means that depending on the type of workload, you could have lots of processors, but all of them spend most of their time waiting for each other.
In Linux 2.6, the kernel resources were divided up into much smaller units, protected by individual locks, and the kernel code was reviewed to make sure that locks were only held while the corresponding resources were in use. So now different processors only have to wait for each other if they want access to the same resource (for example hardware resource).
The preemption allows the kernel to give the IMPRESSION of parallelism: you've got only one processor (let's say a decade ago), but you feel like all your processes are running simulaneously. That's because the kernel preempts (ie, take the execution out of) the execution from one process to give it to the next one (maybe according to their priority).
EDIT Not preemptive kernels wait for processes to give back the hand (ie, during syscalls), so if your process computes a lot of data and doesn't call any kind of yield function, the other processes won't be able to execute to execute their calls. Such systems are said to be cooperative because they ask for the cooperation of the processes to ensure the equity of the execution time
EDIT 2 The main goal of preemption is to improve the reactivity of the system among multiple tasks, so that's good for end-users, whereas on the other-hand, servers want to achieve the highest througput, so they don't need it: (from the Linux kernel configuration)
Preemptible kernel (low-latency desktop)
Voluntary kernel preemption (desktop)
No forced preemption (server)
The linux kernel is monolithic and give a little computing timespan to all the running process sequentially. It means that the processes (eg. the programs) do not run concurrently, but they are given a give timespan regularly to execute their logic. The main problem is that some logic can take longer to terminate and prevent the kernel to allow time for the next process. This results in system "lags".
A preemtive kernel has the ability to switch context. It means that it can stop a "hanging" process even if it is not finished, and give the computing time to the next process as expected. The "hanging" process will continue to execute when its time has come without any problem.
Practically, it means that the kernel has the ability to achieve tasks in realtime, which is particularly interesting for audio recording and editing.
The ubuntu studio districution packages a preemptive kernel as well as a buch of quality free software devoted to audio and video edition.
It means that the operating system scheduler is free to suspend the execution of the running processes to give the CPU to another process whenever it wants; the normal way to do this is to give to each process that is waiting for the CPU a "quantum" of CPU time to run. After it has expired the scheduler takes back the control (and the running process cannot avoid this) to give another quantum to another process.
This method is often compared with the cooperative multitasking, in which processes keep the CPU for all the time they need, without being interrupted, and to let other applications run they have to call explicitly some kind of "yield" function; naturally, to avoid giving the feeling of the system being stuck, well-behaved applications will yield the CPU often. Still,if there's a bug in an application (e.g. an infinite loop without yield calls) the whole system will hang, since the CPU is completely kept by the faulty program.
Almost all recent desktop OSes use preemptive multitasking, that, even if it's more expensive in terms of resources, is in general more stable (it's more difficult for a sigle faulty app to hang the whole system, since the OS is always in control). On the other hand, when the resources are tight and the application are expected to be well-behaved, cooperative multitasking is used. Windows 3 was a cooperative multitasking OS; a more recent example can be RockBox, an opensource PMP firmware replacement.
I think everyone did a good job of explaining this but I'm just gonna add little more info. in context of Linux IRQ, interrupt and kernel scheduler.
Process scheduler is the component of the OS that is responsible for deciding if current running job/process should continue to run and if not which process should run next.
preemptive scheduler is a scheduler which allows to be interrupted and a running process then can change it's state and then let another process to run (since the current one was interrupted).
On the other hand, non-preemptive scheduler can't take away CPU away from a process (aka cooperative)
FYI, the name word "cooperative" can be confusing because the word's meaning does not clearly indicate what scheduler actually does.
For example, Older Windows like 3.1 had cooperative schedulers.
Full credit to wonderful article here
I think it became preemptive from 2.6. preemptive means when a new process is ready to run, the cpu will be allocated to the new process, it doesn't need the running process be co-operative and give up the cpu.
Linux kernel is preemptive means that The kernel supports preemption.
For example, there are two processes P1(higher priority) and P2(lower priority) which are doing read system calls and they are running in kernel mode. Suppose P2 is running and is in the kernel mode and P2 is scheduled to run.
If kernel preemption is available, then preemption can happen at the kernel level i.e P2 can get preempted and but to sleep and the P1 can continue to run.
If kernel preemption is not available, since P2 is in kernel mode, system simply waits till P2 is complete and then

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Context switching kernel processes in Linux

Consider the process keventd. It spends all it's lifetime in kernel mode.
Now, as far as I know, Linux checks if a context switch is due, while the process is switching from kernel mode to user mode, and as far as I know, keventd will never switch from kernel mode to user mode, so, how will the Linux kernel know when to switch it off?
If the kernel were to do as you say, and only check whether a process is due to be switched out on an explicit user-to-kernel-mode transition, then the following loop would lock up a core of your computer:
while (1);
Obviously, this does not happen on normal desktop operating systems. The reason why is preemption, where after a process has run for its time slice the kernel gets an alarm, steps in, and forcibly switches contexts as necessary.
Preemption could in principle work for kernel processes too. However, I'm not sure that's what keventd does - it's more likely that it voluntarily relinquishes its time slice on a regular basis (see sched_yield, a userspace call for the same effect), especially since the kernel can be configured to be non-preemptible. That is a kernel process' prerogative.

What does it mean to say "linux kernel is preemptive"?

I read that Linux kernel is preemptive, which is different from most Unix kernels. So, what does it really mean for a kernal to be preemptive?
Some analogies or examples would be better than pure theoretical explanation.
ADD 1 -- 11:00 AM 12/7/2018
Preemptive is just one paradigm of multi-tasking. There are others like Cooperative Multi-tasking. A better understanding can be achieved by comparing them.
Prior to Linux kernel version 2.5.4, Linux Kernel was not preemptive which means a process running in kernel mode cannot be moved out of processor until it itself leaves the processor or it starts waiting for some input output operation to get complete.
Generally a process in user mode can enter into kernel mode using system calls. Previously when the kernel was non-preemptive, a lower priority process could priority invert a higher priority process by denying it access to the processor by repeatedly calling system calls and remaining in the kernel mode. Even if the lower priority process' timeslice expired, it would continue running until it completed its work in the kernel or voluntarily relinquished control. If the higher priority process waiting to run is a text editor in which the user is typing or an MP3 player ready to refill its audio buffer, the result is poor interactive performance. This way non-preemptive kernel was a major drawback at that time.
Imagine the simple view of preemptive multi-tasking. We have two user tasks, both of which are running all the time without using any I/O or performing kernel calls. Those two tasks don't have to do anything special to be able to run on a multi-tasking operating system. The kernel, typically based on a timer interrupt, simply decides that it's time for one task to pause to let another one run. The task in question is completely unaware that anything happened.
However, most tasks make occasional requests of the kernel via syscalls. When this happens, the same user context exists, but the CPU is running kernel code on behalf of that task.
Older Linux kernels would never allow preemption of a task while it was busy running kernel code. (Note that I/O operations always voluntarily re-schedule. I'm talking about a case where the kernel code has some CPU-intensive operation like sorting a list.)
If the system allows that task to be preempted while it is running kernel code, then we have what is called a "preemptive kernel." Such a system is immune to unpredictable delays that can be encountered during syscalls, so it might be better suited for embedded or real-time tasks.
For example, if on a particular CPU there are two tasks available, and one takes a syscall that takes 5ms to complete, and the other is an MP3 player application that needs to feed the audio pipe every 2ms, you might hear stuttering audio.
The argument against preemption is that all kernel code that might be called in task context must be able to survive preemption-- there's a lot of poor device driver code, for example, that might be better off if it's always able to complete an operation before allowing some other task to run on that processor. (With multi-processor systems the rule rather than the exception these days, all kernel code must be re-entrant, so that argument isn't as relevant today.) Additionally, if the same goal could be met by improving the syscalls with bad latency, perhaps preemption is unnecessary.
A compromise is CONFIG_PREEMPT_VOLUNTARY, which allows a task-switch at certain points inside the kernel, but not everywhere. If there are only a small number of places where kernel code might get bogged down, this is a cheap way of reducing latency while keeping the complexity manageable.
Traditional unix kernels had a single lock, which was held by a thread while kernel code was running. Therefore no other kernel code could interrupt that thread.
This made designing the kernel easier, since you knew that while one thread using kernel resources, no other thread was. Therefore the different threads cannot mess up each others work.
In single processor systems this doesn't cause too many problems.
However in multiprocessor systems, you could have a situation where several threads on different processors or cores all wanted to run kernel code at the same time. This means that depending on the type of workload, you could have lots of processors, but all of them spend most of their time waiting for each other.
In Linux 2.6, the kernel resources were divided up into much smaller units, protected by individual locks, and the kernel code was reviewed to make sure that locks were only held while the corresponding resources were in use. So now different processors only have to wait for each other if they want access to the same resource (for example hardware resource).
The preemption allows the kernel to give the IMPRESSION of parallelism: you've got only one processor (let's say a decade ago), but you feel like all your processes are running simulaneously. That's because the kernel preempts (ie, take the execution out of) the execution from one process to give it to the next one (maybe according to their priority).
EDIT Not preemptive kernels wait for processes to give back the hand (ie, during syscalls), so if your process computes a lot of data and doesn't call any kind of yield function, the other processes won't be able to execute to execute their calls. Such systems are said to be cooperative because they ask for the cooperation of the processes to ensure the equity of the execution time
EDIT 2 The main goal of preemption is to improve the reactivity of the system among multiple tasks, so that's good for end-users, whereas on the other-hand, servers want to achieve the highest througput, so they don't need it: (from the Linux kernel configuration)
Preemptible kernel (low-latency desktop)
Voluntary kernel preemption (desktop)
No forced preemption (server)
The linux kernel is monolithic and give a little computing timespan to all the running process sequentially. It means that the processes (eg. the programs) do not run concurrently, but they are given a give timespan regularly to execute their logic. The main problem is that some logic can take longer to terminate and prevent the kernel to allow time for the next process. This results in system "lags".
A preemtive kernel has the ability to switch context. It means that it can stop a "hanging" process even if it is not finished, and give the computing time to the next process as expected. The "hanging" process will continue to execute when its time has come without any problem.
Practically, it means that the kernel has the ability to achieve tasks in realtime, which is particularly interesting for audio recording and editing.
The ubuntu studio districution packages a preemptive kernel as well as a buch of quality free software devoted to audio and video edition.
It means that the operating system scheduler is free to suspend the execution of the running processes to give the CPU to another process whenever it wants; the normal way to do this is to give to each process that is waiting for the CPU a "quantum" of CPU time to run. After it has expired the scheduler takes back the control (and the running process cannot avoid this) to give another quantum to another process.
This method is often compared with the cooperative multitasking, in which processes keep the CPU for all the time they need, without being interrupted, and to let other applications run they have to call explicitly some kind of "yield" function; naturally, to avoid giving the feeling of the system being stuck, well-behaved applications will yield the CPU often. Still,if there's a bug in an application (e.g. an infinite loop without yield calls) the whole system will hang, since the CPU is completely kept by the faulty program.
Almost all recent desktop OSes use preemptive multitasking, that, even if it's more expensive in terms of resources, is in general more stable (it's more difficult for a sigle faulty app to hang the whole system, since the OS is always in control). On the other hand, when the resources are tight and the application are expected to be well-behaved, cooperative multitasking is used. Windows 3 was a cooperative multitasking OS; a more recent example can be RockBox, an opensource PMP firmware replacement.
I think everyone did a good job of explaining this but I'm just gonna add little more info. in context of Linux IRQ, interrupt and kernel scheduler.
Process scheduler is the component of the OS that is responsible for deciding if current running job/process should continue to run and if not which process should run next.
preemptive scheduler is a scheduler which allows to be interrupted and a running process then can change it's state and then let another process to run (since the current one was interrupted).
On the other hand, non-preemptive scheduler can't take away CPU away from a process (aka cooperative)
FYI, the name word "cooperative" can be confusing because the word's meaning does not clearly indicate what scheduler actually does.
For example, Older Windows like 3.1 had cooperative schedulers.
Full credit to wonderful article here
I think it became preemptive from 2.6. preemptive means when a new process is ready to run, the cpu will be allocated to the new process, it doesn't need the running process be co-operative and give up the cpu.
Linux kernel is preemptive means that The kernel supports preemption.
For example, there are two processes P1(higher priority) and P2(lower priority) which are doing read system calls and they are running in kernel mode. Suppose P2 is running and is in the kernel mode and P2 is scheduled to run.
If kernel preemption is available, then preemption can happen at the kernel level i.e P2 can get preempted and but to sleep and the P1 can continue to run.
If kernel preemption is not available, since P2 is in kernel mode, system simply waits till P2 is complete and then

Resources