What scheduling algorithms does Linux kernel use? - linux

What scheduling algorithms does Linux kernel use?
Where can I get more info about linux's kernel? (OS first course... student level)

The linux kernel has several different available scheduling algorithms both for the process scheduling and for I/O scheduling. Download it from www.kernel.org and call
make menuconfig
You will get a full list of all available options with a built-in help.
One guy that once came up with his O(1) scheduler is Con Kolivas. Definitively have to have a look at what he did. I was once a great break-through.

Note: As Abdullah Shahin noted, this answer is about IO queing scheduler, not for processes.
If you just want to check what scheduler your linux system is using and which are available you can run the following command:
cat /sys/block/sda/queue/scheduler
The one between the [] is the one it's using at the moment. The other ones are available.
To change it:
sudo bash -c 'echo deadline > /sys/block/sda/queue/scheduler'
Be carefull to set it back to default though, unless you know what you are doing and want.
Default (in newer Ubuntu distros at least) is CFQ (Completely Fair Scheduling):
http://en.wikipedia.org/wiki/CFQ
Interview with the creator (Jens Axboe):
http://kerneltrap.org/node/7637

As others have already mentioned, there are several scheduling algorithms available, according to the intended use.
Check this article if you want to learn more about scheduling in Linux.

i believe "completely fair scheduler" is in use with latest kernels. I think you can good amount of information if you just search for it in google.
link : http://en.wikipedia.org/wiki/Completely_Fair_Scheduler

A new addition to Linux Kernel is EDF (Earliest Deadline First) for guaranteed RealTime support
http://lkml.org/lkml/2009/9/22/186
http://www.evidence.eu.com/content/view/313/390/

I think the Linux kernel actually has a few different schedulers you can choose from at compile-time. To find out more about the Linux kernel, you can download the kernel source code (or browse it online) and look in the Documentation directory. For example, the scheduler subdirectory might be helpful. You can also just look at the code itself, obviously.

Modern GNU/Linux distributions use CFS (Completely Fair Scheduler). You may read more on that in the 4th chapter of this book:
Linux Kernel Development 3rd Edition by Robert Love
You will find many interesting and easy to understand explanations there. I enjoyed a lot.

Linux Kernel allows three different scheduling algorithms mainly
shortest job first
Round Robin Scheduling
Priority based preemptive scheduling algorithm.
The third scheduling method which it differs with lower version of Linux versions such as 2.4

Related

Is the RT Linux kernel monolithic or a micro-kernel (like QNX)?

I am studying some documents regarding RT linux and qnx and confused about monolithic and microkernel.Some papers are telling that RT linux is monolithic and some are saying that microkernel. I am worried which is right ?? could you please some one answer my question ??
I know QNX is a microkernel Os and confused w.r.t RTlinunx.
Could someone tell me what is the differenec between the two real time operating system and also the below question.
RT linux is monolithic or microkernel ??
IMHO, there is no actual RT Linux1. There are only approaches of adding RT compatibily features² to the official genereal purpose Linux kernel. Examples are RTAI, Xenomai or the PREEMPT_RT patch. Thus, they're all using the same kernel which is definitely a monolithic kernel (and as for Linus this will pretty sure stay this way).
However, a paper³ by Jae Hwan Koh and Byoung Wook Cho about RTAI and Xenomai performance evaluation puts it like this (which indeed sounds more like a separate kernel approach):
RTAI and Xenomai are interfaces for real-time tasks rather than real-time operating systems. Therefore, an OS is needed to use them; Linux is most widely used. In RTAI and Xenomai, the Linux OS kernel is treated as an idle task, and it only executes when there are no real-time tasks to run. The figure below shows the architectures and versions of the real-time embedded Linux used [here]. RTAI and Xenomai are conceptually homogeneous, and they both use a general-purpose Linux kernel and real-time API. However, there is a remarkable contrast.. [in way they handle certain things].
Another picture that if found⁴ supports this point-of-view as well, i.e. having a kernel running on-top of another one as idle task.
1 Having said that, there used to be a OS (kernel) named RTLinux which was working quite similar like the other approaches mentioned in my answer above, ie it runs the entire Linux kernel as a fully preemptive process [1] [2]. RTLinux later merged into the products of Wind River (VxWorks) and did also influenced the work around RTAI. Couldn't find a source about the kernel type.
2 in other words a "real-time extension"
3 "Real-time Performance of Real-time Mechanisms for RTAI and Xenomai in Various Running Conditions", 2013, International Journal of Control and Automation
4 unfortunately I could not determine its source yet.
RT Linux has both linux kernel as well as real time kernel. The real time kernel has higher priority over linux kernel. Please refer following article for details.
http://www.cs.ru.nl/~hooman/DES/RealtimeLinuxBasics.pdf

How can I execute a task at an exact rate of 4kHz in Linux (with PREEMPT-RT if necessary)

In my embedded C code, I need to run a function at an accurate 4kHz rate to simulate some waveform. I am running some Linux 3.10 kernel with PREEMPT-RT patch. The question is very similar to this post:
Linux' hrtimer - microsecond precision?
But my particular question is: does the recent PREEMPT-RT kernel provide some user API or some more convenient way for such purpose?
I have just come up with an alternative solution by using Xenomai framework. I built and installed the Xenomai in my Linux and installed Xenomai userspace support. Then there is a simple API rt_task_set_periodic allows you to schedule periodic task precisely.
Here is the example:
https://github.com/meeusr/xenomai-forge/blob/master/examples/native/trivial-periodic.c
In my opinion.. no.
PREEMPT_RT only let the kernel be interrupted if needed. My personal opinion is to try to find a delay routine and trim it whit the oscilloscope.
I had a similar issue and I found that "sleep" and "usleep" are not so accurate, I ended up writing my own delay routine.
Hope this helps.

How do I find the cpu the current thread is running on, for Mac and BSD?

I'm looking for a function on Mac OS and BSD that's equivalent to Linux's sched_getcpu(), and Windows' GetCurrentProcessorNumberEx() in order to implement a library for cpu-local storage. It's clearly possible to emulate this with the cpuid or rdtscp instructions, but it's possible to do better with kernel cooperation: https://lkml.org/lkml/2007/1/6/190.
I already know that the thread's current CPU may change by the time I use the information.
There are one or two questions which cover queue tracking for OSX, as well as a dispatch_get_global_queue wiki page which covers the equivalent for BSD. I don't know if you can map a queue to a CPU, but if so, that would seem to be the closest equivalent.

How to "hibernate" a process in Linux by storing its memory to disk and restoring it later?

Is it possible to 'hibernate' a process in linux?
Just like 'hibernate' in laptop, I would to write all the memory used by a process to disk, free up the RAM. And then later on, I can 'resume the process', i.e, reading all the data from memory and put it back to RAM and I can continue with my process?
I used to maintain CryoPID, which is a program that does exactly what you are talking about. It writes the contents of a program's address space, VDSO, file descriptor references and states to a file that can later be reconstructed. CryoPID started when there were no usable hooks in Linux itself and worked entirely from userspace (actually, it still does work, depending on your distro / kernel / security settings).
Problems were (indeed) sockets, pending RT signals, numerous X11 issues, the glibc caching getpid() implementation amongst many others. Randomization (especially VDSO) turned out to be insurmountable for the few of us working on it after Bernard walked away from it. However, it was fun and became the topic of several masters thesis.
If you are just contemplating a program that can save its running state and re-start directly into that state, its far .. far .. easier to just save that information from within the program itself, perhaps when servicing a signal.
I'd like to put a status update here, as of 2014.
The accepted answer suggests CryoPID as a tool to perform Checkpoint/Restore, but I found the project to be unmantained and impossible to compile with recent kernels.
Now, I found two actively mantained projects providing the application checkpointing feature.
The first, the one I suggest 'cause I have better luck running it, is CRIU
that performs checkpoint/restore mainly in userspace, and requires the kernel option CONFIG_CHECKPOINT_RESTORE enabled to work.
Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, IPA: /krɪʊ/, Russian: криу), is a software tool for Linux operating system. Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space.
The latter is DMTCP; quoting from their main page:
DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.
There is also a nice Wikipedia page on the argument: Application_checkpointing
The answers mentioning ctrl-z are really talking about stopping the process with a signal, in this case SIGTSTP. You can issue a stop signal with kill:
kill -STOP <pid>
That will suspend execution of the process. It won't immediately free the memory used by it, but as memory is required for other processes the memory used by the stopped process will be gradually swapped out.
When you want to wake it up again, use
kill -CONT <pid>
The more complicated solutions, like CryoPID, are really only needed if you want the stopped process to be able to survive a system shutdown/restart - it doesn't sound like you need that.
Linux Kernel has now partially implemented the checkpoint/restart futures:https://ckpt.wiki.kernel.org/, the status is here.
Some useful information are in the lwn(linux weekly net):
http://lwn.net/Articles/375855/ http://lwn.net/Articles/412749/ ......
So the answer is "YES"
The issue is restoring the streams - files and sockets - that the program has open.
When your whole OS hibernates, the local files and such can obviously be restored. Network connections don't, but then the code that accesses the internet is typically more error checking and such and survives the error conditions (or ought to).
If you did per-program hibernation (without application support), how would you handle open files? What if another process accesses those files in the interim? etc?
Maintaining state when the program is not loaded is going to be difficult.
Simply suspending the threads and letting it get swapped to disk would have much the same effect?
Or run the program in a virtual machine and let the VM handle suspension.
Short answer is "yes, but not always reliably". Check out CryoPID:
http://cryopid.berlios.de/
Open files will indeed be the most common problem. CryoPID states explicitly:
Open files and offsets are restored.
Temporary files that have been
unlinked and are not accessible on the
filesystem are always saved in the
image. Other files that do not exist
on resume are not yet restored.
Support for saving file contents for
such situations is planned.
The same issues will also affect TCP connections, though CryoPID supports tcpcp for connection resuming.
I extended Cryopid producing a package called Cryopid2 available from SourceForge. This can
migrate a process as well as hibernating it (along with any open files and sockets - data
in sockets/pipes is sucked into the process on hibernation and spat back into these when
process is restarted).
The reason I have not been active with this project is I am not a kernel developer - both
this (and/or the original cryopid) need to get someone on board who can get them running
with the lastest kernels (e.g. Linux 3.x).
The Cryopid method does work - and is probably the best solution to general purpose process
hibernation/migration in Linux I have come across.
The short answer is "yes." You might start by looking at this for some ideas: ELF executable reconstruction from a core image (http://vx.netlux.org/lib/vsc03.html)
As others have noted, it's difficult for the OS to provide this functionality, because the application needs to have some error checking builtin to handle broken streams.
However, on a side note, some programming languages and tools that use virtual machines explicitly support this functionality, such as the Self programming language.
This is sort of the ultimate goal of clustered operating system. Mathew Dillon puts a lot of effort to implement something like this in his Dragonfly BSD project.
adding another workaround: you can use virtualbox. run your applications in a regular virtual machine and simply "save the machine state" whenever you want.
I know this is not an answer, but I thought it could be useful when there are no real options.
if for any reason you don't like virtualbox, vmware and Qemu are as good.
Ctrl-Z increases the chances the process's pages will be swapped, but it doesn't free the process's resources completely. The problem with freeing a process's resources completely is that things like file handles, sockets are kernel resources the process gets to use, but doesn't know how to persist on its own. So Ctrl-Z is as good as it gets.
There was some research on checkpoint/restore for Linux back in 2.2 and 2.4 days, but it never made it past prototype. It is possible (with the caveats described in the other answers) for certain values of possible - I you can write a kernel module to do it, it is possible. But for the common value of possible (can I do it from the shell on a commercial Linux distribution), it is not yet possible.
There's ctrl+z in linux, but i'm not sure it offers the features you specified. I suspect you asked this question since it doesn't

How to find the process which is cosuming the most i/o in linux?

When I use top the iowait on the host is really high.
iostat tells me which disk is utilized more but I want to find out which process is the culprit?
I am trying to find this out on a red hat linux host. Any suggestions.
EDIT: My linux flavor does not either have atop or ntop and since building kernel is not an option for me don't ask me why :) (since this is not my personal box). are there any other alternatives
I usually use atop. There's a really good article at Debian Package A Day about it. It does require kernel patching (although Ubuntu already has the patch applied, I'm not sure about any other distributions.)
Use iotop.
Or you can get it standalone, it's a simple python script which requires a recent kernel (can't remember, but at least > 2.6.20)

Resources