modular kernel vs micro kernel / monolitic kernel

modular kernel vs micro kernel / monolitic kernel - linux

I am C programmer and new to the Linux kernel programming. I could find there are 3 type of kernel monolithic,micro and modular kernel.while googling i could find some website say linux is having monolithic kernel (in Stack overflow) and some other says micro kernel and the rest say hybrid kernel. So i am totally confused while reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single processes this is also bit confusing if so

Before you try to understand those differences, you have to understand other concepts first:
1. Modular programming.
Module is a functionally complete part of a program. Module usually has following properties:
Separation of interface and implementation.
Initialization and deinitialization routines. Both are optional. Deinitialization routine is likely to be missing in environment with GC (Garbage Collector).
Modules used by a program compose directed acyclic graph a.k.a. dependency graph (you might have heard about this - cyclic dependencies are not allowed, dependency is initialized before dependent module).
Modular programming is essential when building large systems. Every big kernel is a modular kernel, regardless of whether it is monolithic, hybrid or microkernel.
Sometimes modules can be loaded and unloaded dynamically. Dynamic modules are essential part of any extensible system. Those can be plugins or, if we talk about kernels, drivers that are developed and distributed separately from the kernel.
2. Safe and unsafe languages.
Safe languages very strictly define what can happen in a program. Most importantly they have no concept of malformed program (or meaningless program). Every program is valid and its execution always follows language specification. Whether program does what programmer expects it to do or not, is irrelevant in this context.
Common traits of safe languages:
They use garbage collection.
They have no pointer arithmetics. That means that writing or reading to an arbitrary address is not allowed.
They prevent out of range array access (if there is such concept). Exceptions or similar mechanisms can be used to signal and recover from such failures.
References (or pointers) have only two possible states: null reference and reference to a valid object. This is guaranteed by garbage collector. In fact, GC is the key component here. Some languages go even further and do no allow null references at all.
Every object (memory chunk in use) has type information assigned to it, object can only be accessed through a reference of appropriate type e.g. you cannot access array of integers through reference to a string.
You can add more entries to this list, but basic idea is to guarantee that program can only access valid memory regions using valid operations. Keep in mind that some unsafe languages can share some or even all of those traits.
Examples of safe languages: Python, Java, safe subset of C#.
Unsafe languages define what can and cannot be done in a program, but there is usually little to nothing to stop programmer from doing the wrong thing. Program that violates those rules is called a malformed program. From language point of view such program is meaningless and language does not even try to define its behavior, as it is usually near to impossible to do. In terms of C such program's behavior is undefined.
Examples of unsafe languages: Assembler, C, C++, Pascal.
3. Hardware is unsafe and thus has to be programmed using unsafe language.
Most hardware does nothing to provide you with a safe environment. There were some processors that used to attach type information to every memory cell (see tagged architecture), but modern ones do not do this as it complicates hardware, making it slower, more expensive and less generic.
Still, some features are provided to make it possible to implement safe environments within unsafe environment of hardware, such as memory protection, separate address spaces and separation of execution modes on user mode and kernel mode (a.k.a. supervisor mode).
Kernel is what runs on bare metal and thus much of it has to be written in unsafe languages like C and Assembly. Another reason is performance - safe environments imply huge overhead.
Microkernel and Monolithic kernel
Monolithic kernel and its modules run in a single shared address space. And since everything is usually written in an unsafe language, it is possible for any part of the kernel to access (and damage) memory that belongs to another part of the kernel due to bugs in code. Unsafe nature of this environment makes it impossible to detect or recover from those failures and most importantly predict kernel behavior after such failures.
Microkernel is an attempt to overcome those limitation by moving various parts of the kernel to a separate address spaces, effectively isolating them from each other, but providing safe way to communicate with each other (usually through message passing). Such separation creates safe environment composed from multiple unsafe processes, allowing kernel to recover from failure of some of its subsystems.
At the same time monolithic kernel can be able to run parts of it in a separate address space (FUSE), while nothing stops microkernel from being able to support modules that share address space with the main part of the kernel.
If most of the kernel runs in a single address space, it is considered to be a monolithic kernel. If most of it runs in separate address spaces, such kernel is considered to be a microkernel. If kernel is somewhere inbetween and actively uses both approaches, you have a hybrid kernel.
Hybrid kernel
Concept of hybrid kernel implies combining best of both worlds and was invented by Microsoft to boost sales of Windows NT in 90s. Joke! But this is almost true. Every important part of Windows NT runs in a shared address space, so it is another example of monolithic design.
Many people seem to use this term when describing monolithic kernel that is able to dynamically load modules. This is because in the past monolithic kernels didn't support dynamic module loading and had to be recompiled every time module is added to the kernel. Microkernels are not about dynamic module loading, but about reliability of the kernel, about its ability to recover from failues of its subsystems.
The answer: Linux is a monolithic kernel.
Monolithic kernel can be modular and can dynamically load modules. Microkernel, on the other hand, has to be modular and has to be able to dynamically load modules - the whole idea is about running them in a separate address space.
Microkernel is not the only way of overcoming unsafe nature of monolithic kernel. Another way is to write monolithic kernel in a safe language. One problem with such approach is that safe environment should be either provided by hardware (and will be very limited) or should be implemented in software using unsafe languages. Implementation of such environment will be extremely complex and will most likely have many bugs (think of all bugs found in JVM).
Example of this would be experimental OS Singularity.

Well, considering that I may have a quiz on this tomorrow, I should be able to help you out. However, I am still learning, and while my post may have some technical mistakes, it should be conceptually sound
Basically, as you may understand, there are different type of kernels for an OS.
Monolithic kernels have all their system functionalities and services together in one single giant program, occupying a single address space.
Microkernels on the other hand, have the bare minimum system programs and services on the microkernel. Most of the services that were previously considered as a part of the kernel (during the monolithic kernel version) such as process scheduler, etc. are now in the user space, and are termed as servers. these servers communicate with each other through the micro-kernel, using the inter-process communication, a form of communication that is laid down by the microkernel.
The modular approach builds on this, by making these "servers" dynamically loadable. Thus, one can have a particular "server" (in this type of kernel, called a module) dynamically loaded, without the kernel requiring to re-compile itself.
Linux Kernel is a monolithic kernel, but most flavours of Linux such as Ubuntu, Solaris, use a hybrid kernel, i.e. a mix of the monolithic and modular kernel approach. This is quite common, has different kernel structure has different pros and cons, and a hybrid structure is required to strike a balance

See this prior StackOverflow question for some information about your question. Briefly, it sounds like you're wondering...
... reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single process ...
These two concepts ("modular kernel" and "single address space") are not actually contradictory. You can build a new kernel module without recompiling the entire Linux kernel. When you load this new kernel module, it will actually be loaded into the same address space as the running kernel itself. From the link above...
Do not confuse the term modular kernel to be anything but monolithic. Some monolithic kernels can be compiled to be modular (e.g Linux), what matters is that the module is inserted to and runs from the same space that handles core functionality (kernel space).
As you have found, there are several ways to classify kernels and the different types are not necessarily mutually exclusive.

Related

In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?

My question lies in a paragraph, the paragraph are shown as follow, I can't understand the the bold sentence. If it doesn't need to invoke message passing, how does it complete communication between process?
Modules
Perhaps the best current methodology for operating-system design involves
using loadable kernel modules (LKMs). Here, the kernel has a set of core
components and can link in additional services via modules, either at boot time
or during run time. This type of design is common in modern implementations
of UNIX, such as Linux, macOS, and Solaris, as well as Windows.
The idea of the design is for the kernel to provide core services, while
other services are implemented dynamically, as the kernel is running. Linking
services dynamically is preferable to adding new features directly to the kernel,
which would require recompiling the kernel every time a change was made.
Thus, for example, we might build CPU scheduling and memory management
algorithms directly into the kernel and then add support for different file
systems by way of loadable modules.
The overall result resembles a layered system in that each kernel section
has defined, protected interfaces; but it is more flexible than a layered system,
because any module can call any other module. The approach is also similar to
the microkernel approach in that the primary module has only core functions
and knowledge of how to load and communicate with other modules; but it
is more efficient, because modules do not need to invoke message passing in
order to communicate.
Linux uses loadable kernel modules, primarily for supporting device
drivers and file systems. LKMs can be “inserted” into the kernel as the system is started (or booted) or during run time, such as when a USB device is
plugged into a running machine. If the Linux kernel does not have the necessary driver, it can be dynamically loaded. LKMs can be removed from the
kernel during run time as well. For Linux, LKMs allow a dynamic and modular
kernel, while maintaining the performance benefits of a monolithic system. We
cover creating LKMs in Linux in several programming exercises at the end of
this chapter.

In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?
The simple answer is that because they're loaded into kernel space and dynamically linked; the kernel can use "mostly normal" functions calls instead of anything more expensive (message passing, remote procedure calls, ...) to communicate with it.
Note: Typically (especially for *nix systems) a driver will provide a set of function pointers to the kernel (e.g. maybe one for open(), one for read(), one for ioctl(), etc) in some kind of "device context" structure; allowing the kernel to call the driver's functions via. the function pointers (e.g. like "result = deviceContext->open( ..);).
"The approach is also similar to the microkernel approach in that the primary module has only core functions and knowledge of how to load and communicate with other modules; but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This paragraph has the potential to give you a false impression. For extensibility alone, modular monolithic kernels are similar to micro-kernels (and both are a lot more extensible than a "literally monolithic (one piece, like stone)" kernel). For other things (e.g. security) modular monolithic kernels are extremely dissimilar to micro-kernels.
For Linux specifically; you can think of it as almost 30 million lines (growing at a rate of over 1 million lines per year) of potential security vulnerabilities running at the highest privilege level with full access to every scrap of data, with an average of about 150 discovered critical vulnerabilities per year (and who knows how many undiscovered critical vulnerabilities).
One of the main goals of micro-kernels is to place isolation barriers between the "kernel core" and everything else; so that you might end up with several thousand lines of kernel that doesn't grow (and a significant improvement in security). It's those isolation barriers that require less efficient communication (e.g. message passing).
"...but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This could be rephrased more correctly as "...but it is more efficient, because modules do not need to pass through an isolation barrier."
Note that message passing is merely one way to pass through an isolation barrier - there's shared memory, signals, pipes, sockets, remote procedure calls, etc. Nothing says a micro-kernel has to use message passing and you could design a micro-kernel that does not use message passing at all.

How is programming in rtems different than Linux?

I am new to programmming in rtem and was wondering how are the two, rtems and linux, are different in terms of programming. I understand rtems is an real time operating system but if you were to make a hello world app, wouldn’t the program be the same?

Note that your question is quite generic. There are a lot of detail differences.
One of the biggest is the format of your binary: Most RTEMS binaries are statically linked together. You only have one big binary containing your system and application. There is some dynamic loading supported but it's not the case used by most users.
As already mentioned my n.m. in the comments RTEMS has a lot of the POSIX API (at least the embedded sub set). So you can use a lot of the same API like you do on Linux.
A big differences is that RTEMS has a global address space (on most targets). So you don't have a separation between tasks. That makes pointer errors a bit harder to debug.
Also a difference: Most embedded systems are targeted for long running applications. In such applications (regardless whether you are on Linux or on RTEMS or on any other system) you should be careful to clean up your stuff (close files, free memory, ...). In Linux (or other desktop class systems) you have processes and the kernel cleans up all resources after your process exits. Although you can create threads in RTEMS no one cleans up after a thread exits.

The POSIX attribute defaults for threads are not specified in the standard and may vary between RTEMS and Linux.

Why are not all library functions not system calls?

So, from my basic OS class, I understood that kernel is the one who interacts with the hardware. So, if we want to interact with hardware, we need to call system calls. open() is a system call, while strlen() is not a system call. But any instruction or command has to interact with hardware, at least to increase program counter or modify the contents of memory. So, shouldn't all functions make a system call at some point ?

I would strongly suggest reading early papers on UNIX, the how and the why. Ken Thompson was a strong advocate for the kernel consisting only of the things that could not be implemented outside of the kernel.
At that time; outside of the kernel corresponded to outside of the privileged mode of the computer. This is a less interesting concept in modern systems; yet continues to drive architecture and design.
In short; open() is exported by the kernel because it has to be - it access data structures that are private to the kernel, thus is an interface; strlen is not exported by the kernel because it doesn't have to be, it neither requires privilege nor access to other data structures.
Doesn't have to be is a trump card; because nobody wants needless functionality in the kernel.

kernel is the one who interacts with the hardware.
That is a very inaccurate statement.
Even the following program, which may run on some microcontroller with no OS (and no optimizing compiler...), interacts with the hardware:
int array[8192];
void entry_point(void) {
array[100] = 5+3;
}
That hardware in question being "CPU", "memory bus", and "memory".
While system calls are primarily used to access certain hardware (disk, network etc.), system calls are not defined as "calls to access hardware", but rather as just "calls to kernel APIs".
A kernel can export whatever API it wants, including strlen(), but for an OS designer, such as the aforementioned Ken Thompson, the APIs that the kernel should export are ones that facilitate the existence of multiple programs, processes, and/or users.
The main concern here is access to resources such as disk, network, timers, memory, etc., but also include e.g.:
Scheduling / process management APIs (e.g. the fork(), exit(), nice(), and sched_yield() calls)
Multiprocess management (e.g. sigaction(), kill(), wait(), futex() and semop())
Performance & debugging (e.g. ptrace(), prlimit() and getrusage())
Security (e.g. chmod(), chown(), setuid(), seccomp(), and chroot())
Administration (e.g. init_module(), sethostname(), shutdown())

Quick questions about Linux kernel modules

I'm very familiar with Linux (I've been using it for 2 years, no Windows for 1 1/2 years), and I'm finally digging deeper into kernel programming and I'm working a project. So my questions are:
Will a kernel module run faster than a traditional c program.
How can I communicate with a module (is that even possible), for example call a function in it.

1.Will a kernel module run faster than a traditional c program.
It Depends™
Running as a kernel module means you get to play by different rules, you potentially get to avoid some context switches depending on what you are doing. You get access to some powerful tools that can be leveraged to optimize your code, but don't expect your code to run magically faster just by throwing everything in kernelspace.
2.How can I communicate with a module (is that even possible), for example call a function in it.
There are various ways:
You can use the various file system interfaces: procfs, sysfs, debugfs, sysctl, ...
You could register a char device
You can make use of the Netlink interface
You could create new syscalls, although that's heavily discouraged
And you can always come up with your own scheme, or use some less common APIs

Will a kernel module run faster than a traditional c program.
The kernel is already a C program, which is most likely be compiled with same compiler you use. So generic algorithms or some processor intensive computations will be executed with almost same speed.
But most userspace programs (like bash) have to ask kernel to perform some operations on system resources, i.e. print prompt onto monitor. It will require entering the kernel with system call, sending data over tty interfaces and passing to a video-driver, it may introduce some latency. If you'd implemented bash in-kernel, you may directly call video-driver, which is definitely faster.
That approach however, have drawbacks. First of all, bash should be able to print prompt on ssh-session or serial console, and that will complicate logic. Also, if your bash will hang, you cannot just kill, you have to reboot system.
How can I communicate with a module (is that even possible), for example call a function in it.
In addition to excellent list provided by #tux3, I would suggest to start with char devices.

Linux System Calls & Kernel Mode

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?

System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.

You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string