Why are not all library functions not system calls? - linux

So, from my basic OS class, I understood that kernel is the one who interacts with the hardware. So, if we want to interact with hardware, we need to call system calls. open() is a system call, while strlen() is not a system call. But any instruction or command has to interact with hardware, at least to increase program counter or modify the contents of memory. So, shouldn't all functions make a system call at some point ?

I would strongly suggest reading early papers on UNIX, the how and the why. Ken Thompson was a strong advocate for the kernel consisting only of the things that could not be implemented outside of the kernel.
At that time; outside of the kernel corresponded to outside of the privileged mode of the computer. This is a less interesting concept in modern systems; yet continues to drive architecture and design.
In short; open() is exported by the kernel because it has to be - it access data structures that are private to the kernel, thus is an interface; strlen is not exported by the kernel because it doesn't have to be, it neither requires privilege nor access to other data structures.
Doesn't have to be is a trump card; because nobody wants needless functionality in the kernel.

kernel is the one who interacts with the hardware.
That is a very inaccurate statement.
Even the following program, which may run on some microcontroller with no OS (and no optimizing compiler...), interacts with the hardware:
int array[8192];
void entry_point(void) {
array[100] = 5+3;
}
That hardware in question being "CPU", "memory bus", and "memory".
While system calls are primarily used to access certain hardware (disk, network etc.), system calls are not defined as "calls to access hardware", but rather as just "calls to kernel APIs".
A kernel can export whatever API it wants, including strlen(), but for an OS designer, such as the aforementioned Ken Thompson, the APIs that the kernel should export are ones that facilitate the existence of multiple programs, processes, and/or users.
The main concern here is access to resources such as disk, network, timers, memory, etc., but also include e.g.:
Scheduling / process management APIs (e.g. the fork(), exit(), nice(), and sched_yield() calls)
Multiprocess management (e.g. sigaction(), kill(), wait(), futex() and semop())
Performance & debugging (e.g. ptrace(), prlimit() and getrusage())
Security (e.g. chmod(), chown(), setuid(), seccomp(), and chroot())
Administration (e.g. init_module(), sethostname(), shutdown())

Related

In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?

My question lies in a paragraph, the paragraph are shown as follow, I can't understand the the bold sentence. If it doesn't need to invoke message passing, how does it complete communication between process?
Modules
Perhaps the best current methodology for operating-system design involves
using loadable kernel modules (LKMs). Here, the kernel has a set of core
components and can link in additional services via modules, either at boot time
or during run time. This type of design is common in modern implementations
of UNIX, such as Linux, macOS, and Solaris, as well as Windows.
The idea of the design is for the kernel to provide core services, while
other services are implemented dynamically, as the kernel is running. Linking
services dynamically is preferable to adding new features directly to the kernel,
which would require recompiling the kernel every time a change was made.
Thus, for example, we might build CPU scheduling and memory management
algorithms directly into the kernel and then add support for different file
systems by way of loadable modules.
The overall result resembles a layered system in that each kernel section
has defined, protected interfaces; but it is more flexible than a layered system,
because any module can call any other module. The approach is also similar to
the microkernel approach in that the primary module has only core functions
and knowledge of how to load and communicate with other modules; but it
is more efficient, because modules do not need to invoke message passing in
order to communicate.
Linux uses loadable kernel modules, primarily for supporting device
drivers and file systems. LKMs can be “inserted” into the kernel as the system is started (or booted) or during run time, such as when a USB device is
plugged into a running machine. If the Linux kernel does not have the necessary driver, it can be dynamically loaded. LKMs can be removed from the
kernel during run time as well. For Linux, LKMs allow a dynamic and modular
kernel, while maintaining the performance benefits of a monolithic system. We
cover creating LKMs in Linux in several programming exercises at the end of
this chapter.
In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?
The simple answer is that because they're loaded into kernel space and dynamically linked; the kernel can use "mostly normal" functions calls instead of anything more expensive (message passing, remote procedure calls, ...) to communicate with it.
Note: Typically (especially for *nix systems) a driver will provide a set of function pointers to the kernel (e.g. maybe one for open(), one for read(), one for ioctl(), etc) in some kind of "device context" structure; allowing the kernel to call the driver's functions via. the function pointers (e.g. like "result = deviceContext->open( ..);).
"The approach is also similar to the microkernel approach in that the primary module has only core functions and knowledge of how to load and communicate with other modules; but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This paragraph has the potential to give you a false impression. For extensibility alone, modular monolithic kernels are similar to micro-kernels (and both are a lot more extensible than a "literally monolithic (one piece, like stone)" kernel). For other things (e.g. security) modular monolithic kernels are extremely dissimilar to micro-kernels.
For Linux specifically; you can think of it as almost 30 million lines (growing at a rate of over 1 million lines per year) of potential security vulnerabilities running at the highest privilege level with full access to every scrap of data, with an average of about 150 discovered critical vulnerabilities per year (and who knows how many undiscovered critical vulnerabilities).
One of the main goals of micro-kernels is to place isolation barriers between the "kernel core" and everything else; so that you might end up with several thousand lines of kernel that doesn't grow (and a significant improvement in security). It's those isolation barriers that require less efficient communication (e.g. message passing).
"...but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This could be rephrased more correctly as "...but it is more efficient, because modules do not need to pass through an isolation barrier."
Note that message passing is merely one way to pass through an isolation barrier - there's shared memory, signals, pipes, sockets, remote procedure calls, etc. Nothing says a micro-kernel has to use message passing and you could design a micro-kernel that does not use message passing at all.

Why is `kernal_thread()` not listed as a system call of Linux?

I was wondering why kernal_thread() isn't listed as a system call in http://man7.org/linux/man-pages/man2/syscalls.2.html?
Does a Linux application programmer never have any need to create a kernel thread?
Is the function accessible to a Linux application programmer?
Thanks.
Application programmers often need to create "kernel scheduled threads", aka "OS threads" or "native threads" using the clone syscall from that list.
"Kernel threads", however, are just threads that the kernel uses to run kernel code for its own internal purposes. They are created and used by kernel context code only. Each piece of software is responsible for creating and managing its own threads to do its own job, including userspace applications and the kernel itself.
kernel_thread is a kernel function defined in kernel/fork.c, which is not exposed to userspace. It's part of the internal kernel API and not a syscall.
As you are familiar that their are two address spaces one user and kernel, normal function will run in user space but when you will make use of some function calls that are implemented in kernel space you cannot use them directly so to access them we need system calls.
So now your question is why kernal_thread() is not listed in system calls.
(As answered by "that other guy" )
kernal_thread() function are used by the kernel programmer or usual in device driver for creating thread in kernel space. So their implementation is in kernel space and only used by kernel developer or programer. (Note:- if a interface have been provided for some function for user space that will be concluded as system call, as no interface for that function for user so their is no documentation for that in man pages)
If you want to read about documents about Kernel space function download the kernel source and check the "Documentation" folder or check the source for respective function they have few comments.

modular kernel vs micro kernel / monolitic kernel

I am C programmer and new to the Linux kernel programming. I could find there are 3 type of kernel monolithic,micro and modular kernel.while googling i could find some website say linux is having monolithic kernel (in Stack overflow) and some other says micro kernel and the rest say hybrid kernel. So i am totally confused while reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single processes this is also bit confusing if so
Before you try to understand those differences, you have to understand other concepts first:
1. Modular programming.
Module is a functionally complete part of a program. Module usually has following properties:
Separation of interface and implementation.
Initialization and deinitialization routines. Both are optional. Deinitialization routine is likely to be missing in environment with GC (Garbage Collector).
Modules used by a program compose directed acyclic graph a.k.a. dependency graph (you might have heard about this - cyclic dependencies are not allowed, dependency is initialized before dependent module).
Modular programming is essential when building large systems. Every big kernel is a modular kernel, regardless of whether it is monolithic, hybrid or microkernel.
Sometimes modules can be loaded and unloaded dynamically. Dynamic modules are essential part of any extensible system. Those can be plugins or, if we talk about kernels, drivers that are developed and distributed separately from the kernel.
2. Safe and unsafe languages.
Safe languages very strictly define what can happen in a program. Most importantly they have no concept of malformed program (or meaningless program). Every program is valid and its execution always follows language specification. Whether program does what programmer expects it to do or not, is irrelevant in this context.
Common traits of safe languages:
They use garbage collection.
They have no pointer arithmetics. That means that writing or reading to an arbitrary address is not allowed.
They prevent out of range array access (if there is such concept). Exceptions or similar mechanisms can be used to signal and recover from such failures.
References (or pointers) have only two possible states: null reference and reference to a valid object. This is guaranteed by garbage collector. In fact, GC is the key component here. Some languages go even further and do no allow null references at all.
Every object (memory chunk in use) has type information assigned to it, object can only be accessed through a reference of appropriate type e.g. you cannot access array of integers through reference to a string.
You can add more entries to this list, but basic idea is to guarantee that program can only access valid memory regions using valid operations. Keep in mind that some unsafe languages can share some or even all of those traits.
Examples of safe languages: Python, Java, safe subset of C#.
Unsafe languages define what can and cannot be done in a program, but there is usually little to nothing to stop programmer from doing the wrong thing. Program that violates those rules is called a malformed program. From language point of view such program is meaningless and language does not even try to define its behavior, as it is usually near to impossible to do. In terms of C such program's behavior is undefined.
Examples of unsafe languages: Assembler, C, C++, Pascal.
3. Hardware is unsafe and thus has to be programmed using unsafe language.
Most hardware does nothing to provide you with a safe environment. There were some processors that used to attach type information to every memory cell (see tagged architecture), but modern ones do not do this as it complicates hardware, making it slower, more expensive and less generic.
Still, some features are provided to make it possible to implement safe environments within unsafe environment of hardware, such as memory protection, separate address spaces and separation of execution modes on user mode and kernel mode (a.k.a. supervisor mode).
Kernel is what runs on bare metal and thus much of it has to be written in unsafe languages like C and Assembly. Another reason is performance - safe environments imply huge overhead.
Microkernel and Monolithic kernel
Monolithic kernel and its modules run in a single shared address space. And since everything is usually written in an unsafe language, it is possible for any part of the kernel to access (and damage) memory that belongs to another part of the kernel due to bugs in code. Unsafe nature of this environment makes it impossible to detect or recover from those failures and most importantly predict kernel behavior after such failures.
Microkernel is an attempt to overcome those limitation by moving various parts of the kernel to a separate address spaces, effectively isolating them from each other, but providing safe way to communicate with each other (usually through message passing). Such separation creates safe environment composed from multiple unsafe processes, allowing kernel to recover from failure of some of its subsystems.
At the same time monolithic kernel can be able to run parts of it in a separate address space (FUSE), while nothing stops microkernel from being able to support modules that share address space with the main part of the kernel.
If most of the kernel runs in a single address space, it is considered to be a monolithic kernel. If most of it runs in separate address spaces, such kernel is considered to be a microkernel. If kernel is somewhere inbetween and actively uses both approaches, you have a hybrid kernel.
Hybrid kernel
Concept of hybrid kernel implies combining best of both worlds and was invented by Microsoft to boost sales of Windows NT in 90s. Joke! But this is almost true. Every important part of Windows NT runs in a shared address space, so it is another example of monolithic design.
Many people seem to use this term when describing monolithic kernel that is able to dynamically load modules. This is because in the past monolithic kernels didn't support dynamic module loading and had to be recompiled every time module is added to the kernel. Microkernels are not about dynamic module loading, but about reliability of the kernel, about its ability to recover from failues of its subsystems.
The answer: Linux is a monolithic kernel.
Monolithic kernel can be modular and can dynamically load modules. Microkernel, on the other hand, has to be modular and has to be able to dynamically load modules - the whole idea is about running them in a separate address space.
Microkernel is not the only way of overcoming unsafe nature of monolithic kernel. Another way is to write monolithic kernel in a safe language. One problem with such approach is that safe environment should be either provided by hardware (and will be very limited) or should be implemented in software using unsafe languages. Implementation of such environment will be extremely complex and will most likely have many bugs (think of all bugs found in JVM).
Example of this would be experimental OS Singularity.
Well, considering that I may have a quiz on this tomorrow, I should be able to help you out. However, I am still learning, and while my post may have some technical mistakes, it should be conceptually sound
Basically, as you may understand, there are different type of kernels for an OS.
Monolithic kernels have all their system functionalities and services together in one single giant program, occupying a single address space.
Microkernels on the other hand, have the bare minimum system programs and services on the microkernel. Most of the services that were previously considered as a part of the kernel (during the monolithic kernel version) such as process scheduler, etc. are now in the user space, and are termed as servers. these servers communicate with each other through the micro-kernel, using the inter-process communication, a form of communication that is laid down by the microkernel.
The modular approach builds on this, by making these "servers" dynamically loadable. Thus, one can have a particular "server" (in this type of kernel, called a module) dynamically loaded, without the kernel requiring to re-compile itself.
Linux Kernel is a monolithic kernel, but most flavours of Linux such as Ubuntu, Solaris, use a hybrid kernel, i.e. a mix of the monolithic and modular kernel approach. This is quite common, has different kernel structure has different pros and cons, and a hybrid structure is required to strike a balance
See this prior StackOverflow question for some information about your question. Briefly, it sounds like you're wondering...
... reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single process ...
These two concepts ("modular kernel" and "single address space") are not actually contradictory. You can build a new kernel module without recompiling the entire Linux kernel. When you load this new kernel module, it will actually be loaded into the same address space as the running kernel itself. From the link above...
Do not confuse the term modular kernel to be anything but monolithic. Some monolithic kernels can be compiled to be modular (e.g Linux), what matters is that the module is inserted to and runs from the same space that handles core functionality (kernel space).
As you have found, there are several ways to classify kernels and the different types are not necessarily mutually exclusive.

Linux System Calls & Kernel Mode

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?
System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.
You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

What are coding conventions for using floating-point in Linux device drivers?

This is related to
this question.
I'm not an expert on Linux device drivers or kernel modules, but I've been reading "Linux Device Drivers" [O'Reilly] by Rubini & Corbet and a number of online sources, but I haven't been able to find anything on this specific issue yet.
When is a kernel or driver module allowed to use floating-point registers?
If so, who is responsible for saving and restoring their contents?
(Assume x86-64 architecture)
If I understand correctly, whenever a KM is running, it is using a hardware context (or hardware thread or register set -- whatever you want to call it) that has been preempted from some application thread. If you write your KM in c, the compiler will correctly insure that the general-purpose registers are properly saved and restored (much as in an application), but that doesn't automatically happen with floating-point registers. For that matter, a lot of KMs can't even assume that the processor has any floating-point capability.
Am I correct in guessing that a KM that wants to use floating-point has to carefully save and restore the floating-point state? Are there standard kernel functions for doing this?
Are the coding conventions for this spelled out anywhere? Are they different for SMP-non SMP drivers? Are they different for older non-preemptive kernels and newer preemptive kernels?
Linus's answer provides this pretty clear quote to use as a guideline:
In other words: the rule is that you really shouldn't use FP in the kernel.
Short answer: Kernel code can use floating point if this use is surrounded by kernel_fpu_begin()/kernel_fpu_end(). These function handle saving and restoring the fpu context. Also, they call preempt_disable()/preempt_enable(), which means no sleeping, page faults etc. in the code between those functions. Google the function names for more information.
If I understand correctly, whenever a
KM is running, it is using a hardware
context (or hardware thread or
register set -- whatever you want to
call it) that has been preempted from
some application thread.
No, a kernel module can run in user context as well (eg. when userspace calls syscalls on a device provided by the KM). It has, however, no relation to the float issue.
If you write your KM in c, the
compiler will correctly insure that
the general-purpose registers are
properly saved and restored (much as
in an application), but that doesn't
automatically happen with
floating-point registers.
That is not because of the compiler, but because of the kernel context-switching code.

Resources