What are coding conventions for using floating-point in Linux device drivers? - linux

This is related to
this question.
I'm not an expert on Linux device drivers or kernel modules, but I've been reading "Linux Device Drivers" [O'Reilly] by Rubini & Corbet and a number of online sources, but I haven't been able to find anything on this specific issue yet.
When is a kernel or driver module allowed to use floating-point registers?
If so, who is responsible for saving and restoring their contents?
(Assume x86-64 architecture)
If I understand correctly, whenever a KM is running, it is using a hardware context (or hardware thread or register set -- whatever you want to call it) that has been preempted from some application thread. If you write your KM in c, the compiler will correctly insure that the general-purpose registers are properly saved and restored (much as in an application), but that doesn't automatically happen with floating-point registers. For that matter, a lot of KMs can't even assume that the processor has any floating-point capability.
Am I correct in guessing that a KM that wants to use floating-point has to carefully save and restore the floating-point state? Are there standard kernel functions for doing this?
Are the coding conventions for this spelled out anywhere? Are they different for SMP-non SMP drivers? Are they different for older non-preemptive kernels and newer preemptive kernels?

Linus's answer provides this pretty clear quote to use as a guideline:
In other words: the rule is that you really shouldn't use FP in the kernel.

Short answer: Kernel code can use floating point if this use is surrounded by kernel_fpu_begin()/kernel_fpu_end(). These function handle saving and restoring the fpu context. Also, they call preempt_disable()/preempt_enable(), which means no sleeping, page faults etc. in the code between those functions. Google the function names for more information.
If I understand correctly, whenever a
KM is running, it is using a hardware
context (or hardware thread or
register set -- whatever you want to
call it) that has been preempted from
some application thread.
No, a kernel module can run in user context as well (eg. when userspace calls syscalls on a device provided by the KM). It has, however, no relation to the float issue.
If you write your KM in c, the
compiler will correctly insure that
the general-purpose registers are
properly saved and restored (much as
in an application), but that doesn't
automatically happen with
floating-point registers.
That is not because of the compiler, but because of the kernel context-switching code.

Related

Why are not all library functions not system calls?

So, from my basic OS class, I understood that kernel is the one who interacts with the hardware. So, if we want to interact with hardware, we need to call system calls. open() is a system call, while strlen() is not a system call. But any instruction or command has to interact with hardware, at least to increase program counter or modify the contents of memory. So, shouldn't all functions make a system call at some point ?
I would strongly suggest reading early papers on UNIX, the how and the why. Ken Thompson was a strong advocate for the kernel consisting only of the things that could not be implemented outside of the kernel.
At that time; outside of the kernel corresponded to outside of the privileged mode of the computer. This is a less interesting concept in modern systems; yet continues to drive architecture and design.
In short; open() is exported by the kernel because it has to be - it access data structures that are private to the kernel, thus is an interface; strlen is not exported by the kernel because it doesn't have to be, it neither requires privilege nor access to other data structures.
Doesn't have to be is a trump card; because nobody wants needless functionality in the kernel.
kernel is the one who interacts with the hardware.
That is a very inaccurate statement.
Even the following program, which may run on some microcontroller with no OS (and no optimizing compiler...), interacts with the hardware:
int array[8192];
void entry_point(void) {
array[100] = 5+3;
}
That hardware in question being "CPU", "memory bus", and "memory".
While system calls are primarily used to access certain hardware (disk, network etc.), system calls are not defined as "calls to access hardware", but rather as just "calls to kernel APIs".
A kernel can export whatever API it wants, including strlen(), but for an OS designer, such as the aforementioned Ken Thompson, the APIs that the kernel should export are ones that facilitate the existence of multiple programs, processes, and/or users.
The main concern here is access to resources such as disk, network, timers, memory, etc., but also include e.g.:
Scheduling / process management APIs (e.g. the fork(), exit(), nice(), and sched_yield() calls)
Multiprocess management (e.g. sigaction(), kill(), wait(), futex() and semop())
Performance & debugging (e.g. ptrace(), prlimit() and getrusage())
Security (e.g. chmod(), chown(), setuid(), seccomp(), and chroot())
Administration (e.g. init_module(), sethostname(), shutdown())

"Switching from user mode to kernel mode" is an incorrect concept

Im studying for the first time "Operating System". In my book i found this sentence about "User Mode" and "Kernel Mode":
"Switch from user to kernel mode" instruction is executed only in kernel
mode
I think that is a incorrect sentence as in practice there is no "switch of kernel". In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself. Is it correct ?
In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself.
But how does that happen? Details are processor (i.e. instruction set architecture) and OS specific (explained in ABI specifications relevant to your system, e.g. here), but that usually involves some machine code instruction like SYSENTER or SYSCALL (or SVC on mainframes) capable of atomically changing the CPU mode (that is switching it in a controlled manner to kernel mode). The actual parameters of the system call (including even the syscall number) are often passed in registers (but details are ABI specific).
So I feel the concept of switching from user-mode to kernel-mode is relevant, and meaningful (so "correct").
BTW, user-mode code is forbidden (by the hardware) to execute privileged machine instructions, such as those interacting with IO hardware devices (read about protection rings). If you try, you get some hardware exception (a bit similar to interrupts). Hence your code (even if it is malicious) has to make system calls, which the kernel controls (it has lots of code related to permission checking), for e.g. all IO.
Read also Operating Systems: Three Easy Pieces - freely downloadable. See also http://osdev.org/. Read system call wikipage & syscalls(2), and the Assembler HowTo.
In real life, things are much more complex. Read about System Management Mode and about the (scary) Intel Management Engine.

modular kernel vs micro kernel / monolitic kernel

I am C programmer and new to the Linux kernel programming. I could find there are 3 type of kernel monolithic,micro and modular kernel.while googling i could find some website say linux is having monolithic kernel (in Stack overflow) and some other says micro kernel and the rest say hybrid kernel. So i am totally confused while reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single processes this is also bit confusing if so
Before you try to understand those differences, you have to understand other concepts first:
1. Modular programming.
Module is a functionally complete part of a program. Module usually has following properties:
Separation of interface and implementation.
Initialization and deinitialization routines. Both are optional. Deinitialization routine is likely to be missing in environment with GC (Garbage Collector).
Modules used by a program compose directed acyclic graph a.k.a. dependency graph (you might have heard about this - cyclic dependencies are not allowed, dependency is initialized before dependent module).
Modular programming is essential when building large systems. Every big kernel is a modular kernel, regardless of whether it is monolithic, hybrid or microkernel.
Sometimes modules can be loaded and unloaded dynamically. Dynamic modules are essential part of any extensible system. Those can be plugins or, if we talk about kernels, drivers that are developed and distributed separately from the kernel.
2. Safe and unsafe languages.
Safe languages very strictly define what can happen in a program. Most importantly they have no concept of malformed program (or meaningless program). Every program is valid and its execution always follows language specification. Whether program does what programmer expects it to do or not, is irrelevant in this context.
Common traits of safe languages:
They use garbage collection.
They have no pointer arithmetics. That means that writing or reading to an arbitrary address is not allowed.
They prevent out of range array access (if there is such concept). Exceptions or similar mechanisms can be used to signal and recover from such failures.
References (or pointers) have only two possible states: null reference and reference to a valid object. This is guaranteed by garbage collector. In fact, GC is the key component here. Some languages go even further and do no allow null references at all.
Every object (memory chunk in use) has type information assigned to it, object can only be accessed through a reference of appropriate type e.g. you cannot access array of integers through reference to a string.
You can add more entries to this list, but basic idea is to guarantee that program can only access valid memory regions using valid operations. Keep in mind that some unsafe languages can share some or even all of those traits.
Examples of safe languages: Python, Java, safe subset of C#.
Unsafe languages define what can and cannot be done in a program, but there is usually little to nothing to stop programmer from doing the wrong thing. Program that violates those rules is called a malformed program. From language point of view such program is meaningless and language does not even try to define its behavior, as it is usually near to impossible to do. In terms of C such program's behavior is undefined.
Examples of unsafe languages: Assembler, C, C++, Pascal.
3. Hardware is unsafe and thus has to be programmed using unsafe language.
Most hardware does nothing to provide you with a safe environment. There were some processors that used to attach type information to every memory cell (see tagged architecture), but modern ones do not do this as it complicates hardware, making it slower, more expensive and less generic.
Still, some features are provided to make it possible to implement safe environments within unsafe environment of hardware, such as memory protection, separate address spaces and separation of execution modes on user mode and kernel mode (a.k.a. supervisor mode).
Kernel is what runs on bare metal and thus much of it has to be written in unsafe languages like C and Assembly. Another reason is performance - safe environments imply huge overhead.
Microkernel and Monolithic kernel
Monolithic kernel and its modules run in a single shared address space. And since everything is usually written in an unsafe language, it is possible for any part of the kernel to access (and damage) memory that belongs to another part of the kernel due to bugs in code. Unsafe nature of this environment makes it impossible to detect or recover from those failures and most importantly predict kernel behavior after such failures.
Microkernel is an attempt to overcome those limitation by moving various parts of the kernel to a separate address spaces, effectively isolating them from each other, but providing safe way to communicate with each other (usually through message passing). Such separation creates safe environment composed from multiple unsafe processes, allowing kernel to recover from failure of some of its subsystems.
At the same time monolithic kernel can be able to run parts of it in a separate address space (FUSE), while nothing stops microkernel from being able to support modules that share address space with the main part of the kernel.
If most of the kernel runs in a single address space, it is considered to be a monolithic kernel. If most of it runs in separate address spaces, such kernel is considered to be a microkernel. If kernel is somewhere inbetween and actively uses both approaches, you have a hybrid kernel.
Hybrid kernel
Concept of hybrid kernel implies combining best of both worlds and was invented by Microsoft to boost sales of Windows NT in 90s. Joke! But this is almost true. Every important part of Windows NT runs in a shared address space, so it is another example of monolithic design.
Many people seem to use this term when describing monolithic kernel that is able to dynamically load modules. This is because in the past monolithic kernels didn't support dynamic module loading and had to be recompiled every time module is added to the kernel. Microkernels are not about dynamic module loading, but about reliability of the kernel, about its ability to recover from failues of its subsystems.
The answer: Linux is a monolithic kernel.
Monolithic kernel can be modular and can dynamically load modules. Microkernel, on the other hand, has to be modular and has to be able to dynamically load modules - the whole idea is about running them in a separate address space.
Microkernel is not the only way of overcoming unsafe nature of monolithic kernel. Another way is to write monolithic kernel in a safe language. One problem with such approach is that safe environment should be either provided by hardware (and will be very limited) or should be implemented in software using unsafe languages. Implementation of such environment will be extremely complex and will most likely have many bugs (think of all bugs found in JVM).
Example of this would be experimental OS Singularity.
Well, considering that I may have a quiz on this tomorrow, I should be able to help you out. However, I am still learning, and while my post may have some technical mistakes, it should be conceptually sound
Basically, as you may understand, there are different type of kernels for an OS.
Monolithic kernels have all their system functionalities and services together in one single giant program, occupying a single address space.
Microkernels on the other hand, have the bare minimum system programs and services on the microkernel. Most of the services that were previously considered as a part of the kernel (during the monolithic kernel version) such as process scheduler, etc. are now in the user space, and are termed as servers. these servers communicate with each other through the micro-kernel, using the inter-process communication, a form of communication that is laid down by the microkernel.
The modular approach builds on this, by making these "servers" dynamically loadable. Thus, one can have a particular "server" (in this type of kernel, called a module) dynamically loaded, without the kernel requiring to re-compile itself.
Linux Kernel is a monolithic kernel, but most flavours of Linux such as Ubuntu, Solaris, use a hybrid kernel, i.e. a mix of the monolithic and modular kernel approach. This is quite common, has different kernel structure has different pros and cons, and a hybrid structure is required to strike a balance
See this prior StackOverflow question for some information about your question. Briefly, it sounds like you're wondering...
... reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single process ...
These two concepts ("modular kernel" and "single address space") are not actually contradictory. You can build a new kernel module without recompiling the entire Linux kernel. When you load this new kernel module, it will actually be loaded into the same address space as the running kernel itself. From the link above...
Do not confuse the term modular kernel to be anything but monolithic. Some monolithic kernels can be compiled to be modular (e.g Linux), what matters is that the module is inserted to and runs from the same space that handles core functionality (kernel space).
As you have found, there are several ways to classify kernels and the different types are not necessarily mutually exclusive.

Linux System Calls & Kernel Mode

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?
System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.
You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

What register state is saved on a context switch in Linux?

Where in Linux would you look to find out what registers are saved on a context switch? I'm wondering, for example, if it is safe to use FP or vector registers in kernel-mode driver code (mostly interested in x86-64 and ARM, but I'm hoping for an architecture-independent answer).
Since no one seems to have answered this, let me venture.
Take a look at the _math_restore_cpu and __unlazy_fpu methods.
You can find them here:
http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=math_state_restore
http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=__unlazy_fpu
The x86 like processors have separate instructions for saving (fnsave) and restore (frstor) FPU state and so it looks like the OS is burdened with saving/restoring them.
I presume unless the FPU unit has been used by the usermode process, linux context switch will not save it for you.
So you need to do it yourself (in your driver) to be sure. You can use kernel_fpu_begin/end to do it in your driver, but is generally not a good idea.
http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=kernel_fpu_begin
http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=kernel_fpu_end
Why it is not a good idea? From Linus himself: http://lkml.indiana.edu/hypermail/linux/kernel/0405.3/1620.html
Quoted:
You can do it "safely" on x86 using
kernel_fpu_begin(); ...
kernel_fpu_end();
and make sure that all the FP stuff
is in between those two things, and
that you don't do anything that
might fault or sleep.
The kernel_fpu_xxx() macros make sure
that preemption is turned off etc, so
the above should always be safe.
Even then, of course, using FP in the
kernel assumes that you actually
have an FPU, of course. The in-kernel FP emulation package is
not supposed to work with kernel FP instructions.
Oh, and since the kernel doesn't link
with libc, you can't use anything
even remotely fancy. It all has to be
stuff that gcc can do in-line,
without any function calls.
In other words: the rule is that you
really shouldn't use FP in the
kernel. There are ways to do it, but
they tend to be for some real
special cases, notably for doing
MMX/XMM work. Ie the only "proper" FPU
user is actually the RAID checksumming
MMX stuff.
Linus
In any case, do you really want to rely on Intel's floating point unit? http://en.wikipedia.org/wiki/Pentium_FDIV_bug (just kidding :-)).

Resources