Why can't we use C standard library functions in kernel development? - linux

I just got started with learning kernel development and had a small doubt. Why can't we use c functions in kernel development after linking it with the c library? Why is it that the kernel is never linked with a c library but has its own implementation of some standard c functions like printk() instead of printf(). IF the kernel is written in c and compiled with the help of a c compiler then why can't we use the standard function from the c library?

Because the GNU C Library which you are familiar with is implemented for user mode, not kernel mode. The kernel cannot access a userspace API (which might invoke a syscall to the Linux kernel).
From the KernelNewbies FAQ
Can I use library functions in the kernel ?
System libraries (such as glibc, libreadline, libproplist, whatever) that are typically available to userspace programmers are unavailable to kernel programmers. When a process is being loaded the loader will automatically load any dependent libraries into the address space of the process. None of this mechanism is available to kernel programmers: forget about ISO C libraries, the only things available is what is already implemented (and exported) in the kernel and what you can implement yourself.
Note that it is possible to "convert" libraries to work in the kernel; however, they won't fit well, the process is tedious and error-prone, and there might be significant problems with stack handling (the kernel is limited to a small amount of stack space, while userspace programs don't have this limitation) causing random memory corruption.
Many of the commonly requested functions have already been implemented in the kernel, sometimes in "lightweight" versions that aren't as featureful as their userland counterparts. Be sure to grep the headers for any functions you might be able to use before writing your own version from scratch. Some of the most commonly used ones are in include/linux/string.h.
Whenever you feel you need a library function, you should consider your design, and ask yourself if you could move some or all the code into user-space instead.
If you need to use functions from standard library, you have to re-implement that functionality because of a simple reason - there is no standard C library.
C library is basically implemented on the top of the Linux kernel (or other operating system's kernel).
For instance, C library's mkdir(3) function is basically nothing more than a wrapper for Linux kernel's system call mkdir(2).
http://linux.die.net/man/3/mkdir
http://linux.die.net/man/2/mkdir

Related

ioctl vs kernel modules in Linux

I know that kernel modules are used to write device drivers. You can add new system calls to the Linux kernel and use it to communicate with other devices.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
My question is, why wouldn't you just write a new kernel module for your device instead of using ioctl? why would ioctl b useful where kernel modules exist?
You will need to write a kernel driver in either case, but you can choose between adding a new syscall and adding a ioctl.
Let's say you want to add a feature to get the tuner settings for a video capturing device.
If you implement it as a syscall:
You can't just load a module, you need to change the kernel itself
Hundreds of drivers could each add dozens of syscalls each, kludging up the table with thousands of global functions that must be kept forever.
For the driver to have any reach, you will need to convince kernel maintainers that this burden is worthwhile.
You will need to upstream the definition into glibc, and people must upgrade before they can write programs for it
If you implement it as an ioctl:
You can build your module for an existing kernel and let users load it, without having to get kernel maintainers involved
All functions are simple per-driver constants in the applicable header file, where they can easily be added or removed
Everyone can start programming with it just by including the header
Since an ioctl is much easier, more flexible, and exactly meant for all these driver specific function calls, this is generally the preferred method.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
This is incorrect.
System calls are (for Linux) listed in syscalls(2) (there are hundreds of them between user space and kernel land) and ioctl(2) is one of them. Read also wikipage on ioctl and on Unix philosophy and Linux Assembler HowTo
In practice, ioctl is mostly used on device files, and used for things which are not a read(2) or a write(2) of bytes.
For example, a sound is made by writing bytes to /dev/audio, but to change the volume you'll use some ioctl. See also fcntl(2) playing a similar role.
Input/output could also happen (somehow indirectly ...) thru mmap(2) and related virtual address space system calls.
For much more, read Advanced Linux Programming and Operating Systems: Three Easy Pieces. Look into Osdev for more hints about coding your own OS.
A kernel module could implement new devices, or new ioctl, etc... See kernelnewbies for more. I tend to believe it might sometimes add a few new syscalls (but this was false in older linux kernels like 3.x ones)
Linux is mostly open source. Please download then look inside source code. See also Linux From Scratch.
IIRC, Linux kernel 1.0 did not have any kernel modules. But that was around 1995.

modular kernel vs micro kernel / monolitic kernel

I am C programmer and new to the Linux kernel programming. I could find there are 3 type of kernel monolithic,micro and modular kernel.while googling i could find some website say linux is having monolithic kernel (in Stack overflow) and some other says micro kernel and the rest say hybrid kernel. So i am totally confused while reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single processes this is also bit confusing if so
Before you try to understand those differences, you have to understand other concepts first:
1. Modular programming.
Module is a functionally complete part of a program. Module usually has following properties:
Separation of interface and implementation.
Initialization and deinitialization routines. Both are optional. Deinitialization routine is likely to be missing in environment with GC (Garbage Collector).
Modules used by a program compose directed acyclic graph a.k.a. dependency graph (you might have heard about this - cyclic dependencies are not allowed, dependency is initialized before dependent module).
Modular programming is essential when building large systems. Every big kernel is a modular kernel, regardless of whether it is monolithic, hybrid or microkernel.
Sometimes modules can be loaded and unloaded dynamically. Dynamic modules are essential part of any extensible system. Those can be plugins or, if we talk about kernels, drivers that are developed and distributed separately from the kernel.
2. Safe and unsafe languages.
Safe languages very strictly define what can happen in a program. Most importantly they have no concept of malformed program (or meaningless program). Every program is valid and its execution always follows language specification. Whether program does what programmer expects it to do or not, is irrelevant in this context.
Common traits of safe languages:
They use garbage collection.
They have no pointer arithmetics. That means that writing or reading to an arbitrary address is not allowed.
They prevent out of range array access (if there is such concept). Exceptions or similar mechanisms can be used to signal and recover from such failures.
References (or pointers) have only two possible states: null reference and reference to a valid object. This is guaranteed by garbage collector. In fact, GC is the key component here. Some languages go even further and do no allow null references at all.
Every object (memory chunk in use) has type information assigned to it, object can only be accessed through a reference of appropriate type e.g. you cannot access array of integers through reference to a string.
You can add more entries to this list, but basic idea is to guarantee that program can only access valid memory regions using valid operations. Keep in mind that some unsafe languages can share some or even all of those traits.
Examples of safe languages: Python, Java, safe subset of C#.
Unsafe languages define what can and cannot be done in a program, but there is usually little to nothing to stop programmer from doing the wrong thing. Program that violates those rules is called a malformed program. From language point of view such program is meaningless and language does not even try to define its behavior, as it is usually near to impossible to do. In terms of C such program's behavior is undefined.
Examples of unsafe languages: Assembler, C, C++, Pascal.
3. Hardware is unsafe and thus has to be programmed using unsafe language.
Most hardware does nothing to provide you with a safe environment. There were some processors that used to attach type information to every memory cell (see tagged architecture), but modern ones do not do this as it complicates hardware, making it slower, more expensive and less generic.
Still, some features are provided to make it possible to implement safe environments within unsafe environment of hardware, such as memory protection, separate address spaces and separation of execution modes on user mode and kernel mode (a.k.a. supervisor mode).
Kernel is what runs on bare metal and thus much of it has to be written in unsafe languages like C and Assembly. Another reason is performance - safe environments imply huge overhead.
Microkernel and Monolithic kernel
Monolithic kernel and its modules run in a single shared address space. And since everything is usually written in an unsafe language, it is possible for any part of the kernel to access (and damage) memory that belongs to another part of the kernel due to bugs in code. Unsafe nature of this environment makes it impossible to detect or recover from those failures and most importantly predict kernel behavior after such failures.
Microkernel is an attempt to overcome those limitation by moving various parts of the kernel to a separate address spaces, effectively isolating them from each other, but providing safe way to communicate with each other (usually through message passing). Such separation creates safe environment composed from multiple unsafe processes, allowing kernel to recover from failure of some of its subsystems.
At the same time monolithic kernel can be able to run parts of it in a separate address space (FUSE), while nothing stops microkernel from being able to support modules that share address space with the main part of the kernel.
If most of the kernel runs in a single address space, it is considered to be a monolithic kernel. If most of it runs in separate address spaces, such kernel is considered to be a microkernel. If kernel is somewhere inbetween and actively uses both approaches, you have a hybrid kernel.
Hybrid kernel
Concept of hybrid kernel implies combining best of both worlds and was invented by Microsoft to boost sales of Windows NT in 90s. Joke! But this is almost true. Every important part of Windows NT runs in a shared address space, so it is another example of monolithic design.
Many people seem to use this term when describing monolithic kernel that is able to dynamically load modules. This is because in the past monolithic kernels didn't support dynamic module loading and had to be recompiled every time module is added to the kernel. Microkernels are not about dynamic module loading, but about reliability of the kernel, about its ability to recover from failues of its subsystems.
The answer: Linux is a monolithic kernel.
Monolithic kernel can be modular and can dynamically load modules. Microkernel, on the other hand, has to be modular and has to be able to dynamically load modules - the whole idea is about running them in a separate address space.
Microkernel is not the only way of overcoming unsafe nature of monolithic kernel. Another way is to write monolithic kernel in a safe language. One problem with such approach is that safe environment should be either provided by hardware (and will be very limited) or should be implemented in software using unsafe languages. Implementation of such environment will be extremely complex and will most likely have many bugs (think of all bugs found in JVM).
Example of this would be experimental OS Singularity.
Well, considering that I may have a quiz on this tomorrow, I should be able to help you out. However, I am still learning, and while my post may have some technical mistakes, it should be conceptually sound
Basically, as you may understand, there are different type of kernels for an OS.
Monolithic kernels have all their system functionalities and services together in one single giant program, occupying a single address space.
Microkernels on the other hand, have the bare minimum system programs and services on the microkernel. Most of the services that were previously considered as a part of the kernel (during the monolithic kernel version) such as process scheduler, etc. are now in the user space, and are termed as servers. these servers communicate with each other through the micro-kernel, using the inter-process communication, a form of communication that is laid down by the microkernel.
The modular approach builds on this, by making these "servers" dynamically loadable. Thus, one can have a particular "server" (in this type of kernel, called a module) dynamically loaded, without the kernel requiring to re-compile itself.
Linux Kernel is a monolithic kernel, but most flavours of Linux such as Ubuntu, Solaris, use a hybrid kernel, i.e. a mix of the monolithic and modular kernel approach. This is quite common, has different kernel structure has different pros and cons, and a hybrid structure is required to strike a balance
See this prior StackOverflow question for some information about your question. Briefly, it sounds like you're wondering...
... reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single process ...
These two concepts ("modular kernel" and "single address space") are not actually contradictory. You can build a new kernel module without recompiling the entire Linux kernel. When you load this new kernel module, it will actually be loaded into the same address space as the running kernel itself. From the link above...
Do not confuse the term modular kernel to be anything but monolithic. Some monolithic kernels can be compiled to be modular (e.g Linux), what matters is that the module is inserted to and runs from the same space that handles core functionality (kernel space).
As you have found, there are several ways to classify kernels and the different types are not necessarily mutually exclusive.

Linux kernel assembly and logic

My question is somewhat weird but I will do my best to explain.
Looking at the languages the linux kernel has, I got C and assembly even though I read a text that said [quote] Second iteration of Unix is written completely in C [/quote]
I thought that was misleading but when I said that kernel has assembly code I got 2 questions of the start
What assembly files are in the kernel and what's their use?
Assembly is architecture dependant so how can linux be installed on more than one CPU architecture
And if linux kernel is truly written completely in C than how can it get GCC needed for compiling?
I did a complete find / -name *.s
and just got one assembly file (asm-offset.s) somewhere in the /usr/src/linux-headers-`uname -r/
Somehow I don't think that is helping with the GCC working, so how can linux work without assembly or if it uses assembly where is it and how can it be stable when it depends on the arch.
Thanks in advance
1. Why assembly is used?
Because there are certain things then can be done only in assembly and because assembly results in a faster code. For eg, "you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs)".
Read here for more reasons.
2. What assembly file are used?
From: https://www.kernel.org/doc/Documentation/arm/README
"The initial entry into the kernel is via head.S, which uses machine
independent code. The machine is selected by the value of 'r1' on
entry, which must be kept unique."
From https://www.ibm.com/developerworks/library/l-linuxboot/
"When the bzImage (for an i386 image) is invoked, you begin at ./arch/i386/boot/head.S in the start assembly routine (see Figure 3 for the major flow). This routine does some basic hardware setup and invokes the startup_32 routine in ./arch/i386/boot/compressed/head.S. This routine sets up a basic environment (stack, etc.) and clears the Block Started by Symbol (BSS). The kernel is then decompressed through a call to a C function called decompress_kernel (located in ./arch/i386/boot/compressed/misc.c). When the kernel is decompressed into memory, it is called. This is yet another startup_32 function, but this function is in ./arch/i386/kernel/head.S."
Apart from these assembly files, lot of linux kernel code has usage of inline assembly.
3. Architecture dependence?
And you are right about it being architecture dependent, that's why the linux kernel code is ported to different architecture.
Linux porting guide
List of supported arch
Things written mainly in assembly in Linux:
Boot code: boots up the machine and sets it up in a state in which it can start executing C code (e.g: on some processors you may need to manually initialize caches and TLBs, on x86 you have to switch to protected mode, ...)
Interrupts/Exceptions/Traps entry points/returns: there you need to do very processor-specific things, e.g: saving registers and reenabling interrupts, and eventually restoring registers and properly returning to user mode. Some exceptions may be handled entirely in assembly.
Instruction emulation: some CPU models may not support certain instructions, may not support unaligned data access, or may not have an FPU. An option is using emulation when getting the corresponding exception.
VDSO: the VDSO is a virtual library that the kernel maps into userspace. It allows e.g: selecting the optimal syscall sequence for the current CPU (on x86 use sysenter/syscall instead of int 0x80 if available), and implementing certain system calls without requiring a context switch (e.g: gettimeofday()).
Atomic operations and locks: Maybe in a future some of these could be written using C11 support for atomic operations.
Copying memory from/to user mode: Besides using an optimized copy, these check for out-of-bounds access.
Optimized routines: the kernel has optimized version of some routines, e.g: crypto routines, memset, clear_page, csum_copy (checksum and copy to another place IP data in one pass), ...
Support for suspend/resume and other ACPI/EFI/firmware thingies
BPF JIT: newer kernels include a JIT compiler for BPF expressions (used for example by tcpdump, secmode mode 2, ...)
...
To support different architectures, Linux has assembly code (re-)written for each architecture it supports (and sometimes, there are several implementations of some code for different platforms using the same CPU architecture). Just look at all the subdirectories under arch/
Assembly is needed for a couple of reasons.
There are many instructions that are needed for the operation of an operating system that have no C equivalent, at least on most processors. A good example on Intel x86/64 processors is the iret instruciton, which returns from hardware/software interrupts. These interrupts are key to handling hardware events (like a keyboard press) and system calls from programs on older processors.
A computer does not start up in a state that is immediately ready for execution of C code. For an Intel example, when execution gets to the startup routine the processor may not be in 32-bit mode (or 64-bit mode), and the stack required by C also may not be ready. There are some other features present in some processors (like paging) which need to be turned on from assembly as well.
However, most of the Linux kernel is written in C, which interfaces with some platform specific C/assembly code through standardized interfaces. By separating the parts in this way, most of the logic of the Linux kernel can be shared between platforms. The build system simply compiles the platform independent and dependent parts together for specific platforms, which results in different executable kernel files for different platforms (and kernel configurations for that matter).
Assembly code in the kernel is generally used for low-level hardware interaction that can't be done directly from C. They're like a platform- specific foundation that's used by higher-level parts of the kernel that are written in C.
The kernel source tree contains assembly code for a variety of systems. When you compile a kernel for a particular type of system (such as an x86 PC), only the appropriate assembly code for that platform is included in the build process.
Linux is not the second version of Unix (or Unix in general). It is Unix compatible, but Unix and Linux have separate histories and, in terms of code base (of their kernels), are completely separate. Linus Torvald's idea was to write an open source Unix.
Some of the lower level things like some of the architecture dependent parts of memory management are done in assembly. The old (but still available) Linux kernel API for x86, int 0x80, is implemented in assembly. There are probably other places in the kernel that are implemented in assembly, but I don't know any others.
When you compile the kernel, you select an architecture to target. Depending on the target, the right assembly files for that architecture are included in the build.
The reason you don't find anything is because you're searching the headers, not the sources. Download a tar ball from kernel.org and search that.

Allocating a data page in linux with NX bit turned off

I would like to generate some machine code in my program and then run it. One way to do it would be to write out a .so file and then load it in the program but that seems too expensive.
IS there a way in linux for me to write out the code in my data pages and then set my function ointer there and just call it? I've seen something similar on windows where you can allocate a page with the NX protection turned off for that page, but I can't find a similar OS call for linux.
The mmap(2) (with munmap(2)) and mprotect(2) syscalls are the elementary operations to do that. Recall that syscalls are elementary operations from the point of view of an application. You want PROT_EXEC
You could just strace any dynamically linked executable to get a clue about how you might call them, since the dynamic linker ld.so is using them.
Generating a shared object might be less expensive than you imagine. Actually, generating C code, running the compiler, then dlopen-ing the resulting shared object has some sense, even when you work interactively. My MELT domain specific language (to extend GCC) is doing this. Recall that you can do a big lot of dlopen-s without issues.
If you want to generate machine code in memory, you could use GNU lightning (quick generation of slow machine code), libjit from dotgnu (generate less bad machine code), LuaJit, asmjit (x86 or amd64 specific), LLVM (slowly generate optimized machine code). BTW, the SBCL Common Lisp implementation is dynamically compiling to memory and produces good machine code at runtime (and there is also all the JIT for JVMs doing that).

Learning x86 assembly on Mac/BSD: Kernel built-in functions? How to know arguments / order?

I have been playing around with yasm in an attempt to grasp a basic understanding of x86 assembly. From my tests, it seems you call functions from the kernel by setting the EAX register with the number of the function you want. Then, you push the function arguments onto the stack and issue a syscall (0x80) to execute the instruction. This is Mac OS X / BSD style, I know Linux uses registers to hold arguments instead of using the stack. Does this sound right? Is this the basic idea?
I am a little confused because where are the functions documented? How would I know what arguments, and in what order, to push them onto the stack? Should I look in syscall.h for the answers? It seems there would be a specific reference for supported kernel calls other than C headers.
Also, do standard C functions like printf() rely on the kernel's built-in functions for say, writing to stdout? In other words, does the C compiler know what the kernel functions are and is it trying to "figure out" how to take C code and translate it to kernel functions (which the assembler then translates to machine code)?
C code -> C compiler -> kernel calls / asm -> assembler -> machine binary
I'm sure these are really basic questions, but my understanding of everything that happens after the C compiler is rather muddy.
System Call Documentation
Make sure you have the XCode Developer Tools installed for the UNIX manpages for Mac OS X and then run man 2 intro on the commandline. For a list of system calls, you can use syscall.h (which is useful for the system call numbers) or you can run man 2 syscalls. Then to look up each specific system call, you can run man 2 syscall_name i.e. for read, you can run man 2 read.
UNIX manpages are a historically significant documentation reference for UNIX systems. Pretty much any low-level POSIX function or system call will be documented using them, as well as most commands. Section 2 covers just system calls, and so when you run man 2 pagename, you're asking for the manpage in the system calls section. Section 3 also deals with library functions, so you can run man 3 sprintf the next time you want to read about sprintf.
How C Libraries relate to System Calls
As for how C libraries implement their functionality, usually they build everything on top of system calls, especially in UNIX-like operating systems. malloc internally uses mmap() or brk() on a lot of platforms to get a hold of the actual memory for your process and I/O functions will often use buffers with read, write calls. If there's some other mechanism or library providing the needed functionality, they may also choose to use those instead (i.e. some C libraries for DOS may make use of direct BIOS interrupts instead of calling only DOS interrupts, whereas C libraries for Windows might use Win32 API calls).
Often only a subset of the library functions will need system calls or underlying mechanisms to be implemented though, since the remainder can be written in terms of that subset.
To actually know what's going on with your specific implementation, you should investigate what's happening in a debugger (just keep stepping into all the function calls) or browse the source code of the C library you're using.
How your C code using C libraries relates to machine code
In your question you also suggested:
C code -> C compiler -> kernel calls / asm -> assembler -> machine binary
This is combining two very different concepts. Functions and function calls are supported at the machine code and assembly level, so your C code has a very direct mapping to machine code:
C code -> C compiler -> Assembler -> Linker -> Machine Binary
That is, the compiler translates your function calls in C to function calls in Assembly and system calls in C to system calls in Assembly.
However on most platforms, that machine code contains references to shared libraries and functions in those libraries, so your machine code might have a function that calls other functions from a shared library. The OS then loads that shared library's machine code (if it hasn't been loaded yet for something else) and then runs the machine code for the library function. Then if that library function calls system calls via interrupts, the kernel receives the system call request and does low-level operations directly with the hardware or the BIOS.
So in a protected mode OS, your machine code can be seen as doing the following:
<----------+
|
Function call to -> Other function calls --+
or -> System calls to -> Direct hardware access (inside kernel)
or -> BIOS calls (inside kernel)
You can, of course, call system calls directly in your program as well, skipping the need for any libraries, but unless you're writing your own library, there's usually very little need to do this. If you want even lower-level access, you have to write kernel-level code such as drivers or kernel subsystems.
The recommended way is not doing INT 0x80 by yourself, but to use the wrapper functions from the stdlib. These are, of course, available for assembly as well.
Concerning printf, this works this way:
printf internally calls fprintf(stdout, ...), which in turn uses the FILE * stdout to write to the file descriptor 1 and does write(1, ...). This calls a small wrapper function to set the proper registers to the arguments and perform the kernel call.

Resources