Linux System Calls & Kernel Mode - security

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?

System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.

You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

Related

Why are not all library functions not system calls?

So, from my basic OS class, I understood that kernel is the one who interacts with the hardware. So, if we want to interact with hardware, we need to call system calls. open() is a system call, while strlen() is not a system call. But any instruction or command has to interact with hardware, at least to increase program counter or modify the contents of memory. So, shouldn't all functions make a system call at some point ?
I would strongly suggest reading early papers on UNIX, the how and the why. Ken Thompson was a strong advocate for the kernel consisting only of the things that could not be implemented outside of the kernel.
At that time; outside of the kernel corresponded to outside of the privileged mode of the computer. This is a less interesting concept in modern systems; yet continues to drive architecture and design.
In short; open() is exported by the kernel because it has to be - it access data structures that are private to the kernel, thus is an interface; strlen is not exported by the kernel because it doesn't have to be, it neither requires privilege nor access to other data structures.
Doesn't have to be is a trump card; because nobody wants needless functionality in the kernel.
kernel is the one who interacts with the hardware.
That is a very inaccurate statement.
Even the following program, which may run on some microcontroller with no OS (and no optimizing compiler...), interacts with the hardware:
int array[8192];
void entry_point(void) {
array[100] = 5+3;
}
That hardware in question being "CPU", "memory bus", and "memory".
While system calls are primarily used to access certain hardware (disk, network etc.), system calls are not defined as "calls to access hardware", but rather as just "calls to kernel APIs".
A kernel can export whatever API it wants, including strlen(), but for an OS designer, such as the aforementioned Ken Thompson, the APIs that the kernel should export are ones that facilitate the existence of multiple programs, processes, and/or users.
The main concern here is access to resources such as disk, network, timers, memory, etc., but also include e.g.:
Scheduling / process management APIs (e.g. the fork(), exit(), nice(), and sched_yield() calls)
Multiprocess management (e.g. sigaction(), kill(), wait(), futex() and semop())
Performance & debugging (e.g. ptrace(), prlimit() and getrusage())
Security (e.g. chmod(), chown(), setuid(), seccomp(), and chroot())
Administration (e.g. init_module(), sethostname(), shutdown())

"Switching from user mode to kernel mode" is an incorrect concept

Im studying for the first time "Operating System". In my book i found this sentence about "User Mode" and "Kernel Mode":
"Switch from user to kernel mode" instruction is executed only in kernel
mode
I think that is a incorrect sentence as in practice there is no "switch of kernel". In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself. Is it correct ?
In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself.
But how does that happen? Details are processor (i.e. instruction set architecture) and OS specific (explained in ABI specifications relevant to your system, e.g. here), but that usually involves some machine code instruction like SYSENTER or SYSCALL (or SVC on mainframes) capable of atomically changing the CPU mode (that is switching it in a controlled manner to kernel mode). The actual parameters of the system call (including even the syscall number) are often passed in registers (but details are ABI specific).
So I feel the concept of switching from user-mode to kernel-mode is relevant, and meaningful (so "correct").
BTW, user-mode code is forbidden (by the hardware) to execute privileged machine instructions, such as those interacting with IO hardware devices (read about protection rings). If you try, you get some hardware exception (a bit similar to interrupts). Hence your code (even if it is malicious) has to make system calls, which the kernel controls (it has lots of code related to permission checking), for e.g. all IO.
Read also Operating Systems: Three Easy Pieces - freely downloadable. See also http://osdev.org/. Read system call wikipage & syscalls(2), and the Assembler HowTo.
In real life, things are much more complex. Read about System Management Mode and about the (scary) Intel Management Engine.

How does instructions like I/O work in user mode?

I am curious because I am reading this OS book which mentions
"User programs always run in user mode, which permits only a subset of the instructions [...]. Generally, all instructions involving I/O and memory protection are disallowed in user mode.
To obtain services from the OS, a user program must make a system call, which traps into the kernel and invokes the OS."
If I/O is not allowed generally in user mode, but let's say I have a program in C++ or Java which asks for input, or let's say something else like a search bar in any program. Whenever I select the search bar (meaning I will write something) then a TRAP instruction is called to invoke the OS (since the OS runs in kernel) to be able to have access to I/O, that is, the keyboard? I am not sure if I follow correctly or what am I getting wrong.
The I/O is not allowed in user mode, but you use Input for applications in the OS, or even with the OS itself there are keyboard commands. If you can use keyboard commands that means the OS is ready for I/O at any time. Then the original statement about I/O instructions being disallowed in user mode.
I am sorry for my ignorance but I am just a little bit confused with these terms and difference between user and kernel. I know the OS runs in kernel mode, and the applications run in the OS, so in the end the applications do have access to I/O.
Don't application need to have the OS deal with there I/O for them?
Meaning, only the OS has the authority to do those type of things...
Am I wrong?
Toss your book in the garbage or use it to line a cat box.
Your apparent paradox is that you think you have to be in kernel mode to do I/O but your book says:
"User programs always run in user mode, which permits only a subset of the instructions"
The resolution to your paradox is that your book is spouting nonsense.
Use programs do not always run in user mode. They frequently run in kernel mode. One of the basic functions of an operating system is to provide a set of kernel mode system services that provide controlled access to kernel mode.
In other words, your instincts here are better than your confusing book's text.
It is important to understand what "user-mode" and "kernel-mode" mean. A process is mapped to memory regions which are user-priviliged, depending on the memory layout. Kernel-mode is basically routines that are in supervisor priviliged memory regions, which are invoked by your program to do the desired work (I/O).

Quick questions about Linux kernel modules

I'm very familiar with Linux (I've been using it for 2 years, no Windows for 1 1/2 years), and I'm finally digging deeper into kernel programming and I'm working a project. So my questions are:
Will a kernel module run faster than a traditional c program.
How can I communicate with a module (is that even possible), for example call a function in it.
1.Will a kernel module run faster than a traditional c program.
It Depends™
Running as a kernel module means you get to play by different rules, you potentially get to avoid some context switches depending on what you are doing. You get access to some powerful tools that can be leveraged to optimize your code, but don't expect your code to run magically faster just by throwing everything in kernelspace.
2.How can I communicate with a module (is that even possible), for example call a function in it.
There are various ways:
You can use the various file system interfaces: procfs, sysfs, debugfs, sysctl, ...
You could register a char device
You can make use of the Netlink interface
You could create new syscalls, although that's heavily discouraged
And you can always come up with your own scheme, or use some less common APIs
Will a kernel module run faster than a traditional c program.
The kernel is already a C program, which is most likely be compiled with same compiler you use. So generic algorithms or some processor intensive computations will be executed with almost same speed.
But most userspace programs (like bash) have to ask kernel to perform some operations on system resources, i.e. print prompt onto monitor. It will require entering the kernel with system call, sending data over tty interfaces and passing to a video-driver, it may introduce some latency. If you'd implemented bash in-kernel, you may directly call video-driver, which is definitely faster.
That approach however, have drawbacks. First of all, bash should be able to print prompt on ssh-session or serial console, and that will complicate logic. Also, if your bash will hang, you cannot just kill, you have to reboot system.
How can I communicate with a module (is that even possible), for example call a function in it.
In addition to excellent list provided by #tux3, I would suggest to start with char devices.

Can a system call cause a system panic in linux?

Please don't consider syscalls due to calls to panic() etc., which are actually supposed to panic the system. I am more interested in general purpose system calls such as Socket, read, write etc. If such syscalls do cause a panic, then is this a kernel bug? My understanding is that it should be a kernel bug. If passed with wrong arguments then system-call should just abort not panic the complete system.
Strangely enough, this is not 100% correct.
Yes, input to system calls by a non privileged user should not cause a panic unless there is a bug in the kernel or a hardware malfunction (such as broken RAM chips).
However, this not true for a privileged user (such as root). Consider the write(2) system call, when applied to /dev/mem by a privileged user (root being the obvious example) - there is nothing stopping you from overwriting kernel memory with it.
Unix is like that - it gives you the full length of the rope to hang yourself easily, if this is what you wish to do :-)
Of course, the kernel must check the syscall parameters, user permissions, resources availability and handle issues like concurrency in order to avoid crash at all costs. The bottomline is that a simple user (even root, ideally - but as mentionned by gby this is difficult since root can have direct access to the physical address space) should never be able to crash the system, no matter how hard she tries.

Resources