Can a system call cause a system panic in linux? - linux

Please don't consider syscalls due to calls to panic() etc., which are actually supposed to panic the system. I am more interested in general purpose system calls such as Socket, read, write etc. If such syscalls do cause a panic, then is this a kernel bug? My understanding is that it should be a kernel bug. If passed with wrong arguments then system-call should just abort not panic the complete system.

Strangely enough, this is not 100% correct.
Yes, input to system calls by a non privileged user should not cause a panic unless there is a bug in the kernel or a hardware malfunction (such as broken RAM chips).
However, this not true for a privileged user (such as root). Consider the write(2) system call, when applied to /dev/mem by a privileged user (root being the obvious example) - there is nothing stopping you from overwriting kernel memory with it.
Unix is like that - it gives you the full length of the rope to hang yourself easily, if this is what you wish to do :-)

Of course, the kernel must check the syscall parameters, user permissions, resources availability and handle issues like concurrency in order to avoid crash at all costs. The bottomline is that a simple user (even root, ideally - but as mentionned by gby this is difficult since root can have direct access to the physical address space) should never be able to crash the system, no matter how hard she tries.

Related

How can a program change a directory without using chdir()?

I can find a lot of documentation on using chdir() to change a directory in a program (a command shell, for instance). I was wondering if it is possible to somehow do the same thing without the use of chdir(). Yet, I can't find any documentation or examples of code where a person is changing directories without using chdir() to some capacity. Is this possible?
In Linux, chdir() is a syscall. That means it's not something a program does in its own memory, but it's a request for the OS kernel to do something on the program's behalf.
Granted, it's one of two syscalls that can change directories -- the other one is fchdir(). Theoretically you could use the other one, though whether that's what your professor actually wants is very much open to interpretation.
In terms of why chdir() and fchdir() can't be reimplemented by an application but need to be leveraged: The current working directory is among the process state maintained by the kernel on a program's behalf; the program itself can't access kernel memory without asking the kernel to operate on its behalf.
Things are syscalls because they need to be syscalls -- if something could be done in-process, it would be done that way (crossing the boundary between userspace and kernelspace involves a context-switch penalty; it's not without performance impact). In this case, letting the kernel do accurate bookkeeping as to what a process's working directory is ensures that the working directory is maintained when a new executable is loaded (with execve()), and helps to ensure the integrity of the kernel's records (making sure a program can't pretend to have its current working directory be a directory it doesn't actually have access to).

"Switching from user mode to kernel mode" is an incorrect concept

Im studying for the first time "Operating System". In my book i found this sentence about "User Mode" and "Kernel Mode":
"Switch from user to kernel mode" instruction is executed only in kernel
mode
I think that is a incorrect sentence as in practice there is no "switch of kernel". In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself. Is it correct ?
In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself.
But how does that happen? Details are processor (i.e. instruction set architecture) and OS specific (explained in ABI specifications relevant to your system, e.g. here), but that usually involves some machine code instruction like SYSENTER or SYSCALL (or SVC on mainframes) capable of atomically changing the CPU mode (that is switching it in a controlled manner to kernel mode). The actual parameters of the system call (including even the syscall number) are often passed in registers (but details are ABI specific).
So I feel the concept of switching from user-mode to kernel-mode is relevant, and meaningful (so "correct").
BTW, user-mode code is forbidden (by the hardware) to execute privileged machine instructions, such as those interacting with IO hardware devices (read about protection rings). If you try, you get some hardware exception (a bit similar to interrupts). Hence your code (even if it is malicious) has to make system calls, which the kernel controls (it has lots of code related to permission checking), for e.g. all IO.
Read also Operating Systems: Three Easy Pieces - freely downloadable. See also http://osdev.org/. Read system call wikipage & syscalls(2), and the Assembler HowTo.
In real life, things are much more complex. Read about System Management Mode and about the (scary) Intel Management Engine.

Why Kernel Can't Handle Crash Gracefully

For user mode application, incorrect page access doesn't create a lot of trouble other than the application crash and the application crash can be done gracefully by exception handling. Why can't we do the same for kernel crash. So when a kernel module tries to access some invalid address, there is a page fault and the kernel crash. Why it can't be handled gracefully like unloading the faulty module.
More specifically I'm interested to know if it is completely impossible or possible. I am not inclined to know the difficulties it might pose in using the system. I understand a driver crash will result unusable device and I'm okay with that. The only thing is whether it is possible to gracefully unload a faulty driver.
As other answer explains very well why it's not feasible to recover from the kernel crashes, I'll try to tell something else.
There is a lot of research in this area, most notably from prof. Andy Tanenbaum with his MINIX. While the kernel crash is still fatal for the MINIX, MINIX kernel is very simple (micro-kernel) narrowing the space for bugs and inside it most other stuff (including drivers) is running as a user-mode process. So, in case of network driver failure, as they are running in the separate address space, all kernel needs to do is to attempt to restart the driver.
Of course, there are areas where you can't recover (or still can't recover), like in case of the file system crash (see the recent discussion here).
There are several good papers on this topic such as http://pages.cs.wisc.edu/~swami/papers/thesis.pdf and I would highly recommend watching Tanenbaum's videos such this one (title is "MINIX 3: A Reliable and Secure Operating System" in case it ever goes offline).
I think this addresses your comment:
We should be able to unload the faulty module. Why can't we? That is my question. Is it a design choice for security or its not possible at all. If it is a design choice, what factors forced us to make that choice
You could live without screen if graphics driver module crashes. However, we can't unload the faulty module and continue because if it crashed and it runs in the same address space as kernel, you don't know if it poisoned the kernel memory - security is the prime factor here.
That's kind of like saying "if you wrap all your Java code in a try/catch block, you've eliminated all bugs!"
There are a number of "errors" that are caught, e.g. kalloc returns NULL if it's out of memory, USB code returns errors if there's no USB, etc. But there's no try/catch for the entire operating system, because not all bugs can be repaired.
As pointed out, what happens if your filesystem module crashes? Keep running without files? How about your ethernet driver? Now your box is cut off from the internet and you can't even ssh into it anymore, but doesn't even have the decency to reboot.
So even though it may be possible for the kernel to not "crash" when a module crashes, the state of the kernel could be arbitrarily broken. The kernel could stay alive without a screen, filesystem or internet connection, but what kind of existence is that?
The kernel modules and the kernel itself share the same address space. There is simply no protection if a modules starts to misbehave and overwrite memory from another subsystem.
So, when a driver crashes, it may or may not stay local to that driver. If you are lucky, you still have a somewhat functional kernel and can continue to work.
That doesn't happen with userspace because the address space for each process is separate and so it is possible to catch erroneous memory access and stop the process (this is a SEGFAULT).

Linux System Calls & Kernel Mode

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?
System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.
You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

Can protection mode be turned off with inline assembly?

If a user didn't have root privileges, could that user still write a user space program with inline assembly to turn off protection mode on the computer to overwrite memory in other segments assuming the OS is linux?
Not unless the user knows of a security vulnerability to get root permissions. Mechanisms like /dev/mem allow root to read and write all userspace memory, and kernel module loading allows root access to kernel memory and the rest of the system's IO space.
Assuming the system is working as intended, no.
In reality, there are undoubtedly a few holes somewhere that would allow it -- given a code base that size, bugs are inevitable, and a few could probably be exploited to get into ring 0.
That said, I'm only guessing based on statistics -- I can't point to a specific vulnerability.

Resources