protection of instructions from using it in user mode in linux

protection of instructions from using it in user mode in linux - linux

I read in a tutorial that some 15 instructions in x86 architecture is not allowed to be used in user mode.
I know there is something called code segment register which keeps track of current previlege level
My question is
a)does cpu, before executing every instructions has to check current previlege level it is running?
b)What actually happens if there is an instruction which cannot be used in user mode present in a user program?How CPU comes to know about this before executing?

The CPU does check CPL, RPL and things alike before executing certain instructions or certain parts of them (there are a number of instructions with very complex logic and the set of checks to perform depends on a number of conditions).
If an instruction is not allowed to execute, the CPU generates an exception event, which is then dispatched to its handler. Exception handlers are similar to interrupt handlers in nature and are defined by the OS. So, when the OS gets to handle an exception that it cannot anyhow correct, it terminates the program that's caused the exception.
An example of a "correctable" exception is page faults for virtual memory that's been offloaded to the disk. The OS loads the code/data that the application is trying to use back from the disk into the memory.

The Current Privilege Level is saved in one of the bit of CS register(technically 2 bits on x86).
Certain instructions are not allowed in the code, Ex - users canbe prevented from making certain system call .int X (in assembly (x86) results in system call),with X being an index into the IDT(interrupt descriptor table).This index points to the system call.Also a field called DPL is stored in each entry of IDT.
This are the steps followed by an int instruction:
• Fetch the n’th descriptor from the IDT, where n is the argument of int.
• Check that CPL in %cs is <= DPL, where DPL is the privilege level in the descriptor.
• If yes then the user code has enough privilege to do this system call,the current execution context is saved ( registers etc), because we now switch to kernel mode.
• If not then the user didn't have enough privilege to execute this and will result in an int 13 instruction (general protection fault) being executed
Well for 1 category of DO NOT DO instruction this how checking is done , I am not aware about how it done for other instruction.
Also for accessing different segments through the GDT ( gate descriptor table).the approach is the same.
PS : This is valid only on x86 based systems.
Please comment the link where you saw the list of reserved instruction.

Related

How can this client-server communication method save two kernel crossings?

In an OS book, when it talks about client-server communication, it says:
Client-server communication is a common pattern in many systems, and so one can ask: how can we improve its performance? One step is to recognize that both the client and the server issue a write immediately followed by a read, to wait for the other side to reply; at the cost of adding a system call, these can be combined to eliminate two kernel crossings per round trip.
I wonder how "issue a write immediately followed by a read" can save 2 kernel crossings per round trip.
A write issues a system call into the kernel, causes a kernel crossing from user mode to kernel mode. When the write finishes, the OS returns to user-code, from kernel mode to user mode.
Then, read is called, and causes a kernel crossing from user mode to kernel mode, and then it returns to user-code, from kernel mode to user mode.
So what is the saved kernel crossing? Does it mean that the when the write finishes, it does not return to user code and user mode, instead, it directly runs read in kernel mode?

As far as understand the OS book, it is a potential optimization. OS may have a syscall that do write and read at once. It could be a hypothetical syscall like int write_read(int fd, char *write_buf, size_t write_len, char *read_buf, size_t *read_len). But there is no such call the linux kernel.
Modern kernels do not use interrupts for syscalls so the optimization would not help much. Moreover modern applications that are performance critical usually use some kind of asynchronous, non-blocking handling so the proposed optimization would be useless for them anyway. Further problem with that optimization would be error reporting. If something failed the caller could not easily recognize wheteher read failed or write failed.

How linux kernel knows the address passed as argument in syscall is invalid?

Currently I am reading System calls chapter of Understanding linux kernel and I could not understand the fact that how linux kernel knows address argument passed via syscall() is invalid.
Book has mentioned that address checking is delayed until it is used and when linux made used this address it generates page fault.
It further mentioned a fault can happen in three case in kernel mode
• The kernel attempts to address a page belonging to the process
address space, but either the corresponding page frame does not exist,
or the kernel is trying to write a read-only page.
• Some kernel function includes a programming bug that causes the
exception to be raised when that program is executed; alternatively,
the exception might be caused by a transient hardware error.
• A system call service routine attempts to read or write into a
memory area whose address has been passed as a system call parameter,
but that address does not belong to the process address space.
These cases must be distinguished by the page fault handler, since the actions to be taken are quite different.The page fault handler can easily recognize the first case by determining whether the faulty linear address is included in one of the memory regions owned by the process.
But how kernel distinguishes between remaining two case. Although it is explained in the text book but it looks alien to me. Please help and explain.

The page fault handler __do_page_fault includes this piece of code:
if (!(error_code & X86_PF_USER) &&
!search_exception_tables(regs->ip)) {
bad_area_nosemaphore(regs, error_code, address, NULL);
return;
}
This condition !(error_code & X86_PF_USER) is true when the system call originated from kernel mode rather than user mode. This condition !search_exception_tables(regs->ip) is true when the page fault did not occur from executing one of the instructions that use a linear that was passed to the system call. Note that regs->ip holds the instruction pointer of the instruction that caused the page fault. When both of these conditions are true, it means that either there is a bug in some kernel function or that there is some hardware error (the second case).
regs contains a snapshot of all architectural registers at the time of the page fault. On x86, this includes the CS segment register. The RPL in that register can be used to determine whether system call originated from user mode or kernel mode.
The search_exception_tables performs a binary search on sorted arrays of instruction addresses that are built at compile-time when compiling the kernel. These are basically the instructions that access an address passed to the system call.
For the other two other cases you listed, the condition !(error_code & X86_PF_USER) would be false.

"Switching from user mode to kernel mode" is an incorrect concept

Im studying for the first time "Operating System". In my book i found this sentence about "User Mode" and "Kernel Mode":
"Switch from user to kernel mode" instruction is executed only in kernel
mode
I think that is a incorrect sentence as in practice there is no "switch of kernel". In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself. Is it correct ?

In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself.
But how does that happen? Details are processor (i.e. instruction set architecture) and OS specific (explained in ABI specifications relevant to your system, e.g. here), but that usually involves some machine code instruction like SYSENTER or SYSCALL (or SVC on mainframes) capable of atomically changing the CPU mode (that is switching it in a controlled manner to kernel mode). The actual parameters of the system call (including even the syscall number) are often passed in registers (but details are ABI specific).
So I feel the concept of switching from user-mode to kernel-mode is relevant, and meaningful (so "correct").
BTW, user-mode code is forbidden (by the hardware) to execute privileged machine instructions, such as those interacting with IO hardware devices (read about protection rings). If you try, you get some hardware exception (a bit similar to interrupts). Hence your code (even if it is malicious) has to make system calls, which the kernel controls (it has lots of code related to permission checking), for e.g. all IO.
Read also Operating Systems: Three Easy Pieces - freely downloadable. See also http://osdev.org/. Read system call wikipage & syscalls(2), and the Assembler HowTo.
In real life, things are much more complex. Read about System Management Mode and about the (scary) Intel Management Engine.

What happens after segmentation fault in linux kernel?

while I was thinking of making a networked paging (request the faulting page from remote node), I got this question:
First, let's consider the following steps:
1) a user-space program tries to access at memory X.
2) MMU walks the page table to find the physical address of X.
3) while walking the page table, it notice that the page table entry is invalid.
4) CPU traps and is catched by the Linux trap vector. (In ARM case, but I think x86 is also the same, right?)
5) At this point, I can retrieve the proper data from remote node, copy into some physical address and map it in page table.
6) Here goes the question: After this point, would the program that has page fault at X safely read the data?, Then, does it mean MMU or CPU somehow remembers the page faulting page table entry and return to that entry and resume the walking of page table?
If any of the steps are not right, please enlighten me.

Data abort handler just assigns to the pc the same value as before the data abort handling started, and instruction gets executed again, with right data in place, so data abort won't happen again.

The solution is tricky and non-portable.
You can get the values of the CPU registers, when the segmentation fault occurred, from a signal handler (link: http://man7.org/linux/man-pages/man2/sigaction.2.html). You need to analyse these to decide whether you can fix the situation. First you need to check that the instruction pointer is valid. Then, you need to check that the faulty address lies in a valid range. Then, you need to map memory for the non existent pages with mmap() system call. Then, you need to copy the required data to these pages. After the signal handler returns, the process will resume from where the segmentation fault had occurred.

Would executable files be Machine Code - made for the hardware?

Here is from Wiki .
"In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," ( Machine Code ?? )
"Modern operating systems retain control over the computer's resources, requiring that individual programs make system calls to access privileged resources. Since each operating system family features its own system call architecture, executable files are generally tied to specific operating systems."
Well this is my perspective .
Executables cannot be Machine Code as they need to tal to the OS for hardware services ( system calls) Hence executable is just not yet "Machine Code" ... Perhaps it is like some part of the code is actual Machine Code and some parts are just meant to call the Machine code embedded in the Operating system ? Overall it contains some junks of Machine Code - and some junks of codes to call the operating system .
Edited after Damon's Answer :
In the end OS is a set of machine codes . Basically OS would be doing the job of copy pasting user's Machine Code ( created by C Compiler ) and then if the instruction is a system call , the transfer goes to OS memory region for handling it . Now the question is what Machine Code generated in C can do this part ? Like asking to transfer control to OS etc - I suppose its system calls at higher abstraction but under the hood - how does it work .
I get a feeling its similar to chicken egg problem , C creates OS and C uses OS Cant find the exactly how the process goes .
Can anyone break the puzzle for me ?

One thing does not exclude the other. Executables are (unless they are some form of bytecode running in a virtual machine) machine code. However, there are different kinds of instructions, some of which are not usable at certain privilegue levels.
That is where the operating system comes in, it is "machine code" that runs at the highest privilegue level, working as arbiter for the "important" parts and tasks, such as deciding who gets CPU time and what value goes into some hardware register.
(originally comment, made an answer by request)
EDIT: About your extended question, this works approximately as follows. When the computer is turned on, the processor runs at its highest privilegue level. In this "mode", the BIOS, the boot loader, and the operating system can do just what they want. This sounds great, but you don't want any kind of code being able to do just whatever it wants.
For example, the code can tell the MMU which memory pages are allowed to be read or written to, and which ones are not. Or, it can define what address is called if "something special" such as a trap or interrupt happens. Or, it can directly write to some special memory addresses that map ports of some devices (disk, network, whatever).
Eventually, the OS switches to "unprivileged" mode and calls some non-OS code. When a trap or interrupt happens, execution is interrupted and continues elsewhere (as specified by the OS previously), and the privilege level is upped again. Once the interrupt has been dealt with, privilege is taken away, and user code is called again.
If a user program needs the OS to do something "OS like", it sets up parameters according to an agreed scheme (for example in some particular registers) and executes a trap instruction.
This is for example how things like multithreading or virtual memory are implemented. In regular intervals, a timer fires off an interrupt, which stops execution of "normal" code, and calls some code in the kernel (in privileged mode). That code then decides what user process control should returned to, after some kind of priority scheme. Those are the "CPU time slices" that are handed out.
If some process reads from or writes to a page that it isn't allowed, a trap is generated by the MMU. The OS then looks at what happened and where, and decides whether to load some data from disk into some memory region (and possibly purge something else) and change the process' mappings, or whether to kill the process with a "segmentation fault" error.
Of course in reality, it is a million times more complicated, but in principle that's about as it works.
It does not really matter whether the OS or the programs were originally written in C or with an assembler. To the processor, it's just a sequence of machine instructions. Even a python or perl script is "just machine instructions" in the end, only with a detour via the interpreter.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string