In an OS book, when it talks about client-server communication, it says:
Client-server communication is a common pattern in many systems, and so one can ask: how can we improve its performance? One step is to recognize that both the client and the server issue a write immediately followed by a read, to wait for the other side to reply; at the cost of adding a system call, these can be combined to eliminate two kernel crossings per round trip.
I wonder how "issue a write immediately followed by a read" can save 2 kernel crossings per round trip.
A write issues a system call into the kernel, causes a kernel crossing from user mode to kernel mode. When the write finishes, the OS returns to user-code, from kernel mode to user mode.
Then, read is called, and causes a kernel crossing from user mode to kernel mode, and then it returns to user-code, from kernel mode to user mode.
So what is the saved kernel crossing? Does it mean that the when the write finishes, it does not return to user code and user mode, instead, it directly runs read in kernel mode?
As far as understand the OS book, it is a potential optimization. OS may have a syscall that do write and read at once. It could be a hypothetical syscall like int write_read(int fd, char *write_buf, size_t write_len, char *read_buf, size_t *read_len). But there is no such call the linux kernel.
Modern kernels do not use interrupts for syscalls so the optimization would not help much. Moreover modern applications that are performance critical usually use some kind of asynchronous, non-blocking handling so the proposed optimization would be useless for them anyway. Further problem with that optimization would be error reporting. If something failed the caller could not easily recognize wheteher read failed or write failed.
Related
Im studying for the first time "Operating System". In my book i found this sentence about "User Mode" and "Kernel Mode":
"Switch from user to kernel mode" instruction is executed only in kernel
mode
I think that is a incorrect sentence as in practice there is no "switch of kernel". In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself. Is it correct ?
In fact, when a user process need to do a privileged instruction it simply ask the kernel to do something for itself.
But how does that happen? Details are processor (i.e. instruction set architecture) and OS specific (explained in ABI specifications relevant to your system, e.g. here), but that usually involves some machine code instruction like SYSENTER or SYSCALL (or SVC on mainframes) capable of atomically changing the CPU mode (that is switching it in a controlled manner to kernel mode). The actual parameters of the system call (including even the syscall number) are often passed in registers (but details are ABI specific).
So I feel the concept of switching from user-mode to kernel-mode is relevant, and meaningful (so "correct").
BTW, user-mode code is forbidden (by the hardware) to execute privileged machine instructions, such as those interacting with IO hardware devices (read about protection rings). If you try, you get some hardware exception (a bit similar to interrupts). Hence your code (even if it is malicious) has to make system calls, which the kernel controls (it has lots of code related to permission checking), for e.g. all IO.
Read also Operating Systems: Three Easy Pieces - freely downloadable. See also http://osdev.org/. Read system call wikipage & syscalls(2), and the Assembler HowTo.
In real life, things are much more complex. Read about System Management Mode and about the (scary) Intel Management Engine.
I am curious because I am reading this OS book which mentions
"User programs always run in user mode, which permits only a subset of the instructions [...]. Generally, all instructions involving I/O and memory protection are disallowed in user mode.
To obtain services from the OS, a user program must make a system call, which traps into the kernel and invokes the OS."
If I/O is not allowed generally in user mode, but let's say I have a program in C++ or Java which asks for input, or let's say something else like a search bar in any program. Whenever I select the search bar (meaning I will write something) then a TRAP instruction is called to invoke the OS (since the OS runs in kernel) to be able to have access to I/O, that is, the keyboard? I am not sure if I follow correctly or what am I getting wrong.
The I/O is not allowed in user mode, but you use Input for applications in the OS, or even with the OS itself there are keyboard commands. If you can use keyboard commands that means the OS is ready for I/O at any time. Then the original statement about I/O instructions being disallowed in user mode.
I am sorry for my ignorance but I am just a little bit confused with these terms and difference between user and kernel. I know the OS runs in kernel mode, and the applications run in the OS, so in the end the applications do have access to I/O.
Don't application need to have the OS deal with there I/O for them?
Meaning, only the OS has the authority to do those type of things...
Am I wrong?
Toss your book in the garbage or use it to line a cat box.
Your apparent paradox is that you think you have to be in kernel mode to do I/O but your book says:
"User programs always run in user mode, which permits only a subset of the instructions"
The resolution to your paradox is that your book is spouting nonsense.
Use programs do not always run in user mode. They frequently run in kernel mode. One of the basic functions of an operating system is to provide a set of kernel mode system services that provide controlled access to kernel mode.
In other words, your instincts here are better than your confusing book's text.
It is important to understand what "user-mode" and "kernel-mode" mean. A process is mapped to memory regions which are user-priviliged, depending on the memory layout. Kernel-mode is basically routines that are in supervisor priviliged memory regions, which are invoked by your program to do the desired work (I/O).
I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?
System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.
You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.
I have recently started reading Linux Kernel Development By Robert Love and I am Love -ing it!
Please read the below excerpt from the book to better understand my questions:
A number identifies interrupts and the kernel uses
this number to execute a specific interrupt handler to process and respond to the interrupt.
For example, as you type, the keyboard controller issues an interrupt to let the system
know that there is new data in the keyboard buffer. The kernel notes the interrupt number of the incoming interrupt and executes the correct interrupt handler.The interrupt
handler processes the keyboard data and lets the keyboard controller know it is ready for
more data...
Now I have dual boot on my machine and sometimes (in fact,many) when I type something on windows, I find myself doing it in, what I call Night crawler mode. This is when I am typing and I don't see anything on the screen and later after a while the entire text comes in one flash, probably the buffer just spits everything out.
Now I don't see this happening on Linux. Is it because of the interrupt-context present in Linux and the absence of it in windows?
BTW, I am still not sure if there is an interrupt-context in windows, google didn't give me any relevant results for that.
All OSes have an interrupt context, it's a feature/constraint of the CPU architecture -- basically, this is "just the way things work" with computer hardware. Different OSes (and drivers within that OS) make different choices about what work and how much work to do in the interrupt before returning, though. That may be related to your windows experience, or it may not. There is a lot of code involved in getting a key press translated into screen output, and interrupt handling is only a tiny part.
A number identifies interrupts and the kernel uses this number to execute a specific interrupt handler to process and respond to the interrupt. For example, as you type, the keyboard controller issues an interrupt to let the system know that there is new data in the keyboard buffer.The kernel notes the interrupt num- ber of the incoming interrupt and executes the correct interrupt handler.The interrupt handler processes the keyboard data and lets the keyboard controller know it is ready for more data
This is a pretty poor description. Things might be different now with USB keyboards, but this seems to discuss what would happen with an old PS/2 connection, where an "8042"-compatible chipset on your motherboard signals on an IRQ line to the CPU, which then executes whatever code is at the address stored in location 9 in the interrupt table (traditionally an array of pointers starting at address 0 in physical memory, though from memory you could change the address, and last time I played with this stuff PCs still had <1MB RAM and used different memory layout modes).
That dispatch process has nothing to do with the kernel... it's the way the hardware works. (The keyboard controller could be asked not to generate interrupts, allowing OS/driver software to "poll" it regularly to see if there happened to be new event data available, but it'd be pretty crazy to use that really).
Still, the code address from the interrupt table will point into the kernel or keyboard driver, and the kernel/driver code will read the keyboard event data from the keyboad controller's I/O port. For these hardware interrupt handlers, a primary goal is to get the data from the device and store it into a buffer as quickly as possible - both to ensure a return from the interrupt to whatever processing was happening, and because the keyboard controller can only handle one event at a time - it needs to be read off into the buffer before the next event.
It's then up to the OS/driver to either provide some kind of input availability signal to application software, or wait for the application software to attempt to read more keyboard input, but it can do it a "whenever you're ready" fashion. Whichever way, once an application has time to read and start responding to the input, things can happen that mean it takes an unexpectedly long amount of time: it could be that the extra keystroke triggers some complex repagination algorithm that takes a long time to run, or that the keystroke results in the program executing code that has been swapped out to disk (check wikipedia for "virtual memory"), in which case it could be only after the hard disk has read part of the program into memory that the program can continue to run. There are thousands of such edge cases involving window movement, graphics clipping algorithms, etc. that could account for the keyboard-handling code taking a long time to complete, and if other keystrokes have happened meanwhile they'll be read by the keyboard driver into that buffer, then only "perceived" by the application after the slow/blocking processing completes. It may well be that the processing consequent to all the keystrokes then in the buffer completes much more quickly: for example, if part of the program was swapped in from disk, that part may be ready to process the remaining keystrokes.
Why would Linux do better at this than Windows? Mainly because the Operating System, drivers and applications tend to be "leaner and meaner"... less bloated software (like C++ vs C# .NET), less wasted memory, so less swapping and delays.
This is related to
this question.
I'm not an expert on Linux device drivers or kernel modules, but I've been reading "Linux Device Drivers" [O'Reilly] by Rubini & Corbet and a number of online sources, but I haven't been able to find anything on this specific issue yet.
When is a kernel or driver module allowed to use floating-point registers?
If so, who is responsible for saving and restoring their contents?
(Assume x86-64 architecture)
If I understand correctly, whenever a KM is running, it is using a hardware context (or hardware thread or register set -- whatever you want to call it) that has been preempted from some application thread. If you write your KM in c, the compiler will correctly insure that the general-purpose registers are properly saved and restored (much as in an application), but that doesn't automatically happen with floating-point registers. For that matter, a lot of KMs can't even assume that the processor has any floating-point capability.
Am I correct in guessing that a KM that wants to use floating-point has to carefully save and restore the floating-point state? Are there standard kernel functions for doing this?
Are the coding conventions for this spelled out anywhere? Are they different for SMP-non SMP drivers? Are they different for older non-preemptive kernels and newer preemptive kernels?
Linus's answer provides this pretty clear quote to use as a guideline:
In other words: the rule is that you really shouldn't use FP in the kernel.
Short answer: Kernel code can use floating point if this use is surrounded by kernel_fpu_begin()/kernel_fpu_end(). These function handle saving and restoring the fpu context. Also, they call preempt_disable()/preempt_enable(), which means no sleeping, page faults etc. in the code between those functions. Google the function names for more information.
If I understand correctly, whenever a
KM is running, it is using a hardware
context (or hardware thread or
register set -- whatever you want to
call it) that has been preempted from
some application thread.
No, a kernel module can run in user context as well (eg. when userspace calls syscalls on a device provided by the KM). It has, however, no relation to the float issue.
If you write your KM in c, the
compiler will correctly insure that
the general-purpose registers are
properly saved and restored (much as
in an application), but that doesn't
automatically happen with
floating-point registers.
That is not because of the compiler, but because of the kernel context-switching code.