Location of interrupt handling code in Linux kernel for x86 architecture - linux

I am doing so research trying to find the code in the Linux kernel that implements interrupt handling; in particular, I am trying to find the code responsible for handling the system timer.
According to http://www.linux-tutorial.info/modules.php?name=MContent&pageid=86
The kernel treats interrupts very similarly to the way it treats exceptions: all the general >purpose registers are pushed onto the system stack and a common interrupt handler is called. >The current interrupt priority is saved and the new priority is loaded. This prevents >interrupts at lower priority levels from interrupting the kernel while it handles this >interrupt. Then the real interrupt handler is called.
I am looking for the code that pushes all of the general purpose registers on the stack, and the common interrupt handling code.
At least pushing the general purpose registers onto the stack is architecture independent, so I'm looking for the code that is associated with the x86 architecture. At the moment I'm looking at version 3.0.4 of the kernel source, but any version is probably fine. I've gotten started looking in kernel/irq/handle.c, but I don't see anything that looks like saving the registers; it just looks like it is calling the registered interrupt handler.

The 32-bit versions are in arch/i386/kernel/entry_32.S, the 64-bit versions in entry_64.S. Search for the various ENTRY macros that mark kernel entry points.

I am looking for the code that pushes all of the general purpose registers on the stack
Hardware stores the current state (which includes registers) before executing an interrupt handler. Code is not involved. And when the interrupt exits, the hardware reads the state back from where it was stored.
Now, code inside the interrupt handler may read and write the saved copies of registers, causing different values to be restored as the interrupt exits. That's how a context switch works.
On x86, the hardware only saves those registers that change before the interrupt handler starts running. On most embedded architectures, the hardware saves all registers. The reason for the difference is that x86 has a huge number of registers, and saving and restoring any not modified by the interrupt handler would be a waste. So the interrupt handler is responsible to save and restore any registers it voluntarily uses.
See Intel® 64 and IA-32 Architectures
Software Developer’s Manual, starting on page 6-15.

Related

How is hardware context switct used/unused in Linux?

Old x86 intel architecture provided context switching (in the form of TSS) at hardware level. But I have read that, linux has long "abandoned" using hardware context switching functionality as they were less optimised, less flexible and was not available on all architextures.
What confuses me is how can software (linux) control hardware operations (saving/restoring context)? Linux can choose not to use context setup by hardware but hardware context switch would nevertheless happen (making "optimisation" argument irrelevant).
Also if linux is not using hardware context switch, how can then the value %eip (pointing to next instruction in user program) be saved and kernel stack pointer restored by the kernel? (and vice-versa process)
I think kernel would need some support from hardware to save user program %eip and switch %esp (user to kernel stack) registers even before interrupt service routine starts.
If this support indeed is provided by hardware then how is linux not using hardware context switches?
Terribly confused!!!

What are the relations between a kernel control path a kernel thread?

Understanding The Linux Kernel says:
A kernel control path denotes the sequence of instructions executed by the kernel to
handle a system call, an exception, or an interrupt.
and
Besides user processes, Unix systems include a few privileged processes called kernel
threads with the following characteristics:
• They run in Kernel Mode in the kernel address space.
• They do not interact with users, and thus do not require terminal
devices.
• They are usually created during system startup and remain alive
until the system is shut down.
What are the relations between the two concepts: a kernel control
path a kernel thread?
Is a kernel control path a kernel thread?
Is a kernel thread a kernel control path?
If I am correct, a kernel thread is represented as a task_struct
object.
So is a kernel control path?
If not, what kinds of kernel control paths can be and what kinds
can't be?
If I am correct, a kernel thread can be scheduled together with processes.
Can a kernel control path? If not, what kinds of kernel control paths can be and what kinds can't be?
Keep in mind there is no standard terminology. Using your definitions:
Is a kernel control path a kernel thread?
No, not under your definition.
Is a kernel thread a kernel control path?
No.
If I am correct, a kernel thread is represented as a task_struct object.
Probably.
So is [it] a kernel control path?
Not under your definition.
If not, what kinds of kernel control paths can be and what kinds can't be?
You defined it as:
A kernel control path denotes the sequence of instructions executed by the kernel to handle a system call, an exception, or an interrupt.
A kernel control path is the sequence of instructions executed by a kernel to handle a system call, an interrupt or an exception.
The kernel is the core of an operating system, and it controls virtually everything that occurs on a computer. An interrupt is a signal to the kernel that an event has occurred. Hardware interrupts are initiated by hardware devices, including the keyboard, the mouse, a printer or a disk drive. Interrupt signals initiated by programs are called software interrupts or exceptions.
In the most simple situation, the CPU executes a kernel control path sequentially, that is, beginning with the first instruction and ending with the last instruction.
source: http://www.linfo.org/kernel_control_path.html

Performance Read() and Write() to/from Linux SKB's

Based on a standard Linux system, where there is a userland application and the kernel network stack. Ive read that moving frames from user space to kernel space (and vica-versa) can be expensive in terms of CPU cycles.
My questions are,
Why? and is moving the frame in one direction (i.e from user to
kernel) have a higher impact.
Also, how do things differ when you
move into TAP based interfaces. As the frame will still be going
between user/kernel space. Do the space concerns apply, or is there some form of zero-copy in play?
Addressing questions in-line:
Why? and is moving the frame in one direction (i.e from user to
kernel) have a higher impact.
Moving to/from user/kernel spaces is expensive because the OS has to:
Validate the pointers for the copy operation.
Transfer the actual data.
Incur the usual costs involved in transitioning between user/kernel mode.
There are some exceptions to this, such as if your driver implements a strategy such as "page flipping", which effectively remaps a chunk/page of memory so that it is accessible to a userspace application. This is "close enough" to a zero copy operation.
With respect to copy_to_user/copy_from_user performance, the performance of the two functions is apparently comparable.
Also, how do things differ when you move into TAP based interfaces. As
the frame will still be going between user/kernel space. Do the space
concerns apply, or is there some form of zero-copy in play?
With TUN/TAP based interfaces, the same considerations apply, unless you're utilizing some sort of DMA, page flipping, etc; logic.
Context Switch
Moving frames from user space to kernel space is called context switch, which is usually caused by system call (which invoke the int 0x80 interrupt).
Interrupt happens, entering kernel space;
When interrupt happens, os will store all of the registers' value into the kernel stack of a thread: ds, es, fs, eax, cr3 etc
Then it jumps to IRQ handler like a function call;
Through some common IRQ execution path, it will choose next thread to run by some algorithm;
The runtime info (all the registers) is loaded from next thread;
Back to user space;
As we can see, we will do a lot of works when moving frame into/out kernel, which is much more work than a simple function call (just setting ebp, esp, eip). That is why this behavior is relatively time-consuming.
Virtual Devices
As a virtual network devices, writing to TAP has no differences compared with writing to a /dev/xxx.
If you write to TAP, os will be interrupted like upper description, then it will copy your arguments into kernel and block your current thread (in blocking IO). Kernel driver thread will be notified in some ways (e.g. message queue) to receive the arguments and consume it.
In Andorid, there exists some zero-copy system call, and in my demo implementations, this can be done through the address translation between the user and kernel. Because kernel and user thread not share same address space and user thread's data may be changed, we usually copy data into kernel. So if we meet the condition, we can avoid copy:
this system call must be blocked, i.e. data won't change;
translate between addresses by page tables, i.e. kernel can refer to right data;
Code
The following are codes from my demo os, which is related to this question if you are interested in detail:
interrupt handle procedure: do_irq.S, irq_handle.c
system call: syscall.c, ide.c
address translation: MM_util.c

Interrupts and system calls dispatching in Linux

Are hardware interrupts and system calls/exceptions dispatched by the same dispatcher procedure in Linux? If you see Linux source, you will notice that hardware interrupts (on x86 arch) on their interrupt vectors doesn't contain more instructions that PUSH interrupt vector number on the stack and JUMP to common_interrupt.
My question:
Are every interrupt in Linux (exceptions (include SysCall), interrupts) dispatched by the same way until reach some point to branch? (in the reason of their type)
Sorry for my English.
Are hardware interrupts and system calls/exceptions dispatched by the same dispatcher procedure in Linux?
No. Exceptions, system calls and hardware interrupts are dispatched in a different way. If you will look in the arch/x86/entry/entry_64.S, you will find there all of them. First is the idtentry macro:
.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
ENTRY(\sym)
...
...
...
END(\sym)
.endm
which provides preparation for an exception handling (stores registers, calls of an exception handler and etc....). Also definition of exceptions handlers with the idtentry macro:
idtentry divide_error do_divide_error has_error_code=0
idtentry overflow do_overflow has_error_code=0
idtentry bounds do_bounds has_error_code=0
Most of exceptions handlers are in the arch/x86/kernel/trap.c
Entry point of hardware interrupts is irq_entries_start. And the system call handling is starts at the entry_SYSCALL_64.
My question: Are every interrupt in Linux (exceptions (include
SysCall), interrupts) dispatched by the same way until reach some
point to branch? (in the reason of their type)
No. They are similar, but not the same. For example, system call preparation routine (entry_SYSCALL_64) checks the type of system call (64-bit or 32-bit emulation), has each time the same state of registers before execution (depends on ABI) and etc..., but for example an exception handler first of all check the type of exception to select correct stack from IST and etc.
More info you can find in the Intel® 64 and IA-32 Architectures Software Developer’s Manual 3A

what all happens in sysenter instruction is used in linux?

I am studying about how CPU changes from user mode to kernel mode in linux. I came across two different methods: Interrupts and using sysenter.
I could not understand how sysenter works. Could someone please explain what exactly happens in the cpu when the sysenter instruction is run?
The problem that a program faces when it wants to get into the kernel (aka "making syscalls") is that user programs cannot access anything kernel-related, yet the program has to somehow switch the CPU into "kernel mode".
On an interrupt, this is done by the hardware.
It also happens automatically when a (CPU-, not C++) exception occurs, like accessing memory that doesn't exist, a divison by zero, or invoking a privileged instruction in user code. Or trying to execute an unimplemented instruction. This last thing is actually a decent way to implement a "call the kernel" interface: CPU runs on an instruction that the CPU doesn't know, so it raises an exception which drops the CPU into kernel mode and into the kernel. The kernel code could then check whether the "correct" unmiplemented instruction was used and perform the syscall stuff if it was, or just kill the process if it was any other unimplemented instruction.
Of course, doing something like this isn't, well, "clean". It's more like a dirty hack, abusing what should be an error to implement a perfectly valid control flow change. Hence, CPUs do tend to have actual instructions to do essentially the same thing, just in a more "defined" way. The main purpose of anything like a "sysenter" instruction is still the same: it changes the CPU into "kernel mode", saves the position where the "sysenter" was called, and continues execution somewhere in the kernel.
As for the difference between a "software interrupt" and "sysenter": "sysenter" is specifically optimized for this kind of use case. For example, it doesn't get the kernel address to call from memory like a (software-)interrupt does, but instead uses a special register to get the address from, which saves the memory address lookup. It might also have additional optimizations internally, based on the fact that software-interrupts might be handled more like interrupts, and the sysenter instruction doesn't actually need that. I don't know the precise details of the implementations of these instructions on the CPUs, you would probably have to read the Intel manuals to really get into such details.

Resources