Understanding The Linux Kernel says:
A kernel control path denotes the sequence of instructions executed by the kernel to
handle a system call, an exception, or an interrupt.
and
Besides user processes, Unix systems include a few privileged processes called kernel
threads with the following characteristics:
• They run in Kernel Mode in the kernel address space.
• They do not interact with users, and thus do not require terminal
devices.
• They are usually created during system startup and remain alive
until the system is shut down.
What are the relations between the two concepts: a kernel control
path a kernel thread?
Is a kernel control path a kernel thread?
Is a kernel thread a kernel control path?
If I am correct, a kernel thread is represented as a task_struct
object.
So is a kernel control path?
If not, what kinds of kernel control paths can be and what kinds
can't be?
If I am correct, a kernel thread can be scheduled together with processes.
Can a kernel control path? If not, what kinds of kernel control paths can be and what kinds can't be?
Keep in mind there is no standard terminology. Using your definitions:
Is a kernel control path a kernel thread?
No, not under your definition.
Is a kernel thread a kernel control path?
No.
If I am correct, a kernel thread is represented as a task_struct object.
Probably.
So is [it] a kernel control path?
Not under your definition.
If not, what kinds of kernel control paths can be and what kinds can't be?
You defined it as:
A kernel control path denotes the sequence of instructions executed by the kernel to handle a system call, an exception, or an interrupt.
A kernel control path is the sequence of instructions executed by a kernel to handle a system call, an interrupt or an exception.
The kernel is the core of an operating system, and it controls virtually everything that occurs on a computer. An interrupt is a signal to the kernel that an event has occurred. Hardware interrupts are initiated by hardware devices, including the keyboard, the mouse, a printer or a disk drive. Interrupt signals initiated by programs are called software interrupts or exceptions.
In the most simple situation, the CPU executes a kernel control path sequentially, that is, beginning with the first instruction and ending with the last instruction.
source: http://www.linfo.org/kernel_control_path.html
Related
The following quote is from the "Understanding the Linux Kernel 3rd Edition" book:
When a User Mode process attempts to access an I/O port by means of an
in or out instruction, the CPU may need to access an I/O Permission
Bitmap stored in the TSS to verify whether the process is allowed to
address the port.
More precisely, when a process executes an in or out I/O instruction
in User Mode, the control unit performs the following operations:
It checks the 2-bit IOPL field in the eflags register. If it is set to 3, the control unit executes the I/O instructions. Otherwise, it
performs the next check.
It accesses the tr register to determine the current TSS, and thus the proper I/O Permission Bitmap.
It checks the bit of the I/O Permission Bitmap corresponding to the I/O port specified in the I/O instruction. If it is cleared, the
instruction is executed; otherwise, the control unit raises a “General
protection” exception.
The following quote is also from the same book:
Although Linux doesn’t use hardware context switches, it is
nonetheless forced to set up a TSS for each distinct CPU in the
system.
Now if Linux only has one TSS structure for all processes (instead of each process having its own TSS structure), and we know that each process must have its own I/O Permission Bitmap, does that mean that when Linux schedule the execution to another process, Linux would change the value of the I/O Permission Bitmap in the only TSS structure the CPU uses to the value of the I/O Permission Bitmap of the process to be executed (which Linux presumably stores somewhere in kernel memory)?
Yes. From the same section of the book, it says:
The tss_struct structure describes the format of the TSS. As already
mentioned in Chapter 2, the init_tss array stores one TSS for each CPU
on the system. At each process switch, the kernel updates some fields
of the TSS so that the corresponding CPU’s control unit may safely
retrieve the information it needs. Thus, the TSS reflects the
privilege of the current process on the CPU, but there is no need to
maintain TSSs for processes when they’re not running.
In later versions of the kernel, init_tss was renamed to cpu_tss. The TSS structure of each processor is initialized in cpu_init, which is executed once per processor when booting the system.
When switching from one task to another, __switch_to_xtra is called, which calls switch_to_bitmap, which simply copies the IO bitmap of the next task into the TSS structure of the processor on which it is scheduled to run next.
Related: How do Intel CPUs that use the ring bus topology decode and handle port I/O operations.
I want to know how privilege separation is enforced by the kernel and the part of kernel that is responsible for this task.
For example, assume there are two processes running -- one at ring 0 and another at ring 3. How does the kernel keep track of the ring number of each process?
Edit: I know about ring numbers. My question is about the part of kernel (module or something) which performs checks on the processes to find out their privilege level. I believe there might be a component of kernel which would check the ring number of a process.
There is no concept of a ring number of a process.
The kernel is mapped in one area of memory, userspace is mapped in another. On boot the kernel specifies an address where the cpu has to jump to when the syscall instruction is executed. So someone does syscall, the cpu switches to ring0 and jumps to the address as instructed by the kernel. It is now executing kernel code. Then, on return, the cpu switches back to ring3 and resumes execution.
Similar story for other ways of entering the kernel like exceptions.
So, how does linux kernel enforce separation? It sets things up for usersapace to execute in ring3. Anything triggering the cpu to switch to ring0 also makes the jump to an address configured by the kernel on boot. no code other than kernel code executes in ring0
This is for NetBSD on MIPS processor, but answer for Linux is also welcome.
I see that an interrupt occurred while receiving a network packet.
This hardware interrupt sees a TLB miss on store operation and kernel crashed.
When I see the core-dump, gdb points to LWP of a process (lets say ProcA).
I am assuming that, this hardware interrupt may have preempted the ProcA and started execution on ProcA's kernel stack.
Though in the stack frame I don't see anything from ProcA, what I don't understand is why gdb still points to ProcA.
I am doing so research trying to find the code in the Linux kernel that implements interrupt handling; in particular, I am trying to find the code responsible for handling the system timer.
According to http://www.linux-tutorial.info/modules.php?name=MContent&pageid=86
The kernel treats interrupts very similarly to the way it treats exceptions: all the general >purpose registers are pushed onto the system stack and a common interrupt handler is called. >The current interrupt priority is saved and the new priority is loaded. This prevents >interrupts at lower priority levels from interrupting the kernel while it handles this >interrupt. Then the real interrupt handler is called.
I am looking for the code that pushes all of the general purpose registers on the stack, and the common interrupt handling code.
At least pushing the general purpose registers onto the stack is architecture independent, so I'm looking for the code that is associated with the x86 architecture. At the moment I'm looking at version 3.0.4 of the kernel source, but any version is probably fine. I've gotten started looking in kernel/irq/handle.c, but I don't see anything that looks like saving the registers; it just looks like it is calling the registered interrupt handler.
The 32-bit versions are in arch/i386/kernel/entry_32.S, the 64-bit versions in entry_64.S. Search for the various ENTRY macros that mark kernel entry points.
I am looking for the code that pushes all of the general purpose registers on the stack
Hardware stores the current state (which includes registers) before executing an interrupt handler. Code is not involved. And when the interrupt exits, the hardware reads the state back from where it was stored.
Now, code inside the interrupt handler may read and write the saved copies of registers, causing different values to be restored as the interrupt exits. That's how a context switch works.
On x86, the hardware only saves those registers that change before the interrupt handler starts running. On most embedded architectures, the hardware saves all registers. The reason for the difference is that x86 has a huge number of registers, and saving and restoring any not modified by the interrupt handler would be a waste. So the interrupt handler is responsible to save and restore any registers it voluntarily uses.
See Intel® 64 and IA-32 Architectures
Software Developer’s Manual, starting on page 6-15.
In Linux, what are the options for handling device interrupts in user space code rather than in kernel space?
Experience tells it is possible to write good and stable user-space drivers for almost any PCI adapter. It just requires some sophistication and a small proxying layer in the kernel. UIO is a step in that direction, but If you want to correctly handle interrupts in user-space then UIO might not be enough, for example if the device doesn't support the PCI-spec's interrupt disable bit which UIO relies on.
Notice that process wakeup latencies are a few microsecs so if your implementation requires very low latency then user-space might be a drag on it.
If I were to implement a user-space driver, I would reduce the kernel ISR to just a "disable & ack & wakeup-userpace" operation, handle the interrupt inside the waked-up process, and then re-enable the interrupt (of course, by writing to mapped PCI memory from the userspace process).
There is Userspace I/O system (UIO), but handling should still be done in kernelspace. OTOH, if you just need to notice the interrupt, you don't need the kernel part.
You may like to take a look at CHAPTER 10: Interrupt Handling from Linux Device Drivers, Third Edition book.
Have to trigger userland code indirectly.
Kernel ISR indicates interrupt by writing file / setting register / signalling. User space application polls this and goes on with the appropriate code.
Edge cases: more or less interrupts than expected (time out / too many interrupts per time interval)
Linux file abstraction is used to connect kernel and user space. This is performed by character devices and ioctl() calls. Some may prefer sysfs entries for this purpose.
This can look odd because event triggered device notifications (interrupts) are hooked with 'time triggered' polling, but it is actually asyncronous blocking (read/select). Anyway some questions are arising according to performance.
So interrupts cannot be directly handled outside the kernel.
E.g. shared memory can be in user space and with some I/O permission settings addresses can be mapped, so U-I/O works, but not for direct interrupt handling.
I have found only one 'minority report' in topic vfio (http://lxr.free-electrons.com/source/Documentation/vfio.txt):
https://stackoverflow.com/a/21197797/5349798
Similar questions:
Running user thread in context of an interrupt in linux
Is it possible in linux to register a interrupt handler from any user-space program?
Linux Kernel: invoke call back function in user space from kernel space
Linux Interrupt vs. Polling
Linux user space PCI driver
How do I inform a user space application that the driver has received an interrupt in linux?