Does there exist a Kernel stack and a user-space stack for each user space process? If both stacks exist, there should be 2 stack pointers for each user space process right?
In Linux, each task (userspace or kernel thread) has a kernel stack of either 8kb or 4kb, depending on kernel configuration. There are indeed separate stack pointers, however, only one is present in the CPU at any given time; if userspace code is running, the kernel stack pointer to be used on exceptions or interrupts is specified by the task-state segment, and if kernel code is running, the user stack pointer is saved in the context structure located on the kernel stack.
Each user space thread(not just process) has its user space stack and kernel space stack. The kernel space has one stack percpu and the ISR has its seperate stack too.
Related
The following image shows where the sections of a process are laid out in the process's virtual address space (in Linux):
You can see that there is only one stack section (since this process only has one thread I assume).
But what if this process has another thread, where will the stack for this second thread be located? will it be located immediately below the first stack?
Stack space for a new thread is created by the parent thread with mmap(MAP_ANONYMOUS|MAP_STACK). So they're in the "memory map segment", as your diagram labels it. It can end up anywhere that a large malloc() could go. (glibc malloc(3) uses mmap(MAP_ANONYMOUS) for large allocations.)
(MAP_STACK is currently a no-op, and exists in case some future architecture needs special handling).
You pass a pointer to the new thread's stack space to the clone(2) system call which actually creates the thread. (Try using strace -f on a multi-threaded process sometime). See also this blog post about creating a thread using raw Linux syscalls.
See this answer on a related question for some more details about mmaping stacks. e.g. MAP_GROWSDOWN doesn't prevent another mmap() from picking the address right below the thread stack, so you can't depend on it to dynamically grow a small stack the way you can for the main thread's stack (where the kernel reserves the address space even though it's not mapped yet).
So even though mmap(MAP_GROWSDOWN) was designed for allocating stacks, it's so bad that Ulrich Drepper proposed removing it in 2.6.29.
Also, note that your memory-map diagram is for a 32-bit kernel. A 64-bit kernel doesn't have to reserve any user virtual-address space for mapping kernel memory, so a 32-bit process running on an amd64 kernel can use the full 4GB of virtual address space. (Except for the low 64k by default (sysctl vm.mmap_min_addr = 65536), so NULL-pointer dereference does actually fault. And the top page is also reserved as error codes, not valid pointers.)
Related:
See Relation between stack limit and threads for more about stack-size for pthreads. getrlimit(RLIMIT_STACK) is the main thread's stack size. Linux pthreads uses RLIMIT_STACK as the stack size for new threads, too.
I know that there are two types of stack in Linux : user stack for each user threads and Kernel Stack for kernel threads (but 1 process). The interruptions, more precisely, the interruption procedures, are the bridges between this two modes, kernel (0) and user (3). The interrupt vector table let the processor loads the right instruction address in the PC register, but how is the stack pointer register changed when it switches in kernel mode? Does the subroutines indicate where's the kernel stack just before its first instruction? Or does the processor uses two stack pointer registers (I really doubt it)?
How does the "return from interrupt" knows where to return? Is the PCB saved in kernel stack or elsewhere?
Please don't hesitate to rectify it anything I've said to be true is not.
Thanks a lot for your help.
Kernel mode stack in Linux kernel is stored in task_struct->stack. Where and how it comes from is totally up to the platform. Some platforms might not be saving it like above. But then you can use task_stack_page() to find the stack.
And
While entering the interrupt handler, PC is stored on kernel stack. While returning from Interrupt this PC is loaded back from the kernel stack.
In a multitasking system when any hardware generates a interrupt to a particular CPU, where CPU can be performing either of below cases unless it is already serving a ISR:
User mode process is executing on CPU
Kernel mode process is executing on CPU
Would like to know which stack is used by interrupt handler in above two situations and why ?
All interrupts are handled by kernel. That is done by interrupt handler written for that particular interrupt. For Interrupt handler there is IRQ stack. The setup of an interrupt handler’s stacks is configuration option. The size of the kernel stack might not always be enough for the kernel work and the space required by
IRQ processing routines. Hence 2 stack comes into picture.
Hardware IRQ Stack.
Software IRQ Stack.
In contrast to the regular kernel stack that is allocated per process, the two additional stacks are allocated per CPU. Whenever a hardware interrupt occurs (or a softIRQ is processed), the kernel needs to switch to
the appropriate stack.
Historically, interrupt handlers did not receive their own stacks. Instead, interrupt handlers would share the stack of the running process, they interrupted. The kernel stack is two pages in size; typically, that is 8KB on 32-bit architectures and 16KB on 64-bit architectures. Because in this setup interrupt handlers share the stack, they must be exceptionally frugal with what data they allocate there. Of course, the kernel stack is limited to begin with, so all kernel code should be cautious.
Interrupts are only handled by the kernel. So it is some kernel stack which is used (in both cases).
Interrupts do not affect (directly) user processes.
Processes may get signals, but these are not interrupts. See signal(7)...
Historically, interrupt handlers did not receive their own stacks.
Instead, they would share the stack of the process that they interrupted.
Note that a process is always running. When nothing else is schedulable, the idle task runs.
The kernel stack is two pages in size:
8KB on 32-bit architectures.
16KB on 64-bit architectures.
Because of sharing the stack, interrupt handlers must be exceptionally frugal with what data they allocate there.
Early in the 2.6 kernel process, an option was added to reduce the stack size from two pages to one, providing only a 4KB stack on 32-bit system, and interrupt handlers were given their own stack, one stack per processor, one page in size. This stack is referred to as the interrupt stack.
Although the total size of the interrupt stack is half that of the original shared stack, the average stack space available is greater because interrupt handlers get the full page of memory to themselves, because previously every process on the system needed two pages of contiguous, nonswappable kernel memory.
Your interrupt handler should not care what stack setup is in use or what the size of the kernel stack is. Always use an absolute minimum amount of stack space
https://notes.shichao.io/lkd/ch7/#stacks-of-an-interrupt-handler
I know Linux kernel take thread kernel stack as ISR stack before 2.6.32, after 2.6.32, kernel uses separated stack, if wrong, please correct me.
Would you tell me when the ISR stack is setup/crated, or destroy if there is. Or tell me the source file name and line number? Thanks in advance.
Updated at Oct 17 2014:
There are several kinds of stack in Linux. Below are 3 major(not all) that I know.
User space process stack, each user space task has its own stack,
this is created by mmap() when task is created.
Kernel stack for user space task, one for each user space task, this is
created within do_fork()->copy_process()->dup_task_struct()->alloc_thread_info() and used for system_call.
Stack for hardware interruption(top half), one for each CPU(after 2.6),
defined in arch/x86/kernel/irq_32.c: DEFINE_PER_CPU(struct irq_stack *, hardirq_stack); do_IRQ() -> handle_irq() ->
execute_on_irq_stack() switch the interrupt stack
Please let me know if these are correct or not.
For Interrupt handler there is IRQ stack. 2 kinds of stack comes into picture for interrupt handler:
Hardware IRQ Stack.
Software IRQ Stack.
In contrast to the regular kernel stack that is allocated per process, the two additional stacks are allocated per CPU. Whenever a hardware interrupt occurs (or a softIRQ is processed), the kernel needs to switch to the appropriate stack. Historically, interrupt handlers did not receive their own stacks. Instead, interrupt handlers would share the stack of the running process, they interrupted. The kernel stack is two pages in size; typically, that is 8KB on 32-bit architectures and 16KB on 64-bit architectures. Because in this setup interrupt handlers share the stack, they must be exceptionally frugal with what data they allocate there. Of course, the kernel stack is limited to begin with, so all kernel code should be cautious.
Pointers to the additional stacks are provided in the following array:
arch/x86/kernel/irq_32.c
static union irq_ctx *hardirq_ctx[NR_CPUS] __read_mostly;
static union irq_ctx *softirq_ctx[NR_CPUS] __read_mostly;
This is my assignment question. Now I understand that the difference between the user mode and kernel mode (I think that is system mode).
But my question is: How the process works in Linux? Does the system have both user mode and system mode stacks for each process it runs?
I believe that this question has already been answered here:
What is the difference between kernel stack and user stack?
kernel stack and user space stack
That is, a userspace process has only one stack, a pointer to which is defined in the second element of the task_struct in include/linux/sched.h (about line 1045 in 3.12).
There is a possibility of some confusion with the per-thread kernel stack, as noted in the above posts. In a sense, a process can have one or more stacks, userspace and kernel space, depending on the number of threads it has at any point in time. The connection between the per-thread kernel stack, the thread and the process task_struct is described in this lecture by Junfeng Yang.