I'm looking for a good description of stacks within the linux kernel, but I'm finding it surprisingly difficult to find anything useful.
I know that stacks are limited to 4k for most systems, and 8k for others. I'm assuming that each kernel thread / bottom half has its own stack. I've also heard that if an interrupt goes off, it uses the current thread's stack, but I can't find any documentation on any of this. What I'm looking for is how the stacks are allocated, if there's any good debugging routines for them (I'm suspecting a stack overflow for a particular problem, and I'd like to know if its possible to compile the kernel to police stack sizes, etc).
The reason that documentation is scarce is that it's an area that's quite architecture-dependent. The code is really the best documentation - for example, the THREAD_SIZE macro defines the (architecture-dependent) per-thread kernel stack size.
The stacks are allocated in alloc_thread_stack_node(). The stack pointer in the struct task_struct is updated in dup_task_struct(), which is called as part of cloning a thread.
The kernel does check for kernel stack overflows, by placing a canary value STACK_END_MAGIC at the end of the stack. In the page fault handler, if a fault in kernel space occurs this canary is checked - see for example the x86 fault handler which prints the message Thread overran stack, or stack corrupted after the Oops message if the stack canary has been clobbered.
Of course this won't trigger on all stack overruns, only the ones that clobber the stack canary. However, you should always be able to tell from the Oops output if you've suffered a stack overrun - that's the case if the stack pointer is below task->stack.
You can determine the process stack size with the ulimit command. I get 8192 KiB on my system:
$ ulimit -s
8192
For processes, you can control the stack size of processes via ulimit command (-s option). For threads, the default stack size varies a lot, but you can control it via a call to pthread_attr_setstacksize() (assuming you are using pthreads).
As for the interrupt using the userland stack, I somewhat doubt it, as accessing userland memory is a kind of a hassle from the kernel, especially from an interrupt routine. But I don't know for sure.
Related
The following image shows where the sections of a process are laid out in the process's virtual address space (in Linux):
You can see that there is only one stack section (since this process only has one thread I assume).
But what if this process has another thread, where will the stack for this second thread be located? will it be located immediately below the first stack?
Stack space for a new thread is created by the parent thread with mmap(MAP_ANONYMOUS|MAP_STACK). So they're in the "memory map segment", as your diagram labels it. It can end up anywhere that a large malloc() could go. (glibc malloc(3) uses mmap(MAP_ANONYMOUS) for large allocations.)
(MAP_STACK is currently a no-op, and exists in case some future architecture needs special handling).
You pass a pointer to the new thread's stack space to the clone(2) system call which actually creates the thread. (Try using strace -f on a multi-threaded process sometime). See also this blog post about creating a thread using raw Linux syscalls.
See this answer on a related question for some more details about mmaping stacks. e.g. MAP_GROWSDOWN doesn't prevent another mmap() from picking the address right below the thread stack, so you can't depend on it to dynamically grow a small stack the way you can for the main thread's stack (where the kernel reserves the address space even though it's not mapped yet).
So even though mmap(MAP_GROWSDOWN) was designed for allocating stacks, it's so bad that Ulrich Drepper proposed removing it in 2.6.29.
Also, note that your memory-map diagram is for a 32-bit kernel. A 64-bit kernel doesn't have to reserve any user virtual-address space for mapping kernel memory, so a 32-bit process running on an amd64 kernel can use the full 4GB of virtual address space. (Except for the low 64k by default (sysctl vm.mmap_min_addr = 65536), so NULL-pointer dereference does actually fault. And the top page is also reserved as error codes, not valid pointers.)
Related:
See Relation between stack limit and threads for more about stack-size for pthreads. getrlimit(RLIMIT_STACK) is the main thread's stack size. Linux pthreads uses RLIMIT_STACK as the stack size for new threads, too.
I know Linux kernel take thread kernel stack as ISR stack before 2.6.32, after 2.6.32, kernel uses separated stack, if wrong, please correct me.
Would you tell me when the ISR stack is setup/crated, or destroy if there is. Or tell me the source file name and line number? Thanks in advance.
Updated at Oct 17 2014:
There are several kinds of stack in Linux. Below are 3 major(not all) that I know.
User space process stack, each user space task has its own stack,
this is created by mmap() when task is created.
Kernel stack for user space task, one for each user space task, this is
created within do_fork()->copy_process()->dup_task_struct()->alloc_thread_info() and used for system_call.
Stack for hardware interruption(top half), one for each CPU(after 2.6),
defined in arch/x86/kernel/irq_32.c: DEFINE_PER_CPU(struct irq_stack *, hardirq_stack); do_IRQ() -> handle_irq() ->
execute_on_irq_stack() switch the interrupt stack
Please let me know if these are correct or not.
For Interrupt handler there is IRQ stack. 2 kinds of stack comes into picture for interrupt handler:
Hardware IRQ Stack.
Software IRQ Stack.
In contrast to the regular kernel stack that is allocated per process, the two additional stacks are allocated per CPU. Whenever a hardware interrupt occurs (or a softIRQ is processed), the kernel needs to switch to the appropriate stack. Historically, interrupt handlers did not receive their own stacks. Instead, interrupt handlers would share the stack of the running process, they interrupted. The kernel stack is two pages in size; typically, that is 8KB on 32-bit architectures and 16KB on 64-bit architectures. Because in this setup interrupt handlers share the stack, they must be exceptionally frugal with what data they allocate there. Of course, the kernel stack is limited to begin with, so all kernel code should be cautious.
Pointers to the additional stacks are provided in the following array:
arch/x86/kernel/irq_32.c
static union irq_ctx *hardirq_ctx[NR_CPUS] __read_mostly;
static union irq_ctx *softirq_ctx[NR_CPUS] __read_mostly;
I'm working on pthread in android NDK that process video data through network.
I meet problem that is 'stack corruption detected : aborted'. So I set -fstack-check in application.mk, and FATAL SIGNAL 11 blabla.. again.
My conclusion in this problem is about stack size.
When I use window thread, its stack size sets 1kb as default, increase automatically.
but, I don't know about pthread.
Is pthread increase stack size automatically?
p.s. I attached this thread to JavaVM.
On stock Android, pthread stacks are allocated at 1MB by default. (Because of the way the system works, only the parts of the stack that are actually touched get physical pages, so for many threads there's actually only a few KB in use.)
The "stack corruption" message indicates that something has trashed the stack, not that you've run off the end. One way to encounter this is to write off the end of a stack-allocated array. It might be useful to decode the stack trace you get in the log file and see what method it's in when it fails.
I recently ran into a bug with the "linux stack" and the "linux stack size". I came across a blog directing me to try
ulimit -a
to see what the limit for my box was, and it was set to 8192kb which seems to be the default.
What is the "linux stack"? How does it work, what does it store, what does it do?
The short answer is:
When programs on your linux box run, they add and remove data from the stack on a regular basis as the programs function. The stack size, referes to how much space is allocated in memory for the stack. If you increase the stack size, that allows the program to increase the number of routines that can be called. Each time a function is called, data can be added to the stack (stacked on top of the last routines data.)
Unless the program is a very complex, or designed for a special purpose, a stack size of 8192kb is normally fine. Some programs like graphics processing programs require you to increase the size of the stack to function. As they may store a lot of data on the stack.
Feel free to increase the stack size for those applications, its not a problem. To do so, use
ulimit -s bytes
BTW, What is a StackOverflowError?
Is the kernel stack for all process shared or there is a seperate kernel stack for each process? If it is seperate for each process where is this stack pointer stored? In task_struct ?
There is just one common kernel memory. In it each process has it's own task_struct + kernel stack (by default 8K).
In a context switch the old stack pointer is saved somewhere and the actual stack pointer is made to point to the top of the stack (or bottom depending on the hardware architecture) of the new process which is going to run.
This old article says that each process has its own kernel stack. See comments to why that seems to be a very good design.
I tried reading the current source to make sure, but since the kernel stack is "implicit", it's not visible in the task_struct. This is mentioned in the article.
This answer was edited to incorporate wisdom from comments. Thanks.
The book "Linux kernel Development" of Robert Love has a good explanation about process kernel stack.
And yes, each process has its own kernel stack and if I´m not wrong its pointer is stored on thread_info structure. But I´m not really sure about it, and the struct task_struct is stored on beginning or the end of process kernel stack, depending of CPU architecture.
Cheers.
Carlos Maiolino
I think each process has its own kernel mode stack. Driver is executing in the kernel mode, the process sometimes will be blocked while executing driver routine. and the operating system can schedule another process to run. The scheduled process can call driver routine again. If kernel stack is shared, 2 processes are using the kernel stack, things will be mixed up. I'm puzzled by this question for a long time. At first I think kernel stack is shared, some books say that. After I read the Linux kernel Development, and see some driver code, I begin to think the kernel stack is not shared.