Stack memory management in Linux - linux

I have a few questions related to limitations on the stack size for Linux. I'm most interested in x86_64 systems, but if there are platform differences I'd like to understand them as well. My questions are:
1) How does Linux dynamically increase the size of the stack?
I've written a test program with a recursive function (to use stack space) where I can specify the number of iterations as a command line parameter. The program pauses waiting for user input after finishing the recursion, which allows me to get information about the running process. If I run with a small number of iterations and then use pmap to view the stack size it is 132K.
00007fff0aa3c000 132K rw--- [ stack ]
Whereas if I run with a larger number of iterations the size can grow much larger, I believe up to 8192 KB by default. For example here's output from running with more iterations.
00007fff3ed75000 8040K rw--- [ stack ]
But if I use strace to trace the system calls when running the application I don't see any associated with growing the stack. So I'm wondering what the kernel is doing to manage the stack space for a process.
2) Does Linux protect the stack region in any way when setting ulimit -s unlimited?
If I use the command ulimit -s unlimited then I'm able to run many more iterations of my recursive function and the stack grows much larger. For example here's output from pmap
00007ffda43a3000 8031260K rw--- [ stack ]
Since I don't want to cause my machine to crash/hang/lockup I haven't tested with infinite recursion yet. But I'm wondering if I did is there anything that would cause the kernel to detect stack overflow. Or is ulimit the only protection and turning that off allows the stack to grow unbounded?
3) How are stack guard pages handled?
This isn't directly related to anything I've experimented with, but I'm also wondering how Linux manages stack guard pages in conjunction with allowing the stack to grow dynamically.

For each running process, Linux keeps a list of regions of the virtual memory addresses. If an address reference generates a page fault, Linux checks that list to see if the virtual address is legal (within the range of one of the regions). If not claimed by a region, the application gets a SIGSEGV error, otherwise the kernel allocates another page of system memory and adds to the translation caches. if the faulting address just misses a region, and that region is for a stack (which grow up or down, according to the machine architecture) then Linux allocates another VM page, maps it into the region, thus growing the stack.
The kernel does not protect the stack. If a stack access causes a page fault because a physical VM page is not attached to a memory region for the process, the process's rlimit is tested to see if adding another page is permitted.
Stack guard pages are using by some malloc(3) debugger libraries. What these to is to expand each memory request by 2 VM pages: one page before the new page, one page after it. The extra pages are marked as no-access-at-all so if the application walks off the end of a region, or moves before the beginning, the application gets an access violation.
The above has been shamelessly over-simplified but still should give the gist.

Related

Where are the stacks for the other threads located in a process virtual address space?

The following image shows where the sections of a process are laid out in the process's virtual address space (in Linux):
You can see that there is only one stack section (since this process only has one thread I assume).
But what if this process has another thread, where will the stack for this second thread be located? will it be located immediately below the first stack?
Stack space for a new thread is created by the parent thread with mmap(MAP_ANONYMOUS|MAP_STACK). So they're in the "memory map segment", as your diagram labels it. It can end up anywhere that a large malloc() could go. (glibc malloc(3) uses mmap(MAP_ANONYMOUS) for large allocations.)
(MAP_STACK is currently a no-op, and exists in case some future architecture needs special handling).
You pass a pointer to the new thread's stack space to the clone(2) system call which actually creates the thread. (Try using strace -f on a multi-threaded process sometime). See also this blog post about creating a thread using raw Linux syscalls.
See this answer on a related question for some more details about mmaping stacks. e.g. MAP_GROWSDOWN doesn't prevent another mmap() from picking the address right below the thread stack, so you can't depend on it to dynamically grow a small stack the way you can for the main thread's stack (where the kernel reserves the address space even though it's not mapped yet).
So even though mmap(MAP_GROWSDOWN) was designed for allocating stacks, it's so bad that Ulrich Drepper proposed removing it in 2.6.29.
Also, note that your memory-map diagram is for a 32-bit kernel. A 64-bit kernel doesn't have to reserve any user virtual-address space for mapping kernel memory, so a 32-bit process running on an amd64 kernel can use the full 4GB of virtual address space. (Except for the low 64k by default (sysctl vm.mmap_min_addr = 65536), so NULL-pointer dereference does actually fault. And the top page is also reserved as error codes, not valid pointers.)
Related:
See Relation between stack limit and threads for more about stack-size for pthreads. getrlimit(RLIMIT_STACK) is the main thread's stack size. Linux pthreads uses RLIMIT_STACK as the stack size for new threads, too.

How much stack space is typically reserved for a thread? (POSIX / OSX)

The answer probably differs depending on the OS, but I'm curious how much stack space does a thread normally preallocate. For example, if I use:
push rax
that will put a value on the stack and increment the rsp. But what if I never use a push op? I imagine some space still gets allocated, but how much? Also, is this a fixed amount or is does it grow dynamically with the amount of stuff pushed?
POSIX does not define any standards regarding stack size, it is entirely implementation dependent. Since you tagged this OSX, the default allocations there are :
Main thread (8MB)
Secondary Thread (512kB)
Naturally, these can be configured to suit your needs. The allocation is dynamic :
The minimum allowed stack size for secondary threads is 16 KB and the
stack size must be a multiple of 4 KB. The space for this memory is
set aside in your process space at thread creation time, but the
actual pages associated with that memory are not created until they
are needed.
There is too much detail to include here. I suggest you read :
Thread Management (Mac Developer Library)

What is the Linux Stack?

I recently ran into a bug with the "linux stack" and the "linux stack size". I came across a blog directing me to try
ulimit -a
to see what the limit for my box was, and it was set to 8192kb which seems to be the default.
What is the "linux stack"? How does it work, what does it store, what does it do?
The short answer is:
When programs on your linux box run, they add and remove data from the stack on a regular basis as the programs function. The stack size, referes to how much space is allocated in memory for the stack. If you increase the stack size, that allows the program to increase the number of routines that can be called. Each time a function is called, data can be added to the stack (stacked on top of the last routines data.)
Unless the program is a very complex, or designed for a special purpose, a stack size of 8192kb is normally fine. Some programs like graphics processing programs require you to increase the size of the stack to function. As they may store a lot of data on the stack.
Feel free to increase the stack size for those applications, its not a problem. To do so, use
ulimit -s bytes
BTW, What is a StackOverflowError?

How does the amount of memory for a process get determined?

From my understanding, when a process is under execution it has some amount of memory at it's disposal. As the stack increases in size it builds from one end of the process (disregarding global variables that come before the stack), while the heap builds from another end. If you keep adding to the stack or heap, eventually all the memory will be used up for this process.
How does the amount of memory the process is given get determined? I can only imagine it depends on a bunch of different variables, but an as-general-as-possible response would be great. If things have to get specific, I'm interested in linux processes written in C++.
On most platforms you will encounter, Linux runs with virtual memory enabled. This means that each process has its own virtual address space, the size of which is determined only by the hardware and the way the kernel has configured it.
For example, on the x86 architecture with a "3/1" split configuration, every userspace process has 3GB of address space available to it, within which the heap and stack are allocated. This is regardless of how much physical memory is available in the system. On the x86-64 architecture, 128TB of address space is typically available to each userspace process.
Physical memory is separately allocated to back that virtual memory. The amount of this available to a process depends upon the configuration of the system, but in general it's simply supplied "on-demand" - limited mostly how much physical memory and swap file space exists, and how much is currently in use for other purposes.
The stack does not magically grow. It's size is static and the size is determined at linking time. So when you take enough space from the stack, it overflows (stack overflow ;)
On the other hand, the heap area 'magically' grows. Meaning that when ever more memory is needed for heap, the program asks operating system for more memory.
EDIT: As Mat pointed out below, the stack actually can increase during runtime on modern operating systems.

Linux Stack Sizes

I'm looking for a good description of stacks within the linux kernel, but I'm finding it surprisingly difficult to find anything useful.
I know that stacks are limited to 4k for most systems, and 8k for others. I'm assuming that each kernel thread / bottom half has its own stack. I've also heard that if an interrupt goes off, it uses the current thread's stack, but I can't find any documentation on any of this. What I'm looking for is how the stacks are allocated, if there's any good debugging routines for them (I'm suspecting a stack overflow for a particular problem, and I'd like to know if its possible to compile the kernel to police stack sizes, etc).
The reason that documentation is scarce is that it's an area that's quite architecture-dependent. The code is really the best documentation - for example, the THREAD_SIZE macro defines the (architecture-dependent) per-thread kernel stack size.
The stacks are allocated in alloc_thread_stack_node(). The stack pointer in the struct task_struct is updated in dup_task_struct(), which is called as part of cloning a thread.
The kernel does check for kernel stack overflows, by placing a canary value STACK_END_MAGIC at the end of the stack. In the page fault handler, if a fault in kernel space occurs this canary is checked - see for example the x86 fault handler which prints the message Thread overran stack, or stack corrupted after the Oops message if the stack canary has been clobbered.
Of course this won't trigger on all stack overruns, only the ones that clobber the stack canary. However, you should always be able to tell from the Oops output if you've suffered a stack overrun - that's the case if the stack pointer is below task->stack.
You can determine the process stack size with the ulimit command. I get 8192 KiB on my system:
$ ulimit -s
8192
For processes, you can control the stack size of processes via ulimit command (-s option). For threads, the default stack size varies a lot, but you can control it via a call to pthread_attr_setstacksize() (assuming you are using pthreads).
As for the interrupt using the userland stack, I somewhat doubt it, as accessing userland memory is a kind of a hassle from the kernel, especially from an interrupt routine. But I don't know for sure.

Resources