Total stack sizes of threads in one process - linux

I use pthreads_attr_getthreadsizes() to get default stack size of one thread, 8MB on my machine.
But when I create 8 threads and allocate a very large stack size to them, say hundreds of MB, the program crash.
So, I guess, shall
("Number of threads" * "stack size per thread") < a constant value (e.g. virtual memory size)
?

The short answer is "Yes".
The longer answer is that all of your threads share one virtual address space, and userspace-usable part of this space must be therefore be large enough to contain all thread stacks (as well as the code, static data, heap, libraries and any miscellaneous mappings).
Multi-hundred-megabyte stacks are a good indication that You're Doing It Wrong, as they say in the classics.

Related

Increase stack size

I'm doing computations with huge arrays and for some of this computations I need an increased stack size! Is there any downside of setting the stack size to unlimited (ulimit -s unlimited) in my ~/.bashrc?
The program is written in fortran(F77 & F90) and parallelized with MPI. Some of my arrays have more than 2E7 entries and when I use a small number of cores with MPI it crashes with segmentation fault.
The array size stays the same through the whole computation therefore I setted them to fixes value:
real :: p(200,200,400)
integer :: ib,ie,jb,je,kb,ke
...
ib=1;ie=199
jb=2;je=198
kb=2;ke=398
call SOLVE_POI_EQ(rank,p(ib:ie,jb:je,kb:ke),R)
Setting the stacksize to unlimited likely won't help you. You are allocating a chunk of 64MB on the stack, and likely don't fill it from the top, but from the bottom.
This is important, because the OS grows the stack as you go. Whenever it detects a page-fault right below the stack segment, it will assume that you need more space, and silently insert a new page. The size of this trigger-region within your address-space is limited, though, and I doubt that its larger than 64 MB. Since you index variables are likely placed below your array on the stack, accessing them already does the 64 MB jump that kills your process.
Just make your array allocatable, add the corresponding allocate() statement, and you should be fine.
Stack size is never really unlimited, so you would still have some failures. And your code still won't be portable to Linux systems with smaller (or normal-sized) stacks.
BTW, you should explain which kind of programs are you running, show some source code.
If coding in C++, using standard containers should help a lot (regarding actual stack consumption). For example, a local (stack allocated) std::vector<int> v(10000); (instead of int v[10000];) has its data allocated on the heap (and deallocated by the destructor when you exit from the block defining it)
It would be much better to improve your programs to avoid excessive stack consumption. The need of a lot of stack space is really a bug that you should try to correct. A typical rule of thumb is to have call frames smaller than a few kilobytes (so allocate any larger data on the heap).
You might consider also using the Boehm conservative garbage collector: you would use GC_MALLOC instead of malloc (and you would heap allocate large data structure using GC_MALLOC) but you won't have to bother to free your (GC-heap allcoated) data.

How does an OS implement or maintain a stack for each thread?

There have been various questions on SO on whether or not threads get their own stack. However I fail to understand how the OS implements or how do OSs generally implement one stack per thread. In OS books the memory layout of a program is shown as thus:
Note that it can be considered as a contiguous block of memory ( virtual memory). I would imagine some part of the virtual memory space is divided among the stacks for the threads. Which brings me to the second part of this question: a popular technical interview question involves trying to implement 3 stacks using a single array. Is this problem directly related to solving the implementation of thread stacks.
I summarize my questions thus:
How does a modern day OS, say Linux divide the memory space for stacks of different threads?
Is the "3 stacks using 1 array" directly related to or an answer for the above question?
PS: Perhaps images to explain how the memory is divided for different thread stacks would be best to explain.
The picture shown above is totally obsolete on both Windows and Linux. It doesn't really matter at what addresses the individual allocations are located. Virtual address space is big on 32 bit and vast on 64 bit. The OS just needs to carve out some chunk of it somewhere and hand it out.
Each stack is an independent virtual memory allocation that can be placed at arbitrary locations. It is important to note that stacks are generally finite in size. The OS reserves a certain maximum size (such a 1MB or 8MB). The stack cannot exceed that size. This is suggested differently in the (obsolete) picture above. The stack indeed grows down, but when the fixed space is exhausted a stack overflow is triggered. This is not a concern in practice. In fact, exceeding a reasonable stack size is considered to be a bug.
Binary images (above: text, initialized data and bss) are also just placed anywhere. They are fixed in size as well.
The heap consists of multiple segments. It can grow arbitrarily by just adding more segments. The heap is managed by user-mode libraries. The kernel doesn't know about it. All the kernel does is provide slabs of virtual memory at locations chosen at will.
1)Thread's stack is just a contiguous block in virtual memory. It's maximal size is fixed. It may look like that:
2)I don't think it is directly related to this problem because thread's stack size limit is known when a thread is created, but nothing is known about each of 3 stack's sizes in a problem about "3 stacks using 1 array".

How much stack space is typically reserved for a thread? (POSIX / OSX)

The answer probably differs depending on the OS, but I'm curious how much stack space does a thread normally preallocate. For example, if I use:
push rax
that will put a value on the stack and increment the rsp. But what if I never use a push op? I imagine some space still gets allocated, but how much? Also, is this a fixed amount or is does it grow dynamically with the amount of stuff pushed?
POSIX does not define any standards regarding stack size, it is entirely implementation dependent. Since you tagged this OSX, the default allocations there are :
Main thread (8MB)
Secondary Thread (512kB)
Naturally, these can be configured to suit your needs. The allocation is dynamic :
The minimum allowed stack size for secondary threads is 16 KB and the
stack size must be a multiple of 4 KB. The space for this memory is
set aside in your process space at thread creation time, but the
actual pages associated with that memory are not created until they
are needed.
There is too much detail to include here. I suggest you read :
Thread Management (Mac Developer Library)

Why has a (C-)stack a maximum of 2mb?

This question is about stack overflows, so where better to ask it than here.
If we consider how memory is used for a program (a.out) in unix, it is something like this:
| etext | stack, 2mb | heap ->>>
And I have wondered for a few years now why there is a restriction of 2MB for the stack. Consider that we have 64 bits for a memory address, then why not allocate like this:
| MIN_ADDR MAX_ADDR|
| heap ->>>> <<<- stack | etext |
MAX_ADDR will be somewhere near 2^64 and MIN_ADDR somewhere near 2^0, so there are many bytes in between which the program can use, but are not necessarily accounted for by the kernel (by actually assigning pages for them). The heap and stack will probably never reach each other, and hence the 2MB limit is not needed ( and would instead have a ~1.8446744e+19 bytes limit). If we are scared that they will reach each other, then set the limit to 2^63 or some bizarre and enormous number.
Furthermore, the heap grows from low to high, so our kernel can still resize blocks of memory (allocated with for example malloc) without necessarily needing to shift the content.
Moreover, a stack frame is always static in size in some way. So we never need to resize there, if we do, that would be awkward anyway, since we also need to change the whole pointer structure used by return and created by call.
I read this as an answer on another stackoverflow question:
"My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create."
Source: Why is the page size of Linux (x86) 4 KB, how is that calcualted
But we have loads of memory addresses! So this makes no sense. So why 2MB?
The reason I ask is that allocating memory on the stack is quite safe with respect to dangling pointers and memory leaks:
e.g. I prefer
int foo[5];
instead of
int *foo = malloc(5*sizeof(int));
Since it will deallocate by itself. Also, allocation on the stack is faster than allocation executed by malloc. However, If I allocate an image (i.e. a jpeg or png) on the stack, I am in a dangerous zone of overflowing the stack.
Another point on this matter, why not also allow this:
int *huge_list_of_data = malloc(1000*sizeof(char), 10 000 000 000*sizeof(char))
where we allocate a list object, which has initially the size of 1KB, but we ask the kernel to allocate it such that the page it is put on is not used for anything else, and that we want to have 10GB of pages behind it, which can be (partially) swapped in when necessary.
This way we don't need 10GB of memory, we only need 10GB of memory addresses.
So why no:
void *malloc( unsigned long, unsigned long );
?
In essence: WHY NOT USE THE PAGING SYSTEM OF UNIX TO SOLVE OUR MEMORY ALLOCATION PROBLEMS?
Thank you for reading.

Maximum Thread Stack Size .NET?

What is the maximum stack size allowed for a thread in C#.NET 2.0? Also, does this value depend on the version of the CLR and/or the bitness (32 or 64) of the underlying OS?
I have looked at the following resources msdn1 and msdn2
public Thread(
ThreadStart start,
int maxStackSize
)
The only information I can see is that the default size is 1 megabytes and in the above method, if maxStackSize is '0' the default maximum stack size specified in the header for the executable will be used, what's the maximum value that we can change the value in the header upto? Also is it advisable to do so? Thanks.
For the record, this fits Raymond Chen's category of "if you need to know then you are doing something wrong".
The default stack size for threads running 64-bit code is 4 megabytes, 1 megabyte for 32-bit code. While the Thread constructor lets you pass a integer value up to int.MaxValue, you'll never get that on a 32-bit machine. The stack must fit in an available hole in the virtual memory address space, that usually tops out at ~600 MB early in the process lifetime. Rapidly getting smaller as you allocate memory and fragment the address space.
Allocating more than the default is quite unnecessary. You might contemplate doing this when you have a heavily recursive method that blows the stack. Don't, fix the algorithm or you'll blow it anyway when the job gets bigger.
The smallest stack that .NET lets you choose is 250 KB. It silently rounds it up if you pass a value that's smaller. Necessary because both the jitter and the garbage collector need stack space to get their job done. Again, doing so should be quite unnecessary. If you contemplate doing so because you have a lot of threads and consume all virtual memory with their stacks then you have too many threads. A StackOverflowException is one of the nastiest runtime exceptions you can get. Process death is immediate and untrappable.
The stack size for the main thread is determined by an option in the EXE header. The compiler doesn't have an option to change it, you have to use editbin.exe /stack to patch the .exe header.
I am unaware of what the maximum is, but MSDN speaks to whether you should do it or not:
Avoid using this constructor overload. The default stack size used by the Thread(ThreadStart) constructor overload is the recommended stack size for threads. If a thread has memory problems, the most likely cause is programming error, such as infinite recursion.
I have never had a StackOverflow occur in C# which was not due to infinite recursion. If there truly was a case where recursion went to that depth, I would consider replacing it with iteration.

Resources