How to interpret the Thread Chart in ASMProfiler? - multithreading

I am starting to use asmprofiler for some small programs I am doing as a hobby, now when I am watching the results I see a 'Thread Chart' tab and it shows each thread stack size and stack height vs (time?).
The problem is I don't understand what a thread's stack size and height mean and why this graph could be useful when profiling?

As I read the source code for this program:
Stack height is the number of function call stack frames present on the stack.
Stack size if the size in bytes of the stack.
You might use these graphs if you were:
debugging stack overflows, or
trying to gain understanding of a recursive algorithms performance, or
trying to optimise the reserved stack size for your threads, or
many other reasons that I have not thought of!

Related

Does a stack overflow always lead to a segmentation fault error?

In Linux, when a running program attempts to use more stack space than the limit (a stack overflow), that usually results in a "segmentation fault" error and the execution is aborted.
Is it guaranteed that exceeding the stack space limit will always cause a segmentation fault error? Or could it happen that the program continues to run, possibly with some erroneous behavior due to data having been corrupted?
Another way of putting this: if a program misbehaves by producing wrong results but without a crash, can the cause still be a stack overflow?
Edit: to clarify, this question is not about "stack buffer overflow", it is about stack overflow, when the stack space used by the program exceeds the stack size limit (the limit that is in Linux given by ulimit -s).
A stack overflow turning into an access violation requires memory management hardware of some sort. Without hardware-assisted memory protection, an overgrown stack will collide with some other memory allocation, causing mutual corruption.
On demand-paged virtual memory operating systems, the upper limit of the stack is protected by a guard page: a page of virtual memory which is reserved (will not be allocated to anything) and marked "not present" so that accessing it generates a violation. A guard page is only so many bytes wide; a stack pointer can still accidentally increment over the guard page and land in some unrelated writable memory (such as a mapping belonging to a heap allocation) where havoc will be wreaked without necessarily triggering any memory access violation.
In the C language we can easily cause large stack increments by declaring large, uninitialized non-static block-scoped arrays, like char array[8192]; // (twice as large as a 4096 byte guard page). Using features like alloca or C99 variable-length arrays, we can do this dynamically: we can write a program which reads an integer value as a run-time input, and increments the stack by that much.
I debugged a problem many years ago whereby third-party code had debug logging macros, inside whose expansions there was a temporay array like char print_buf[8192] that was used for formatting messages. This was used in a multi-threaded application with many threads, whose stacks were reduced to just 64 kilobytes in size. Thanks to this print_buf, a thread's overflown stack leaped right past the guard page, and landed in another thread's stack, corrupting its local variables, causing the proverbial "hilarity to ensue".

what exactly is program stack's growth direction?

I'm reading Professional Assembly Language by Richard Blum,and I am confusing about a Inconsistency in the book and I wondering what exactly is program stack's growth direction?
This is the picture from page 312, which is suggesting that program stack grows up.
But when I reached page 322,I see another version, which suggesting that program stack grows down.
and this
The book is not inconsistent; each drawing shows higher addresses at the top.
The first drawing illustrates a stack that grows downward. The caller pushes parameters onto the stack, then calls the new function. The act of calling pushes the return address onto the stack. The callee then pushes the current value of the base pointer onto the stack, copies the stack pointer into the base pointer, and decrements the stack pointer to make room for the callee's local variables.
Some background:
For different processors the meaning of the stack pointer and the direction of the stack may differ. For TMS Piccolo controllers stack is growing up so that "PUSH" will increment the stack pointer. The stack pointer may point to the value last pushed or to the location where the next value to be pushed is written to. The ARM processor allows all 4 possible combinations for the stack so there must be a convention on how to use the stack pointer.
On x86 processors:
On x86 processors stack ALWAYS grows downwards so a "PUSH" instruction will decrement the stack pointer; the stack pointer always points to the last value pushed.
The first picture shows you that the addresses after the stack pointer (address > stack pointer) already contain values. If you store more values to the stack they are stored to locations below the stack pointer (the next value will be stored to address -16(%ebp)). This means that the picture from page 312 also shows a down-growing stack.
-- Edit --
If a processor has a "PUSH" instruction the direction of stack growth is given by the CPU. For CPUs that do not have a "PUSH" instruction (like PowerPC or ARM without ARM-THUMB code) the Operating System has to define the direction of stack growth.
Stack growth direction varies with OS, CPU architecture, and probably a number of other things.
The most common layout has the stack start at the top of memory and grow down, while the heap starts at the bottom and grows up. Sometimes it's the other way around, eg. MacOS prior to OSX put the stack just above the code area, growing up, while the heap started at the top of memory and grew down.
Even more compelling definition of stack growth direction is (if the processor has it) interrupt stack. Some architectures (like PowerPC) don't really have a HW stack at all. Then the system designer can decide which way to implement the stack: Pre-incrementing, post-incrementing. pre-decrementing or post-decrementing.
In PPC the calls use link register, and next call overwrites it, if the return address is not programmatically saved.
PPC interrupts use 2 special registers - the "return address" and machine status. That's because instructions can be "restarted" after interrupt - a way to handle interrupts in pipelined architecture.
pre-increment: stack pointer is incremented before store in push - stack pointer points to the last used item. Seen in few more strange 8-bit architectures (some forth-processors and the like).
post-incrementing: store is done before stack pointer incrementing - stack pointer points to the first free stack element.
pre- and post decrementins: similar to the above, but stack grows downwards (more common).
Most common is post-decrementing.

pthread stack size increase automatically?

I'm working on pthread in android NDK that process video data through network.
I meet problem that is 'stack corruption detected : aborted'. So I set -fstack-check in application.mk, and FATAL SIGNAL 11 blabla.. again.
My conclusion in this problem is about stack size.
When I use window thread, its stack size sets 1kb as default, increase automatically.
but, I don't know about pthread.
Is pthread increase stack size automatically?
p.s. I attached this thread to JavaVM.
On stock Android, pthread stacks are allocated at 1MB by default. (Because of the way the system works, only the parts of the stack that are actually touched get physical pages, so for many threads there's actually only a few KB in use.)
The "stack corruption" message indicates that something has trashed the stack, not that you've run off the end. One way to encounter this is to write off the end of a stack-allocated array. It might be useful to decode the stack trace you get in the log file and see what method it's in when it fails.

What is the Linux Stack?

I recently ran into a bug with the "linux stack" and the "linux stack size". I came across a blog directing me to try
ulimit -a
to see what the limit for my box was, and it was set to 8192kb which seems to be the default.
What is the "linux stack"? How does it work, what does it store, what does it do?
The short answer is:
When programs on your linux box run, they add and remove data from the stack on a regular basis as the programs function. The stack size, referes to how much space is allocated in memory for the stack. If you increase the stack size, that allows the program to increase the number of routines that can be called. Each time a function is called, data can be added to the stack (stacked on top of the last routines data.)
Unless the program is a very complex, or designed for a special purpose, a stack size of 8192kb is normally fine. Some programs like graphics processing programs require you to increase the size of the stack to function. As they may store a lot of data on the stack.
Feel free to increase the stack size for those applications, its not a problem. To do so, use
ulimit -s bytes
BTW, What is a StackOverflowError?

Linux Stack Sizes

I'm looking for a good description of stacks within the linux kernel, but I'm finding it surprisingly difficult to find anything useful.
I know that stacks are limited to 4k for most systems, and 8k for others. I'm assuming that each kernel thread / bottom half has its own stack. I've also heard that if an interrupt goes off, it uses the current thread's stack, but I can't find any documentation on any of this. What I'm looking for is how the stacks are allocated, if there's any good debugging routines for them (I'm suspecting a stack overflow for a particular problem, and I'd like to know if its possible to compile the kernel to police stack sizes, etc).
The reason that documentation is scarce is that it's an area that's quite architecture-dependent. The code is really the best documentation - for example, the THREAD_SIZE macro defines the (architecture-dependent) per-thread kernel stack size.
The stacks are allocated in alloc_thread_stack_node(). The stack pointer in the struct task_struct is updated in dup_task_struct(), which is called as part of cloning a thread.
The kernel does check for kernel stack overflows, by placing a canary value STACK_END_MAGIC at the end of the stack. In the page fault handler, if a fault in kernel space occurs this canary is checked - see for example the x86 fault handler which prints the message Thread overran stack, or stack corrupted after the Oops message if the stack canary has been clobbered.
Of course this won't trigger on all stack overruns, only the ones that clobber the stack canary. However, you should always be able to tell from the Oops output if you've suffered a stack overrun - that's the case if the stack pointer is below task->stack.
You can determine the process stack size with the ulimit command. I get 8192 KiB on my system:
$ ulimit -s
8192
For processes, you can control the stack size of processes via ulimit command (-s option). For threads, the default stack size varies a lot, but you can control it via a call to pthread_attr_setstacksize() (assuming you are using pthreads).
As for the interrupt using the userland stack, I somewhat doubt it, as accessing userland memory is a kind of a hassle from the kernel, especially from an interrupt routine. But I don't know for sure.

Resources