I'm reading Professional Assembly Language by Richard Blum,and I am confusing about a Inconsistency in the book and I wondering what exactly is program stack's growth direction?
This is the picture from page 312, which is suggesting that program stack grows up.
But when I reached page 322,I see another version, which suggesting that program stack grows down.
and this
The book is not inconsistent; each drawing shows higher addresses at the top.
The first drawing illustrates a stack that grows downward. The caller pushes parameters onto the stack, then calls the new function. The act of calling pushes the return address onto the stack. The callee then pushes the current value of the base pointer onto the stack, copies the stack pointer into the base pointer, and decrements the stack pointer to make room for the callee's local variables.
Some background:
For different processors the meaning of the stack pointer and the direction of the stack may differ. For TMS Piccolo controllers stack is growing up so that "PUSH" will increment the stack pointer. The stack pointer may point to the value last pushed or to the location where the next value to be pushed is written to. The ARM processor allows all 4 possible combinations for the stack so there must be a convention on how to use the stack pointer.
On x86 processors:
On x86 processors stack ALWAYS grows downwards so a "PUSH" instruction will decrement the stack pointer; the stack pointer always points to the last value pushed.
The first picture shows you that the addresses after the stack pointer (address > stack pointer) already contain values. If you store more values to the stack they are stored to locations below the stack pointer (the next value will be stored to address -16(%ebp)). This means that the picture from page 312 also shows a down-growing stack.
-- Edit --
If a processor has a "PUSH" instruction the direction of stack growth is given by the CPU. For CPUs that do not have a "PUSH" instruction (like PowerPC or ARM without ARM-THUMB code) the Operating System has to define the direction of stack growth.
Stack growth direction varies with OS, CPU architecture, and probably a number of other things.
The most common layout has the stack start at the top of memory and grow down, while the heap starts at the bottom and grows up. Sometimes it's the other way around, eg. MacOS prior to OSX put the stack just above the code area, growing up, while the heap started at the top of memory and grew down.
Even more compelling definition of stack growth direction is (if the processor has it) interrupt stack. Some architectures (like PowerPC) don't really have a HW stack at all. Then the system designer can decide which way to implement the stack: Pre-incrementing, post-incrementing. pre-decrementing or post-decrementing.
In PPC the calls use link register, and next call overwrites it, if the return address is not programmatically saved.
PPC interrupts use 2 special registers - the "return address" and machine status. That's because instructions can be "restarted" after interrupt - a way to handle interrupts in pipelined architecture.
pre-increment: stack pointer is incremented before store in push - stack pointer points to the last used item. Seen in few more strange 8-bit architectures (some forth-processors and the like).
post-incrementing: store is done before stack pointer incrementing - stack pointer points to the first free stack element.
pre- and post decrementins: similar to the above, but stack grows downwards (more common).
Most common is post-decrementing.
Related
I am starting to use asmprofiler for some small programs I am doing as a hobby, now when I am watching the results I see a 'Thread Chart' tab and it shows each thread stack size and stack height vs (time?).
The problem is I don't understand what a thread's stack size and height mean and why this graph could be useful when profiling?
As I read the source code for this program:
Stack height is the number of function call stack frames present on the stack.
Stack size if the size in bytes of the stack.
You might use these graphs if you were:
debugging stack overflows, or
trying to gain understanding of a recursive algorithms performance, or
trying to optimise the reserved stack size for your threads, or
many other reasons that I have not thought of!
The answer probably differs depending on the OS, but I'm curious how much stack space does a thread normally preallocate. For example, if I use:
push rax
that will put a value on the stack and increment the rsp. But what if I never use a push op? I imagine some space still gets allocated, but how much? Also, is this a fixed amount or is does it grow dynamically with the amount of stuff pushed?
POSIX does not define any standards regarding stack size, it is entirely implementation dependent. Since you tagged this OSX, the default allocations there are :
Main thread (8MB)
Secondary Thread (512kB)
Naturally, these can be configured to suit your needs. The allocation is dynamic :
The minimum allowed stack size for secondary threads is 16 KB and the
stack size must be a multiple of 4 KB. The space for this memory is
set aside in your process space at thread creation time, but the
actual pages associated with that memory are not created until they
are needed.
There is too much detail to include here. I suggest you read :
Thread Management (Mac Developer Library)
I have defined an upward stack in xv6 (which had a downward stack) and want to know how I put a guard page between the stack and the heap. Is there any specific system call I can make use of? Also how can I maintain that one page address space to always lie between stack and heap?
so you know exactly from where your stack start growing up? In that case, why don't you just leave one page and just start from the next page onwards. And you might need to allocate and poison the memory with some data so that it could be detected. Like the way some of these memory overrun detection tools work. or you might need to set some custom flag to that page so that while trying to access them, you can check the flag and fault if found inappropriate.
Did I get your question correctly, btw?
I'm looking for a good description of stacks within the linux kernel, but I'm finding it surprisingly difficult to find anything useful.
I know that stacks are limited to 4k for most systems, and 8k for others. I'm assuming that each kernel thread / bottom half has its own stack. I've also heard that if an interrupt goes off, it uses the current thread's stack, but I can't find any documentation on any of this. What I'm looking for is how the stacks are allocated, if there's any good debugging routines for them (I'm suspecting a stack overflow for a particular problem, and I'd like to know if its possible to compile the kernel to police stack sizes, etc).
The reason that documentation is scarce is that it's an area that's quite architecture-dependent. The code is really the best documentation - for example, the THREAD_SIZE macro defines the (architecture-dependent) per-thread kernel stack size.
The stacks are allocated in alloc_thread_stack_node(). The stack pointer in the struct task_struct is updated in dup_task_struct(), which is called as part of cloning a thread.
The kernel does check for kernel stack overflows, by placing a canary value STACK_END_MAGIC at the end of the stack. In the page fault handler, if a fault in kernel space occurs this canary is checked - see for example the x86 fault handler which prints the message Thread overran stack, or stack corrupted after the Oops message if the stack canary has been clobbered.
Of course this won't trigger on all stack overruns, only the ones that clobber the stack canary. However, you should always be able to tell from the Oops output if you've suffered a stack overrun - that's the case if the stack pointer is below task->stack.
You can determine the process stack size with the ulimit command. I get 8192 KiB on my system:
$ ulimit -s
8192
For processes, you can control the stack size of processes via ulimit command (-s option). For threads, the default stack size varies a lot, but you can control it via a call to pthread_attr_setstacksize() (assuming you are using pthreads).
As for the interrupt using the userland stack, I somewhat doubt it, as accessing userland memory is a kind of a hassle from the kernel, especially from an interrupt routine. But I don't know for sure.
When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.
My Google research so far just led me to some (vague) historical speculations like this:
There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.
Can anyone confirm/deny this?
As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).
The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:
The code must be in a range which includes the starting point.
There must be a stack.
There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
There must be some room for mmap() system calls, including shared library loading.
Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.
But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.
As Michael points out, "losing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.
Why not start at address 0x0? There's at least two reasons for this:
Because address zero is famously known as a NULL pointer, and used by programming languages to sane check pointers. You can't use an address value for that, if you're going to execute code there.
The actual contents at address 0 is often (but not always) the exception vector table, and is hence not accessible in non-privileged modes. Consult the documentation of your specific architecture.
As for the entrypoint _start vs main:
If you link against the C runtime (the C standard libraries), the library wraps the function named main, so it can initialize the environment before main is called. On Linux, these are the argc and argv parameters to the application, the env variables, and probably some synchronization primitives and locks. It also makes sure that returning from main passes on the status code, and calls the _exit function, which terminates the process.