Assembly - Thread Safe Local Variables - multithreading

I'm trying to have thread-safe local variables in an assembly program.
I've searched on the net, but I haven't found anything simple.
I'm currently using GCC assembler, as the program is a mix of C code and assembly, but the final program will contain code for multiple-platforms / calling conventions.
For now, I've declared my variables using the .lcomm pseudo-op.
As I understand it, those variables will be placed in the .bss section.
So I guess they will be shared by all threads.
Is there a way to have a kind of TLS variables directly in assembly, or should I use platform-specific implementations, like pthread or __declspec on Windows?
Hope it's clear enough. Don't hesitate to ask if more information is needed.
Thanks to everyone,
EDIT
Here's the code in question:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp
movq %rsp, %rbp
xor %rax, %rax
popq stack0( %rip )
popq stack1( %rip )
callq *%rdi
pushq stack1( %rip )
pushq stack0( %rip )
leave
ret
Basically, it's used to redirect a call to a C function.
The C prototype is:
extern uint64_t XSRuntime_CallMethod( void ( *m )( void * self, ... ), ... );
It takes a function pointer as first argument, hence the callq *%rdi, as I'm testing this with system V ABI.
The assembly code is very simple, and I'd like to keep it that way, so it can be easily maintainable.
The question is: how to make the stack0 and stack1 variables thread safe.

Not that familiar with assembler so:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp // save BP
movq %rsp, %rbp // load BP with SP
xor %rax, %rax // clear AX
popq stack0( %rip ) // pop return address into STACK0
popq stack1( %rip ) // pop flags into stack1
callq *%rdi // call the indirect procedure, so putting flags/return to XSRuntime_CallMethod onto stack
pushq stack1( %rip ) // put caller flags onto stack
pushq stack0( %rip ) // put caller return onto stack
leave // clean passed parameters from stack
ret // and back to caller
Is this how it works, yes??
If so, would it not be easier to just jump to the indirect procedure, rather than calling it? You then don't need any extra variables to hold the caller flags/return & the indirected procedure returns directly to the caller.
Just a suggestion - while since I did assembler.
If you have to store the caller address somewhere, dec the SP, (enter?), and use the stack frame. Anything else is likely to be thread-unsafe at some point.
Rgds,
Martin
Well, with a TLS maybee not thread-unsafe, but what about any recursive calls? You end up with another stack in the TLS to cover this, so you may as well use the 'SP' stack
Martin

How do you think the compiler implements thread-local variables? Try compiling such a program with -S or /FAs and you'll see. Hint: it has to rely on OS-specific APIs or other details to get access to TLS storage. Sometimes the preparation steps are hidden in the CRT but there's no single way to do it.
For example, here's how recent MSVC does it:
_TLS SEGMENT
?number##3HA DD 01H DUP (?) ; number
_TLS ENDS
EXTRN __tls_array:DWORD
EXTRN __tls_index:DWORD
_TEXT SEGMENT
[...]
mov eax, DWORD PTR __tls_index
mov ecx, DWORD PTR fs:__tls_array
mov edx, DWORD PTR [ecx+eax*4]
mov eax, DWORD PTR ?number##3HA[edx]
As you can see, it uses special variables which are initialized by the CRT.
On recent Linux, GCC can use TLS-specific relocations:
.globl number
.section .tbss,"awT",#nobits
number:
.zero 4
.text
[...]
movl %gs:number#NTPOFF, %eax
If you want portability, it's best not to rely on such OS-specific details but use a generic API such as pthread or use stack-based approach proposed by Martin. But I guess if you wanted portability you wouldn't use assembler :)

?? 'Classic' local variables, ie. parameters/variables/results accessed by stack offsets, are inherently thread-safe.
If you need a 'TLS' that is platform-agnostic, pass some suitable struct/class instance into all threads, either as the creation parameter, in a thread field before resuming all the threads, first message to the threads input queue or whatever...
Rgds,
Martin

As previously mentioned, local variables (stack-based) are inherently thread-safe because each thread has its own stack.
A thread-safe variable which is reachable by all threads (not stack-based) is probably best implemented using a spin-lock (or the equivalent in the Windows NT engine, a critical section). Such a variable must be locked before access, accessed and then unlocked. One variant could be that reads are free, but writes must be framed by locking/unlocking.
AFAIK only compilers themselves don't implement thread-safe variables. Instead they provide lib functions which access the required OS functionality.

You should probably be using(placing calls to) the TlsAlloc & TlsFree (or their other OS equivalents) to do something like this. the returned indices can be stored in global set once, read-only vars for easy use.
Depending on what the vars hold and what the code using them does, you might be able to get away with atomic ops, but that might yield problems of its own.

Related

.struct directive of GNU Assembler - How to instantiate a class instance?

I'm trying to create an object-oriented class using GNU assembly for educational purposes. I have many questions regarding the use of the .struct directive:
It is said that this directive switch the code to the absolute section. Why is it named .struct then? Does it have anything to do with the struct of C?
What is the difference between using:
.set Object.data, 8
.set Object.func_pointer, 16
and using
.struct 0
Object:
Object.data:
.struct Object + 8
Object.func_pointer:
.struct Object.data + 8
Object_size = . - Object
Is this actually the way to use the .struct directive or it it a miss-use? To be honest I don't even know what I'm doing with the .struct directive. It looks useless to me?
The full code I provided below will allocate one instance of the Object class manually. If I want to instantiate the class whenever I want using a constructor function, do I need to move my object onto the stack and how would that be done? using .bss I can name my instance (for example test_object) but I'm not sure if I can name a stack address.
Full code so you can understand what I'm talking about: (GNU as, Linux x64)
.data
msg: .asciz "Hello world!\n"
.set msglen, . - msg
# CLASS struct
# This class struct only hold pointers to functions available for the class
.struct 0
Object:
# Class variables
Object.data:
.struct Object + 8
# Class pointers
Object.func_pointer:
.struct Object.data + 8
Object_size = . - Object
# Allocate Object_size bytes in the bss section to store the object instance
.bss
test_object:
.space Object_size
#----------------------------------------------------------------------------
# Code starts here
.text
#----------------------------------------------------------------------------
# Simple hello world function used as the target
hellofunc:
push %rbp
mov %rsp, %rbp
mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $msglen, %rdx
syscall
leave
ret
# Initiate the Object class by:
# Allocate a chunk of Object_size bytes on the stack
# Load value given in %rdi into Object.data
# Load the address of hellofunc into Object.func_pointer
initiate:
# Prologue
push %rbp
mov %rsp, %rbp
# Make %rdx point at the current object instance
mov $test_object, %rdx
# Load hellofunc address onto test_object.func_pointer
mov $hellofunc, %rax
mov %rax, Object.func_pointer(%rdx)
# Load data from %rdi onto test_object.data
mov %rdi, Object.data(%rdx)
# Test call to see if test_object.func_pointer is properly loaded
call *Object.func_pointer(%rdx)
# Epilogue
leave
ret
#---------------------------------------------------------------------------
# Main function
.globl _start
#---------------------------------------------------------------------------
_start:
mov $0xDEADBEEF, %rdi
call initiate
# return(0);
mov $60, %rax
mov $0, %rdi
syscall
I'm pretty sure all you're ever going to get out of any assembly language struct syntax is symbolic names for offsets, and a way to define them using the same .space directives that you'd normally use to reserve space for one static instance of a struct in the BSS. That's what NASM gives you, for example, and GAS's is even more rudimentary than that.
So yes, it's like a struct foo {int x; char c;}; definition in C: you use syntax that looks like declaring (reserving space for) global variables, but instead you're just defining the layout of a struct. (Except in GAS, you can only use .space and similar directives that can't take a value, no distinguishing one int from an array of 4 chars for example.)
But in GAS, it's so primitive that yes, it's just one way to
separately set a bunch of assemble-time constants with names like Object.func_pointer. You could do exactly the same thing with Object.func_pointer = 8, as well.
What is the difference between using: (.set) and (.struct)
Nothing. That's all it's doing for you. There isn't anything like a concept of a struct object, it's just one part of the tools that you can use to get symbolic names for offsets. They chose to call it .struct because that's one use-case.
However, you're repeating yourself unnecessarily by using .struct repeatedly (the example in the GAS manual does this, too; I don't know why). AFAIK you do have to repeat the struct name in each label (because GAS .struct syntax isn't very sophisticated), but you can use .space instead of having to name the previous field.
.struct 0
Object:
# .space 4
Object.obdata:
.space 8
Object.func_pointer:
.space 8
Object_size = . - Object
.text
mov $Object.obdata, %eax
mov $Object.func_pointer, %ecx
gcc -c foo.s && objdump -drwC -Mintel foo.o
...
0: b8 00 00 00 00 mov eax,0x0
5: b9 08 00 00 00 mov ecx,0x8
The names you choose to define after switching to the absolute section are 100% up to you, and not associated with any overall name for the struct.
NASM's struct support is a bit more sophisticated: you can define a layout in one spot, and use those member names when statically initializing an instance of it (e.g. in the .data section). Nasm - access struct elements by value and by address has an example. Note that it involves endstruc and iend directives, because you are actually defining a struct with members, not just separately setting a bunch of assemble-time constants with names like Object.func_pointer
Anything else like "constructors" you'll have to do yourself. e.g. writing the members into memory somewhere that you've decided is now an instance of a struct object.
This is assembly language, there is no compiler, any instructions you want in the final machine code you're going to have to write explicitly. (Unless you were using MASM, which magically adds instructions to your code in a few cases. GAS certainly doesn't; it was primarily designed to compiler compiler-generated asm. Deciding to emit instructions for a "constructor" at a certain point is something a compiler does internally, not via any special asm syntax. Same for hand-written asm.)

Understanding assembly language _start label in a C program

I had written a simple c program and was trying to do use GDB to debug the program. I understand the use of following in main function:
On entry
push %ebp
mov %esp,%ebp
On exit
leave
ret
Then I tried gdb on _start and I got the following
xor %ebp,%ebp
pop %esi
mov %esp,%ecx
and $0xfffffff0,%esp
push %eax
push %esp
push %edx
push $0x80484d0
push $0x8048470
push %ecx
push %esi
push $0x8048414
call 0x8048328 <__libc_start_main#plt>
hlt
nop
nop
nop
nop
I am unable to understand these lines, and the logic behind this.
Can someone provide any guidance to help explain the code of _start?
Here is the well commented assembly source of the code you posted.
Summarized, it does the following things:
establish a sentinel stack frame with ebp = 0 so code that walks the stack can find its end easily
Pop the number of command line arguments into esi so we can pass them to __libc_start_main
Align the stack pointer to a multiple of 16 bits in order to comply with the ABI. This is not guaranteed to be the case in some versions of Linux so it has to be done manually just in case.
The addresses of __libc_csu_fini, __libc_csu_init, the argument vector, the number of arguments and the address of main are pushed as arguments to __libc_start_main
__libc_start_main is called. This function (source code here) sets up some glibc-internal variables and eventually calls main. It never returns.
If for any reason __libc_start_main should return, a hlt instruction is placed afterwards. This instruction is not allowed in user code and should cause the program to crash (hopefully).
The final series of nop instructions is padding inserted by the assembler so the next function starts at a multiple of 16 bytes for better performance. It is never reached in normal execution.
for gnu tools the _start label is the entry point of the program. for the C language to work you need to have a stack you need to have some memory/variables zeroed and some set to the values you chose:
int x = 5;
int y;
int fun ( void )
{
static int z;
}
all three of these variables x,y,z are essentially global, one is a local global. since we wrote it that way we assume that when our program starts x contains the value 5 and it is assumed that y is zero. in order for those things to happen, some bootstrap code is required and that is what happens (and more) between _start and main().
Other toolchains may choose to use a different label to define the entry/start point, but gnu tools use _start. there may be other things your tools require before main() is called C++ for example requires more than C.

Are return-to-libc attacks possible now?

I'v read that to make a successful return to lib-c attack, the attacker should store the address of the command (for example "bin/sh") on the stack exactly after the return address to 'system' function (for example).
So as the 'system()' function reads that address as its 'parameter' and executes that command. But now after disassembling a program which calls system() I noticed that it doesn't use the stack to get the address of that string ("bin/sh"). Instead the address is stored in EDI or RDI registers. As long as the attacker can't access the registers how is it possible to perform such an attack?
It might actually be easy for an attacker to have the correct value stored in RDI without having direct access to the registers.
Take a typical vulnerable C function that might lead to a return-to-libc attack:
void f(const char* str)
{
char buf[BUF_LEN];
strcpy(buf, str);
}
On my machine, the following assembly is generated for the call to strcpy:
movq %rdi, -24(%rbp)
movq -24(%rbp), %rdx
leaq -16(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call strcpy
As you can see, the destination buffer of strcpy, which is the vulnerable buffer, is stored in RDI. This means that if we manage to overwrite the return address with the address of system, it will be called with a pointer to the vulnerable buffer as its argument.
A small disclaimer: this particular example is just to illustrate that having the correct value in a register can be accomplished by other ways than having direct access to the register. The example itself is actually very difficult to exploit since strcpy will stop after the first null byte.

Linux Stack - main() why ebp is not pushed onto the stack

I'm viewing the stack at the beginning of main, but the ebp of main is missing.
I declared a variable to check where will it's located on the stack, it turns out that there is zeros between this variable and the return address to n __libc_start_main !
System I'm using
I'm using fedora Linux 3.1.2-1.fc16.i686
ASLR is disabled.
Debugging with GDB.
Here's the code :
void main(){
char ret ='a';
}
Register information:
(gdb)
eax 0x1 1
ecx 0xbffff5f4 -1073744396
edx 0xbffff584 -1073744508
ebx 0x2dbff4 2998260
esp 0xbffff554 0xbffff554
**ebp 0xbffff558 0xbffff558**
esi 0x0 0
edi 0x0 0
eip 0x804839a 0x804839a <main+6>
stack
(gdb) x/8xw $esp
0xbffff554: 0x00000000(local var) 0x00000000(missing ebp!) 0x0014d6b3(return to libc_start) 0x00000001
0xbffff564: 0xbffff5f4 0xbffff5fc 0x00131fc4 0x0000082d
The only thing that I can think of is that the function prologue of the libc_start_main is not pushing the main's ebp for some reason !
Edit 1:
-compiled without Opatmization just (gcc -ggdb file file.c)
Assembly of main ( gcc version 4.6.2 20111027 )
push %ebp
mov %esp,%ebp
sub $0x10,%esp
movb $0x61,-0x1(%ebp)
leave
ret
A break point at the local variable to view the stack shows the same thing the variable followed by zeros then the return to libc_start
There's no requirement for a particular calling convention to be used in the assembly code generated by a compiler. That's why it's called a convention rather than a requirement :-)
In any case, you need to keep in mind that the 'normal' x86 calling convention for C requires the function itself to handle set-up and tear-down of the stack frame. In other words, this is the responsibility of main rather than the startup code (the code that generally runs before your main to set up the C runtime environment such as stack setup, creation of argc/argv, any library pre-initialisation and so on).
Additionally, the ebp pushed on to the stack is the previous value of ebp before the current stack frame is built.
Part of that build process for the current stack frame is the saving of the current ebp and then loading a new value into the ebp register to easily access passed parameters and locals.
You can see that by compiling your code snippet with gcc -S:
main:
pushl %ebp ; Push PREVIOUS ebp.
movl %esp, %ebp ; Load ebp for variable access.
subl $16, %esp ; Allocate space on stack.
movb $97, -1(%ebp) ; Store 'a' into variable.
leave ; Tear down frame and return.
ret
The first three lines and the last two are mirror images of each other, the set-up and tear-down code. There's a good chance in this case that the startup code had ebp set to zero, possibly because it didn't care - it doesn't have to worry about calling conventions other than to ensure argc and argv are there.
If you compile without optimization, you'll almost certainly find that ebp/rbp is in fact pushed pushed onto the stack and then set up based on esp/rsp. It is, however, done by main itself and not by libc as you appear to suggest.
Here is the assembly code produced by gcc 4.4.5:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movb $97, -1(%rbp)
leave
ret
.cfi_endproc
If you compile with optimization options, you might find that the entire body of main is optimized away (gcc -O3):
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
Instead of guessing, why not look at the disassembly (e.g. in gdb) to see what happens in your particular case?
Also, even in the unoptimized case you have to actually execute the function prologue for the registers to be set up the way you expect.
Finally, you should not be surprised when you see apparent gaps between data on the stack, as the stack is subject to alignment:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num
byte boundary. If -mpreferred-stack-boundary is not specified, the
default is 4 (16 bytes or 128 bits).
If you are compiling for x86_64 then ebp/rbp is callee saved. That means that main() should save it if it needs to use it. If not then there is no requirement for the old register value to be saved by the either callee or the caller.
See section 3.2 of the AMD64 ABI for more information if you are interested.

Help with understanding a very basic main() disassembly in GDB

Heyo,
I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:
int main() {
return 6;
}
Using gdb to disas main produces this:
0x08048374 <main+0>: lea 0x4(%esp),%ecx
0x08048378 <main+4>: and $0xfffffff0,%esp
0x0804837b <main+7>: pushl -0x4(%ecx)
0x0804837e <main+10>: push %ebp
0x0804837f <main+11>: mov %esp,%ebp
0x08048381 <main+13>: push %ecx
0x08048382 <main+14>: mov $0x6,%eax
0x08048387 <main+19>: pop %ecx
0x08048388 <main+20>: pop %ebp
0x08048389 <main+21>: lea -0x4(%ecx),%esp
0x0804838c <main+24>: ret
Here is my best guess as to what I think is going on and what I need help with line-by-line:
lea 0x4(%esp),%ecx
Load the address of esp + 4 into ecx. Why do we add 4 to esp?
I read somewhere that this is the address of the command line arguments. But when I did x/d $ecx I get the value of argc. Where are the actual command line argument values stored?
and $0xfffffff0,%esp
Align stack
pushl -0x4(%ecx)
Push the address of where esp was originally onto the stack. What is the purpose of this?
push %ebp
Push the base pointer onto the stack
mov %esp,%ebp
Move the current stack pointer into the base pointer
push %ecx
Push the address of original esp + 4 on to stack. Why?
mov $0x6,%eax
I wanted to return 6 here so i'm guessing the return value is stored in eax?
pop %ecx
Restore ecx to value that is on the stack. Why would we want ecx to be esp + 4 when we return?
pop %ebp
Restore ebp to value that is on the stack
lea -0x4(%ecx),%esp
Restore esp to it's original value
ret
I am a n00b when it comes to assembly so any help would be great! Also if you see any false statements about what I think is going on please correct me.
Thanks a bunch! :]
Stack frames
The code at the beginning of the function body:
push %ebp
mov %esp, %ebp
is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.
After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)
You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.
Accessing parameters
Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.
Local variables
And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)
Turning off frame pointers
It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.
Call Frame Information
Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.
You did pretty good with your interpretation. When a function is called, the return address is automatically pushed to the stack, which is why argc, the first argument, has been pushed back to 4(%esp). argv would start at 8(%esp), with a pointer for each argument, followed by a null pointer. This function pushes the old value of %esp to the stack so that it can contain the original, unaligned value upon returned. The value of %ecx at return doesn't matter, which is why it is used as temporary storage for the %esp reference. Other than that, you are correct with everything.
Regarding your first question (where are stored the command line arguments), arguments to functions are right before ebp. I must say, your "real" main begins at < main + 10 >, where it pushes ebp and moves esp to ebp. I think that gcc messes everything up with all that leas just to replace the usual operations (addictions and subtractions) on esp before and after functions call. Usually a routine looks like this (simple function I did as an example):
0x080483b4 <+0>: push %ebp
0x080483b5 <+1>: mov %esp,%ebp
0x080483b7 <+3>: sub $0x10,%esp # room for local variables
0x080483ba <+6>: mov 0xc(%ebp),%eax # get arg2
0x080483bd <+9>: mov 0x8(%ebp),%edx # and arg1
0x080483c0 <+12>: lea (%edx,%eax,1),%eax # just add them
0x080483c3 <+15>: mov %eax,-0x4(%ebp) # store in local var
0x080483c6 <+18>: mov -0x4(%ebp),%eax # and return the sum
0x080483c9 <+21>: leave
0x080483ca <+22>: ret
Perhaps you've enabled some optimizations, which could make the code trickier.
Finally yes, the return value is stored in eax. Your interpretation is quite correct anyway.
The only thing I think that's outstanding from your original questions is why the following statements exist in your code:
0x08048381 <main+13>: push %ecx
0x08048382 <main+14>: mov $0x6,%eax
0x08048387 <main+19>: pop %ecx
The push and pop of %ecx at <main+13> and <main+19> don't seem to make much sense - and they don't really do anything in this example, but consider the case where your code invokes function calls.
There's no way for the system to guarantee that the calls to other functions - which will set up their own stack activation frames - won't reset register values. In fact they probably will. The code therefore sets up a saved register section on the stack where any registers used by the code (other than %esp and %ebp which are already saved though the regular stack setup) are stored in the stack before possibly handing control over to function calls in the "meat" of the current code block.
When these potential calls return, the system then pops the values off the stack to restore the pre-call register values. If you were writing assembler directly rather than compiling, you'd be responsible for storing and retrieving these register values, yourself.
In the case of your example code, however, there are no function calls - only a single instruction at <main+14> where you're setting the return value, but the compiler can't know that, and preserves its registers as usual.
It would be interesting to see what would happen here if you added C statements which pushed other values onto the stack after <main+14>. If I'm right about this being a saved register section of the stack, you'd expect the compiler to insert automatic pop statements prior to <main+19> in order to clear these values.

Resources