I am trying to understand the working of a simple asm code, my task is to build the stack or the list of values pointed by rsp throughout execution.
In gdb, after setting a breakpoint # main I use x/10xg $rsp to display 10 - memory addr from rsp. But since the results are shown in 2x32-bit form, rather than 1x64, I am unable to understand what values rsp is taking.
My goal here is to make the entire stack of the program to see what goes where and to understand the order of execution of the program.
What I am confused about is:
-Why doesn't x has specifier to show results in 1x64 bit form?
-How do I achieve my goal of making the stack of the program?
Here's my asm :
0x0000555555555170 <+0>: endbr64
0x0000555555555174 <+4>: push rbp
0x0000555555555175 <+5>: mov rbp,rsp
=> 0x0000555555555178 <+8>: mov eax,0x0
0x000055555555517d <+13>: call 0x55555555515c <func>
0x0000555555555182 <+18>: pop rbp
0x0000555555555183 <+19>: ret
and the output of x/10xg $rsp when ip is at line 4 is :
0x7fffffffdd30: 0x0000000000000000 0x00007ffff7dd80b3
0x7fffffffdd40: 0x00007ffff7ffc620 0x00007fffffffde28
0x7fffffffdd50: 0x0000000100000000 0x0000555555555170
0x7fffffffdd60: 0x0000555555555190 0x1706e24ed60a5880
0x7fffffffdd70: 0x0000555555555040 0x00007fffffffde20
Shouldn't the value of rsp be the address of the next instruction, which is 0x0000555555555178?
I can see something similar to the mem addr of the code in higher addr of the stack but since it is split to 2 x 32 bit form i am unable to easily understand the value of the stack
Also, am i approaching it the correct way? I am really confused here, sorry if my question sounds stupid.
gdb version:
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
This is what i am trying to achieve.
Related
TL;DR
How can I modify the stack while using ret or achieving similar effect while using something else?
Hello world,
I am trying to make a compiler for my language,
currently everything is inlined and it makes
the compilation slow for some steps so today I decided
to try to optimise it using functions, though
it keeps segfaulting, then I realised
This seems to not work:
;; main.s
BITS 64
segment .text
global _start
exit:
mov rax, 60 ;; Linux syscall number for exit
pop rdi ;; Exit code
syscall
ret
write:
mov rax, 1 ;; Linux syscall number for write
mov rdi, 1 ;; File descriptor (1 = stdout)
pop rsi ;; Pointer to string
pop rdx ;; String length
syscall
ret
_start:
mov rax, msg_len
push rax
mov rax, msg
push rax
call write
mov rax, 0
push rax
call exit
segment .data
msg: db "Hello, world!", 10
msg_len: equ $-msg
My output for this is.... questionable:
$ nasm -felf64 main.s
$ ld -o main main.s
$ ./main
PHello, world!
# # #$# #+ #2 #main.sexitwritemsgmsg_len__bss_start_edata_end.symtab.strtab.shstrtab.text.data9! # !77!'Segmentation fault
$? (exit code) is 139 (segfault)
While all inlined all works:
;; main1.s
BITS 64
segment .text
global _start
_start:
mov rax, msg_len
push rax
mov rax, msg
push rax
mov rax, 1 ;; Linux syscall number for write
mov rdi, 1 ;; File descriptor (1 = stdout)
pop rsi ;; Pointer to string
pop rdx ;; String length
syscall
mov rax, 0
push rax
mov rax, 60 ;; Linux syscall number for exit
pop rdi ;; Exit code
syscall
segment .data
msg: db "Hello, world!", 10
msg_len: equ $-msg
My output is completely normal:
$ nasm -felf64 main1.s
$ ld -o main1 main1.o
$ ./main1
Hello, world!
$? (exit code) is 0 (as specified in assembly, meaning success)
So now I'm here confused as I am a newbie
at assembly what to do, even though I found related
solutions like
NASM push before ret
I am still confused how to take that in...
Is there a way I can do it or am I stuck with inlining? Should I maybe switch assemblers all together from nasm to something else?
Thanks in advance
tl;dr
Remember that call is technically a push rip, and ret is technically a pop rip, so you pretty much messed up your stack in your example because you inadvertently pop it in the wrong spot.
More of an answer
Although you should probably properly learn how calling conventions work, I'm going to attempt an answer to briefly "soften" the idea, and for the fun of learning.
Abstractly speaking, in order to have functions, you must have something called stack frames, or else you'd have a pretty hard time managing local variables and getting ret to work. On x86_64, a stack frame is pretty much composed of a few things, in order.
The function arguments, if there are any0,
If some arguments were passed in registers, this may be omitted.
the return address,
The call instruction will push this onto the stack.
It's on you to make sure the ret instruction will pop this off the stack.
optionally a frame pointer,
If your stack grows by a dynamic amount, this can keep track of the start of the frame.
Otherwise, if you know the stack size ahead of time, it's optional.
and then your local state on the stack.
As long as execution stays within your little assembly space, you are technically free to pass arguments however you want1 as long as you are aware of how instructions like call and ret manipulate the stack. The simplest way, in my opinion, is to make it sort of stack-based, so that your compiler would not need to worry about register allocation as much2.
To keep things simple, I'd suggest using something like the x86 convention but applied to x86_64, as you seem to be using 64-bit code. That is to say, the caller function would push all of its arguments onto the stack (usually in reverse order), and then call the callee function. For example, for a 3-argument function, your stack would end up looking something like this (beware that the top of the stack is actually on the bottom).
+----------------+
| argument 2 |
+----------------+
| argument 1 |
+----------------+
| argument 0 |
+----------------+
| return address |
+----------------+
| local state |
| ... |
+----------------+
Also, I noticed that you never really made use of the rsp register. Depending on the design of your compiler, you technically could get away with this. Stack machines like the JVM rely solely on pushes and pops, anyway, I believe. As long as your pushes and pops match (especially call and ret, which act as a special push and pop), you should be fine.
0 Windows actually allocates at least an extra 32 bytes here for argument spilling, but you can probably ignore that in this case.
1 There are specific calling conventions that dictate how parameters are passed from caller to callee and back. Beyond your programming exercise, I highly recommend reading about how they work, so that your compiler can output code that can easily be called by and easily call functions that weren't emitted by your compiler, or go the Forth way as Nate mentioned.
2 goto 1
question number1:
Having this nasm:
section .data
dat db "write out this:%x", 0xa, 0x0
section .text
global main
extern printf
main:
push rbp
mov rbp, rsp
mov rdi, dat
mov esi, 0xdeedbeef
call printf
leave
ret
gives errno 24 - too many file descriptor opened.
BUT IF CHANGED TO int 80h, instead of
leave
ret
Will terminate without error, how's that?
Also, question number 2:
If I do not make calling convention by :
push rbp
mov rbp, rsp
And only mov rbp, rsp , without pushing rbp before, then command terminated, although no function was call before, therefore there is no need to push base pointer. So why is it needed (in eyes of compiler), and will terminate?
Question 1
You're mistaken about this having anything to do with file descriptors. That isn't what's being reported.
As you explained in comments, 24 is the number shown when you echo $? after running the program. This is the exit code of the program; normally, the value returned from the main function or passed to exit(). It can be whatever you want and normally does not correspond to an errno value.
So why does your program give an exit code of 24? If main returns, then the exit code is its return value. A function's return value is expected to be left in the rax register when it returns (assuming it's of integer or pointer type). But you never touch the rax register, so it still contains the value that was left there when printf returns. Now printf returns the number of characters it successfully printed, which for the string you chose is... 24 (count 'em).
It is just a coincidence that 24 also happens to the errno code for "too many open file descriptors" - that is completely unrelated.
If you want to exit with an exit code of 0 to signal success, then you should xor rax, rax just before your ret.
You didn't show the exact code you used when changing it to int 0x80 to invoke the _exit syscall yourself. But in that case, the exit code would be whatever is in the ebx register when you make the system call. Maybe your code puts zero in ebx, or maybe you are lucky and it happens to already contain zero.
(Side note: int 0x80 is the interface for 32-bit system calls, and is not appropriate in a 64-bit program which is what you seem to be writing, though it may work in a few cases. The 64-bit system call interface uses the syscall instruction and is explained in A.2.1 of the ABI.)
Question 2
You have to align the stack.
When calling a C function from assembly (as printf in this case), you're required by the x86-64 ABI Section 3.2.2 to align the stack to a 16-byte boundary. The stack is aligned appropriately before the call to main, and the return address pushed by call subtracts 8 bytes.
So if you don't touch the stack at all in your code, it will not be correctly aligned when you call printf, and this can cause it to crash. (The assembler will not help you do this; that's not its job.) But when you push rbp, that subtracts a further 8 bytes and gets the stack aligned properly. So either leave that code in, or align the stack yourself.
And in either case, keep in mind that if you change your code to push more stuff on the stack, you'll have to adjust it accordingly before you make any function calls.
I am working on a toy compiler. I used to allocate all memory with malloc, but since I never call free, I think it will be sufficient (and faster) to allocate a GB or so on the stack and then slowly use that buffer.
But... now I am segfaulting before anything interesting happens. It happens on about 30% of my test cases (all test cases are the same in this section tho). Pasted from GDB:
(gdb) disas
Dump of assembler code for function main:
0x0000000000400bf1 <+0>: push rbp
0x0000000000400bf2 <+1>: mov rbp,rsp
0x0000000000400bf5 <+4>: mov QWORD PTR [rip+0x2014a4],rsp # 0x6020a0
0x0000000000400bfc <+11>: sub rsp,0x7735940
0x0000000000400c03 <+18>: sub rsp,0x7735940
0x0000000000400c0a <+25>: sub rsp,0x7735940
0x0000000000400c11 <+32>: sub rsp,0x7735940
=> 0x0000000000400c18 <+39>: call 0x400fec <new_Main>
0x0000000000400c1d <+44>: mov r15,rax
0x0000000000400c20 <+47>: mov rax,r15
0x0000000000400c23 <+50>: add rax,0x20
0x0000000000400c27 <+54>: mov rax,QWORD PTR [rax]
0x0000000000400c2a <+57>: add rax,0x48
0x0000000000400c2e <+61>: mov rax,QWORD PTR [rax]
0x0000000000400c31 <+64>: call rax
0x0000000000400c33 <+66>: mov rax,0x0
0x0000000000400c3a <+73>: mov rsp,rbp
0x0000000000400c3d <+76>: pop rbp
0x0000000000400c3e <+77>: ret
I originally did one big "sub rsp, 0x..." and I thought breaking it up a bit would help (it didn't -- the program crashes at call either way). The total should be 500MB in this case.
What really confuses me is why it fails on "call <>" instead of one of the subs. And why it only fails some of the time rather than always or never.
Disclosure: this is a school project, but asking for help with general issues regarding x86 is not against any rules.
Update: based on #swift's comment, I set ulimit -s unlimited... and it now segfaults randomly? It seems random. It's not coming close to using the whole 500 MB buffer tho. It only allocates about 400 bytes total.
Subtracting something from RSP won’t cause any issues since nothing uses it. It’s just a register with a value, it doesn’t allocate anything. But when you use CALL then memory pointed by RSP is accessed and issues may happen. The stack usually isn’t very big so to your question “is there any reason you can’t take a GB of memory from the stack” the answer is “because the stack doesn’t have that much space to be used.”
As for being faster to allocate a big buffer in the stack isn’t really a thing. Allocating and releasing a single big block of memory isn’t slower in the heap. Having lots of allocations and releases in heap is worse than in the stack. So there’s not much point in this case to do it in the stack.
I had written a simple c program and was trying to do use GDB to debug the program. I understand the use of following in main function:
On entry
push %ebp
mov %esp,%ebp
On exit
leave
ret
Then I tried gdb on _start and I got the following
xor %ebp,%ebp
pop %esi
mov %esp,%ecx
and $0xfffffff0,%esp
push %eax
push %esp
push %edx
push $0x80484d0
push $0x8048470
push %ecx
push %esi
push $0x8048414
call 0x8048328 <__libc_start_main#plt>
hlt
nop
nop
nop
nop
I am unable to understand these lines, and the logic behind this.
Can someone provide any guidance to help explain the code of _start?
Here is the well commented assembly source of the code you posted.
Summarized, it does the following things:
establish a sentinel stack frame with ebp = 0 so code that walks the stack can find its end easily
Pop the number of command line arguments into esi so we can pass them to __libc_start_main
Align the stack pointer to a multiple of 16 bits in order to comply with the ABI. This is not guaranteed to be the case in some versions of Linux so it has to be done manually just in case.
The addresses of __libc_csu_fini, __libc_csu_init, the argument vector, the number of arguments and the address of main are pushed as arguments to __libc_start_main
__libc_start_main is called. This function (source code here) sets up some glibc-internal variables and eventually calls main. It never returns.
If for any reason __libc_start_main should return, a hlt instruction is placed afterwards. This instruction is not allowed in user code and should cause the program to crash (hopefully).
The final series of nop instructions is padding inserted by the assembler so the next function starts at a multiple of 16 bytes for better performance. It is never reached in normal execution.
for gnu tools the _start label is the entry point of the program. for the C language to work you need to have a stack you need to have some memory/variables zeroed and some set to the values you chose:
int x = 5;
int y;
int fun ( void )
{
static int z;
}
all three of these variables x,y,z are essentially global, one is a local global. since we wrote it that way we assume that when our program starts x contains the value 5 and it is assumed that y is zero. in order for those things to happen, some bootstrap code is required and that is what happens (and more) between _start and main().
Other toolchains may choose to use a different label to define the entry/start point, but gnu tools use _start. there may be other things your tools require before main() is called C++ for example requires more than C.
I'm viewing the stack at the beginning of main, but the ebp of main is missing.
I declared a variable to check where will it's located on the stack, it turns out that there is zeros between this variable and the return address to n __libc_start_main !
System I'm using
I'm using fedora Linux 3.1.2-1.fc16.i686
ASLR is disabled.
Debugging with GDB.
Here's the code :
void main(){
char ret ='a';
}
Register information:
(gdb)
eax 0x1 1
ecx 0xbffff5f4 -1073744396
edx 0xbffff584 -1073744508
ebx 0x2dbff4 2998260
esp 0xbffff554 0xbffff554
**ebp 0xbffff558 0xbffff558**
esi 0x0 0
edi 0x0 0
eip 0x804839a 0x804839a <main+6>
stack
(gdb) x/8xw $esp
0xbffff554: 0x00000000(local var) 0x00000000(missing ebp!) 0x0014d6b3(return to libc_start) 0x00000001
0xbffff564: 0xbffff5f4 0xbffff5fc 0x00131fc4 0x0000082d
The only thing that I can think of is that the function prologue of the libc_start_main is not pushing the main's ebp for some reason !
Edit 1:
-compiled without Opatmization just (gcc -ggdb file file.c)
Assembly of main ( gcc version 4.6.2 20111027 )
push %ebp
mov %esp,%ebp
sub $0x10,%esp
movb $0x61,-0x1(%ebp)
leave
ret
A break point at the local variable to view the stack shows the same thing the variable followed by zeros then the return to libc_start
There's no requirement for a particular calling convention to be used in the assembly code generated by a compiler. That's why it's called a convention rather than a requirement :-)
In any case, you need to keep in mind that the 'normal' x86 calling convention for C requires the function itself to handle set-up and tear-down of the stack frame. In other words, this is the responsibility of main rather than the startup code (the code that generally runs before your main to set up the C runtime environment such as stack setup, creation of argc/argv, any library pre-initialisation and so on).
Additionally, the ebp pushed on to the stack is the previous value of ebp before the current stack frame is built.
Part of that build process for the current stack frame is the saving of the current ebp and then loading a new value into the ebp register to easily access passed parameters and locals.
You can see that by compiling your code snippet with gcc -S:
main:
pushl %ebp ; Push PREVIOUS ebp.
movl %esp, %ebp ; Load ebp for variable access.
subl $16, %esp ; Allocate space on stack.
movb $97, -1(%ebp) ; Store 'a' into variable.
leave ; Tear down frame and return.
ret
The first three lines and the last two are mirror images of each other, the set-up and tear-down code. There's a good chance in this case that the startup code had ebp set to zero, possibly because it didn't care - it doesn't have to worry about calling conventions other than to ensure argc and argv are there.
If you compile without optimization, you'll almost certainly find that ebp/rbp is in fact pushed pushed onto the stack and then set up based on esp/rsp. It is, however, done by main itself and not by libc as you appear to suggest.
Here is the assembly code produced by gcc 4.4.5:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movb $97, -1(%rbp)
leave
ret
.cfi_endproc
If you compile with optimization options, you might find that the entire body of main is optimized away (gcc -O3):
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
Instead of guessing, why not look at the disassembly (e.g. in gdb) to see what happens in your particular case?
Also, even in the unoptimized case you have to actually execute the function prologue for the registers to be set up the way you expect.
Finally, you should not be surprised when you see apparent gaps between data on the stack, as the stack is subject to alignment:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num
byte boundary. If -mpreferred-stack-boundary is not specified, the
default is 4 (16 bytes or 128 bits).
If you are compiling for x86_64 then ebp/rbp is callee saved. That means that main() should save it if it needs to use it. If not then there is no requirement for the old register value to be saved by the either callee or the caller.
See section 3.2 of the AMD64 ABI for more information if you are interested.