I have find the sys_open code from vmlinux binary:
c1143c20: 55 push ebp
c1143c21: 89 e5 mov ebp,esp
c1143c23: 83 ec 10 sub esp,0x10
c1143c26: 89 5d f4 mov DWORD PTR [ebp-0xc],ebx
c1143c29: 89 75 f8 mov DWORD PTR [ebp-0x8],esi
c1143c2c: 89 7d fc mov DWORD PTR [ebp-0x4],edi
**c1143c2f: e8 74 bb 46 00 call 0xc15af7a8**
c1143c34: b8 9c ff ff ff mov eax,0xffffff9c
c1143c39: 8b 7d 08 mov edi,DWORD PTR [ebp+0x8]
c1143c3c: 8b 75 0c mov esi,DWORD PTR [ebp+0xc]
c1143c3f: 8b 5d 10 mov ebx,DWORD PTR [ebp+0x10]
c1143c42: 89 fa mov edx,edi
c1143c44: 89 f1 mov ecx,esi
c1143c46: 89 1c 24 mov DWORD PTR [esp],ebx
c1143c49: e8 e2 fd ff ff call 0xc1143a30 // same as above here
c1143c4e: 8b 5d f4 mov ebx,DWORD PTR [ebp-0xc]
c1143c51: 8b 75 f8 mov esi,DWORD PTR [ebp-0x8]
c1143c54: 8b 7d fc mov edi,DWORD PTR [ebp-0x4]
c1143c57: 89 ec mov esp,ebp
c1143c59: 5d pop ebp
c1143c5a: c3 ret
c1143c5b: 90 nop
and from the virtual memory:
.data:0x00000000 55 push ebp
.data:0x00000001 89e5 mov ebp,esp
.data:0x00000003 83ec10 sub esp,0x10
.data:0x00000006 895df4 mov DWORD PTR [ebp-0xc],ebx
.data:0x00000009 8975f8 mov DWORD PTR [ebp-0x8],esi
.data:0x0000000c 897dfc mov DWORD PTR [ebp-0x4],edi
**.data:0x0000000f 3e8d742600 lea esi,ds:[esi+eiz*1+0x0] **
**.data:0x00000014 b89cffffff mov eax,0xffffff9c**
.data:0x00000019 8b7d08 mov edi,DWORD PTR [ebp+0x8]
.data:0x0000001c 8b750c mov esi,DWORD PTR [ebp+0xc]
.data:0x0000001f 8b5d10 mov ebx,DWORD PTR [ebp+0x10]
.data:0x00000022 89fa mov edx,edi
.data:0x00000024 89f1 mov ecx,esi
.data:0x00000026 891c24 mov DWORD PTR [esp],ebx
.data:0x00000029 e8e2fdffff call func_fffffe10 // same
.data:0x0000002e 8b5df4 mov ebx,DWORD PTR [ebp-0xc]
.data:0x00000031 8b75f8 mov esi,DWORD PTR [ebp-0x8]
.data:0x00000034 8b7dfc mov edi,DWORD PTR [ebp-0x4]
.data:0x00000037 89ec mov esp,ebp
.data:0x00000039 5d pop ebp
.data:0x0000003a c3 ret
I don't understand why e8 74 bb 46 00 become 3e 8d 74 26 00 when loaded in memory. The adress at 0xc15af7a8 is a simple ret.
c15af7a8: c3 ret
0xc15af7a8 is called 26500 times in the vmlinux file. Why we call a simple ret instruction ?
My kernel is 3.2.0-23, with a default configuration. (no KASLR)
The useless ret is a stub that is replaced by the right code once the memory has been mapped.
The code of system calls maybe located to different places depending on some non-deterministic choices and once the memory address is known, the stub is replaced.
Related
Short Story
I am writing a simple program in Assembly to simulate buffer overflow. The buffer is simply memory allocation from 512 bytes stack and then read() syscall is called with 4096 bytes from stdin fd.
The buffer overflow is working perfectly when I execute the payload outside GDB. But when I am inside the GDB, the syscall read() returns EFAULT.
In this case, our buffer overflow is supposed to replace return address and make the %rip reach secret_func.
Question
Why in this case buffer overflow does not work inside GDB?
Resources
Code test.S
.section .rodata
str1:
.ascii "Enter the input: "
str2:
.ascii "\nYou find a secret function!\n"
str_end:
.section .text
.global _start
_start:
xorl %ebp, %ebp
andq $-16, %rsp
callq main
_exit:
movl %eax, %edi
movl $60, %eax
syscall
main:
subq $512, %rsp
movl $1, %eax
movl $1, %edi
leaq str1(%rip), %rsi
movl $(str2 - str1), %edx
syscall
xorl %eax, %eax
xorl %edi, %edi
movq %rsp, %rsi
movl $4096, %edx # Intentional to create buffer overflow
syscall
addq $512, %rsp
xorl %eax, %eax
retq
# We reach this function via buffer overflow (replace return address)
secret_func:
movl $1, %eax
movl $1, %edi
leaq str2(%rip), %rsi
movl $(str_end - str2), %edx
syscall
xorl %eax, %eax
jmp _exit
objdump of compiled ELF
Disassembly of section .text:
0000000000401000 <_start>:
401000: 31 ed xor %ebp,%ebp
401002: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
401006: e8 09 00 00 00 call 401014 <main>
000000000040100b <_exit>:
40100b: 89 c7 mov %eax,%edi
40100d: b8 3c 00 00 00 mov $0x3c,%eax
401012: 0f 05 syscall
0000000000401014 <main>:
401014: 48 81 ec 00 02 00 00 sub $0x200,%rsp
40101b: b8 01 00 00 00 mov $0x1,%eax
401020: bf 01 00 00 00 mov $0x1,%edi
401025: 48 8d 35 d4 0f 00 00 lea 0xfd4(%rip),%rsi # 402000 <str1>
40102c: ba 11 00 00 00 mov $0x11,%edx
401031: 0f 05 syscall
401033: 31 c0 xor %eax,%eax
401035: 31 ff xor %edi,%edi
401037: 48 89 e6 mov %rsp,%rsi
40103a: ba 00 10 00 00 mov $0x1000,%edx
40103f: 0f 05 syscall
401041: 48 81 c4 00 02 00 00 add $0x200,%rsp
401048: 31 c0 xor %eax,%eax
40104a: c3 ret
000000000040104b <secret_func>:
40104b: b8 01 00 00 00 mov $0x1,%eax
401050: bf 01 00 00 00 mov $0x1,%edi
401055: 48 8d 35 b5 0f 00 00 lea 0xfb5(%rip),%rsi # 402011 <str2>
40105c: ba 1d 00 00 00 mov $0x1d,%edx
401061: 0f 05 syscall
401063: 31 c0 xor %eax,%eax
401065: eb a4 jmp 40100b <_exit>
Reproduction Steps
Compile and run without GDB (working fine)
In this case, we calculate the offset of return address and replace it with secret_func address.
ammarfaizi2#integral:/tmp$ gcc -O3 -no-pie -static -nostartfiles -ffreestanding test.S -o test
ammarfaizi2#integral:/tmp$ perl -e 'print "A"x512,"\x4b\x10\x40","\x00"x5' > payload
ammarfaizi2#integral:/tmp$ ./test < payload
Enter the input:
You find a secret function!
ammarfaizi2#integral:/tmp$
Compile and run inside the GDB (read() returns -14 (-EFAULT))
We stepped the read() syscall and found it returns -14. It does not read from stdin at all.
gef➤ b main
Breakpoint 1 at 0x401014
gef➤ r < input
[... GEF output elided ...]
gef➤ si 11
[... GEF output elided ...]
gef➤ x/5i $rip
=> 0x401041 <main+45>: add $0x200,%rsp
0x401048 <main+52>: xor %eax,%eax
0x40104a <main+54>: ret
0x40104b <secret_func>: mov $0x1,%eax
0x401050 <secret_func+5>: mov $0x1,%edi
gef➤ p/d $rax
$2 = -14
gef➤ shell errno 14
EFAULT 14 Bad address
gef➤
GDB and Linux Version
ammarfaizi2#integral:/tmp$ gdb --version
GNU gdb (Ubuntu 10.1-2ubuntu2) 10.1.90.20210411-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
ammarfaizi2#integral:/tmp$ uname -r
5.13.0-rc2-fresh-tea-00005-g8ac91e6c6033
ammarfaizi2#integral:/tmp$
For an assignment, I wrote the following assembly code shell_exec.asm that should execute a shell in Linux:
section .data ; declare stuff
arg0 db "/bin/sh",0 ; 1st arg
align 4
argv dd arg0, 0 ; 2nd arg
envp dd 0 ; 3rd arg
section .text
global _start
_start:
mov eax, 0x0b ; execve
mov ebx, arg0 ; 1st arg
mov ecx, argv ; 2nd arg
mov edx, envp ; 3rd arg
int 0x80 ; kernel
I used nasm -f elf32 shell_exec.asm for compilation and ld -m elf_i386 -o shell_exec shell_exec.o for linking. Everything works so far and if I run ./shell_exec the shell spawns the way I want.
Now I wanted to extract the shell code (like \12\34\ab\cd\ef...) from this program. I used objdump -D -z shell_exec to show all parts of the code including the section .data and all zeroes. The output is as follows:
shell_exec: file format elf32-i386
Disassembly of section .text:
08049000 <_start>:
8049000: b8 0b 00 00 00 mov $0xb,%eax
8049005: bb 00 a0 04 08 mov $0x804a000,%ebx
804900a: b9 08 a0 04 08 mov $0x804a008,%ecx
804900f: ba 10 a0 04 08 mov $0x804a010,%edx
8049014: cd 80 int $0x80
Disassembly of section .data:
0804a000 <arg0>:
804a000: 2f das
804a001: 62 69 6e bound %ebp,0x6e(%ecx)
804a004: 2f das
804a005: 73 68 jae 804a06f <__bss_start+0x5b>
804a007: 00 add %al,(%eax)
0804a008 <argv>:
804a008: 00 a0 04 08 00 00 add %ah,0x804(%eax)
804a00e: 00 00 add %al,(%eax)
0804a010 <envp>:
804a010: 00 00 add %al,(%eax)
804a012: 00 00 add %al,(%eax)
If I only have a section .text within my assembly code, I can usually just copy all given values and use them as my shell code. But how is the order in case I have those two sections, namely .data and .text?
Edit 1
So, my second attempt is to do the assembly code like this:
section .text
global _start
_start:
mov ebp, esp
xor eax, eax
push eax ; -4
push "/sh " ; -8
push "/bin" ; -12
xor eax, eax
push eax
lea ebx, [ebp-12]
push ebx ; 1st arg
mov ecx, esp ; 2nd arg
lea edx, [ebp-4] ; 3rd arg
mov eax, 0x0b ; execve
int 0x80 ; kernel
This avoids using multiple sections, but sadly leads to a segmentation fault.
I'm using a 64-bit Ubuntu 18.04.3 LTS VM and I'm trying to write a simple x64 assembly code that will print "Owned!!!".
Because I don't want any 0x00 or 0x0a bytes and I want the code to be position independent (because I'm learning how to write shellcodes), I wrote it this way:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 4
mov bl, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rcx, rsp
mov dl, 8
int 0x80
;exit(int ret)
mov al,1
xor rbx, rbx
int 0x80
This is the output that I'm getting:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ nasm -f elf64 hello4.asm
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ld hello4.o -o hello4
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 db xor rbx,rbx
400086: 48 31 c9 xor rcx,rcx
400089: 48 31 d2 xor rdx,rdx
40008c: b0 04 mov al,0x4
40008e: b3 01 mov bl,0x1
400090: 68 64 21 21 21 push 0x21212164
400095: 68 4f 77 6e 65 push 0x656e774f
40009a: 48 89 e1 mov rcx,rsp
40009d: b2 08 mov dl,0x8
40009f: cd 80 int 0x80
4000a1: b0 01 mov al,0x1
4000a3: 48 31 db xor rbx,rbx
4000a6: cd 80 int 0x80
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ./hello4
Segmentation fault (core dumped)
How do I fix this?
UPDATE:
I've understood that int 0x80 is intended for 32-bit programs and I should use syscall instead and that syscall has different ids for each system call.
The new code is:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
The output is Owne% instead of Owned!!! now.
It still needs to be fixed.
With the help of #CertainLach I've written the correct code:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
mov rsi, 0x21212164656e774f
push rsi
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
This code contains no null bytes or 0x0a bytes and it's position-independent, as following:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 48 be 4f 77 6e 65 64 movabs rsi,0x21212164656e774f
400099: 21 21 21
40009c: 56 push rsi
40009d: 48 89 e6 mov rsi,rsp
4000a0: b2 08 mov dl,0x8
4000a2: 0f 05 syscall
4000a4: b0 3c mov al,0x3c
4000a6: 48 31 ff xor rdi,rdi
4000a9: 0f 05 syscall
This is also a correct way of implementing the solution, which is 1 bytecode less, but with more memory consumption:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ cat hello4.asm
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 16
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 68 64 21 21 21 push 0x21212164
400097: 68 4f 77 6e 65 push 0x656e774f
40009c: 48 89 e6 mov rsi,rsp
40009f: b2 10 mov dl,0x10
4000a1: 0f 05 syscall
4000a3: b0 3c mov al,0x3c
4000a5: 48 31 ff xor rdi,rdi
4000a8: 0f 05 syscall
Thank you so much!
Can't answer your comment, you can't just change int 0x80 to syscall to make it work, system call numbers differ, i.e sys_write you have here, have id 4 for int 0x80, and id 1 with syscall
Here you can see numbers for syscall
And here for int 0x80
I am currently practicing with assembly reading by disassemblying C programs and trying to understand what they do.
I am stuck with a trivial one: a simple hello world program.
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("Hello, world!");
return(0);
}
When I disassemble the main:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400526 <+0>: push rbp
0x0000000000400527 <+1>: mov rbp,rsp
0x000000000040052a <+4>: mov edi,0x4005c4
0x000000000040052f <+9>: mov eax,0x0
0x0000000000400534 <+14>: call 0x400400 <printf#plt>
0x0000000000400539 <+19>: mov eax,0x0
0x000000000040053e <+24>: pop rbp
0x000000000040053f <+25>: ret
I understand the first two lines: the base pointer is saved on the stack (by push rbp, which causes the value of the stack pointer to be decreased by 8, because it has "grown") and the value of the stack pointer is saved in the base pointer (so that parameters and local variable can be easily reached through positive and negative offsets, respectively, while the stack can keep "growing").
The third line presents the first issue: why is 0x4005c4 (the address of the "Hello, World!" string) moved in the edi register instead of moving it on the stack? Shouldn't the printf function take the address of that string as parameter? For what I know, functions take parameters from the stack (but here, it looks like the parameter is put in that register: edi)
On another post here on StackOverflow I read that "printf#ptl" is like a stub function that calls the real printf function. I tried to disassemble that function, but it gets even more confusing:
(gdb) disassemble printf
Dump of assembler code for function __printf:
0x00007ffff7a637b0 <+0>: sub rsp,0xd8
0x00007ffff7a637b7 <+7>: test al,al
0x00007ffff7a637b9 <+9>: mov QWORD PTR [rsp+0x28],rsi
0x00007ffff7a637be <+14>: mov QWORD PTR [rsp+0x30],rdx
0x00007ffff7a637c3 <+19>: mov QWORD PTR [rsp+0x38],rcx
0x00007ffff7a637c8 <+24>: mov QWORD PTR [rsp+0x40],r8
0x00007ffff7a637cd <+29>: mov QWORD PTR [rsp+0x48],r9
0x00007ffff7a637d2 <+34>: je 0x7ffff7a6380b <__printf+91>
0x00007ffff7a637d4 <+36>: movaps XMMWORD PTR [rsp+0x50],xmm0
0x00007ffff7a637d9 <+41>: movaps XMMWORD PTR [rsp+0x60],xmm1
0x00007ffff7a637de <+46>: movaps XMMWORD PTR [rsp+0x70],xmm2
0x00007ffff7a637e3 <+51>: movaps XMMWORD PTR [rsp+0x80],xmm3
0x00007ffff7a637eb <+59>: movaps XMMWORD PTR [rsp+0x90],xmm4
0x00007ffff7a637f3 <+67>: movaps XMMWORD PTR [rsp+0xa0],xmm5
0x00007ffff7a637fb <+75>: movaps XMMWORD PTR [rsp+0xb0],xmm6
0x00007ffff7a63803 <+83>: movaps XMMWORD PTR [rsp+0xc0],xmm7
0x00007ffff7a6380b <+91>: lea rax,[rsp+0xe0]
0x00007ffff7a63813 <+99>: mov rsi,rdi
0x00007ffff7a63816 <+102>: lea rdx,[rsp+0x8]
0x00007ffff7a6381b <+107>: mov QWORD PTR [rsp+0x10],rax
0x00007ffff7a63820 <+112>: lea rax,[rsp+0x20]
0x00007ffff7a63825 <+117>: mov DWORD PTR [rsp+0x8],0x8
0x00007ffff7a6382d <+125>: mov DWORD PTR [rsp+0xc],0x30
0x00007ffff7a63835 <+133>: mov QWORD PTR [rsp+0x18],rax
0x00007ffff7a6383a <+138>: mov rax,QWORD PTR [rip+0x36d70f] # 0x7ffff7dd0f50
0x00007ffff7a63841 <+145>: mov rdi,QWORD PTR [rax]
0x00007ffff7a63844 <+148>: call 0x7ffff7a5b130 <_IO_vfprintf_internal>
0x00007ffff7a63849 <+153>: add rsp,0xd8
0x00007ffff7a63850 <+160>: ret
End of assembler dump.
The two mov operations on eax (mov eax, 0x0) bother me a little as well, since I don't get they role in here (but I am more concerned with what I have just described).
Thank you in advance.
gcc is targeting the x86-64 System V ABI, used by all x86-64 systems other than Windows (for various historical reasons). Its calling convention passes the first few args in registers before falling back to the stack. (See also the Wikipedia basic summary of this calling convention.)
And yes, this is different from the crusty old 32-bit calling conventions that use the stack for everything. This is a Good Thing. See also the x86 tag wiki for more links to ABI docs, and tons of other stuff.
0x0000000000400526: push rbp
0x0000000000400527: mov rbp,rsp # stack-frame boilerplate
0x000000000040052a: mov edi,0x4005c4 # first arg
0x000000000040052f: mov eax,0x0 # 0 FP args in vector registers
0x0000000000400534: call 0x400400 <printf#plt>
0x0000000000400539: mov eax,0x0 # return 0. If you'd compiled with optimization, this and the previous mov would be xor eax,eax
0x000000000040053e: pop rbp # clean up stack frame
0x000000000040053f: ret
Pointers to static data fit into 32 bits, which is why it can use mov edi, imm32 instead of movabs rdi, imm64.
Floating-point args are passed in SSE registers (xmm0-xmm7), even to var-args functions. al indicates how many FP args are in vector registers. (Note that C's type promotion rules mean that float args to variadic functions are always promoted to double, which is why printf doesn't have any format specifiers for float, only double and long double).
printf#ptl is like a stub function that calls the real printf function.
Yes, that's right. The Procedure Linking Table entry starts out as a jmp to a dynamic linker routine that resolves the symbol and modifies the code in the PLT to turn it into a jmp directly to the address where libc's printf definition is mapped. printf is a weak alias for __printf, which is why gdb chooses the __printf label for that address, after you asked for disassembly of printf.
Dump of assembler code for function __printf:
0x00007ffff7a637b0 <+0>: sub rsp,0xd8 # reserve space
0x00007ffff7a637b7 <+7>: test al,al # check if there were any FP args
0x00007ffff7a637b9 <+9>: mov QWORD PTR [rsp+0x28],rsi # store the integer arg-passing registers to local scratch space
0x00007ffff7a637be <+14>: mov QWORD PTR [rsp+0x30],rdx
0x00007ffff7a637c3 <+19>: mov QWORD PTR [rsp+0x38],rcx
0x00007ffff7a637c8 <+24>: mov QWORD PTR [rsp+0x40],r8
0x00007ffff7a637cd <+29>: mov QWORD PTR [rsp+0x48],r9
0x00007ffff7a637d2 <+34>: je 0x7ffff7a6380b <__printf+91> # skip storing the FP arg-passing regs if there were no FP args
0x00007ffff7a637d4 <+36>: movaps XMMWORD PTR [rsp+0x50],xmm0
0x00007ffff7a637d9 <+41>: movaps XMMWORD PTR [rsp+0x60],xmm1
0x00007ffff7a637de <+46>: movaps XMMWORD PTR [rsp+0x70],xmm2
0x00007ffff7a637e3 <+51>: movaps XMMWORD PTR [rsp+0x80],xmm3
0x00007ffff7a637eb <+59>: movaps XMMWORD PTR [rsp+0x90],xmm4
0x00007ffff7a637f3 <+67>: movaps XMMWORD PTR [rsp+0xa0],xmm5
0x00007ffff7a637fb <+75>: movaps XMMWORD PTR [rsp+0xb0],xmm6
0x00007ffff7a63803 <+83>: movaps XMMWORD PTR [rsp+0xc0],xmm7
branch_target_from_test_je:
0x00007ffff7a6380b <+91>: lea rax,[rsp+0xe0] # some more stuff
So printf's implementation keeps the var-args handling simple by storing all the arg-passing registers (except the first one holding the format string) in order to local arrays. It can walk a pointer through them instead of needing switch-like code to extract the right integer or FP arg. It still needs to keep track of the first 5 integer and first 8 FP args, because they aren't contiguous with the rest of the args pushed by the caller onto the stack.
The Windows 64-bit calling convention's shadow space simplifies this by providing space for a function to dump its register args to the stack contiguous with the args already on the stack, but that's not worth wasting 32 bytes of stack on every call, IMO. (See my answer and comments on other answers on Why does Windows64 use a different calling convention from all other OSes on x86-64?)
there is nothing trivial about printf, not the first choice for what you are trying to do but, turned out to be not overly complicated.
Something simpler:
extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <fun+0x9>
9: 48 83 c4 08 add $0x8,%rsp
d: 83 c0 07 add $0x7,%eax
10: c3 retq
and the stack is used. eax used for the return.
now use a pointer
extern unsigned int more_fun ( unsigned int * );
unsigned int fun ( unsigned int x )
{
return(more_fun(&x)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 18 sub $0x18,%rsp
4: 89 7c 24 0c mov %edi,0xc(%rsp)
8: 48 8d 7c 24 0c lea 0xc(%rsp),%rdi
d: e8 00 00 00 00 callq 12 <fun+0x12>
12: 48 83 c4 18 add $0x18,%rsp
16: 83 c0 07 add $0x7,%eax
19: c3 retq
and there you go edi used as in your case.
two pointers
extern unsigned int more_fun ( unsigned int *, unsigned int * );
unsigned int fun ( unsigned int x, unsigned int y )
{
return(more_fun(&x,&y)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 18 sub $0x18,%rsp
4: 89 7c 24 0c mov %edi,0xc(%rsp)
8: 89 74 24 08 mov %esi,0x8(%rsp)
c: 48 8d 7c 24 0c lea 0xc(%rsp),%rdi
11: 48 8d 74 24 08 lea 0x8(%rsp),%rsi
16: e8 00 00 00 00 callq 1b <fun+0x1b>
1b: 48 83 c4 18 add $0x18,%rsp
1f: 83 c0 07 add $0x7,%eax
22: c3 retq
now edi and esi are used. all looking like it is the calling convention to me...
a string
extern unsigned int more_fun ( const char * );
unsigned int fun ( void )
{
return(more_fun("Hello World")+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <fun+0xe>
e: 48 83 c4 08 add $0x8,%rsp
12: 83 c0 07 add $0x7,%eax
15: c3 retq
eax is not prepped as in printf, so perhaps eax has something to do with the number of parameters that follow, try putting more parameters on your printf and see if eax going in changes.
if I add -m32 on my command line then edi is not used.
00000000 <fun>:
0: 83 ec 18 sub $0x18,%esp
3: 68 00 00 00 00 push $0x0
8: e8 fc ff ff ff call 9 <fun+0x9>
d: 83 c4 1c add $0x1c,%esp
10: 83 c0 07 add $0x7,%eax
13: c3
I suspect the push is a placeholder for the linker to push the address to the string when the linker patches up the binary, this was just an object. So my guess is when you have a 64 bit pointer, the first one or two go into registers then the stack is used after it runs out of registers.
Obviously the compiler works so this is conforming to the compilers calling convention.
extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x+5)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 c7 05 add $0x5,%edi
7: e8 00 00 00 00 callq c <fun+0xc>
c: 48 83 c4 08 add $0x8,%rsp
10: 83 c0 07 add $0x7,%eax
13: c3 retq
correction based on Peter's comment. Yeah it does appear that registers are being used here.
And since he mentioned 6 parameters, lets try 7.
extern unsigned int more_fun
(
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int
);
unsigned int fun (
unsigned int a,
unsigned int b,
unsigned int c,
unsigned int d,
unsigned int e,
unsigned int f,
unsigned int g
)
{
return(more_fun(a+1,b+2,c+3,d+4,e+5,f+6,g+7)+17);
}
0000000000000000 <fun>:
0: 48 83 ec 10 sub $0x10,%rsp
4: 83 c1 04 add $0x4,%ecx
7: 83 c2 03 add $0x3,%edx
a: 8b 44 24 18 mov 0x18(%rsp),%eax
e: 83 c6 02 add $0x2,%esi
11: 83 c7 01 add $0x1,%edi
14: 41 83 c1 06 add $0x6,%r9d
18: 41 83 c0 05 add $0x5,%r8d
1c: 83 c0 07 add $0x7,%eax
1f: 50 push %rax
20: e8 00 00 00 00 callq 25 <fun+0x25>
25: 48 83 c4 18 add $0x18,%rsp
29: 83 c0 11 add $0x11,%eax
2c: c3 retq
and sure enough that 7th parameter was pulled from the stack modified and put back on the stack before the call. The other 6 in registers.
I'm reading http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html and trying to verify things by hand.
The disassembly of _start is given as follows:
080482e0 <_start>: 80482e0: 31 ed xor %ebp,%ebp
80482e2: 5e pop %esi
80482e3: 89 e1 mov %esp,%ecx
80482e5: 83 e4 f0 and $0xfffffff0,%esp
80482e8: 50 push %eax
80482e9: 54 push %esp
80482ea: 52 push %edx
80482eb: 68 00 84 04 08 push $0x8048400
80482f0: 68 a0 83 04 08 push $0x80483a0
80482f5: 51 push %ecx 80482f6: 56 push %esi
80482f7: 68 94 83 04 08 push $0x8048394
80482fc: e8 c3 ff ff ff call 80482c4 <__libc_start_main#plt>
8048301: f4 hlt
However my own disassembly is as follows:
0x00000000004003c0 <+0>: xor ebp,ebp
0x00000000004003c2 <+2>: mov r9,rdx
0x00000000004003c5 <+5>: pop rsi
0x00000000004003c6 <+6>: mov rdx,rsp
0x00000000004003c9 <+9>: and rsp,0xfffffffffffffff0
0x00000000004003cd <+13>: push rax
0x00000000004003ce <+14>: push rsp
0x00000000004003cf <+15>: mov r8,0x400650
0x00000000004003d6 <+22>: mov rcx,0x4005c0
0x00000000004003dd <+29>: mov rdi,0x40051c
0x00000000004003e4 <+36>: call 0x4003b0 <__libc_start_main#plt>
0x00000000004003e9 <+41>: hlt
0x00000000004003ea <+42>: nop
0x00000000004003eb <+43>: nop
So my question is simply what happened to the arguments for __libc_start_main that are pushed on the stack in the first disassembly?
My file is "ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), not stripped." i.e. dynamically linked as the file in http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html is as well.
Is this because my system is 64-bit and the system used in the link is 32-bit? Has the definition of __libc_start_main changed?