When compiling below code:
global main
extern printf, scanf
section .data
msg: db "Enter a number: ",10,0
format:db "%d",0
section .bss
number resb 4
section .text
main:
mov rdi, msg
mov al, 0
call printf
mov rsi, number
mov rdi, format
mov al, 0
call scanf
mov rdi,format
mov rsi,[number]
inc rsi
mov rax,0
call printf
ret
using:
nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example
and then run
./example
it runs, print: enter a number:
but then crashes and prints:
Segmentation fault (core dumped)
So printf works fine but scanf not.
What am I doing wrong with scanf so?
Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call.
Or better push/pop a dummy register, e.g. push rdx / pop rcx, or a call-preserved register like RBP you actually wanted to save anyway. You need the total change to RSP to be an odd multiple of 8 counting all pushes and sub rsp, from function entry to any call.
i.e. 8 + 16*n bytes for whole number n.
On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved,
main and stack alignment, and Calling printf in x86_64 using GNU assembler. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.
See also Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
To put it another way, RSP % 16 == 8 on function entry, and you need to ensure RSP % 16 == 0 before you call a function. How you do this doesn't matter. (Not all functions will actually crash if you don't, but the ABI does require/guarantee it.)
gcc's code-gen for glibc scanf now depends on 16-byte stack alignment
even when AL == 0.
It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf, which regular scanf calls after spilling its register args to the stack1. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf, fscanf, etc.)
I downloaded Ubuntu 18.04's libc6 binary package: https://packages.ubuntu.com/bionic/amd64/libc6/download and extracted the files (with 7z x blah.deb and tar xf data.tar, because 7z knows how to extract a lot of file formats).
I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.
With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run. With layout reg, the disassembly window looks like this at the point where it received SIGSEGV:
│0x7ffff786b49a <_IO_vfscanf+602> cmp r12b,0x25 │
│0x7ffff786b49e <_IO_vfscanf+606> jne 0x7ffff786b3ff <_IO_vfscanf+447> │
│0x7ffff786b4a4 <_IO_vfscanf+612> mov rax,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ab <_IO_vfscanf+619> add rax,QWORD PTR [rbp-0x458] │
│0x7ffff786b4b2 <_IO_vfscanf+626> movq xmm0,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ba <_IO_vfscanf+634> mov DWORD PTR [rbp-0x678],0x0 │
│0x7ffff786b4c4 <_IO_vfscanf+644> mov QWORD PTR [rbp-0x608],rax │
│0x7ffff786b4cb <_IO_vfscanf+651> movzx eax,BYTE PTR [rbx+0x1] │
│0x7ffff786b4cf <_IO_vfscanf+655> movhps xmm0,QWORD PTR [rbp-0x608] │
>│0x7ffff786b4d6 <_IO_vfscanf+662> movaps XMMWORD PTR [rbp-0x470],xmm0 │
So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.
I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.
Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.
Related
I found this "Hello" (shellcode) assembly program:
SECTION .data
SECTION .text
global main
main:
mov rax, 1
mov rsi, 0x6f6c6c6548 ; "Hello" is stored in reverse order "olleH"
push rsi
mov rsi, rsp
mov rdx, 5
syscall
mov rax, 60
syscall
And I found that mov rdi, 1 is missing. In other "hello world" programs that instruction appears so I would like to understand why this happens.
I was going to say it's an intentional trick or hack to save code bytes, using argc as the file descriptor. (1 if you run it from the shell without extra command line args). main(int argc, char**argv) gets its args in EDI and RSI respectively, in the x86-64 SysV calling convention used on Linux.
But given the other choices, like mov rax, 1 instead of mov eax, edi, it's probably just a bug that got overlooked because the code happened to work.
It would not work in real shellcode for a code-injection attack, where execution would probably reach this code with garbage other than 0, 1, or 2 in EDI. The shellcode test program on the tutorial you linked calls a const char[] of machine code as the only thing in main, which will normally compile to asm that doesn't touch RDI.
This code wouldn't work for code-injection attacks based on strcpy or other C-string overflows either, since the machine code contains 00 bytes as part of mov eax, 1, mov edx, 5, and the end of that character string.
Also, modern linkers don't link .rodata into an executable segment, and -zexecstack only affects the actual stack, not all readable memory. So that shellcode test won't work, although I expect it did when written. See How to get c code to execute hex machine code? for working ways, like using a local array and compiling with -zexecstack.
That tutorial is overall not great, probably something this guy wrote while learning. (But not as bad as I expected based on this bug and the use of Kali; it's at least decently written, just missing some tricks.)
Since you're using NASM, you don't need to manually waste time looking up ASCII codes and getting the byte order correct. Unlike some assemblers, mov rsi, "Hello" / push rsi results in those bytes being in memory in source order.
You also don't need an empty .data section, especially when making shellcode which is just a self-contained snippet of machine code which can't reference anything outside itself.
Writing a 32-bit register implicitly zero-extends to 64-bit. NASM optimizes mov rax,1 into mov eax,1 for you (as you can see in the objdump -d AT&D disassembly; objdump -drwC -Mintel to use Intel-syntax disassembly similar to NASM.)
The following should work:
global main
main:
mov rax, `Hello\n ` ; non-zero padding to fill 8 bytes
push rax
mov rsi, rsp
push 1 ; push imm8
pop rax ; __NR_write
mov edi, eax ; STDOUT_FD is also 1
lea edx, [rax-1 + 6] ; EDX = 6; using 3 bytes with no zeros
syscall
mov al, 60 ; assuming write success, RAX = 5, zero outside the low byte
;lea eax, [rdi-1 + 60] ; the safe way that works even with ./hello >&- to return -EBADF
syscall
This is fewer bytes of machine code than the original, and avoids \x00 bytes which strcpy would stop on. I changed the string to end with a newline, using NASM backticks to support C-style escape sequences like \n as 0x0a byte.
Running normally (I linked it into a static executable without CRT, despite it being called main instead of _start. ld foo.o -o foo):
$ strace ./foo > /dev/null
execve("./foo", ["./foo"], 0x7ffecdc70a20 /* 54 vars */) = 0
write(1, "Hello\n", 6) = 6
exit(1) = ?
Running with stdout closed to break the mov al, 60 __NR_exit hack:
$ strace ./foo >&-
execve("./foo", ["./foo"], 0x7ffe3d24a240 /* 54 vars */) = 0
write(1, "Hello\n", 6) = -1 EBADF (Bad file descriptor)
syscall_0xffffffffffffff3c(0x1, 0x7ffd0b37a988, 0x6, 0, 0, 0) = -1 ENOSYS (Function not implemented)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffffffffda} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
To still exit cleanly, use lea eax, [rdi-1 + 60] (3 bytes) instead of mov al, 60 (2 bytes) to set RAX according to the unmodified EDI, instead of depending on the upper bytes of RAX being zero which they aren't after an error return.
See also https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)
When compiling below code:
global main
extern printf, scanf
section .data
msg: db "Enter a number: ",10,0
format:db "%d",0
section .bss
number resb 4
section .text
main:
mov rdi, msg
mov al, 0
call printf
mov rsi, number
mov rdi, format
mov al, 0
call scanf
mov rdi,format
mov rsi,[number]
inc rsi
mov rax,0
call printf
ret
using:
nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example
and then run
./example
it runs, print: enter a number:
but then crashes and prints:
Segmentation fault (core dumped)
So printf works fine but scanf not.
What am I doing wrong with scanf so?
Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call.
Or better push/pop a dummy register, e.g. push rdx / pop rcx, or a call-preserved register like RBP you actually wanted to save anyway. You need the total change to RSP to be an odd multiple of 8 counting all pushes and sub rsp, from function entry to any call.
i.e. 8 + 16*n bytes for whole number n.
On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved,
main and stack alignment, and Calling printf in x86_64 using GNU assembler. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.
See also Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
To put it another way, RSP % 16 == 8 on function entry, and you need to ensure RSP % 16 == 0 before you call a function. How you do this doesn't matter. (Not all functions will actually crash if you don't, but the ABI does require/guarantee it.)
gcc's code-gen for glibc scanf now depends on 16-byte stack alignment
even when AL == 0.
It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf, which regular scanf calls after spilling its register args to the stack1. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf, fscanf, etc.)
I downloaded Ubuntu 18.04's libc6 binary package: https://packages.ubuntu.com/bionic/amd64/libc6/download and extracted the files (with 7z x blah.deb and tar xf data.tar, because 7z knows how to extract a lot of file formats).
I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.
With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run. With layout reg, the disassembly window looks like this at the point where it received SIGSEGV:
│0x7ffff786b49a <_IO_vfscanf+602> cmp r12b,0x25 │
│0x7ffff786b49e <_IO_vfscanf+606> jne 0x7ffff786b3ff <_IO_vfscanf+447> │
│0x7ffff786b4a4 <_IO_vfscanf+612> mov rax,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ab <_IO_vfscanf+619> add rax,QWORD PTR [rbp-0x458] │
│0x7ffff786b4b2 <_IO_vfscanf+626> movq xmm0,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ba <_IO_vfscanf+634> mov DWORD PTR [rbp-0x678],0x0 │
│0x7ffff786b4c4 <_IO_vfscanf+644> mov QWORD PTR [rbp-0x608],rax │
│0x7ffff786b4cb <_IO_vfscanf+651> movzx eax,BYTE PTR [rbx+0x1] │
│0x7ffff786b4cf <_IO_vfscanf+655> movhps xmm0,QWORD PTR [rbp-0x608] │
>│0x7ffff786b4d6 <_IO_vfscanf+662> movaps XMMWORD PTR [rbp-0x470],xmm0 │
So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.
I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.
Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.
I am trying to learn the assembly language from a Linux Ubuntu 16.04 x64.
For now I have the following problem:
- scan an integer n and print the numbers from 1 to n.
For n = 5 I should have 1 2 3 4 5.
I tried to do it with scanf and printf but after I input the number, it exits.
The code is:
;nasm -felf64 code.asm && gcc code.o && ./a.out
SECTION .data
message1: db "Enter the number: ",0
message1Len: equ $-message1
message2: db "The numbers are:", 0
formatin: db "%d",0
formatout: db "%d",10,0 ; newline, nul
integer: times 4 db 0 ; 32-bits integer = 4 bytes
SECTION .text
global main
extern scanf
extern printf
main:
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
int 80h
mov rax, integer
loop:
push rax
push formatout
call printf
add esp, 8
dec rax
jnz loop
mov rax,0
ret
I am aware that in this loop I would have the inverse output (5 4 3 2 1 0), but I did not know how to set the condition.
The command I'm using is the following:
nasm -felf64 code.asm && gcc code.o && ./a.out
Can you please help me find where I'm going wrong?
There are several problems:
1. The parameters to printf, as discussed in the comments. In x86-64, the first few parameters are passed in registers.
2. printf does not preserve the value of eax.
3. The stack is misaligned.
4. rbx is used without saving the caller's value.
5. The address of integer is being loaded instead of its value.
6. Since printf is a varargs function, eax needs to be set to 0 before the call.
7. Spurious int 80h after the call to scanf.
I'll repeat the entire function in order to show the necessary changes in context.
main:
push rbx ; This fixes problems 3 and 4.
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
mov ebx, [integer] ; fix problems 2 and 5
loop:
mov rdi, formatout ; fix problem 1
mov esi, ebx
xor eax, eax ; fix problem 6
call printf
dec ebx
jnz loop
pop rbx ; restore caller's value
mov rax,0
ret
P.S. To make it count up instead of down, change the loop like this:
mov ebx, 1
loop:
<call printf>
inc ebx
cmp ebx, [integer]
jle loop
You are calling scanf correctly, using the x86-64 System V calling convention. It leaves its return value in eax. After successful conversion of one operand (%d), it returns with eax = 1.
... correct setup for scanf, including zeroing AL.
call scanf ; correct
int 80h ; insane: system call with eax = scanf return value
Then you run int 80h, which makes a 32-bit legacy-ABI system call using eax=1 as the code to determine which system call. (see What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).
eax=1 / int 80h is sys_exit on Linux. (unistd_32.h has __NR_exit = 1). Use a debugger; that would have shown you which instruction was making your program exit.
Your title (before I corrected it) said you got a segmentation fault, but I tested on my x86-64 desktop and that's not the case. It exits cleanly using an int 80h exit system call. (But in code that does segfault, use a debugger to find out which instruction.) strace decodes int 0x80 system calls incorrectly in 64-bit processes, using the 64-bit syscall call numbers from unistd_64.h, not the 32-bit unistd_32.h call numbers.
Your code was close to working: you use the int 0x80 32-bit ABI correctly for sys_write, and only pass it 32-bit args. (The pointer args fit in 32 bits because static code/data is always placed in the low 2GiB of virtual address space in the default code model on x86-64. Exactly for this reason, so you can use compact instructions like mov edi, formatin to put addresses in registers, or use them as immediates or rel32 signed displacements.)
OTOH I think you were doing that for the wrong reason. And as #prl points out, you forgot to maintain 16-byte stack alignment.
Also, mixing system calls with C stdio functions is usually a bad idea. Stdio uses internal buffers instead of always making a system call on every function call, so things can appear out of order, or a read can be waiting for user input when there's already data in the stdio buffer for stdin.
Your loop is broken in several ways, too. You seem to be trying to call printf with the 32-bit calling convention (args on the stack).
Even in 32-bit code, this is broken, because printf's return vale is in eax. So your loop is infinite, because printf returns the number of characters printed. That's at least two from the %d\n format string, so dec rax / jnz will always jump.
In the x86-64 SysV ABI, you need to zero al before calling printf (with xor eax,eax), if you didn't pass any FP args in XMM registers. You also have to pass args in rdi, rsi, ..., like for scanf.
You also add rsp, 8 after pushing two 8-byte values, so the stack grows forever. (But you never return, so the eventual segfault will be on stack overflow, not on trying to return with rsp not pointing to the return address.)
Decide whether you're making 32-bit or 64-bit code, and only copy/paste from examples for the mode and OS you're targeting. (Note that 64-bit code can and often does use mostly 32-bit registers, though.)
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) (which does include a NASM section with a handy asm-link script that assembles and links into a static binary). But since you're writing main instead of _start and are using libc functions, you should just link with gcc -m32 (if you decide to use 32-bit code instead of replacing the 32-bit parts of your program with 64-bit function-calling and system-call conventions).
See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.
I have a function which prints text and a floating point number. Here is a version which does not use main
extern printf
extern _exit
section .data
hello: db 'Hello world! %f',10,0
pi: dq 3.14159
section .text
global _start
_start:
xor eax, eax
lea rdi, [rel hello]
movsd xmm0, [rel pi]
mov eax, 1
call printf
mov rax, 0
jmp _exit
I assemble and link this like this
nasm -felf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -melf_x86_64
This runs fine. However, now I want to do this using main.
global main
extern printf
section .data
hello: db 'Hello world! %f',10,0
pi: dq 3.14159
section .text
main:
sub rsp, 8
xor eax, eax
lea rdi, [rel hello]
movsd xmm0, [rel pi]
mov eax, 1
call printf
mov rax, 0
add rsp, 8
ret
I assembly and link like this
nasm -felf64 hello_main.asm
gcc hello_main.o
This runs fine as well. However, I had to subtract eight bytes from the stack pointer before calling printf and then add eight bytes to the stack pointer after otherwise I get a segmentation fault.
Looking at the stack pointer I see that without using main it's 16-byte aligned but with main it's only eight byte aligned. The fact that eight bytes has to be subtracted and added says that it's always 8-byte aligned and never 16-byte aligned (unless I misunderstand something). Why is this? I thought with x86_64 code we could assume that the stack is 16-byte aligned (at least for standard library function calls which I would think includes main).
According to the ABI, the stack pointer + 8 should be kept 16 byte aligned upon entry to functions. The reason you have to subtract 8 is that call itself places 8 bytes of return address on the stack, thereby violating this constraint. Basically you have to make sure the total stack pointer movement is a multiple of 16, including the return address. Thus the stack pointer needs to be moved by multiple of 16 + 8 to leave room for the return address.
As for _start, I don't think you can rely on it working without manual alignment either. It just so happens that in your case it works due to the things already on the stack.