Scan an integer and print the interval (1, integer) in NASM - linux

I am trying to learn the assembly language from a Linux Ubuntu 16.04 x64.
For now I have the following problem:
- scan an integer n and print the numbers from 1 to n.
For n = 5 I should have 1 2 3 4 5.
I tried to do it with scanf and printf but after I input the number, it exits.
The code is:
;nasm -felf64 code.asm && gcc code.o && ./a.out
SECTION .data
message1: db "Enter the number: ",0
message1Len: equ $-message1
message2: db "The numbers are:", 0
formatin: db "%d",0
formatout: db "%d",10,0 ; newline, nul
integer: times 4 db 0 ; 32-bits integer = 4 bytes
SECTION .text
global main
extern scanf
extern printf
main:
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
int 80h
mov rax, integer
loop:
push rax
push formatout
call printf
add esp, 8
dec rax
jnz loop
mov rax,0
ret
I am aware that in this loop I would have the inverse output (5 4 3 2 1 0), but I did not know how to set the condition.
The command I'm using is the following:
nasm -felf64 code.asm && gcc code.o && ./a.out
Can you please help me find where I'm going wrong?

There are several problems:
1. The parameters to printf, as discussed in the comments. In x86-64, the first few parameters are passed in registers.
2. printf does not preserve the value of eax.
3. The stack is misaligned.
4. rbx is used without saving the caller's value.
5. The address of integer is being loaded instead of its value.
6. Since printf is a varargs function, eax needs to be set to 0 before the call.
7. Spurious int 80h after the call to scanf.
I'll repeat the entire function in order to show the necessary changes in context.
main:
push rbx ; This fixes problems 3 and 4.
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
mov ebx, [integer] ; fix problems 2 and 5
loop:
mov rdi, formatout ; fix problem 1
mov esi, ebx
xor eax, eax ; fix problem 6
call printf
dec ebx
jnz loop
pop rbx ; restore caller's value
mov rax,0
ret
P.S. To make it count up instead of down, change the loop like this:
mov ebx, 1
loop:
<call printf>
inc ebx
cmp ebx, [integer]
jle loop

You are calling scanf correctly, using the x86-64 System V calling convention. It leaves its return value in eax. After successful conversion of one operand (%d), it returns with eax = 1.
... correct setup for scanf, including zeroing AL.
call scanf ; correct
int 80h ; insane: system call with eax = scanf return value
Then you run int 80h, which makes a 32-bit legacy-ABI system call using eax=1 as the code to determine which system call. (see What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).
eax=1 / int 80h is sys_exit on Linux. (unistd_32.h has __NR_exit = 1). Use a debugger; that would have shown you which instruction was making your program exit.
Your title (before I corrected it) said you got a segmentation fault, but I tested on my x86-64 desktop and that's not the case. It exits cleanly using an int 80h exit system call. (But in code that does segfault, use a debugger to find out which instruction.) strace decodes int 0x80 system calls incorrectly in 64-bit processes, using the 64-bit syscall call numbers from unistd_64.h, not the 32-bit unistd_32.h call numbers.
Your code was close to working: you use the int 0x80 32-bit ABI correctly for sys_write, and only pass it 32-bit args. (The pointer args fit in 32 bits because static code/data is always placed in the low 2GiB of virtual address space in the default code model on x86-64. Exactly for this reason, so you can use compact instructions like mov edi, formatin to put addresses in registers, or use them as immediates or rel32 signed displacements.)
OTOH I think you were doing that for the wrong reason. And as #prl points out, you forgot to maintain 16-byte stack alignment.
Also, mixing system calls with C stdio functions is usually a bad idea. Stdio uses internal buffers instead of always making a system call on every function call, so things can appear out of order, or a read can be waiting for user input when there's already data in the stdio buffer for stdin.
Your loop is broken in several ways, too. You seem to be trying to call printf with the 32-bit calling convention (args on the stack).
Even in 32-bit code, this is broken, because printf's return vale is in eax. So your loop is infinite, because printf returns the number of characters printed. That's at least two from the %d\n format string, so dec rax / jnz will always jump.
In the x86-64 SysV ABI, you need to zero al before calling printf (with xor eax,eax), if you didn't pass any FP args in XMM registers. You also have to pass args in rdi, rsi, ..., like for scanf.
You also add rsp, 8 after pushing two 8-byte values, so the stack grows forever. (But you never return, so the eventual segfault will be on stack overflow, not on trying to return with rsp not pointing to the return address.)
Decide whether you're making 32-bit or 64-bit code, and only copy/paste from examples for the mode and OS you're targeting. (Note that 64-bit code can and often does use mostly 32-bit registers, though.)
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) (which does include a NASM section with a handy asm-link script that assembles and links into a static binary). But since you're writing main instead of _start and are using libc functions, you should just link with gcc -m32 (if you decide to use 32-bit code instead of replacing the 32-bit parts of your program with 64-bit function-calling and system-call conventions).
See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

Related

Why is the RDI register missing in this "Hello world" assembly program?

I found this "Hello" (shellcode) assembly program:
SECTION .data
SECTION .text
global main
main:
mov rax, 1
mov rsi, 0x6f6c6c6548 ; "Hello" is stored in reverse order "olleH"
push rsi
mov rsi, rsp
mov rdx, 5
syscall
mov rax, 60
syscall
And I found that mov rdi, 1 is missing. In other "hello world" programs that instruction appears so I would like to understand why this happens.
I was going to say it's an intentional trick or hack to save code bytes, using argc as the file descriptor. (1 if you run it from the shell without extra command line args). main(int argc, char**argv) gets its args in EDI and RSI respectively, in the x86-64 SysV calling convention used on Linux.
But given the other choices, like mov rax, 1 instead of mov eax, edi, it's probably just a bug that got overlooked because the code happened to work.
It would not work in real shellcode for a code-injection attack, where execution would probably reach this code with garbage other than 0, 1, or 2 in EDI. The shellcode test program on the tutorial you linked calls a const char[] of machine code as the only thing in main, which will normally compile to asm that doesn't touch RDI.
This code wouldn't work for code-injection attacks based on strcpy or other C-string overflows either, since the machine code contains 00 bytes as part of mov eax, 1, mov edx, 5, and the end of that character string.
Also, modern linkers don't link .rodata into an executable segment, and -zexecstack only affects the actual stack, not all readable memory. So that shellcode test won't work, although I expect it did when written. See How to get c code to execute hex machine code? for working ways, like using a local array and compiling with -zexecstack.
That tutorial is overall not great, probably something this guy wrote while learning. (But not as bad as I expected based on this bug and the use of Kali; it's at least decently written, just missing some tricks.)
Since you're using NASM, you don't need to manually waste time looking up ASCII codes and getting the byte order correct. Unlike some assemblers, mov rsi, "Hello" / push rsi results in those bytes being in memory in source order.
You also don't need an empty .data section, especially when making shellcode which is just a self-contained snippet of machine code which can't reference anything outside itself.
Writing a 32-bit register implicitly zero-extends to 64-bit. NASM optimizes mov rax,1 into mov eax,1 for you (as you can see in the objdump -d AT&D disassembly; objdump -drwC -Mintel to use Intel-syntax disassembly similar to NASM.)
The following should work:
global main
main:
mov rax, `Hello\n ` ; non-zero padding to fill 8 bytes
push rax
mov rsi, rsp
push 1 ; push imm8
pop rax ; __NR_write
mov edi, eax ; STDOUT_FD is also 1
lea edx, [rax-1 + 6] ; EDX = 6; using 3 bytes with no zeros
syscall
mov al, 60 ; assuming write success, RAX = 5, zero outside the low byte
;lea eax, [rdi-1 + 60] ; the safe way that works even with ./hello >&- to return -EBADF
syscall
This is fewer bytes of machine code than the original, and avoids \x00 bytes which strcpy would stop on. I changed the string to end with a newline, using NASM backticks to support C-style escape sequences like \n as 0x0a byte.
Running normally (I linked it into a static executable without CRT, despite it being called main instead of _start. ld foo.o -o foo):
$ strace ./foo > /dev/null
execve("./foo", ["./foo"], 0x7ffecdc70a20 /* 54 vars */) = 0
write(1, "Hello\n", 6) = 6
exit(1) = ?
Running with stdout closed to break the mov al, 60 __NR_exit hack:
$ strace ./foo >&-
execve("./foo", ["./foo"], 0x7ffe3d24a240 /* 54 vars */) = 0
write(1, "Hello\n", 6) = -1 EBADF (Bad file descriptor)
syscall_0xffffffffffffff3c(0x1, 0x7ffd0b37a988, 0x6, 0, 0, 0) = -1 ENOSYS (Function not implemented)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffffffffda} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
To still exit cleanly, use lea eax, [rdi-1 + 60] (3 bytes) instead of mov al, 60 (2 bytes) to set RAX according to the unmodified EDI, instead of depending on the upper bytes of RAX being zero which they aren't after an error return.
See also https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code

I'm getting a segmentation fault in my assembly program [duplicate]

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.
section .text
global _start ; Declare global entry oint for ld
_start:
jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack
code:
xor rax, rax ; Clean up the registers
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
; Try to change the N to a space
pop rsi ; Get address from stack
mov al, 0x20 ; Load 0x20 into RAX
mov [rsi], al; Why segfault?
xor rax, rax; Clear again
; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
mov al, 0x01 ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
mov rdi, rax ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
;pop rsi ; Second Parameter, RSI = Popped address of message from stack
mov dl, 25 ; Third Parameter, RDX = Length of message
syscall ; Call Write
; exit(rdi) = exit(return value)
xor rax, rax ; write returns # of bytes written in rax, need to clean it up again
add rax, 0x3C ; 64bit syscall exit is 0x3C
xor rdi, rdi ; Return value is in rdi (First parameter), zero it to return 0
syscall ; Call Exit
message:
call code ; Pushes the address of the string onto the stack
db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A
This culprit is this line:
mov [rsi], al; Why segfault?
If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?
The authors code is the following:
global _start
_start:
jmp short ender
starter:
pop ebx ;get the address of the string
xor eax, eax
mov al, 0x20
mov [ebx+7], al ;put a NULL where the N is in the string
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 25 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter
db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A
And I've compiled that using:
nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>
But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated
I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)
For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.
Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.
This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.
Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.
Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.
(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

System call causes Segmentation Fault

I am writing a simple program in assembly that should call setreuid(0,0) and then call exit(). Here is my code:
section .text ; start code section of assembly
global _start
_start:
xor eax, eax ; setruid call
mov al, 0x46 ; get ready for setreuid system call
xor ebx, ebx ; arg 1 (0)
xor ecx, ecx ; arg 2 (0)
int 0x80 ; interrupt for setreuid
mov al, 0x01 ; prepare for exit call
int 0x80 ; interrupt for exit <---- 0x0804806c
When I run this through gdb it gets to 0x0804806c and then it crashes with the message:
0x0804806e in ?? ()
Execution is not within a known function
I am new to assembly so sorry if it's a noob mistake.
Update
I have copy and pasted exactly what I have posted here into exit.asm. Then I have compiled exit.asm using the following commands:
nasm -f elf exit.asm # elf file format for 32-bit linux
ld -o exit exit.o # link
this produces the program exit. When run it I get the following:
****#debian:~/shellcode$ ./exit
Segmentation fault
****#debian:~/shellcode$
What's happening is, setreuid returns a zero on success and a -1 on error. You're probably running as a regular user, who isn't allowed to set the user id of the process. For that reason, the return value of setreuid is -1, which in binary is all bits set for eax. By setting al to 0x01, you're only setting the least significant byte to 1. The high bits are all set, so you're not actually passing 1 in eax. You're effectively passing FFFFFF01. That's not a valid system call, let alone an exit call. When it gets to the second int 0x80, it keeps going to the next instruction, which it isn't allowed to read. Another thing is, you should mov ebx, 0 for the exit call. It just so happens that you xor'ed ebx previously, but that's a potential bug waiting to happen.

Using interrupt 0x80 on 64-bit Linux [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I have a simple 64-bit assembly program which is intended to print an 'O' and 'K' followed by a newline.
However, the 'K' is never printed. One of the goals of the programs is to print the value in the lower bits of the rax register as ASCII letter. The program is specifically for 64-bit Linux, written for educational purposes, so there is no need to use C-style system calls.
I suspect that the problem either lies with mov QWORD [rsp], rax or mov rcx, rsp.
Currently, the program only outputs 'O' followed by a newline.
How can one change the program to make it use the value in rax and then print a 'K' so that the complete output is 'OK' followed by a newline?
bits 64
section .data
o: db "O" ; 'O'
nl: dq 10 ; newline
section .text
;--- function main ---
global main ; make label available to the linker
global _start ; make label available to the linker
_start: ; starting point of the program
main: ; name of the function
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, o ; parameter #2 is &o
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- rax = 'K' ---
mov rax, 75 ; rax = 75
;--- call interrupt 0x80 ---
sub rsp, 8 ; make some space for storing rax on the stack
mov QWORD [rsp], rax ; move rax to a memory location on the stack
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, rsp ; parameter #2 is rsp
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
add rsp, 8 ; move the stack pointer back
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, nl ; parameter #2 is nl
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- exit program ---
mov rax, 1 ; function call: 1
xor rbx, rbx ; return code 0
int 0x80 ; exit program
Update: Note that this is a 64-bit x86 Assembly program that uses int 80h, and is very different from a 32-bit x86 Assembly program that uses int 80h.
Obviously you write a 64-bit program and you use the "int 0x80" instruction. "int 0x80" however only works correctly in 32-bit programs.
The address of the stack is in a range that cannot be accessed by 32-bit programs. Therefore it is quite probable that "int 0x80"-style system calls do not allow accessing this memory area.
To solve this problem there are two possibilities:
Compile as 32-bit application (use 32-bit registers like EAX instead of 64-bit registers like RAX). When you link without using any shared libraries 32-bit programs will work perfectly on 64-bit Linux.
Use "syscall"-style system calls instead of "int 0x80"-style system calls. The use of these differs a lot from "int 0x80"-style ones!
32-bit code:
mov eax,4 ; In "int 0x80" style 4 means: write
mov ebx,1 ; ... and the first arg. is stored in ebx
mov ecx,esp ; ... and the second arg. is stored in ecx
mov edx,1 ; ... and the third arg. is stored in edx
int 0x80
64-bit code:
mov rax,1 ; In "syscall" style 1 means: write
mov rdi,1 ; ... and the first arg. is stored in rdi (not rbx)
mov rsi,rsp ; ... and the second arg. is stored in rsi (not rcx)
mov rdx,1 ; ... and the third arg. is stored in rdx
syscall
--- Edit ---
Background information:
"int 0x80" is intended for 32-bit programs. When called from a 64-bit program it behaves the same way it would behave like if it has been called from a 32-bit program (using the 32-bit calling convention).
This also means that the parameters for "int 0x80" will be passed in 32-bit registers and the upper 32 bits of the 64-bit registers are ignored.
(I just tested that on Ubuntu 16.10, 64 bit.)
This however means that you can only access memory below 2^32 (or even below 2^31) when using "int 0x80" because you cannot pass an address above 2^32 in a 32-bit register.
If the data to be written is located at an address below 2^31 you may use "int 0x80" to write the data. If it is located above 2^32 you can't. The stack (RSP) is very likely located above 2^32 so you cannot write data on the stack using "int 0x80".
Because it is very likely that your program will use memory above 2^32 I have written: "int 0x80 does not work with 64-bit programs."

Segmentation Fault on simple ASM code

For my Question when I tried to create a example of NASM under ubuntu 64-bit version and execute it after assembled and linked into ELF. It return error messages as below when I execute
NASM -f elf64 -o firstasm.o firstasm.asm
ld -o firstasm firstasm.o
firstasm
Segmentation fault (core dumped)
My NASM code would be below where I tried to perform simple write() and exit() function
section .data ;Data segment
msg db "This line is test", 0x0a
section .text ;text segment
global _start ;Default entry point for ELF linking
_start:
; SYSCALL : write (1,msg,14)
xor rax,rax
xor rbx,rbx
xor rcx,rcx
xor rdx,rdx
mov rax,64 ; make a syscall write 4
mov rbx,1 ; put 1 into rbx and also stdout is 1
mov rcx,msg ;put address of string in rcx
mov rdx,19 ; put length of string into rdx
int 0x80 ; call kernel to made syscall
; SYSCALL : exit(0)
xor rax,rax
xor rbx,rbx
mov rax,93 ; make a syscall exit 93
mov rbx, 0 ; store 0 argument into rbx, success to exit
int 0x80
Can someone pointed me what is problem to my NASM code and suggestions to fix the problem of "Segmentation fault (core dumped)". Appreciate thanks to anyone could help.
Uh, where are you getting the system call numbers? Are you pulling them out of the air?
64bit sys_exit = 60
32bit sys_exit = 1
64bit sys_write = 1
32bit sys_write = 4
Linux 64-bit System Call List
Linux 32-bit System Call List
Linux System Call Table for x86_64
The above link will show what registers are used for what.
the 32 bit system call - int 0x80 does not use the 64bit registers and the register parameters are different. The 64 bit system call is - syscall.
32 bit sys_exit:
mov ebx, ERR_CODE
mov eax, sys_exit ; 1
int 80h
64 bit sys_exit:
mov rdi, ERR_CODE
mov rax, sys_exit ; 60
syscall
see the difference?
if you want to create an inc file of the system call names and numbers for YOUR system (maybe they are different for some reason)
grep __NR /usr/include/asm/unistd_64.h | grep define | sed -e 's/\#/\%/' -e 's/__NR_/sys_/' > unistd_64.inc
of course, adjust the path to unistd_64.h for your system. It will be the same for 32 bit but the file is called unistd_32.h I believe.
Now that I showed you the difference between the exit sys call, and with the provided links, you can fix your write system call to be correct.

Resources