Why is 64-bit NASM insisting on the RSI register ? Why can't I put "hello world" into RCX register and use SYSCALL? - linux

I have this x86 assembly code for a "hello world" program.
global _start
section .text
_start:
mov eax, 1 ; system call for write
mov ebx, 1 ; file handle 1 is stdout
mov ecx, message ; address of string to output
mov edx, message_len ; length of the string
syscall ; invoke operating system to do the write
mov eax, 60 ; system call for exit
mov ebx, 0 ; exit code 0
syscall ; invoke operating system to ex
section .data
message: db "Hello, World!!!!", 10 ; newline at the end
message_len equ $-message ; length of the string
This doesn't compile with nasm -felf64 hello.asm && ld hello.o && ./a.out on a 64-bit Linux machine.
But if I change the third line mov ecx, message to mov rsi, message it works!
My question is why is 64-bit NASM insisting on the RSI register? Because I have seen people compiling with ECX on 32-bit Arch Linux.

x86 does not use the same calling convention as x64.
In x86, the first argument is EBX which contains the descriptor, ECX contains the buffer, EDX contains the length and EAX contains the system call ordinal.
In x64, the first argument is contained in RDI, second in RSI, third in RDX and fourth in RCX while RAX contains the ordinal for the system call.
That's why your call is working on x86 but needs to be adjusted to work on x64 as well.

Related

Why does NASM system call number perform 2 different operations despite specifying the same call number

I have the following 'hello world' code written in NASM x86_64 assembly:
section .data
msg db "Hello World", 0xa
msg_L equ $-msg
section .text
global _start
_start:
mov eax, 4 ; sys_write call
mov ebx, 1 ; stdout
mov ecx, msg
mov edx, msg_L
int 0x80 ; call kernel
mov eax, 1 ; sys_exit call
int 0x80 ; call kernel
In the first 'function' under the _start: section, mov ebx, 1 is used to specify the standard output for printing. Later, after the first kernel call, mov eax, 1 is used to specify the sys_exit system call. I don't understand how specifying the same system call number yields 2 different results when the kernel is called. This NASM tutorial specifies 1 as the system call number for sys_exit, yet the program does not exit after the first use of that number, and uses it for stdout instead. Can someone explain to me why this is?
You are not specifying the same system call number.
eax, not ebx, is used to specify system call numbers.
mov ebx, 1 sets the value of ebx and doesn't set the value of eax.
The system call number is set to 4 via mov eax, 4 when using the standard output set by mov ebx, 1.

Using interrupt 0x80 on 64-bit Linux [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I have a simple 64-bit assembly program which is intended to print an 'O' and 'K' followed by a newline.
However, the 'K' is never printed. One of the goals of the programs is to print the value in the lower bits of the rax register as ASCII letter. The program is specifically for 64-bit Linux, written for educational purposes, so there is no need to use C-style system calls.
I suspect that the problem either lies with mov QWORD [rsp], rax or mov rcx, rsp.
Currently, the program only outputs 'O' followed by a newline.
How can one change the program to make it use the value in rax and then print a 'K' so that the complete output is 'OK' followed by a newline?
bits 64
section .data
o: db "O" ; 'O'
nl: dq 10 ; newline
section .text
;--- function main ---
global main ; make label available to the linker
global _start ; make label available to the linker
_start: ; starting point of the program
main: ; name of the function
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, o ; parameter #2 is &o
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- rax = 'K' ---
mov rax, 75 ; rax = 75
;--- call interrupt 0x80 ---
sub rsp, 8 ; make some space for storing rax on the stack
mov QWORD [rsp], rax ; move rax to a memory location on the stack
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, rsp ; parameter #2 is rsp
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
add rsp, 8 ; move the stack pointer back
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, nl ; parameter #2 is nl
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- exit program ---
mov rax, 1 ; function call: 1
xor rbx, rbx ; return code 0
int 0x80 ; exit program
Update: Note that this is a 64-bit x86 Assembly program that uses int 80h, and is very different from a 32-bit x86 Assembly program that uses int 80h.
Obviously you write a 64-bit program and you use the "int 0x80" instruction. "int 0x80" however only works correctly in 32-bit programs.
The address of the stack is in a range that cannot be accessed by 32-bit programs. Therefore it is quite probable that "int 0x80"-style system calls do not allow accessing this memory area.
To solve this problem there are two possibilities:
Compile as 32-bit application (use 32-bit registers like EAX instead of 64-bit registers like RAX). When you link without using any shared libraries 32-bit programs will work perfectly on 64-bit Linux.
Use "syscall"-style system calls instead of "int 0x80"-style system calls. The use of these differs a lot from "int 0x80"-style ones!
32-bit code:
mov eax,4 ; In "int 0x80" style 4 means: write
mov ebx,1 ; ... and the first arg. is stored in ebx
mov ecx,esp ; ... and the second arg. is stored in ecx
mov edx,1 ; ... and the third arg. is stored in edx
int 0x80
64-bit code:
mov rax,1 ; In "syscall" style 1 means: write
mov rdi,1 ; ... and the first arg. is stored in rdi (not rbx)
mov rsi,rsp ; ... and the second arg. is stored in rsi (not rcx)
mov rdx,1 ; ... and the third arg. is stored in rdx
syscall
--- Edit ---
Background information:
"int 0x80" is intended for 32-bit programs. When called from a 64-bit program it behaves the same way it would behave like if it has been called from a 32-bit program (using the 32-bit calling convention).
This also means that the parameters for "int 0x80" will be passed in 32-bit registers and the upper 32 bits of the 64-bit registers are ignored.
(I just tested that on Ubuntu 16.10, 64 bit.)
This however means that you can only access memory below 2^32 (or even below 2^31) when using "int 0x80" because you cannot pass an address above 2^32 in a 32-bit register.
If the data to be written is located at an address below 2^31 you may use "int 0x80" to write the data. If it is located above 2^32 you can't. The stack (RSP) is very likely located above 2^32 so you cannot write data on the stack using "int 0x80".
Because it is very likely that your program will use memory above 2^32 I have written: "int 0x80 does not work with 64-bit programs."

Segmentation Fault on simple ASM code

For my Question when I tried to create a example of NASM under ubuntu 64-bit version and execute it after assembled and linked into ELF. It return error messages as below when I execute
NASM -f elf64 -o firstasm.o firstasm.asm
ld -o firstasm firstasm.o
firstasm
Segmentation fault (core dumped)
My NASM code would be below where I tried to perform simple write() and exit() function
section .data ;Data segment
msg db "This line is test", 0x0a
section .text ;text segment
global _start ;Default entry point for ELF linking
_start:
; SYSCALL : write (1,msg,14)
xor rax,rax
xor rbx,rbx
xor rcx,rcx
xor rdx,rdx
mov rax,64 ; make a syscall write 4
mov rbx,1 ; put 1 into rbx and also stdout is 1
mov rcx,msg ;put address of string in rcx
mov rdx,19 ; put length of string into rdx
int 0x80 ; call kernel to made syscall
; SYSCALL : exit(0)
xor rax,rax
xor rbx,rbx
mov rax,93 ; make a syscall exit 93
mov rbx, 0 ; store 0 argument into rbx, success to exit
int 0x80
Can someone pointed me what is problem to my NASM code and suggestions to fix the problem of "Segmentation fault (core dumped)". Appreciate thanks to anyone could help.
Uh, where are you getting the system call numbers? Are you pulling them out of the air?
64bit sys_exit = 60
32bit sys_exit = 1
64bit sys_write = 1
32bit sys_write = 4
Linux 64-bit System Call List
Linux 32-bit System Call List
Linux System Call Table for x86_64
The above link will show what registers are used for what.
the 32 bit system call - int 0x80 does not use the 64bit registers and the register parameters are different. The 64 bit system call is - syscall.
32 bit sys_exit:
mov ebx, ERR_CODE
mov eax, sys_exit ; 1
int 80h
64 bit sys_exit:
mov rdi, ERR_CODE
mov rax, sys_exit ; 60
syscall
see the difference?
if you want to create an inc file of the system call names and numbers for YOUR system (maybe they are different for some reason)
grep __NR /usr/include/asm/unistd_64.h | grep define | sed -e 's/\#/\%/' -e 's/__NR_/sys_/' > unistd_64.inc
of course, adjust the path to unistd_64.h for your system. It will be the same for 32 bit but the file is called unistd_32.h I believe.
Now that I showed you the difference between the exit sys call, and with the provided links, you can fix your write system call to be correct.

Linux x86-64 Hello World and register usage for parameters

I found this page which has a Hello World example for x86-64 on Linux:
http://blog.markloiseau.com/2012/05/64-bit-hello-world-in-linux-assembly-nasm/
; 64-bit "Hello World!" in Linux NASM
global _start ; global entry point export for ld
section .text
_start:
; sys_write(stdout, message, length)
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, message ; message address
mov rdx, length ; message string length
syscall
; sys_exit(return_code)
mov rax, 60 ; sys_exit
mov rdi, 0 ; return 0 (success)
syscall
section .data
message: db 'Hello, world!',0x0A ; message and newline
length: equ $-message ; NASM definition pseudo-instruction
The Author says:
An integer value representing the system_write call is placed in the
first register, followed by its arguments. When the system call and
its arguments are all in their proper registers, the system is called
and the message is displayed.
What does he mean by "proper" registers/What would be an im"proper" register?
What happens if I have a function with more arguments than I have registers?
Does rax always point to the function call (this would always be a system call?)? Is that its only purpose?
By "the proper registers", the author means the registers specified by the x86-64 ABI, in the Linux Kernel Calling Conventions section. The system call number goes in rax, and arguments go in rdi, rsi, rdx, r10, r8 and r9, in that order.
This calling convention (especially the use of syscall!) is only used for system calls, which can only have up to six arguments. Application functions use a different (but similar) calling convention which spills some arguments to the stack, or to other registers.

NASM x86_64 having trouble writing command line arguments, returning -14 in rax

I am using elf64 compilation and trying to take a parameter and write it out to the console.
I am calling the function as ./test wooop
After stepping through with gdb there seems to be no problem, everything is set up ok:
rax: 0x4
rbx: 0x1
rcx: pointing to string, x/6cb $rcx gives 'w' 'o' 'o' 'o' 'p' 0x0
rdx: 0x5 <---correctly determining length
after the int 80h rax contains -14 and nothing is printed to the console.
If I define a string in .data, it just works. gdb shows the value of $rcx in the same way.
Any ideas? here is my full source
%define LF 0Ah
%define stdout 1
%define sys_exit 1
%define sys_write 4
global _start
section .data
usagemsg: db "test {string}",LF,0
testmsg: db "wooop",0
section .text
_start:
pop rcx ;this is argc
cmp rcx, 2 ;one argument
jne usage
pop rcx
pop rcx ; argument now in rcx
test rcx,rcx
jz usage
;mov rcx, testmsg ;<-----uncomment this to print ok!
call print
jmp exit
usage:
mov rcx, usagemsg
call print
jmp exit
calclen:
push rdi
mov rdi, rcx
push rcx
xor rcx,rcx
not rcx
xor al,al
cld
repne scasb
not rcx
lea rdx, [rcx-1]
pop rcx
pop rdi
ret
print:
push rax
push rbx
push rdx
call calclen
mov rax, sys_write
mov rbx, stdout
int 80h
pop rdx
pop rbx
pop rax
ret
exit:
mov rax, sys_exit
mov rbx, 0
int 80h
Thanks
EDIT: After changing how I make my syscalls as below it works fine. Thanks all for your help!
sys_write is now 1
sys_exit is now 60
stdout now goes in rdi, not rbx
the string to write is now set in rsi, not rcx
int 80h is replaced by syscall
I'm still running 32-bit hardware, so this is a wild asmed guess! As you probably know, 64-bit system call numbers are completely different, and "syscall" is used instead of int 80h. However int 80h and 32-bit system call numbers can still be used, with 64-bit registers truncated to 32-bit. Your tests indicate that this works with addresses in .data, but with a "stack address", it returns -14 (-EFAULT - bad address). The only thing I can think of is that truncating rcx to ecx results in a "bad address" if it's on the stack. I don't know where the stack is in 64-bit code. Does this make sense?
I'd try it with "proper" 64-bit system call numbers and registers and "syscall", and see if that helps.
Best,
Frank
As you said, you're using ELF64 as the target of the compilation. This is, unfortunately, your first mistake. Using the "old" system call interface on Linux, e.g. int 80h is possible only when running 32-bit tasks. Obviously, you could simply assemble your source as ELF32, but then you're going to lose all the advantages if running tasks in 64-bit mode, namely the extra registers and 64-bit operations.
In order to make system calls in 64-bit tasks, the "new" system call interface must be used. The system call itself is done with the syscall instruction. The kernel destroys registers rcx and r11. The number of the system is specified in the register rax, while the arguments of the call are passed in rdi, rsi, rdx, r10, r8 and r9. Keep in mind that the numbers of the syscalls are different than the ones in 32-bit mode. You can find them in unistd_64.h, which is usually in /usr/include/asm or wherever your distribution stores it.

Resources