Confused about 64-bit registers - ASM - linux

I'm currently learning assembly, I'm using Intel syntax on a 64bit ubuntu, using nasm.
So I found two websites that reference the syscalls numbers:
This one for 32 bit registers (eax, ebx, ...): https://syscalls.kernelgrok.com
This one for 64 bits registers (rax, rbx, ...): https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64
The thing is that my code doesn't work when I'm using the 64 bits syscall numbers, but it works when I replace the 'e' from the 32 bit registers by a 'r', so for instance in sys_write I use rbx to store the fd instead of rdi as and it works.
I'm quite lost right now. This code doesn't work:
message db 'Hello, World', 10
section .text
global _start
_start: mov rax,4
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
mov rax, 1
mov rdi, 0
syscall

Run strace ./my_program - you make a bogus stat system call, then write which succeeds, then fall off the end and segfault.
$ strace ./foo
execve("./foo", ["./foo"], 0x7ffe6b91aa00 /* 51 vars */) = 0
stat(0x1, 0x401000) = -1 EFAULT (Bad address)
write(0, "Hello, World\n", 13Hello, World
) = 13
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
It's not register names that are your problem, it's call numbers. You're using 32-bit call numbers but calling the 64-bit syscall ABI.
Call numbers and calling convention both differ.
int 0x80 system calls only ever look at the low 32 bits of registers which is why you shouldn't use them in 64-bit code.
The code you posted in a comment with mov rcx, message would work fine with mov ecx, message and so on, if it works with mov rcx, message. See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Note that writing a 32-bit register zero-extends into the full 64-bit register so you should always use mov edi, 1 instead of mov rdi, 1. (Although NASM will do this optimization for you to save code-size; they're so equivalent that some assemblers will silently do it for you.)

Related

Segmentation fault (core dumped) when I run my assembly code [duplicate]

I've been looking at a tutorial for assembly, and I'm trying to get a hello world program to run. I am using Bash on Ubuntu on Windows.
Here is the assembly:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
I am using these commands to create the executable:
nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o -m elf_x86_64
And I run it using:
./hello
The program then seems to run without a segmentation fault or error, but it produces no output.
I can't figure out why the code won't produce an output, but I wonder if using Bash on Ubuntu on Windows has anything to do with it? Why doesn't it produce output and how can I fix it?
Related: WSL2 does allow 32-bit user-space programs, WSL1 doesn't. See Does WSL 2 really support 32 bit program? re: making sure you're actually using WSL2. The rest of this answer was written before WLS2 existed.
The issue is with Ubuntu for Windows (Windows Subsystem for Linux version 1). It only supports the 64-bit syscall interface and not the 32-bit x86 int 0x80 system call mechanism.
Besides not being able to use int 0x80 (32-bit compatibility) in 64-bit binaries, Ubuntu on Windows (WSL1) doesn't support running 32-bit executables either. (Same as if you'd built a real Linux kernel without CONFIG_IA32_EMULATION, like some Gentoo users do.)
You need to convert from using int 0x80 to syscall. It's not difficult. A different set of registers are used for a syscall and the system call numbers are different from their 32-bit counterparts. Ryan Chapman's blog has information on the syscall interface, the system calls, and their parameters. Sys_write and Sys_exit are defined this way:
%rax System call %rdi %rsi %rdx %r10 %r8 %r9
----------------------------------------------------------------------------------
0 sys_read unsigned int fd char *buf size_t count
1 sys_write unsigned int fd const char *buf size_t count
60 sys_exit int error_code
Using syscall also clobbers RCX and the R11 registers. They are considered volatile. Don't rely on them being the same value after the syscall.
Your code could be modified to be:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov rsi,msg ;message to write
mov edi,1 ;file descriptor (stdout)
mov eax,edi ;system call number (sys_write)
syscall ;call kernel
xor edi, edi ;Return value = 0
mov eax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
Note: in 64-bit code if the destination register of an instruction is 32-bit (like EAX, EBX, EDI, ESI etc) the processor zero extends the result into the upper 32-bits of the 64-bit register. mov edi,1 has the same effect as mov rdi,1.
This answer isn't a primer on writing 64-bit code, only about using the syscall interface. If you are interested in the nuances of writing code that calls the C library, and conforms to the 64-bit System V ABI there are reasonable tutorials to get you started like Ray Toal's NASM tutorial. He discusses stack alignment, the red zone, register usage, and a basic overview of the 64-bit System V calling convention.
As already pointed out in comments by Ross Ridge, don't use 32-bit calling of kernel functions when you compile 64bit.
Either compile for 32bit or "translate" the code into 64 bit syscalls.
Here is what that could look like:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov rdx,len ;message length
mov rsi,msg ;message to write
mov rdi,1 ;file descriptor (stdout)
mov rax,1 ;system call number (sys_write)
syscall ;call kernel
mov rax,60 ;system call number (sys_exit)
mov rdi,0 ;add this to output error code 0(to indicate program terminated without errors)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string

Segmentation fault in assembly code using macros [duplicate]

I've been looking at a tutorial for assembly, and I'm trying to get a hello world program to run. I am using Bash on Ubuntu on Windows.
Here is the assembly:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
I am using these commands to create the executable:
nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o -m elf_x86_64
And I run it using:
./hello
The program then seems to run without a segmentation fault or error, but it produces no output.
I can't figure out why the code won't produce an output, but I wonder if using Bash on Ubuntu on Windows has anything to do with it? Why doesn't it produce output and how can I fix it?
Related: WSL2 does allow 32-bit user-space programs, WSL1 doesn't. See Does WSL 2 really support 32 bit program? re: making sure you're actually using WSL2. The rest of this answer was written before WLS2 existed.
The issue is with Ubuntu for Windows (Windows Subsystem for Linux version 1). It only supports the 64-bit syscall interface and not the 32-bit x86 int 0x80 system call mechanism.
Besides not being able to use int 0x80 (32-bit compatibility) in 64-bit binaries, Ubuntu on Windows (WSL1) doesn't support running 32-bit executables either. (Same as if you'd built a real Linux kernel without CONFIG_IA32_EMULATION, like some Gentoo users do.)
You need to convert from using int 0x80 to syscall. It's not difficult. A different set of registers are used for a syscall and the system call numbers are different from their 32-bit counterparts. Ryan Chapman's blog has information on the syscall interface, the system calls, and their parameters. Sys_write and Sys_exit are defined this way:
%rax System call %rdi %rsi %rdx %r10 %r8 %r9
----------------------------------------------------------------------------------
0 sys_read unsigned int fd char *buf size_t count
1 sys_write unsigned int fd const char *buf size_t count
60 sys_exit int error_code
Using syscall also clobbers RCX and the R11 registers. They are considered volatile. Don't rely on them being the same value after the syscall.
Your code could be modified to be:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov rsi,msg ;message to write
mov edi,1 ;file descriptor (stdout)
mov eax,edi ;system call number (sys_write)
syscall ;call kernel
xor edi, edi ;Return value = 0
mov eax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
Note: in 64-bit code if the destination register of an instruction is 32-bit (like EAX, EBX, EDI, ESI etc) the processor zero extends the result into the upper 32-bits of the 64-bit register. mov edi,1 has the same effect as mov rdi,1.
This answer isn't a primer on writing 64-bit code, only about using the syscall interface. If you are interested in the nuances of writing code that calls the C library, and conforms to the 64-bit System V ABI there are reasonable tutorials to get you started like Ray Toal's NASM tutorial. He discusses stack alignment, the red zone, register usage, and a basic overview of the 64-bit System V calling convention.
As already pointed out in comments by Ross Ridge, don't use 32-bit calling of kernel functions when you compile 64bit.
Either compile for 32bit or "translate" the code into 64 bit syscalls.
Here is what that could look like:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov rdx,len ;message length
mov rsi,msg ;message to write
mov rdi,1 ;file descriptor (stdout)
mov rax,1 ;system call number (sys_write)
syscall ;call kernel
mov rax,60 ;system call number (sys_exit)
mov rdi,0 ;add this to output error code 0(to indicate program terminated without errors)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string

NASM: Two subsequent file writes not working

Trying to run this code so I could create bmp file - I write headline, then I want to write content to file - everything works separately but not together.
I'm using hexedit for checking file if it matters.
If I run the code with headline writing part it works.
If I run the code with content writing part it works.
When I run both of them it doesn't.
Any ideas?
Here's the code:
section .text
global _start
_start:
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov eax,8 ;system call number - open/create file
mov ebx,msg ;file name
mov ecx,111111111b ;file mode
int 0x80 ;call kernel
; save file descriptor to r8d
mov r8d, eax
; write headline to file
mov eax, 4 ;write 54 bytes to file
mov ebx, r8d ;load file desc
mov ecx, bmpheadline ;load adress of memory to write
mov edx, 54 ;load number of bytes
int 0x80 ;call kernel
; write content to file
mov eax, 4 ;number of syscall - write
mov ebx, r8d ;load file desc
;add ebx, 54 ;add 54 bytes to location of file location
mov ecx, empty_space ;load adress of buffer
mov edx, 40054 ;load number of bytes
int 0x80 ;call kernel
; close file
mov eax, 6 ;load syscall number - close
mov ebx, r8d ;load file desc
int 0x80 ;call kernel
; exit program
mov eax,1 ;syscall number - exit
int 0x80 ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40054
There are 2 significant problems with your code. R8D (R8) is not preserved across int 0x80. Secondly, the add ebx, 54 in your original question is incorrect. You don't need to change the file descriptor.
SYSCALL preferred for 64-bit code
int 0x80 is an IA32 compatibility feature in the Linux kernel. This feature is generally turned on in most 64-bit Linux kernels but it can be turned off. You can't use 64-bit pointers with int 0x80. This prevents using stack based addresses as parameters to int 0x80. For these reasons it is preferred that you use SYSCALL for 64-bit programs rather than int 0x80.
More on using SYSCALL in Linux can be found in Ryan Chapman's Blog . Note that the system call numbers used with SYSCALL are different from int 0x80. The registers used to pass parameters are different, and the only registers not preserved across a SYSCALL are RCX, R11, and RAX (RAX being the return value). The system calling convention is thoroughly described in the current 64-bit Linux System V ABI. In particular:
User-level applications use as integer registers for passing the sequence
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.
A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
The number of the syscall has to be passed in register %rax.
System-calls are limited to six arguments, no argument is passed directly on
the stack.
Returning from the syscall, register %rax contains the result of the
system-call. A value in the range between -4095 and -1 indicates an error,
it is -errno.
Only values of class INTEGER or class MEMORY are passed to the kernel
If you want your 64-bit code to work with INT 0x80
INT 0x80 has some quirks in 64-bit code. It adheres to the 32-bit calling convention of preserving RBX, RCX, RDX, RSI, RDI, and RBP. For the other 64-bit registers the 64-bit C calling convention applies. From the ABI:
A.2.1 Calling Conventions
... applications that like to call system calls should use the functions from the C library. The interface between the C library and the Linux kernel is the same as for the user-level applications
See Figure 3.4: Register Usage in the 64-bit Linux ABI linked to above. R12, R13, R14, and R15 will also be preserved.
This means that RAX, R8, R9, R10, and R11 will not be preserved. Change your code from using R8D to one of the registers that are saved. R12D for example.
Why does your code fail?
Since R8D is not preserved across int 0x80 it is being potentially overwritten by the SYS_WRITE system calls. The first write works, the second one doesn't because R8D was likely trashed by the first SYS_WRITE, and R8D likely became an invalid file descriptor. Using one of the registers that will be preserved should solve this issue. If you run out of registers you can always allocate space on the stack for temporary storage.
You add 54 to the file descriptor without explanation; I have absolutely no clue why you are doing that.
I suspect that you misunderstand file descriptors and believe that you need to add the total amount of data written so far to the descriptor. This is not so. The descriptor does not change from the time you open/create to the time that you close the file handle. It's a really good idea to verify that your comments are synced with your code. When you are writing detailed comments, lines with no comments become immediately suspect (the add instruction, for instance.)
You appear to have some issues from the very beginning. For example, your comments say "open file" and "sys_write" but your code doesn't match. What your code currently does is attempt to call sys_creat. What you are calling the file descriptor is actually the permissions mode. ebx should contain the address of the string representing the path... The comments seem to indicate it should be stdout, but it's clearly not. :)
You also don't state whether this is for 64 bit or 32 bit Linux. Your code seems to mix the two, using r8d and using int 0x80.
(Posted solution on behalf of the OP).
Here is the source code of solution, 64 bit version:
section .text
global _start ;must be declared for linker (ld)
_start: ;tell linker entry point
;#######################################################################
;### This program creates empty bmp file - 64 bit version ##############
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov rax,85 ;system call number - open/create file
mov rdi,msg ;file name
;flags
mov rsi,111111111b ;mode
syscall ;call kernel
; save file descriptor
mov r8, rax
; write headline to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, bmpheadline ;load adress of memory to write
mov rdx, 54 ;load number of bytes
syscall ;call kernel
; write content to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, empty_space ;load adress of memory to write
mov rdx, 40000 ;load number of bytes
syscall ;call kernel
; close file
mov rax, 3 ;load syscall number - close
mov rdi, r8 ;load file desc
syscall ;call kernel
; exit program
mov rax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
len equ $ - msg ;length of our dear string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40000
Makefile:
all: a.out
a.out: main.o
ld main.o
main.o: main64.asm
nasm -f elf64 main64.asm -o main.o

Linux Assembly x86_64 create a file using command line parameters

I'm trying to teach myself assembly. I've found a good website; however, everything is written for x86 and I use a 64-bit machine.
I know what the problem is, but I don't know how to fix it. If I run the program with strace, then here is the results:
execve("./file", ["./file", "hello"], [/* 94 vars */]) = 0
creat(NULL, 0) = -1 EINVAL (Invalid argument)
write(0, NULL, 0 <unfinished ...>
+++ exited with 234 +++
So, I know that when I call creat, that the file name "hello" is not being passed and as a result I don't have a file descriptor.
Here is the code in question:
section .text
global _start
_start:
pop rbx ; argc
pop rbx ; prog name
pop rbx ; the file name
mov eax,85 ; syscall number for creat()
mov ecx,00644Q ; rw,r,r
int 80h ; call the kernel
I know that I can use the syscall command; however, I want to use interrupt.
Any ideas or suggestions would be helpful. Also, I'm using nasm an assembler.
You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall instruction with the appropriate registers:
global _start
_start:
mov eax,85 ; syscall number for creat()
mov rdi,[rsp+16] ; argv[1], the file name
mov esi,00644Q ; rw,r,r
syscall ; call the kernel
xor edi, edi ; exit code 0
mov eax, 60 ; syscall number for exit()
syscall
See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.

Why doesn't the 'syscall' instruction work under Linux?

I have a very basic assembly program that runs in Linux userland:
section .text
global _start
_start:
mov edx, 14
mov ecx, msg
mov ebx, 1
mov eax, 4
syscall
mov eax, 1
syscall
section .data
msg db "Hello, World!", 0xA
However, this doesn't work as it is, but only if I replace the syscalls with int 0x80. Don't these do the same thing? I know that syscall was designed to be lower-latency, but other than that, I didn't think there was a difference. Why doesn't it work?
syscall works only in x86-64 operating systems and you should put the system call number in rax register instead of eax.
See this website for more information.
The syscall instruction doesn't store "return RIP" or "return RSP" anywhere, so these are typically stored in registers in previous instructions before the syscall instruction is used.
I suspect that on Linux RCX and RDX are used for this purpose; and that all the other parameters end up in different registers because of this.

Resources