Linux x86-64 Hello World and register usage for parameters

Linux x86-64 Hello World and register usage for parameters - linux

I found this page which has a Hello World example for x86-64 on Linux:
http://blog.markloiseau.com/2012/05/64-bit-hello-world-in-linux-assembly-nasm/
; 64-bit "Hello World!" in Linux NASM
global _start ; global entry point export for ld
section .text
_start:
; sys_write(stdout, message, length)
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, message ; message address
mov rdx, length ; message string length
syscall
; sys_exit(return_code)
mov rax, 60 ; sys_exit
mov rdi, 0 ; return 0 (success)
syscall
section .data
message: db 'Hello, world!',0x0A ; message and newline
length: equ $-message ; NASM definition pseudo-instruction
The Author says:
An integer value representing the system_write call is placed in the
first register, followed by its arguments. When the system call and
its arguments are all in their proper registers, the system is called
and the message is displayed.
What does he mean by "proper" registers/What would be an im"proper" register?
What happens if I have a function with more arguments than I have registers?
Does rax always point to the function call (this would always be a system call?)? Is that its only purpose?

By "the proper registers", the author means the registers specified by the x86-64 ABI, in the Linux Kernel Calling Conventions section. The system call number goes in rax, and arguments go in rdi, rsi, rdx, r10, r8 and r9, in that order.
This calling convention (especially the use of syscall!) is only used for system calls, which can only have up to six arguments. Application functions use a different (but similar) calling convention which spills some arguments to the stack, or to other registers.

Related

Why does NASM system call number perform 2 different operations despite specifying the same call number

I have the following 'hello world' code written in NASM x86_64 assembly:
section .data
msg db "Hello World", 0xa
msg_L equ $-msg
section .text
global _start
_start:
mov eax, 4 ; sys_write call
mov ebx, 1 ; stdout
mov ecx, msg
mov edx, msg_L
int 0x80 ; call kernel
mov eax, 1 ; sys_exit call
int 0x80 ; call kernel
In the first 'function' under the _start: section, mov ebx, 1 is used to specify the standard output for printing. Later, after the first kernel call, mov eax, 1 is used to specify the sys_exit system call. I don't understand how specifying the same system call number yields 2 different results when the kernel is called. This NASM tutorial specifies 1 as the system call number for sys_exit, yet the program does not exit after the first use of that number, and uses it for stdout instead. Can someone explain to me why this is?

You are not specifying the same system call number.
eax, not ebx, is used to specify system call numbers.
mov ebx, 1 sets the value of ebx and doesn't set the value of eax.
The system call number is set to 4 via mov eax, 4 when using the standard output set by mov ebx, 1.

Why is 64-bit NASM insisting on the RSI register ? Why can't I put "hello world" into RCX register and use SYSCALL?

I have this x86 assembly code for a "hello world" program.
global _start
section .text
_start:
mov eax, 1 ; system call for write
mov ebx, 1 ; file handle 1 is stdout
mov ecx, message ; address of string to output
mov edx, message_len ; length of the string
syscall ; invoke operating system to do the write
mov eax, 60 ; system call for exit
mov ebx, 0 ; exit code 0
syscall ; invoke operating system to ex
section .data
message: db "Hello, World!!!!", 10 ; newline at the end
message_len equ $-message ; length of the string
This doesn't compile with nasm -felf64 hello.asm && ld hello.o && ./a.out on a 64-bit Linux machine.
But if I change the third line mov ecx, message to mov rsi, message it works!
My question is why is 64-bit NASM insisting on the RSI register? Because I have seen people compiling with ECX on 32-bit Arch Linux.

x86 does not use the same calling convention as x64.
In x86, the first argument is EBX which contains the descriptor, ECX contains the buffer, EDX contains the length and EAX contains the system call ordinal.
In x64, the first argument is contained in RDI, second in RSI, third in RDX and fourth in RCX while RAX contains the ordinal for the system call.
That's why your call is working on x86 but needs to be adjusted to work on x64 as well.

I'm getting a segmentation fault in my assembly program [duplicate]

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.
section .text
global _start ; Declare global entry oint for ld
_start:
jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack
code:
xor rax, rax ; Clean up the registers
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
; Try to change the N to a space
pop rsi ; Get address from stack
mov al, 0x20 ; Load 0x20 into RAX
mov [rsi], al; Why segfault?
xor rax, rax; Clear again
; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
mov al, 0x01 ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
mov rdi, rax ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
;pop rsi ; Second Parameter, RSI = Popped address of message from stack
mov dl, 25 ; Third Parameter, RDX = Length of message
syscall ; Call Write
; exit(rdi) = exit(return value)
xor rax, rax ; write returns # of bytes written in rax, need to clean it up again
add rax, 0x3C ; 64bit syscall exit is 0x3C
xor rdi, rdi ; Return value is in rdi (First parameter), zero it to return 0
syscall ; Call Exit
message:
call code ; Pushes the address of the string onto the stack
db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A
This culprit is this line:
mov [rsi], al; Why segfault?
If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?
The authors code is the following:
global _start
_start:
jmp short ender
starter:
pop ebx ;get the address of the string
xor eax, eax
mov al, 0x20
mov [ebx+7], al ;put a NULL where the N is in the string
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 25 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter
db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A
And I've compiled that using:
nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>
But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated

I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)
For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.
Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.
This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.
Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.
Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.
(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

Using interrupt 0x80 on 64-bit Linux [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I have a simple 64-bit assembly program which is intended to print an 'O' and 'K' followed by a newline.
However, the 'K' is never printed. One of the goals of the programs is to print the value in the lower bits of the rax register as ASCII letter. The program is specifically for 64-bit Linux, written for educational purposes, so there is no need to use C-style system calls.
I suspect that the problem either lies with mov QWORD [rsp], rax or mov rcx, rsp.
Currently, the program only outputs 'O' followed by a newline.
How can one change the program to make it use the value in rax and then print a 'K' so that the complete output is 'OK' followed by a newline?
bits 64
section .data
o: db "O" ; 'O'
nl: dq 10 ; newline
section .text
;--- function main ---
global main ; make label available to the linker
global _start ; make label available to the linker
_start: ; starting point of the program
main: ; name of the function
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, o ; parameter #2 is &o
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- rax = 'K' ---
mov rax, 75 ; rax = 75
;--- call interrupt 0x80 ---
sub rsp, 8 ; make some space for storing rax on the stack
mov QWORD [rsp], rax ; move rax to a memory location on the stack
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, rsp ; parameter #2 is rsp
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
add rsp, 8 ; move the stack pointer back
;--- call interrupt 0x80 ---
mov rax, 4 ; function call: 4
mov rbx, 1 ; parameter #1 is 1
mov rcx, nl ; parameter #2 is nl
mov rdx, 1 ; parameter #3 is length of string
int 0x80 ; perform the call
;--- exit program ---
mov rax, 1 ; function call: 1
xor rbx, rbx ; return code 0
int 0x80 ; exit program
Update: Note that this is a 64-bit x86 Assembly program that uses int 80h, and is very different from a 32-bit x86 Assembly program that uses int 80h.

Obviously you write a 64-bit program and you use the "int 0x80" instruction. "int 0x80" however only works correctly in 32-bit programs.
The address of the stack is in a range that cannot be accessed by 32-bit programs. Therefore it is quite probable that "int 0x80"-style system calls do not allow accessing this memory area.
To solve this problem there are two possibilities:
Compile as 32-bit application (use 32-bit registers like EAX instead of 64-bit registers like RAX). When you link without using any shared libraries 32-bit programs will work perfectly on 64-bit Linux.
Use "syscall"-style system calls instead of "int 0x80"-style system calls. The use of these differs a lot from "int 0x80"-style ones!
32-bit code:
mov eax,4 ; In "int 0x80" style 4 means: write
mov ebx,1 ; ... and the first arg. is stored in ebx
mov ecx,esp ; ... and the second arg. is stored in ecx
mov edx,1 ; ... and the third arg. is stored in edx
int 0x80
64-bit code:
mov rax,1 ; In "syscall" style 1 means: write
mov rdi,1 ; ... and the first arg. is stored in rdi (not rbx)
mov rsi,rsp ; ... and the second arg. is stored in rsi (not rcx)
mov rdx,1 ; ... and the third arg. is stored in rdx
syscall
--- Edit ---
Background information:
"int 0x80" is intended for 32-bit programs. When called from a 64-bit program it behaves the same way it would behave like if it has been called from a 32-bit program (using the 32-bit calling convention).
This also means that the parameters for "int 0x80" will be passed in 32-bit registers and the upper 32 bits of the 64-bit registers are ignored.
(I just tested that on Ubuntu 16.10, 64 bit.)
This however means that you can only access memory below 2^32 (or even below 2^31) when using "int 0x80" because you cannot pass an address above 2^32 in a 32-bit register.
If the data to be written is located at an address below 2^31 you may use "int 0x80" to write the data. If it is located above 2^32 you can't. The stack (RSP) is very likely located above 2^32 so you cannot write data on the stack using "int 0x80".
Because it is very likely that your program will use memory above 2^32 I have written: "int 0x80 does not work with 64-bit programs."

lost in assembly NASM ELF64 world

So as part of my Computer Architecture class I need to get comfortable with Assembly, or at least comfortable enough, I'm trying to read the input to the user and then reprint it (for the time being), this is my how I tried to laid this out in pseudo code:
Declare msg variable (this will be printed on screen)
Declare length variable (to be used by the sys_write function) with long enough value
Pop the stack once to get the program name
Pop the stack again to get the first argument
Move the current value of the stack into the msg variable
Move msg to ECX (sys_write argument)
Mov length to EDX (sys_write argument)
Call sys_write using standard output
Kernel call
Call sys_exit and leave
This is my code so far
section .data
msg: db 'placeholder text',0xa;
length: dw 0x123;
section .text
global _start
_start:
pop rbx;
pop rbx;
; this is not working when I leave it in I get this error:
; invalid combination of opcode and operands
;mov msg, rbx;
mov ecx, msg;
mov edx, length;
mov eax, 4;
mov ebx, 1;
int 0x80;
mov ebx, 0;
mov eax, 1;
int 0x80;
When I leave it out (not moving the argument into msg), I get this output
placeholder text
#.shstrtab.text.data
�#�$�`��
We really just begun with NASM so ANY help will be greatly appreciated, I've been looking at this http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html#stack and http://syscalls.kernelgrok.com/ adapting the examples adapting the registry names to the best of my understanding to match http://www.nasm.us/doc/nasmdo11.html
I'm running Ubuntu 12.04, 64bit compiling (not even sure if this is the right word) NASM under ELF64, I'm sorry to ask such a silly question but I have been unable to find an easy enough tutorial for NASM that uses 64bits.

When the program is called the stack should looks like this:
+----------------+
| ... | <--- rsp + 24
+----------------+
| argument 2 | <--- rsp + 16
+----------------+
| argument 1 | <--- rsp + 8
+----------------+
| argument count | <--- rsp
+----------------+
The first argument is the name of your program and the second is the user input (if the user typed anything as an argument). So the count of the arguments is at least 1.
The arguments for system calls in 64-mode are stored in the following registers:
rax (system call number)
rdi (1st argument)
rsi (2nd argument)
rdx (3rd argument)
rcx (4th argument)
r8 (5th argument)
r9 (6th argument)
And the system call is called with syscall. The numbers of all the system calls can be found here here (yes they are different from the numbers in 32 bit mode).
This is the program which should do your stuff:
section .data
msg: db 'Requesting 1 argument!', 10 ; message + newline
section .text
global _start
_start:
cmp qword [rsp], 2 ; check if argument count is 2
jne fail ; if not jump to the fail lable
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, [rsp+16] ; get the address of the argument
mov rdx, 1 ; one character (length 1)
loop:
cmp byte [rsi], 0 ; check if current character is 0
je exit ; if 0 then jump to the exit lable
syscall
inc rsi ; jump to the next character
jmp loop ; repeat
fail:
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
lea rsi, [rel msg] ; move the address of the lable msg in rsi
mov rdx, 23 ; length = 23
syscall
exit:
mov rax, 60 ; sys_exit
mov rdi, 0 ; with code 0
syscall
Since the code isn't prefect in many ways you may want to modify it.

You've followed the instructions quite literally -- and this is expected output.
The stack variable that you write to the message, is just some binary value -- to be exact, it's a pointer to an array of strings containing the command line arguments.
To make sense of that, either you would have to print those strings, or convert the pointer to ascii string eg. "0x12313132".

My OS is Ubuntu 64-bit. Compiling your code produced the error:
nasm print3.asm
print3.asm:12: error: instruction not supported in 16-bit mode
print3.asm:13: error: instruction not supported in 16-bit mode
Exactly where the "pop rbx" is located.
Adding "BITS 64" to the top of the asm file solved the problem:
BITS 64
section .data
msg: db 'placeholder text',0xa;
length: dw 0x123;
...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string