Assembly LINUX, x86-64 - Calling convention

Assembly LINUX, x86-64 - Calling convention - linux

I am currently learning the basics of assembly language for LINUX OS, x84-64 processors, GCC compilers and I came across an example which tries to translate into assembly the following C function:
unsigned long fact(unsigned n)
{
if (n<=1)
return 1;
else
return n*fact(n-1);
}
This is the solution proposed:
.intel_syntax noprefix
.text
.global fact
.type fact, #function
fact: PUSH RBP
MOV RBP, RSP
CMP EDI, 1
JBE retour_un
PUSH RDI
DEC RDI
SUB RSP, 8
CALL fact
ADD RSP, 8
POP RDI
MUL RDI
JMP retour
retour_un: MOV RAX, 1
retour: POP RBP
RET
However, I don't exactly how this works. From what I have read the CALL instruction places on the stack the value of RIP and then jumps to wherever its argument shows. In this case, I understand up to the point where CALL fact is used. If this instruction is implemented everytime then everytime the program will start form the beginning until RDI==1 in which case it will jump to retour_un and everything that there is in between will never be executed. Could someone explain to me where I am mistaken and how this segment of assembly code actually works?

Related

Scan an integer and print the interval (1, integer) in NASM

I am trying to learn the assembly language from a Linux Ubuntu 16.04 x64.
For now I have the following problem:
- scan an integer n and print the numbers from 1 to n.
For n = 5 I should have 1 2 3 4 5.
I tried to do it with scanf and printf but after I input the number, it exits.
The code is:
;nasm -felf64 code.asm && gcc code.o && ./a.out
SECTION .data
message1: db "Enter the number: ",0
message1Len: equ $-message1
message2: db "The numbers are:", 0
formatin: db "%d",0
formatout: db "%d",10,0 ; newline, nul
integer: times 4 db 0 ; 32-bits integer = 4 bytes
SECTION .text
global main
extern scanf
extern printf
main:
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
int 80h
mov rax, integer
loop:
push rax
push formatout
call printf
add esp, 8
dec rax
jnz loop
mov rax,0
ret
I am aware that in this loop I would have the inverse output (5 4 3 2 1 0), but I did not know how to set the condition.
The command I'm using is the following:
nasm -felf64 code.asm && gcc code.o && ./a.out
Can you please help me find where I'm going wrong?

There are several problems:
1. The parameters to printf, as discussed in the comments. In x86-64, the first few parameters are passed in registers.
2. printf does not preserve the value of eax.
3. The stack is misaligned.
4. rbx is used without saving the caller's value.
5. The address of integer is being loaded instead of its value.
6. Since printf is a varargs function, eax needs to be set to 0 before the call.
7. Spurious int 80h after the call to scanf.
I'll repeat the entire function in order to show the necessary changes in context.
main:
push rbx ; This fixes problems 3 and 4.
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
mov ebx, [integer] ; fix problems 2 and 5
loop:
mov rdi, formatout ; fix problem 1
mov esi, ebx
xor eax, eax ; fix problem 6
call printf
dec ebx
jnz loop
pop rbx ; restore caller's value
mov rax,0
ret
P.S. To make it count up instead of down, change the loop like this:
mov ebx, 1
loop:
<call printf>
inc ebx
cmp ebx, [integer]
jle loop

You are calling scanf correctly, using the x86-64 System V calling convention. It leaves its return value in eax. After successful conversion of one operand (%d), it returns with eax = 1.
... correct setup for scanf, including zeroing AL.
call scanf ; correct
int 80h ; insane: system call with eax = scanf return value
Then you run int 80h, which makes a 32-bit legacy-ABI system call using eax=1 as the code to determine which system call. (see What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).
eax=1 / int 80h is sys_exit on Linux. (unistd_32.h has __NR_exit = 1). Use a debugger; that would have shown you which instruction was making your program exit.
Your title (before I corrected it) said you got a segmentation fault, but I tested on my x86-64 desktop and that's not the case. It exits cleanly using an int 80h exit system call. (But in code that does segfault, use a debugger to find out which instruction.) strace decodes int 0x80 system calls incorrectly in 64-bit processes, using the 64-bit syscall call numbers from unistd_64.h, not the 32-bit unistd_32.h call numbers.
Your code was close to working: you use the int 0x80 32-bit ABI correctly for sys_write, and only pass it 32-bit args. (The pointer args fit in 32 bits because static code/data is always placed in the low 2GiB of virtual address space in the default code model on x86-64. Exactly for this reason, so you can use compact instructions like mov edi, formatin to put addresses in registers, or use them as immediates or rel32 signed displacements.)
OTOH I think you were doing that for the wrong reason. And as #prl points out, you forgot to maintain 16-byte stack alignment.
Also, mixing system calls with C stdio functions is usually a bad idea. Stdio uses internal buffers instead of always making a system call on every function call, so things can appear out of order, or a read can be waiting for user input when there's already data in the stdio buffer for stdin.
Your loop is broken in several ways, too. You seem to be trying to call printf with the 32-bit calling convention (args on the stack).
Even in 32-bit code, this is broken, because printf's return vale is in eax. So your loop is infinite, because printf returns the number of characters printed. That's at least two from the %d\n format string, so dec rax / jnz will always jump.
In the x86-64 SysV ABI, you need to zero al before calling printf (with xor eax,eax), if you didn't pass any FP args in XMM registers. You also have to pass args in rdi, rsi, ..., like for scanf.
You also add rsp, 8 after pushing two 8-byte values, so the stack grows forever. (But you never return, so the eventual segfault will be on stack overflow, not on trying to return with rsp not pointing to the return address.)
Decide whether you're making 32-bit or 64-bit code, and only copy/paste from examples for the mode and OS you're targeting. (Note that 64-bit code can and often does use mostly 32-bit registers, though.)
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) (which does include a NASM section with a handy asm-link script that assembles and links into a static binary). But since you're writing main instead of _start and are using libc functions, you should just link with gcc -m32 (if you decide to use 32-bit code instead of replacing the 32-bit parts of your program with 64-bit function-calling and system-call conventions).
See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

push/pop segmentation fault in simple multiplication function

my teacher is doing a crash course in assembly with us, and I have no experience in it whatsoever. I am supposed to write a simple function that takes four variables and calculates (x+y)-(z+a) and then prints out the answer. I know it's a simple problem, but after hours of research I am getting no where, any push in the right direction would be very helpful! I do need to use the stack, as I have more things to add to the program once I get past this point, and will have a lot of variables to store. I am compiling using nasm and gcc, in linux. (x86 64)
(side question, my '3' isn't showing up in register r10, but I am in linux so this should be the correct register... any ideas?)
Here is my code so far:
global main
extern printf
segment .data
mulsub_str db "(%ld * %ld) - (%ld * %ld) = %ld",10,0
data dq 1, 2, 3, 4
segment .text
main:
call multiplyandsubtract
pop r9
mov rdi, mulsub_str
mov rsi, [data]
mov rdx, [data+8]
mov r10, [data+16]
mov r8, [data+24]
mov rax, 0
call printf
ret
multiplyandsubtract:
;;multiplies first function
mov rax, [data]
mov rdi, [data+8]
mul rdi
mov rbx, rdi
push rbx
;;multiplies second function
mov rax, [data+16]
mov rsi, [data+24]
mul rsi
mov rbx, rsi
push rbx
;;subtracts function 2 from function 1
pop rsi
pop rdi
sub rdi, rsi
push rdi
ret

push in the right direction
Nice pun!
Your problem is that you apparently don't seem to know that ret is using the stack for the return address. As such push rdi; ret will just go to the address in rdi and not return to your caller. Since that is unlikely to be a valid code address, you get a nice segfault.
To return values from functions just leave the result in a register, standard calling conventions normally use rax. Here is a possible version:
global main
extern printf
segment .data
mulsub_str db "(%ld * %ld) - (%ld * %ld) = %ld",10,0
data dq 1, 2, 3, 4
segment .text
main:
sub rsp, 8
call multiplyandsubtract
mov r9, rax
mov rdi, mulsub_str
mov rsi, [data]
mov rdx, [data+8]
mov r10, [data+16]
mov r8, [data+24]
mov rax, 0
call printf
add rsp, 8
ret
multiplyandsubtract:
;;multiplies first function
mov rax, [data]
mov rdi, [data+8]
mul rdi
mov rbx, rdi
push rbx
;;multiplies second function
mov rax, [data+16]
mov rsi, [data+24]
mul rsi
mov rbx, rsi
push rbx
;;subtracts function 2 from function 1
pop rsi
pop rdi
sub rdi, rsi
mov rax, rdi
ret
PS: notice I have also fixed the stack alignment as per the ABI. printf is known to be picky about that too.

To return more than 64b from subroutine (rax is not enough), you can optionally drop the whole standard ABI convention (or actually follow it, there's surely a well defined way how to return more than 64b from subroutines), and use other registers until you ran out of them.
And once you ran out of spare return registers (or when you desperately want to use stack memory), you can follow the way C++ compilers do:
SUB rsp,<return_data_size + alignment>
CALL subroutine
...
MOV al,[rsp + <offset>] ; to access some value from returned data
; <offset> = 0 to return_data_size-1, as defined by you when defining
; the memory layout for returned data structure
...
ADD rsp,<return_data_size + alignment> ; restore stack pointer
subroutine:
MOV al,<result_value_1>
MOV [rsp + 8 + <offset>],al ; store it into allocated stack space
; the +8 is there to jump beyond return address, which was pushed
; at stack by "CALL" instruction. If you will push more registers/data
; at the stack inside the subroutine, you will have either to recalculate
; all offsets in following code, or use 32b C-like function prologue:
PUSH rbp
MOV rbp,rsp
MOV [rbp + 16 + <offset>],al ; now all offsets are constant relative to rbp
... other code ...
; epilogue code restoring stack
MOV rsp,rbp ; optional, when you did use RSP and didn't restore it yet
POP rbp
RET
So during executing the instructions of subroutine, the stack memory layout is like this:
rsp -> current_top_of_stack (some temporary push/pop as needed)
+x ...
rbp -> original rbp value (if prologue/epilogue code was used)
+8 return address to caller
+16 allocated space for returning values
+16+return_data_size
... padding to have rsp correctly aligned by ABI requirements ...
+16+return_data_size+alignment
... other caller stack data or it's own stack frame/return address ...
I'm not going to check how ABI defines it, because I'm too lazy, plus I hope this answer is understandable for you to explain the principle, so you will recognize which way the ABI works and adjust...
Then again, I would highly recommend to use rather many shorter simpler subroutines returning only single value (in rax/eax/ax/al), whenever possible, try to follow the SRP (Single Responsibility Principle). The above way will force you to define some return-data-structure, which may be too much hassle, if it's just some temporary thing and can be split into single-value subroutines instead (if performance is endangered, then probably inlining the whole subroutine will outperform even the logic of grouped returned values and single CALL).

I'm getting a segmentation fault in my assembly program [duplicate]

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.
section .text
global _start ; Declare global entry oint for ld
_start:
jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack
code:
xor rax, rax ; Clean up the registers
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
; Try to change the N to a space
pop rsi ; Get address from stack
mov al, 0x20 ; Load 0x20 into RAX
mov [rsi], al; Why segfault?
xor rax, rax; Clear again
; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
mov al, 0x01 ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
mov rdi, rax ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
;pop rsi ; Second Parameter, RSI = Popped address of message from stack
mov dl, 25 ; Third Parameter, RDX = Length of message
syscall ; Call Write
; exit(rdi) = exit(return value)
xor rax, rax ; write returns # of bytes written in rax, need to clean it up again
add rax, 0x3C ; 64bit syscall exit is 0x3C
xor rdi, rdi ; Return value is in rdi (First parameter), zero it to return 0
syscall ; Call Exit
message:
call code ; Pushes the address of the string onto the stack
db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A
This culprit is this line:
mov [rsi], al; Why segfault?
If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?
The authors code is the following:
global _start
_start:
jmp short ender
starter:
pop ebx ;get the address of the string
xor eax, eax
mov al, 0x20
mov [ebx+7], al ;put a NULL where the N is in the string
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 25 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter
db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A
And I've compiled that using:
nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>
But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated

I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)
For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.
Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.
This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.
Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.
Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.
(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

Pointers in assembly language

I am trying to understand how to use pointer in assembly. By reading some tutorials around internel,I think had undertantood some concepts. But when I'II go to try it,it did work. Below some attempts to translate C to ASM.
C
const char *s = "foo";
unsigned z = *(unsigned*)s;
if(!(z & 0xFF))
do_something();
if(!(z & 0xFFFF))
do_b_something();
(here's not full code,but it's a word-check,thefore,there is more two stmts which checks 0xFF0000,0xF000000 respectivily.
ASM:
mov ebp,str
mov eax,ebp
mov eax,[eax]
and eax,0xFF
cmp eax,0
je etc
mov eax,[eax]
and eax,0xFFFF
cmp eax,0
je etc
It returns a seg fault.
And the try:
mov eax,dword ptr [eax]
that's generated by gcc compiler and you can see it in some other assemblies code,returns
invalid symbol
on FASM assembler. It isn't really supported by the FASM or am I missing something?

I think this is what you are attempting to do:
mov ebp,str
mov eax,ebp
mov ebx,[eax]
test ebx,0xFF
jz low_byte_empty
do_something:
; some code here...
low_byte_empty:
test ebx,0xFFFF
jz low_word_empty
do_b_something:
; some code here.
low_word_empty:
Explanation:
First, as JasonD already mentions in his answer, you are loading a pointer to eax, then doing a logical and to it, then you are using the result still in eax to address memory (some memory offset in the range 0x0 ... 0xFF).
So what goes wrong in your code: you can't keep in the same register both a pointer to a memory address and a value at the same time. So I chose to load the value from [eax] to ebx, you can also use some other 32-bit general register (ecx, edx, esi, edi) according to your needs.
Then, you don't need to use cmp to check if a register is empty, because all cmp does is that it does the subtraction and sets the flags. But ZF (zero flag) is already set by and, so cmp is absolutely unnecessary here. Then, as cmp is not needed here and we do not need the result either, we only want to update the flags, it's better to use test. test does exactly the same logical AND as and does, the only difference being that test does not store the result, it only updates the flags.

It's not at all clear what you're trying to do in the original code - doesn't look right.
However this:
mov eax,[eax]
and eax,0xFF
cmp eax,0
je etc
mov eax,[eax]
Isn't going to work. You're overwriting the contents of EAX with the value stored at the address in EAX, manipulating that value, and then trying to reload it after the branch without restoring the original pointer.

Following variant is simpler, smaller, faster and uses only one register.
mov eax, str
mov eax,[eax]
test al, al
jz low_byte_empty
do_something_byte:
; some code here...
low_byte_empty:
test ah, ah
jz low_word_empty
do_something_word:
; some code here
low_word_empty:

NASM x86_64 having trouble writing command line arguments, returning -14 in rax

I am using elf64 compilation and trying to take a parameter and write it out to the console.
I am calling the function as ./test wooop
After stepping through with gdb there seems to be no problem, everything is set up ok:
rax: 0x4
rbx: 0x1
rcx: pointing to string, x/6cb $rcx gives 'w' 'o' 'o' 'o' 'p' 0x0
rdx: 0x5 <---correctly determining length
after the int 80h rax contains -14 and nothing is printed to the console.
If I define a string in .data, it just works. gdb shows the value of $rcx in the same way.
Any ideas? here is my full source
%define LF 0Ah
%define stdout 1
%define sys_exit 1
%define sys_write 4
global _start
section .data
usagemsg: db "test {string}",LF,0
testmsg: db "wooop",0
section .text
_start:
pop rcx ;this is argc
cmp rcx, 2 ;one argument
jne usage
pop rcx
pop rcx ; argument now in rcx
test rcx,rcx
jz usage
;mov rcx, testmsg ;<-----uncomment this to print ok!
call print
jmp exit
usage:
mov rcx, usagemsg
call print
jmp exit
calclen:
push rdi
mov rdi, rcx
push rcx
xor rcx,rcx
not rcx
xor al,al
cld
repne scasb
not rcx
lea rdx, [rcx-1]
pop rcx
pop rdi
ret
print:
push rax
push rbx
push rdx
call calclen
mov rax, sys_write
mov rbx, stdout
int 80h
pop rdx
pop rbx
pop rax
ret
exit:
mov rax, sys_exit
mov rbx, 0
int 80h
Thanks
EDIT: After changing how I make my syscalls as below it works fine. Thanks all for your help!
sys_write is now 1
sys_exit is now 60
stdout now goes in rdi, not rbx
the string to write is now set in rsi, not rcx
int 80h is replaced by syscall

I'm still running 32-bit hardware, so this is a wild asmed guess! As you probably know, 64-bit system call numbers are completely different, and "syscall" is used instead of int 80h. However int 80h and 32-bit system call numbers can still be used, with 64-bit registers truncated to 32-bit. Your tests indicate that this works with addresses in .data, but with a "stack address", it returns -14 (-EFAULT - bad address). The only thing I can think of is that truncating rcx to ecx results in a "bad address" if it's on the stack. I don't know where the stack is in 64-bit code. Does this make sense?
I'd try it with "proper" 64-bit system call numbers and registers and "syscall", and see if that helps.
Best,
Frank

As you said, you're using ELF64 as the target of the compilation. This is, unfortunately, your first mistake. Using the "old" system call interface on Linux, e.g. int 80h is possible only when running 32-bit tasks. Obviously, you could simply assemble your source as ELF32, but then you're going to lose all the advantages if running tasks in 64-bit mode, namely the extra registers and 64-bit operations.
In order to make system calls in 64-bit tasks, the "new" system call interface must be used. The system call itself is done with the syscall instruction. The kernel destroys registers rcx and r11. The number of the system is specified in the register rax, while the arguments of the call are passed in rdi, rsi, rdx, r10, r8 and r9. Keep in mind that the numbers of the syscalls are different than the ones in 32-bit mode. You can find them in unistd_64.h, which is usually in /usr/include/asm or wherever your distribution stores it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string