assembly subroutines get called twice without even being called from main - linux

I'm trying to define some subroutines that have calls to printf in them.
A very trivial example is as follows:
extern printf
LINUX equ 80H
EXIT equ 60
section .data
intfmt: db "%ld", 10, 0
segment .text
global main
main:
call os_return ; return to operating system
os_return:
mov rax, EXIT ; Linux system call 60 i.e. exit ()
mov rdi, 0 ; Error code 0 i.e. no errors
int LINUX ; Interrupt Linux kernel
test:
push rdi
push rsi
mov rsi, 10
mov rdi, intfmt
xor rax, rax
call printf
pop rdi
pop rsi
ret
Here test just has a call to printf that outputs the number 10 to the screen. I would not expect this to get called as I have no call to it.
However when compiling and running:
nasm -f elf64 test.asm
gcc -m64 -o test test.o
I get the output:
10
10
I'm totally baffled and wondered if someone could explain why this is happening?

int 80H invokes the 32-bit system call interface, which a) uses the 32-bit system call numbers and b) is intended for use by 32-bit code, not 64-bit code. Your code is actually performing a umask system call with random parameters.
For a 64-bit system call, use the syscall instruction instead:
...
os_return:
mov rax, EXIT ; Linux system call 60 i.e. exit ()
mov rdi, 0 ; Error code 0 i.e. no errors
syscall ; Interrupt Linux kernel
...

I would say that your call to exit is failing, so when it returns, it falls through to the test function, that prints the first 10.
Then when you return with ret you go back to the instruction just after the call os_return, that is, well os_return. The call to exit fails again and falls through to the test function again. But this time the ret returns from the main function and the program ends.
About why is the exit call failing, I cannot tell as I don't have a 64-bit system available. But you could disassemble the exit function from libc and see how it is done there. My guess is that the int LINUX interface is 32-bit only, as it exists only for historic compatibility, and 64-bit linux in not so old.

Related

Confused about 64-bit registers - ASM

I'm currently learning assembly, I'm using Intel syntax on a 64bit ubuntu, using nasm.
So I found two websites that reference the syscalls numbers:
This one for 32 bit registers (eax, ebx, ...): https://syscalls.kernelgrok.com
This one for 64 bits registers (rax, rbx, ...): https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64
The thing is that my code doesn't work when I'm using the 64 bits syscall numbers, but it works when I replace the 'e' from the 32 bit registers by a 'r', so for instance in sys_write I use rbx to store the fd instead of rdi as and it works.
I'm quite lost right now. This code doesn't work:
message db 'Hello, World', 10
section .text
global _start
_start: mov rax,4
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
mov rax, 1
mov rdi, 0
syscall
Run strace ./my_program - you make a bogus stat system call, then write which succeeds, then fall off the end and segfault.
$ strace ./foo
execve("./foo", ["./foo"], 0x7ffe6b91aa00 /* 51 vars */) = 0
stat(0x1, 0x401000) = -1 EFAULT (Bad address)
write(0, "Hello, World\n", 13Hello, World
) = 13
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
It's not register names that are your problem, it's call numbers. You're using 32-bit call numbers but calling the 64-bit syscall ABI.
Call numbers and calling convention both differ.
int 0x80 system calls only ever look at the low 32 bits of registers which is why you shouldn't use them in 64-bit code.
The code you posted in a comment with mov rcx, message would work fine with mov ecx, message and so on, if it works with mov rcx, message. See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Note that writing a 32-bit register zero-extends into the full 64-bit register so you should always use mov edi, 1 instead of mov rdi, 1. (Although NASM will do this optimization for you to save code-size; they're so equivalent that some assemblers will silently do it for you.)

calling printf from assembly language on 64bit and 32bit architecture using nasm

I want to call printf function from assembly language in linux.
i want to know the method for for 64 bit and 32 bit assembly language programs.
1) please tell me for two cases if i want to pass a 32 bit arguement and 64 bit arguement in printf with a string. how should i do it?
2) for x86 32 bit architecture if i want to do the same thing as in point 1.
please tell me the code. and let me know do i need to adjust the stack for both cases and do i just need to pass the arguements in registers?
Thanks alot
There are 2 ways to print a string with assembly language in Linux.
1) Use syscall for x64, or int 0x80 for x86. It's not printf, it's kernel routines. You can find more here (x86) and here (x64).
2) Use printf from glibc. I assume you are familiar with the structure of NASM program, so here is a nice x86 example from acm.mipt.ru:
global main
;Declare used libc functions
extern exit
extern puts
extern scanf
extern printf
section .text
main:
;Arguments are passed in reversed order via stack (for x86)
;For x64 first six arguments are passed in straight order
; via RDI, RSI, RDX, RCX, R8, R9 and other are passed via stack
;The result comes back in EAX/RAX
push dword msg
call puts
;After passing arguments via stack, you have to clear it to
; prevent segfault with add esp, 4 * (number of arguments)
add esp, 4
push dword a
push dword b
push dword msg1
call scanf
add esp, 12
;For x64 this scanf call will look like:
; mov rdi, msg1
; mov rsi, b
; mov rdx, a
; call scanf
mov eax, dword [a]
add eax, dword [b]
push eax
push dword msg2
call printf
add esp, 8
push dword 0
call exit
add esp, 4
ret
section .data
msg : db "An example of interfacing with GLIBC.",0xA,0
msg1 : db "%d%d",0
msg2 : db "%d", 0xA, 0
section .bss
a resd 1
b resd 1
You can assembly it with nasm -f elf32 -o foo.o foo.asm and link with gcc -m32 -o foo foo.o for x86. For x64 just replace elf32 with elf64 and -m32 with -m64. Note than you need gcc-multilib to build x86 programs on x64 system using gcc.

Linux Assembly x86_64 create a file using command line parameters

I'm trying to teach myself assembly. I've found a good website; however, everything is written for x86 and I use a 64-bit machine.
I know what the problem is, but I don't know how to fix it. If I run the program with strace, then here is the results:
execve("./file", ["./file", "hello"], [/* 94 vars */]) = 0
creat(NULL, 0) = -1 EINVAL (Invalid argument)
write(0, NULL, 0 <unfinished ...>
+++ exited with 234 +++
So, I know that when I call creat, that the file name "hello" is not being passed and as a result I don't have a file descriptor.
Here is the code in question:
section .text
global _start
_start:
pop rbx ; argc
pop rbx ; prog name
pop rbx ; the file name
mov eax,85 ; syscall number for creat()
mov ecx,00644Q ; rw,r,r
int 80h ; call the kernel
I know that I can use the syscall command; however, I want to use interrupt.
Any ideas or suggestions would be helpful. Also, I'm using nasm an assembler.
You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall instruction with the appropriate registers:
global _start
_start:
mov eax,85 ; syscall number for creat()
mov rdi,[rsp+16] ; argv[1], the file name
mov esi,00644Q ; rw,r,r
syscall ; call the kernel
xor edi, edi ; exit code 0
mov eax, 60 ; syscall number for exit()
syscall
See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.

Nasm segmentation fault on RET in _start

section .text
global _start
_start:
nop
main:
mov eax, 1
mov ebx, 2
xor eax, eax
ret
I compile with these commands:
nasm -f elf main.asm
ld -melf_i386 -o main main.o
When I run the code, Linux throw a segmentation fault error
(I am using Linux Mint Nadia 64 bits). Why this error is produced?
Because ret is NOT the proper way to exit a program in Linux, Windows, or Mac!!!!
_start is not a function, there is no return address on the stack because there is no user-space caller to return to. Execution in user-space started here (in a static executable), at the process entry point. (Or with dynamic linking, it jumped here after the dynamic linker finished, but same result).
On Linux / OS X, the stack pointer is pointing at argc on entry to _start (see the i386 or x86-64 System V ABI doc for more details on the process startup environment); the kernel puts command line args into user-space stack memory before starting user-space. (So if you do try to ret, EIP/RIP = argc = a small integer, not a valid address. If your debugger shows a fault at address 0x00000001 or something, that's why.)
For Windows it is ExitProcess and Linux is is system call -
int 80H using sys_exit, for x86 or using syscall using 60 for 64-bit or a call to exit from the C Library if you are linking to it.
32-bit Linux (i386)
%define SYS_exit 1 ; call number __NR_exit from <asm/unistd_32.h>
mov eax, SYS_exit ; use the NASM macro we defined earlier
xor ebx, ebx ; ebx = 0 exit status
int 80H ; _exit(0)
64-bit Linux (amd64)
mov rax, 60 ; SYS_exit aka __NR_exit from asm/unistd_64.h
xor rdi, rdi ; edi = 0 first arg to 64-bit system calls
syscall ; _exit(0)
(In GAS you can actually #include <sys/syscall.h> or <asm/unistd.h> to get the right numbers for the mode you're assembling a .S for, but NASM can't easily use the C preprocessor.
See Polygot include file for nasm/yasm and C for hints.)
32-bit Windows (x86)
push 0
call ExitProcess
Or Windows/Linux linking against the C Library
; pass an int exit_status as appropriate for the calling convention
; push 0 / xor edi,edi / xor ecx,ecx
call exit
(Or for 32-bit x86 Windows, call _exit, because C names get prepended with an underscore, unlike in x86-64 Windows. The POSIX _exit function would be call __exit, if Windows had one.)
Windows x64's calling convention includes shadow space which the caller has to reserve, but exit isn't going to return so it's ok to let it step on that space above its return address. Also, 16-byte stack alignment is required by the calling convention before call exit except for 32-bit Windows, but often won't actually crash for a simple function like exit().
call exit (unlike a raw exit system call or libc _exit) will flush stdio buffers first. If you used printf from _start, use exit to make sure all output is printed before you exit, even if stdout is redirected to a file (making stdout full-buffered, not line-buffered).
It's generally recommended that if you use libc functions, you write a main function and link with gcc so it's called by the normal CRT start functions which you can ret to.
See also
Syscall implementation of exit()
How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?
Defining main as something that _start falls through into doesn't make it special, it's just confusing to use a main label if it's not like a C main function called by a _start that's prepared to exit after main returns.

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

Edit: Title changed, as #Gunner pointed out that this is not a buffer overflow.
In reading user input from stdin with NR_read in Linux 64-bit Intel assembly, I wonder how can I avoid that the input that does not fit in the input buffer being sent to Linux shell eg. bash? For example in this example program I have defined an input buffer of 255 bytes (the size of the buffer can be whatever >= 1). The rest of an input longer than 255 bytes is sent to bash (if running from bash) and and this is obviously a serious vulnerability. How should input be read in Linux 64-bit assembly to avoid this vulnerability?
Here's my code:
[bits 64]
section .text
global _start
; can be compiled eg. with nasm or yasm.
; nasm:
; nasm -f elf64 read_stdin_64.asm; ld read_stdin_64.o -o read_stdin_64
; yasm:
; yasm -f elf64 -m amd64 read_stdin_64.asm -o read_stdin_64.o; ld read_stdin_64.o -o read_stdin_64
NR_read equ 0
NR_exit equ 60
STDIN equ 1
; input:
; rax number of syscall
; rdi parameter 1
; rsi parameter 2
; rdx parameter 3
; r10 parameter 4
; r8 parameter 5
; r9 parameter 6
;
; output:
; rax syscall's output
#do_syscall:
push rcx
push r11
syscall ; 64-bit syscall, overwrites rcx and r11
pop r11 ; syscall's return value in rax
pop rcx
ret
#read_stdin:
push rdi
push rsi
push rdx
mov rdi,STDIN ; file handle to read. STDIN = 1.
lea rsi,[input_buffer]
mov rdx,input_buffer_length ; length of string
mov rax,NR_read ; number of syscall (0)
call #do_syscall
sub rax,1 ; get the number of writable characters.
pop rdx
pop rsi
pop rdi
ret
_start: ; linker entry point
call #read_stdin
#end_program:
xor rdi,rdi
mov rax,NR_exit ; number of syscall (60)
syscall
section .data
input_buffer times 255 db 0
input_buffer_length equ $-input_buffer
It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
The read syscall already has that protection built in. One other thing though: You shouldn't be explicitly using syscall. What if your code is taken to an x86-64 machine (which uses sysenter)? You should be using Linux's VDSO (virtual dynamic shared object), which contains code to do syscalls on all architectures, regardless as to wheather they support syscall, sysenter, or only int.
You could read the input until newline character is found.

Resources