Linux Assembly x86_64 create a file using command line parameters - linux

I'm trying to teach myself assembly. I've found a good website; however, everything is written for x86 and I use a 64-bit machine.
I know what the problem is, but I don't know how to fix it. If I run the program with strace, then here is the results:
execve("./file", ["./file", "hello"], [/* 94 vars */]) = 0
creat(NULL, 0) = -1 EINVAL (Invalid argument)
write(0, NULL, 0 <unfinished ...>
+++ exited with 234 +++
So, I know that when I call creat, that the file name "hello" is not being passed and as a result I don't have a file descriptor.
Here is the code in question:
section .text
global _start
_start:
pop rbx ; argc
pop rbx ; prog name
pop rbx ; the file name
mov eax,85 ; syscall number for creat()
mov ecx,00644Q ; rw,r,r
int 80h ; call the kernel
I know that I can use the syscall command; however, I want to use interrupt.
Any ideas or suggestions would be helpful. Also, I'm using nasm an assembler.

You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall instruction with the appropriate registers:
global _start
_start:
mov eax,85 ; syscall number for creat()
mov rdi,[rsp+16] ; argv[1], the file name
mov esi,00644Q ; rw,r,r
syscall ; call the kernel
xor edi, edi ; exit code 0
mov eax, 60 ; syscall number for exit()
syscall
See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.

Related

Confused about 64-bit registers - ASM

I'm currently learning assembly, I'm using Intel syntax on a 64bit ubuntu, using nasm.
So I found two websites that reference the syscalls numbers:
This one for 32 bit registers (eax, ebx, ...): https://syscalls.kernelgrok.com
This one for 64 bits registers (rax, rbx, ...): https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64
The thing is that my code doesn't work when I'm using the 64 bits syscall numbers, but it works when I replace the 'e' from the 32 bit registers by a 'r', so for instance in sys_write I use rbx to store the fd instead of rdi as and it works.
I'm quite lost right now. This code doesn't work:
message db 'Hello, World', 10
section .text
global _start
_start: mov rax,4
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
mov rax, 1
mov rdi, 0
syscall
Run strace ./my_program - you make a bogus stat system call, then write which succeeds, then fall off the end and segfault.
$ strace ./foo
execve("./foo", ["./foo"], 0x7ffe6b91aa00 /* 51 vars */) = 0
stat(0x1, 0x401000) = -1 EFAULT (Bad address)
write(0, "Hello, World\n", 13Hello, World
) = 13
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
It's not register names that are your problem, it's call numbers. You're using 32-bit call numbers but calling the 64-bit syscall ABI.
Call numbers and calling convention both differ.
int 0x80 system calls only ever look at the low 32 bits of registers which is why you shouldn't use them in 64-bit code.
The code you posted in a comment with mov rcx, message would work fine with mov ecx, message and so on, if it works with mov rcx, message. See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Note that writing a 32-bit register zero-extends into the full 64-bit register so you should always use mov edi, 1 instead of mov rdi, 1. (Although NASM will do this optimization for you to save code-size; they're so equivalent that some assemblers will silently do it for you.)

NASM: Two subsequent file writes not working

Trying to run this code so I could create bmp file - I write headline, then I want to write content to file - everything works separately but not together.
I'm using hexedit for checking file if it matters.
If I run the code with headline writing part it works.
If I run the code with content writing part it works.
When I run both of them it doesn't.
Any ideas?
Here's the code:
section .text
global _start
_start:
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov eax,8 ;system call number - open/create file
mov ebx,msg ;file name
mov ecx,111111111b ;file mode
int 0x80 ;call kernel
; save file descriptor to r8d
mov r8d, eax
; write headline to file
mov eax, 4 ;write 54 bytes to file
mov ebx, r8d ;load file desc
mov ecx, bmpheadline ;load adress of memory to write
mov edx, 54 ;load number of bytes
int 0x80 ;call kernel
; write content to file
mov eax, 4 ;number of syscall - write
mov ebx, r8d ;load file desc
;add ebx, 54 ;add 54 bytes to location of file location
mov ecx, empty_space ;load adress of buffer
mov edx, 40054 ;load number of bytes
int 0x80 ;call kernel
; close file
mov eax, 6 ;load syscall number - close
mov ebx, r8d ;load file desc
int 0x80 ;call kernel
; exit program
mov eax,1 ;syscall number - exit
int 0x80 ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40054
There are 2 significant problems with your code. R8D (R8) is not preserved across int 0x80. Secondly, the add ebx, 54 in your original question is incorrect. You don't need to change the file descriptor.
SYSCALL preferred for 64-bit code
int 0x80 is an IA32 compatibility feature in the Linux kernel. This feature is generally turned on in most 64-bit Linux kernels but it can be turned off. You can't use 64-bit pointers with int 0x80. This prevents using stack based addresses as parameters to int 0x80. For these reasons it is preferred that you use SYSCALL for 64-bit programs rather than int 0x80.
More on using SYSCALL in Linux can be found in Ryan Chapman's Blog . Note that the system call numbers used with SYSCALL are different from int 0x80. The registers used to pass parameters are different, and the only registers not preserved across a SYSCALL are RCX, R11, and RAX (RAX being the return value). The system calling convention is thoroughly described in the current 64-bit Linux System V ABI. In particular:
User-level applications use as integer registers for passing the sequence
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.
A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
The number of the syscall has to be passed in register %rax.
System-calls are limited to six arguments, no argument is passed directly on
the stack.
Returning from the syscall, register %rax contains the result of the
system-call. A value in the range between -4095 and -1 indicates an error,
it is -errno.
Only values of class INTEGER or class MEMORY are passed to the kernel
If you want your 64-bit code to work with INT 0x80
INT 0x80 has some quirks in 64-bit code. It adheres to the 32-bit calling convention of preserving RBX, RCX, RDX, RSI, RDI, and RBP. For the other 64-bit registers the 64-bit C calling convention applies. From the ABI:
A.2.1 Calling Conventions
... applications that like to call system calls should use the functions from the C library. The interface between the C library and the Linux kernel is the same as for the user-level applications
See Figure 3.4: Register Usage in the 64-bit Linux ABI linked to above. R12, R13, R14, and R15 will also be preserved.
This means that RAX, R8, R9, R10, and R11 will not be preserved. Change your code from using R8D to one of the registers that are saved. R12D for example.
Why does your code fail?
Since R8D is not preserved across int 0x80 it is being potentially overwritten by the SYS_WRITE system calls. The first write works, the second one doesn't because R8D was likely trashed by the first SYS_WRITE, and R8D likely became an invalid file descriptor. Using one of the registers that will be preserved should solve this issue. If you run out of registers you can always allocate space on the stack for temporary storage.
You add 54 to the file descriptor without explanation; I have absolutely no clue why you are doing that.
I suspect that you misunderstand file descriptors and believe that you need to add the total amount of data written so far to the descriptor. This is not so. The descriptor does not change from the time you open/create to the time that you close the file handle. It's a really good idea to verify that your comments are synced with your code. When you are writing detailed comments, lines with no comments become immediately suspect (the add instruction, for instance.)
You appear to have some issues from the very beginning. For example, your comments say "open file" and "sys_write" but your code doesn't match. What your code currently does is attempt to call sys_creat. What you are calling the file descriptor is actually the permissions mode. ebx should contain the address of the string representing the path... The comments seem to indicate it should be stdout, but it's clearly not. :)
You also don't state whether this is for 64 bit or 32 bit Linux. Your code seems to mix the two, using r8d and using int 0x80.
(Posted solution on behalf of the OP).
Here is the source code of solution, 64 bit version:
section .text
global _start ;must be declared for linker (ld)
_start: ;tell linker entry point
;#######################################################################
;### This program creates empty bmp file - 64 bit version ##############
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov rax,85 ;system call number - open/create file
mov rdi,msg ;file name
;flags
mov rsi,111111111b ;mode
syscall ;call kernel
; save file descriptor
mov r8, rax
; write headline to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, bmpheadline ;load adress of memory to write
mov rdx, 54 ;load number of bytes
syscall ;call kernel
; write content to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, empty_space ;load adress of memory to write
mov rdx, 40000 ;load number of bytes
syscall ;call kernel
; close file
mov rax, 3 ;load syscall number - close
mov rdi, r8 ;load file desc
syscall ;call kernel
; exit program
mov rax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
len equ $ - msg ;length of our dear string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40000
Makefile:
all: a.out
a.out: main.o
ld main.o
main.o: main64.asm
nasm -f elf64 main64.asm -o main.o

Why doesn't the 'syscall' instruction work under Linux?

I have a very basic assembly program that runs in Linux userland:
section .text
global _start
_start:
mov edx, 14
mov ecx, msg
mov ebx, 1
mov eax, 4
syscall
mov eax, 1
syscall
section .data
msg db "Hello, World!", 0xA
However, this doesn't work as it is, but only if I replace the syscalls with int 0x80. Don't these do the same thing? I know that syscall was designed to be lower-latency, but other than that, I didn't think there was a difference. Why doesn't it work?
syscall works only in x86-64 operating systems and you should put the system call number in rax register instead of eax.
See this website for more information.
The syscall instruction doesn't store "return RIP" or "return RSP" anywhere, so these are typically stored in registers in previous instructions before the syscall instruction is used.
I suspect that on Linux RCX and RDX are used for this purpose; and that all the other parameters end up in different registers because of this.

Assembly execve failure -14

Program writes executable placed in it's second segment on disk, decrypts it(into /tmp/decbd), and executes(as it was planned)
file decbd appears on disk, and can be executed via shell, last execve call return eax=-14, and after end of the program, execution flows on data and gets segfault.
http://pastebin.com/KywXTB0X
In second segment after compilation using hexdump and dd I manually placed echo binary encrypted via openssl, and when I stopped execution right before last int 0x80 command, I've already been able to run my "echo" in decbd, using another terminal.
You should have narrowed it down to a minimal example. See MCVE.
You should comment your code if you want other people to help.
You should learn to use the debugger and/or other tools.
For point #1, you could have gone down to:
section .text
global _start ;must be declared for linker (ld)
_start:
mov eax,11 ; execve syscall
mov ebx,program ; name of program
mov ecx,[esp+4] ; pointer to argument array
mov ebp,[esp] ; number of arguments
lea edx,[esp+4*ebp+2] ; pointer to environ array
int 0x80
section .data
program db '/bin/echo',0
For point #3, using the debugger you could have seen that:
ebx is okay
ebp is okay
ecx is wrong
edx is wrong
It's an easy fix. ecx should be loaded with the address, not the value and edx should be skipping 2 pointers which are 4 bytes each, so the offset should be 8 not 2. The fixed code could look like this:
section .text
global _start ;must be declared for linker (ld)
_start:
mov eax,11 ; execve syscall
mov ebx,program ; name of program
lea ecx,[esp+4] ; pointer to argument array
mov ebp,[esp] ; number of arguments
lea edx,[esp+4*ebp+8] ; pointer to environ array (skip argc and NULL)
int 0x80
section .data
program db '/bin/echo',0
man execve says this in the "ERRORS" section with regard to return code -14 (-EFAULT):
EFAULT filename points outside your accessible address space.
You passed a bad pointer to execve().

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

Edit: Title changed, as #Gunner pointed out that this is not a buffer overflow.
In reading user input from stdin with NR_read in Linux 64-bit Intel assembly, I wonder how can I avoid that the input that does not fit in the input buffer being sent to Linux shell eg. bash? For example in this example program I have defined an input buffer of 255 bytes (the size of the buffer can be whatever >= 1). The rest of an input longer than 255 bytes is sent to bash (if running from bash) and and this is obviously a serious vulnerability. How should input be read in Linux 64-bit assembly to avoid this vulnerability?
Here's my code:
[bits 64]
section .text
global _start
; can be compiled eg. with nasm or yasm.
; nasm:
; nasm -f elf64 read_stdin_64.asm; ld read_stdin_64.o -o read_stdin_64
; yasm:
; yasm -f elf64 -m amd64 read_stdin_64.asm -o read_stdin_64.o; ld read_stdin_64.o -o read_stdin_64
NR_read equ 0
NR_exit equ 60
STDIN equ 1
; input:
; rax number of syscall
; rdi parameter 1
; rsi parameter 2
; rdx parameter 3
; r10 parameter 4
; r8 parameter 5
; r9 parameter 6
;
; output:
; rax syscall's output
#do_syscall:
push rcx
push r11
syscall ; 64-bit syscall, overwrites rcx and r11
pop r11 ; syscall's return value in rax
pop rcx
ret
#read_stdin:
push rdi
push rsi
push rdx
mov rdi,STDIN ; file handle to read. STDIN = 1.
lea rsi,[input_buffer]
mov rdx,input_buffer_length ; length of string
mov rax,NR_read ; number of syscall (0)
call #do_syscall
sub rax,1 ; get the number of writable characters.
pop rdx
pop rsi
pop rdi
ret
_start: ; linker entry point
call #read_stdin
#end_program:
xor rdi,rdi
mov rax,NR_exit ; number of syscall (60)
syscall
section .data
input_buffer times 255 db 0
input_buffer_length equ $-input_buffer
It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
The read syscall already has that protection built in. One other thing though: You shouldn't be explicitly using syscall. What if your code is taken to an x86-64 machine (which uses sysenter)? You should be using Linux's VDSO (virtual dynamic shared object), which contains code to do syscalls on all architectures, regardless as to wheather they support syscall, sysenter, or only int.
You could read the input until newline character is found.

Resources