How do I run a binary in kernel mode? CPL = 0 - linux

I have a binary that does a tlb flush by
mov eax, cr3 mov cr3, eax
But that needs CPL 0, how do I set that?
I’m on Linux.
I tried finding an instruction to change CPL or a way to run it like su but came up empty.

Related

Confused about 64-bit registers - ASM

I'm currently learning assembly, I'm using Intel syntax on a 64bit ubuntu, using nasm.
So I found two websites that reference the syscalls numbers:
This one for 32 bit registers (eax, ebx, ...): https://syscalls.kernelgrok.com
This one for 64 bits registers (rax, rbx, ...): https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64
The thing is that my code doesn't work when I'm using the 64 bits syscall numbers, but it works when I replace the 'e' from the 32 bit registers by a 'r', so for instance in sys_write I use rbx to store the fd instead of rdi as and it works.
I'm quite lost right now. This code doesn't work:
message db 'Hello, World', 10
section .text
global _start
_start: mov rax,4
mov rdi, 1
mov rsi, message
mov rdx, 13
syscall
mov rax, 1
mov rdi, 0
syscall
Run strace ./my_program - you make a bogus stat system call, then write which succeeds, then fall off the end and segfault.
$ strace ./foo
execve("./foo", ["./foo"], 0x7ffe6b91aa00 /* 51 vars */) = 0
stat(0x1, 0x401000) = -1 EFAULT (Bad address)
write(0, "Hello, World\n", 13Hello, World
) = 13
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
It's not register names that are your problem, it's call numbers. You're using 32-bit call numbers but calling the 64-bit syscall ABI.
Call numbers and calling convention both differ.
int 0x80 system calls only ever look at the low 32 bits of registers which is why you shouldn't use them in 64-bit code.
The code you posted in a comment with mov rcx, message would work fine with mov ecx, message and so on, if it works with mov rcx, message. See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Note that writing a 32-bit register zero-extends into the full 64-bit register so you should always use mov edi, 1 instead of mov rdi, 1. (Although NASM will do this optimization for you to save code-size; they're so equivalent that some assemblers will silently do it for you.)

NASM: Two subsequent file writes not working

Trying to run this code so I could create bmp file - I write headline, then I want to write content to file - everything works separately but not together.
I'm using hexedit for checking file if it matters.
If I run the code with headline writing part it works.
If I run the code with content writing part it works.
When I run both of them it doesn't.
Any ideas?
Here's the code:
section .text
global _start
_start:
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov eax,8 ;system call number - open/create file
mov ebx,msg ;file name
mov ecx,111111111b ;file mode
int 0x80 ;call kernel
; save file descriptor to r8d
mov r8d, eax
; write headline to file
mov eax, 4 ;write 54 bytes to file
mov ebx, r8d ;load file desc
mov ecx, bmpheadline ;load adress of memory to write
mov edx, 54 ;load number of bytes
int 0x80 ;call kernel
; write content to file
mov eax, 4 ;number of syscall - write
mov ebx, r8d ;load file desc
;add ebx, 54 ;add 54 bytes to location of file location
mov ecx, empty_space ;load adress of buffer
mov edx, 40054 ;load number of bytes
int 0x80 ;call kernel
; close file
mov eax, 6 ;load syscall number - close
mov ebx, r8d ;load file desc
int 0x80 ;call kernel
; exit program
mov eax,1 ;syscall number - exit
int 0x80 ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40054
There are 2 significant problems with your code. R8D (R8) is not preserved across int 0x80. Secondly, the add ebx, 54 in your original question is incorrect. You don't need to change the file descriptor.
SYSCALL preferred for 64-bit code
int 0x80 is an IA32 compatibility feature in the Linux kernel. This feature is generally turned on in most 64-bit Linux kernels but it can be turned off. You can't use 64-bit pointers with int 0x80. This prevents using stack based addresses as parameters to int 0x80. For these reasons it is preferred that you use SYSCALL for 64-bit programs rather than int 0x80.
More on using SYSCALL in Linux can be found in Ryan Chapman's Blog . Note that the system call numbers used with SYSCALL are different from int 0x80. The registers used to pass parameters are different, and the only registers not preserved across a SYSCALL are RCX, R11, and RAX (RAX being the return value). The system calling convention is thoroughly described in the current 64-bit Linux System V ABI. In particular:
User-level applications use as integer registers for passing the sequence
%rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.
A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
The number of the syscall has to be passed in register %rax.
System-calls are limited to six arguments, no argument is passed directly on
the stack.
Returning from the syscall, register %rax contains the result of the
system-call. A value in the range between -4095 and -1 indicates an error,
it is -errno.
Only values of class INTEGER or class MEMORY are passed to the kernel
If you want your 64-bit code to work with INT 0x80
INT 0x80 has some quirks in 64-bit code. It adheres to the 32-bit calling convention of preserving RBX, RCX, RDX, RSI, RDI, and RBP. For the other 64-bit registers the 64-bit C calling convention applies. From the ABI:
A.2.1 Calling Conventions
... applications that like to call system calls should use the functions from the C library. The interface between the C library and the Linux kernel is the same as for the user-level applications
See Figure 3.4: Register Usage in the 64-bit Linux ABI linked to above. R12, R13, R14, and R15 will also be preserved.
This means that RAX, R8, R9, R10, and R11 will not be preserved. Change your code from using R8D to one of the registers that are saved. R12D for example.
Why does your code fail?
Since R8D is not preserved across int 0x80 it is being potentially overwritten by the SYS_WRITE system calls. The first write works, the second one doesn't because R8D was likely trashed by the first SYS_WRITE, and R8D likely became an invalid file descriptor. Using one of the registers that will be preserved should solve this issue. If you run out of registers you can always allocate space on the stack for temporary storage.
You add 54 to the file descriptor without explanation; I have absolutely no clue why you are doing that.
I suspect that you misunderstand file descriptors and believe that you need to add the total amount of data written so far to the descriptor. This is not so. The descriptor does not change from the time you open/create to the time that you close the file handle. It's a really good idea to verify that your comments are synced with your code. When you are writing detailed comments, lines with no comments become immediately suspect (the add instruction, for instance.)
You appear to have some issues from the very beginning. For example, your comments say "open file" and "sys_write" but your code doesn't match. What your code currently does is attempt to call sys_creat. What you are calling the file descriptor is actually the permissions mode. ebx should contain the address of the string representing the path... The comments seem to indicate it should be stdout, but it's clearly not. :)
You also don't state whether this is for 64 bit or 32 bit Linux. Your code seems to mix the two, using r8d and using int 0x80.
(Posted solution on behalf of the OP).
Here is the source code of solution, 64 bit version:
section .text
global _start ;must be declared for linker (ld)
_start: ;tell linker entry point
;#######################################################################
;### This program creates empty bmp file - 64 bit version ##############
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov rax,85 ;system call number - open/create file
mov rdi,msg ;file name
;flags
mov rsi,111111111b ;mode
syscall ;call kernel
; save file descriptor
mov r8, rax
; write headline to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, bmpheadline ;load adress of memory to write
mov rdx, 54 ;load number of bytes
syscall ;call kernel
; write content to file
mov rax, 1 ;write to file
mov rdi, r8 ;load file desc
mov rsi, empty_space ;load adress of memory to write
mov rdx, 40000 ;load number of bytes
syscall ;call kernel
; close file
mov rax, 3 ;load syscall number - close
mov rdi, r8 ;load file desc
syscall ;call kernel
; exit program
mov rax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
len equ $ - msg ;length of our dear string
bmpheadline db 0x42,0x4D,0xB6,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x7A,0x00,0x00,0x00,0x6C,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x13,0x0B,0x00,0x00,0x13,0x0B,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x42,0x47,0x52,0x73,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss
empty_space: resb 40000
Makefile:
all: a.out
a.out: main.o
ld main.o
main.o: main64.asm
nasm -f elf64 main64.asm -o main.o

Why doesn't the 'syscall' instruction work under Linux?

I have a very basic assembly program that runs in Linux userland:
section .text
global _start
_start:
mov edx, 14
mov ecx, msg
mov ebx, 1
mov eax, 4
syscall
mov eax, 1
syscall
section .data
msg db "Hello, World!", 0xA
However, this doesn't work as it is, but only if I replace the syscalls with int 0x80. Don't these do the same thing? I know that syscall was designed to be lower-latency, but other than that, I didn't think there was a difference. Why doesn't it work?
syscall works only in x86-64 operating systems and you should put the system call number in rax register instead of eax.
See this website for more information.
The syscall instruction doesn't store "return RIP" or "return RSP" anywhere, so these are typically stored in registers in previous instructions before the syscall instruction is used.
I suspect that on Linux RCX and RDX are used for this purpose; and that all the other parameters end up in different registers because of this.

Nasm segmentation fault on RET in _start

section .text
global _start
_start:
nop
main:
mov eax, 1
mov ebx, 2
xor eax, eax
ret
I compile with these commands:
nasm -f elf main.asm
ld -melf_i386 -o main main.o
When I run the code, Linux throw a segmentation fault error
(I am using Linux Mint Nadia 64 bits). Why this error is produced?
Because ret is NOT the proper way to exit a program in Linux, Windows, or Mac!!!!
_start is not a function, there is no return address on the stack because there is no user-space caller to return to. Execution in user-space started here (in a static executable), at the process entry point. (Or with dynamic linking, it jumped here after the dynamic linker finished, but same result).
On Linux / OS X, the stack pointer is pointing at argc on entry to _start (see the i386 or x86-64 System V ABI doc for more details on the process startup environment); the kernel puts command line args into user-space stack memory before starting user-space. (So if you do try to ret, EIP/RIP = argc = a small integer, not a valid address. If your debugger shows a fault at address 0x00000001 or something, that's why.)
For Windows it is ExitProcess and Linux is is system call -
int 80H using sys_exit, for x86 or using syscall using 60 for 64-bit or a call to exit from the C Library if you are linking to it.
32-bit Linux (i386)
%define SYS_exit 1 ; call number __NR_exit from <asm/unistd_32.h>
mov eax, SYS_exit ; use the NASM macro we defined earlier
xor ebx, ebx ; ebx = 0 exit status
int 80H ; _exit(0)
64-bit Linux (amd64)
mov rax, 60 ; SYS_exit aka __NR_exit from asm/unistd_64.h
xor rdi, rdi ; edi = 0 first arg to 64-bit system calls
syscall ; _exit(0)
(In GAS you can actually #include <sys/syscall.h> or <asm/unistd.h> to get the right numbers for the mode you're assembling a .S for, but NASM can't easily use the C preprocessor.
See Polygot include file for nasm/yasm and C for hints.)
32-bit Windows (x86)
push 0
call ExitProcess
Or Windows/Linux linking against the C Library
; pass an int exit_status as appropriate for the calling convention
; push 0 / xor edi,edi / xor ecx,ecx
call exit
(Or for 32-bit x86 Windows, call _exit, because C names get prepended with an underscore, unlike in x86-64 Windows. The POSIX _exit function would be call __exit, if Windows had one.)
Windows x64's calling convention includes shadow space which the caller has to reserve, but exit isn't going to return so it's ok to let it step on that space above its return address. Also, 16-byte stack alignment is required by the calling convention before call exit except for 32-bit Windows, but often won't actually crash for a simple function like exit().
call exit (unlike a raw exit system call or libc _exit) will flush stdio buffers first. If you used printf from _start, use exit to make sure all output is printed before you exit, even if stdout is redirected to a file (making stdout full-buffered, not line-buffered).
It's generally recommended that if you use libc functions, you write a main function and link with gcc so it's called by the normal CRT start functions which you can ret to.
See also
Syscall implementation of exit()
How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?
Defining main as something that _start falls through into doesn't make it special, it's just confusing to use a main label if it's not like a C main function called by a _start that's prepared to exit after main returns.

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

Edit: Title changed, as #Gunner pointed out that this is not a buffer overflow.
In reading user input from stdin with NR_read in Linux 64-bit Intel assembly, I wonder how can I avoid that the input that does not fit in the input buffer being sent to Linux shell eg. bash? For example in this example program I have defined an input buffer of 255 bytes (the size of the buffer can be whatever >= 1). The rest of an input longer than 255 bytes is sent to bash (if running from bash) and and this is obviously a serious vulnerability. How should input be read in Linux 64-bit assembly to avoid this vulnerability?
Here's my code:
[bits 64]
section .text
global _start
; can be compiled eg. with nasm or yasm.
; nasm:
; nasm -f elf64 read_stdin_64.asm; ld read_stdin_64.o -o read_stdin_64
; yasm:
; yasm -f elf64 -m amd64 read_stdin_64.asm -o read_stdin_64.o; ld read_stdin_64.o -o read_stdin_64
NR_read equ 0
NR_exit equ 60
STDIN equ 1
; input:
; rax number of syscall
; rdi parameter 1
; rsi parameter 2
; rdx parameter 3
; r10 parameter 4
; r8 parameter 5
; r9 parameter 6
;
; output:
; rax syscall's output
#do_syscall:
push rcx
push r11
syscall ; 64-bit syscall, overwrites rcx and r11
pop r11 ; syscall's return value in rax
pop rcx
ret
#read_stdin:
push rdi
push rsi
push rdx
mov rdi,STDIN ; file handle to read. STDIN = 1.
lea rsi,[input_buffer]
mov rdx,input_buffer_length ; length of string
mov rax,NR_read ; number of syscall (0)
call #do_syscall
sub rax,1 ; get the number of writable characters.
pop rdx
pop rsi
pop rdi
ret
_start: ; linker entry point
call #read_stdin
#end_program:
xor rdi,rdi
mov rax,NR_exit ; number of syscall (60)
syscall
section .data
input_buffer times 255 db 0
input_buffer_length equ $-input_buffer
It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
The read syscall already has that protection built in. One other thing though: You shouldn't be explicitly using syscall. What if your code is taken to an x86-64 machine (which uses sysenter)? You should be using Linux's VDSO (virtual dynamic shared object), which contains code to do syscalls on all architectures, regardless as to wheather they support syscall, sysenter, or only int.
You could read the input until newline character is found.

Resources