How are system calls interpreted in x86 assembly linux - linux

I am confused towards why/how a value gets printed in x86 assembly in a Linux environment.
For example if I wish to print a value I would do this:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx msgLength
int 80h
Now I understand the numerical value 4 will make the system call to sys_write after the interrupt. But my question is, what is the significance of the 4? Is it loading the address of the decimal value 4 into eax? Or is it loading the value 4 into the eax register?
I am confused after reading I can transfer the value at an address to a register using the following instruction:
mov eax, [msg]
eax will now contain the bytes at the address of msg, but I would guess this format is not acceptable:
mov eax, [4]
So what is really happening when I move 4 into eax to print something?

Simply the value (number) 4 is loaded into eax, no magic there. The operating system will look at the value in eax to figure out what function you want. System call number is a code that identifies the various available kernel functions you can use.

Linux kernel maintains all the system call routines as an array of function pointers (can be called as sys_call table) and the value in the eax gives the index to that array (which system call to choose) by the kernel. Other registers like ebx, ecx, edx contains the appropriate parameters for that system call routine.
And the int 80h is for software interrupt to the cpu from user mode to kernel mode because actual system call routine is kernel space function.

Related

Is it possible for a program to read itself?

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.
Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

Incrementing one to a variable in IA32 Linux Assembly

I'm trying to increment 1 to a variable in IA32 Assembly in Linux
section .data
num: dd 0x1
section .text
global _start
_start:
add dword [num], 1
mov edx, 1
mov ecx, [num]
mov ebx,1
mov eax,4
int 0x80
mov eax,1
int 0x80
Not sure if it's possible to do.
In another literature I saw the follow code:
mov eax, num
inc eax
mov num, eax
Is it possible to increment a value to a var without moving to a register?
If so, do I have any advantage moving the value to a register?
Is it possible to increment a value to a var without moving to a register?
Certainly: inc dword [num].
Like practically all x86 instructions, inc can take either a register or memory operand. See the instruction description at http://felixcloutier.com/x86/inc; the form inc r/m32 indicates that you can give an operand which is either a 32-bit register or 32-bit memory operand (effective address).
If you're interested in micro-optimizations, it turns out that add dword [num], 1 may still be somewhat faster, though one byte larger, on certain CPUs. The specifics are pretty complicated and you can find a very extensive discussion at INC instruction vs ADD 1: Does it matter?. This is partly related to the slight difference in effect between the two, which is that add will set or clear the carry flag according to whether a carry occurs, while inc always leaves the carry flag unchanged.
If so, do I have any advantage moving the value to a register?
No. That would make your code larger and probably slower.

Linux 64-abi, calling convention

I'm reading intel manual about calling convention and which register has which purpose. Here is what was specified in the Figure 3.4: Register Usage:
%rax temporary register; with variable arguments
passes information about the number of vector
registers used; 1st return register
But in linux api we use rax to pass the function number. Is it consistent with what was specified in the intel manual? Actually I expected that (according to the manual) we will pass the function number into rdi (it is used for the 1st argument). And so forth...
Can I use rax to pass the first function argument in my hand-written functions? E.g.
mov rax, [array_lenght_ptr]
mov rdi, array_start_ptr
callq _array_sum
That quote is talking about the function-calling convention, which is standardized by x86-64 System V ABI doc.
You're thinking of Linux's system-call calling convention, which is described in an appendix to the ABI doc, but that part isn't normative. Anyway, the system call ABI puts the call number in rax because it's not an arg to the system call. Alternatively, you can think of it like a 0th arg, the same way that variadic function calls pass the number of FP register-args in al. (Fun fact: that makes it possible for the caller to pass even the first FP arg on the stack if they want to.)
But more importantly, because call number in RAX makes a better ABI, and because of tradition: it's what the i386 system-call ABI does, too. And the i386 System V function-call ABI is totally different, using stack args exclusively.
This means system-call wrapper functions can just set eax and run syscall instead of needing to do something like
libc_write_wrapper_for_your_imagined_syscall_convention:
; copy all args to the next slot over
mov r10, rdx ; size_t count
mov rdx, rsi ; void *buf
mov esi, edi ; int fd
mov edi, 1 ; SYS_write
syscall
cmp rax, -4095
jae set_errno
ret
instead of
actual_libc_write_wrapper: ; glibc's actual code I think also checks for pthread cancellation points or something...
mov eax, 1 ; SYS_write
syscall
cmp rax, -4095
jae set_errno
ret
Note the use of r10 instead of rcx because syscall clobbers rcx and r11 with the saved RIP and RFLAGS, so it doesn't have to write any memory with return information, and doesn't force user-space to put it somewhere the kernel can read it (like 32-bit sysenter does).
So the system-calling convention couldn't be the same as the function-calling convention. (Or the function-calling convention would have had to choose different registers.)
For system calls with 4 or more args (or a generic wrapper that works for any system call) you do need a mov r10, rcx, but that's all. (Unlike the 32-bit convention where a wrapper has to load args from the stack, and save/restore ebx because the kernel's poorly-chosen ABI uses it for the first arg.)
Can I use rax to pass the first function argument in my hand-written functions?
Yes, do whatever you want for private helper functions that you don't need to call from C.
Choose arg registers to make things easier for the callers (or for the most important caller), or you'll be using any registers with fixed register choices (like div).
Note which registers are clobbered and which are preserved with a comment. Only bother to save/restore registers that your caller actually needs saving / restoring, and choose which tmp regs you use to minimize push/pop. Avoid a push/pop save/reload of registers that are part of a critical latency path in your caller, if your function is short.

How to limit the address space of 32bit application on 64bit Linux to 3GB?

Is it possible to make 64bit Linux loader to limit the address space of the loaded 32bit program to some upper limit?
Or to set some holes in the address space that to not be allocated by the kernel?
I mean for specific executable, not globally for all processes, neither through kernel configuration. Some code or ELF executable flags are examples of appropriate solution.
The limit should be forced for all loaded shared libraries as well.
Clarification:
The problem I want to fix is that my code uses the numbers above 0xc0000000 as a handle values and I want to clearly distinct between handle values and memory addresses, even when the memory addresses are allocated and returned by some third party library function.
As long as the address space in 64bit Linux is very close to 4G limit, there is no enough addressing space left for the handle values.
On the other hand 3GB or even less is far enough for all my needs.
OK, I found the answer of this question elsewhere.
The solution is to change the "personality" of your program to PER_LINUX32_3GB, using the Linux system call sys_personality.
But there is a problem. After switching to PER_LINUX32_3GB Linux kernel will not allocate space in the upper 1GB, but the already allocated space, for example the application stack, remains there.
The solution is to "restart" your program through sys_execve system call.
Here is the code where I packed everything in one:
proc ___SwitchLinuxTo3GB
begin
cmp esp, $c0000000
jb .finish ; the system is native 32bit
; check the current personality.
mov eax, sys_personality
mov ebx, -1
int $80
; and exit if it is what intended
test eax, ADDR_LIMIT_3GB
jnz .finish ; everything is OK.
; set the needed personality
mov eax, sys_personality
mov ebx, PER_LINUX32_3GB
int $80
; and restart the process
mov eax, [esp+4] ; argument count
mov ebx, [esp+8] ; the filename of the executable.
lea ecx, [esp+8] ; the arguments list.
lea edx, [ecx+4*eax+4] ; the environment list.
mov eax, sys_execve
int $80
; if something gone wrong, it comes here and stops!
int3
.finish:
return
endp

Shellcode with restrictions

For a task I need to create simple shellcode, but it is not allowed that it contains \x80.
Notice: To make a system call on linux, like write or exit, you need among others this line: int 0x80, which in the end will produce shellcode including \x80.
Nevertheless I need to make system calls, so my idea now is to use a variable for the interrupt vector number. For example 0x40 and then multiply it with 2, so in the end there will be a \x40 but not a \x80 in the shellcode.
The problem is that the int is not taking a variable as an argument, I tried this for a test:
section .data
nr db 0x80
section .text
global _start
_start:
xor eax, eax
inc eax
xor ebx, ebx
mov ebx, 0x1
int [nr]
And get
error: invalid combination of opcode and operands
How could I get my idea working? Or do you have a different solution for the problem?
PS. sysenter and syscall are not working -> Illegal instruction
I am using nasm on a x86-32bit machine.
maybe something like this, but never use it in serious code!
format ELF executable
use32
entry start
segment executable writeable
start:
;<some code>
inc byte [ here + 1 ] ;<or some other math>
jmp here
here:
int 0x7f
segment readable writeable
(this is fasm-code)

Resources