Is it possible for a program to read itself? - linux

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.

Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

Related

Read file from a specific position in x86

Is it possible to start reading a file from a specific line or byte. Currently I use this code to read 4 bytes of a file:
section .data
filename db "file.txt", 0
section .bss
read_data resb 4
section .text
global _start
_start:
mov rax, SYS_OPEN
mov rdi, filename
mov rsi, O_RDONLY
mov rdx, 0
syscall
push rax
mov rdi, rax
mov rax, SYS_READ
mov rsi, read_data
mov rdx, 4
syscall
mov rax, SYS_CLOSE
pop rdi
syscall
This code always reads the first 4 bytes, but I want to start reading from other parts of the file, like the middle for example. What do I need to add or change?
A freshly-opened file descriptor starts at position = 0. If you keep reading from the same fd in a loop, you'll get successive chunks. (Use a larger buffer like 8kiB and loop over dwords in user-space, though, using the value that read returned as an upper limit! A system call is very expensive in CPU time.)
Is it possible to start reading a file from a specific line or byte.
Byte: yes
Line: no. In Unix/Linux, the kernel doesn't have an index of line-start byte offsets or any other line-oriented API. The line handling in stdio fgets for example is purely done in user-space. There have been some historical OSes with record-based files, but Unix files are flat arrays of bytes. (They can have holes, unwritten extents, and extended attributes... But the kernel APIs for the main file contents only operate with by byte offsets).
If you want to do lines, read a big block and loop forward until you've seen some number of newlines. If you're not there yet, read another block; repeat until you find the start and end of the line number you want, or you hit EOF. x86-64 can efficiently search 16 bytes at a time with pcmpeqb / pmovmskb / popcnt (popcnt requires SSE4.2 or the specific popcnt feature bit).
Or with just SSE2, or when optimizing for large blocks, with pcmpeqb / psadbw (against all-zero) to hsum bytes to qwords / paddd. Then check how many lines you went every so often with some scalar code. Or keep it simple and branch on finding the first newline in a SIMD vector.
Obviously the slow and simple option is a byte-at-a-time loop that counts '\n' characters - if you know how to do strchr with SSE2 it should be straightforward to vectorize that search using the above suggestions.
But if you only want some specific byte positions, you have two main options:
seek with lseek(2) before read(2) (see #Nicolae Natea's answer)
Use POSIX/Linux pread(2) to read from a specified offset, without moving the fd's file offset for future read calls. The Linux system call name is pread64 (__NR_pread64 equ 17 from asm/unistd_64.h)
ssize_t pread(int fd, void *buf, size_t count, off_t offset); The only difference from read is the offset arg, the 4th arg thus passed in R10 (not RCX like the user-space function calling convention). off_t is a 64-bit type simply passed in a single register in 64-bit code.
Other than the pread64 name in the .h, there's nothing special about the asm interface compared to the C interface, it follows the standard system-calling convention. (It exists since Linux 2.1.60 ; before that glibc's wrapper emulated it with lseek.)
There are other things you can do like mmap, or a preadv system call, but pread is most exactly what you're looking for if you have a known position you want to read from.
Before performing the read you should perform a lseek, so that the file position is updated.
so something along the lines:
mov rdi, rax ; fd
mov rax, SYS_LSEEK
mov rsi, <whatever offset you want>
mov rdx, 0 ; keep 0 if the offset should be from the begining of the file
syscall
note: RDI will still hold the same fd value after a syscall so you don't need extra save/restore for the fd across lseek / read / close.
Tip:
It might be easier to write the code in c and compile it with gcc -g -S -fverbose-asm -Og -c main.c and then look at main.s. (How to remove "noise" from GCC/clang assembly output?). But that will only show the compiler making calls to libc wrapper functions, unless you use inline system call macros like MUSL libc provides.

Can't understand assembly mov instruction between register and a variable

I am using NASM assembler on linux 64 bit.
There is something with variables and registers I can't understand.
I create a variable named "msg":
msg db "hello, world"
Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.
So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.
Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?
Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.
In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)
mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.
mov rsi, msg ; rsi = address of msg. Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1] ; rax = 'e' (upper 7 bytes zeroed)
mov edx, [msg+6] ; rdx = ' wor' (upper 4 bytes zeroed)
Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
and after writing 1 symbol it changes to the memory location of the next symbol?
No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.
Accessing registers doesn't magically modify them.
There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.
Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. The particular flavor may have more features than what I'm used to. In general, assembly deals with single byte/word entities where the size depends on the processor. I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from.
General statements about Assembly:
Assembly is just like a high level language, except you have to handle a lot more of the details. So if you're used to some operation in say C, you can start there and then break the operation down even further.
For instance, if you have declared two variables that you want to add, that's pretty easy in C:
x = a + b;
In assembly, you have to break that down further:
mov R1, a * get value from a into register R1
mov R2, b * get value from b into register R2
add R1,R2 * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x
Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with.
Comments on your specific question:
If you have a string of characters, then you would have to move each one individually using a loop of some sort. I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop.

How to limit the address space of 32bit application on 64bit Linux to 3GB?

Is it possible to make 64bit Linux loader to limit the address space of the loaded 32bit program to some upper limit?
Or to set some holes in the address space that to not be allocated by the kernel?
I mean for specific executable, not globally for all processes, neither through kernel configuration. Some code or ELF executable flags are examples of appropriate solution.
The limit should be forced for all loaded shared libraries as well.
Clarification:
The problem I want to fix is that my code uses the numbers above 0xc0000000 as a handle values and I want to clearly distinct between handle values and memory addresses, even when the memory addresses are allocated and returned by some third party library function.
As long as the address space in 64bit Linux is very close to 4G limit, there is no enough addressing space left for the handle values.
On the other hand 3GB or even less is far enough for all my needs.
OK, I found the answer of this question elsewhere.
The solution is to change the "personality" of your program to PER_LINUX32_3GB, using the Linux system call sys_personality.
But there is a problem. After switching to PER_LINUX32_3GB Linux kernel will not allocate space in the upper 1GB, but the already allocated space, for example the application stack, remains there.
The solution is to "restart" your program through sys_execve system call.
Here is the code where I packed everything in one:
proc ___SwitchLinuxTo3GB
begin
cmp esp, $c0000000
jb .finish ; the system is native 32bit
; check the current personality.
mov eax, sys_personality
mov ebx, -1
int $80
; and exit if it is what intended
test eax, ADDR_LIMIT_3GB
jnz .finish ; everything is OK.
; set the needed personality
mov eax, sys_personality
mov ebx, PER_LINUX32_3GB
int $80
; and restart the process
mov eax, [esp+4] ; argument count
mov ebx, [esp+8] ; the filename of the executable.
lea ecx, [esp+8] ; the arguments list.
lea edx, [ecx+4*eax+4] ; the environment list.
mov eax, sys_execve
int $80
; if something gone wrong, it comes here and stops!
int3
.finish:
return
endp

Shellcode with restrictions

For a task I need to create simple shellcode, but it is not allowed that it contains \x80.
Notice: To make a system call on linux, like write or exit, you need among others this line: int 0x80, which in the end will produce shellcode including \x80.
Nevertheless I need to make system calls, so my idea now is to use a variable for the interrupt vector number. For example 0x40 and then multiply it with 2, so in the end there will be a \x40 but not a \x80 in the shellcode.
The problem is that the int is not taking a variable as an argument, I tried this for a test:
section .data
nr db 0x80
section .text
global _start
_start:
xor eax, eax
inc eax
xor ebx, ebx
mov ebx, 0x1
int [nr]
And get
error: invalid combination of opcode and operands
How could I get my idea working? Or do you have a different solution for the problem?
PS. sysenter and syscall are not working -> Illegal instruction
I am using nasm on a x86-32bit machine.
maybe something like this, but never use it in serious code!
format ELF executable
use32
entry start
segment executable writeable
start:
;<some code>
inc byte [ here + 1 ] ;<or some other math>
jmp here
here:
int 0x7f
segment readable writeable
(this is fasm-code)

How are system calls interpreted in x86 assembly linux

I am confused towards why/how a value gets printed in x86 assembly in a Linux environment.
For example if I wish to print a value I would do this:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx msgLength
int 80h
Now I understand the numerical value 4 will make the system call to sys_write after the interrupt. But my question is, what is the significance of the 4? Is it loading the address of the decimal value 4 into eax? Or is it loading the value 4 into the eax register?
I am confused after reading I can transfer the value at an address to a register using the following instruction:
mov eax, [msg]
eax will now contain the bytes at the address of msg, but I would guess this format is not acceptable:
mov eax, [4]
So what is really happening when I move 4 into eax to print something?
Simply the value (number) 4 is loaded into eax, no magic there. The operating system will look at the value in eax to figure out what function you want. System call number is a code that identifies the various available kernel functions you can use.
Linux kernel maintains all the system call routines as an array of function pointers (can be called as sys_call table) and the value in the eax gives the index to that array (which system call to choose) by the kernel. Other registers like ebx, ecx, edx contains the appropriate parameters for that system call routine.
And the int 80h is for software interrupt to the cpu from user mode to kernel mode because actual system call routine is kernel space function.

Resources