64 bit PE injection - Suspend, inject, resume (x64 equivalent to changing EAX?) - 64-bit

All,
I'm trying to figure this out by endlessly debugging applications, but I can't seem to find my answer.
In my 32bit PE injection I eventually change EAX with the new EntryPoint of the injected PE, then resume the thread. I've read that the kernel runs a call EAX at the end to get to the entrypoint(I did not see this when debugging applications, so no idea if that is really the case).
However, I can't seem to find if this is possible in x64 (Tried about all registers :)).
So all in all two questions:
Does the kernel actually call EAX, because I can't see that call when debugging
Is the same method usable of changing a register to get the new entrypoint to run in x64 or do I need to rely on e.g. CreateRemoteThread?
P.S.: I'm a security researcher :)

in x64 RCX register used as application defined entry point of thread. in x86 - EAX register used. and not kernel call this address. but kernel32.dll

Related

Injecting code into Windows process running under Wine in Linux

I'm trying to inject code into Windows process (a game) running under Wine in Linux (for learning purposes, of course). Not a pre-built DLL, but native code.
On Windows, we can use procedures like VirtualAllocEx and WriteProcessMemory to write code to the virtual memory of a foreign process and then create a thread in it via CreateRemoteThread, thus executing our code in the context of the foreign process.
This is a popular approach, which is often used by all sorts of cheats and trainers for games. The popular game hacking program Cheat Engine has its own API that does this trick.
To do the same on Linux, you need a lot more, much more complex manipulations: attach to the process using ptrace, swapping registers and making system calls to allocate memory, write code there, then create a thread using the clone system call that will execute our code. And it will be even better and safer if the thread created by clone creates another thread using pthread_create, and the last one will already execute our code.
In order for a thread created with clone to jump to the desired code, the address of this code is written to the end of the memory allocated for its stack, and the clone system call itself is performed using the syscall instruction, followed by the ret instruction. This combination of instructions is easily found in libc.so.
I found an example in C on Github on how to do it correctly and implemented my own program. I won't include its source code here because it's quite large. Let me just say that it works as expected with native Linux processes. Moreover, I can also inject code that calls Linux routines (for example, puts from libc.so) into the Wine process and see the correct result:
lea rdi, [text]
call <libc.so code vaddr + `puts` offset>
ret
text:
db 'puts called successfully!', 0
But when I try to call any Windows procedure in the thread created by pthread_create (be it any procedure from the game code or, for example, MessageBoxA from user32.dll), the thread hangs:
sub rsp, 32
xor rcx, rcx
lea rdx, [text]
lea r8, [caption]
mov r9, 0
call <user32.dll code vaddr + `MessageBoxA` offset>
add rsp, 32
ret
text:
db 'MessageBoxA called successfully!', 0
caption:
db 'Yaaay!', 0
But why is this happening? After all, Wine is not an emulator, but just an API and system call compatibility layer. Doesn't this mean that the code itself runs natively on the processor, just like the code of any Linux programs, and I should be able to interoperate between them?
I suspect that the problem may be, for example, in the stack incompatibility. I once heard that Wine uses some kind of trick to convert the stack between Windows and Linux, but I don't know what exactly that trick is and I'm too bad to find and understand it in the Wine source code.
Could you explain to me exactly why my idea of calling Windows procedures from a thread created by pthread_create does not work and how can I make it work?

Checking if an address is writable in x86 assembly

Using only (32 bits) x86 assembly, is it possible to check if an address is writable, without interacting with the operating system, and without risking a segfault?
The program will run on a Linux system in ring 3. I cannot use the "verw" instruction.
To give an example, I might want to check if the address 0x0804a000 is writable. But if I e.g. do "mov eax, 0x0804a000; mov [eax], eax" then, if the address is not writable, the program will segfault. The only other way I know is to e.g. call sys_read into the address and see if it fails, but this interacts with the operating system.
Is there a way to check if an address is writable given the constraints? If so, how?
If you have kernel privileges you could probably find that info in the MMU.
But if you don't, you simply do not have access to it and must use OS facilities.
If you mean not calling an OS function, then it is possible at least on Windows by using Structured Exception Handling. It is still OS specific of course, because you need to access the Windows TIB at the FS segment.

Change page permission using gdb

I was playing around with gdb and I'd like to set remove executable privilege from a particular page. How could I go about doing that? I don't need to be able to do that from within gdb, its just that I'd like to change the permission somehow(anything short of modifying the source code of the binary will do).
[EDIT]
Im looking for a solution that works for binaries that are not linked against libc.
Use mprotect(): http://linux.die.net/man/2/mprotect
You can call it from within gdb, something like call mprotect(addr, len, 3) where 3 is the numeric value of PROT_READ|PROT_WRITE (at least on my system).
Even if the binary is not using libc, it must execute at least one system call to do anything useful.
What you need to do then is to arrange for correct arguments to be on the stack or in registers (details differ between processors and OSes, and you haven't told us which one you are running on), and then jump (using GDB jump command) to the syscal instruction.
E.g. on Linux/x86_64, you would put 10 (__NR_mprotect) into %rax, addr into %rdi, len into %rsi, and 3 into %rdx. See documentation here.

Does eax always, and only, store the system call?

[I'm confused about the CPU registers and I haven't found any truly clear and coherent explanation of them across the whole internet. If anyone has a link to something useful I'd really appreciate it if you'd post it in a comment or answer.]
The primary reason I'm here now is because I have been looking at sample NASM programs in a [thus far vain] attempt to learn the language. The program always ends by placing a system call code in eax and then calling int 0x80 (which I would love if someone could explain as well). However, from what I understand, eax is a 32 bit register - why do you need 32 bits to store system calls (I'm sure there aren't 232 worth). Also, sometimes I see other values and strings moved into eax during the program itself. Does that mean eax only has a special use when you finally want to perform a system call but for the rest of the time you can do with it as you please?
All bits of eax are used because that's how the system call interface is implemented. It's true there aren't 232 system calls, not even 216. But that's how it is. It allows for easy extension of the set of the system calls. You don't need to think hard about it, just accept it as a fact and live on.
eax is a general purpose register and you can do with it anything you please. The fact that it's used to contain the system call ID is just an established convention and nothing else. eax is not anyhow forbidden for other uses.
The program always ends by placing a system call code in eax and then calling int 0x80 (which I would love if someone could explain as well).
This is because you're only looking at old 32-bit examples for Linux, and that is what the Linux developers felt like doing. There's no reason why they couldn't have used a different register, and not much reason they couldn't have used half a register (e.g. a ax instead of eax, or bx or ..). In a similar way, there's no reason they couldn't have used a call gate or a different interrupt number. Of course once Linux developers made their decision ("kernel will expect function number in EAX and use int 0x80") everything that calls their kernel has to comply with their decision; and they can't easily change their decision without breaking all existing software (but can, and did, support alternatives - e.g. adding support for sysenter and syscall when those instructions got invented, while ensuring that int 0x80 still works the same).
However, from what I understand, eax is a 32 bit register - why do you need 32 bits to store system calls (I'm sure there aren't 232 worth)
They didn't "need" 32-bits; but you can expect that the function number will (after a "is the value too big" sanity check) end up being used inside a call [table+eax*4] instruction to call the selected function, and because that uses 32-bit addressing it needs to use a 32-bit register. Using half (or a quarter) of a register would've involved zero extension (e.g. an extra and eax,0x0000FFFF or movzx eax,ax instruction) to convert the 16-bit value into a 32-bit value. It's also typically faster to use all 32 bits for other reasons (e.g. a mov ax,123 that sets the lowest 16 bits of EAX and leaves the highest 16 bits unchanged will depend on the previous value of the highest 16 bits, and that can cause a "dependency stall" in the CPU if it needs to wait until the previous value of EAX is known).
Does that mean eax only has a special use when you finally want to perform a system call but for the rest of the time you can do with it as you please?
It means that when you call someone else's code, you have to comply with someone else's calling conventions, regardless of what they are. This can mean using other registers (ebx, ecx, etc) for whatever purpose they decided, and can mean using a specific stack layout (e.g. pushing things onto stack in a specific order).
Note that there are various instructions that do expect specific registers to be used in a specific way - mul, div, stosd, movsd, loop, in, out, enter, leave, etc; and there are "rare special cases" for every general purpose register. Despite this; they are still "general purpose registers" because they are not "specific purpose registers" (like eip or flags, which can only be used for one specific purpose and can never be used for anything else).
eax is a general purpose register, you can put whatever you want in it. int 0x80 is the interrupt for a system call... that interrupt looks at the value in eax and calls that system routine.

Can we modify the int 0x80 routine?

How does linux 2.6 differ from 2.4?
Can we modify the source kernel?
Can we modify the int 0x80 service routine?
UPDATE:
1. the 0x80 handler is essentially the same between 2.4 and 2.6, although the function called from the handler is called by the 'syscall' instruction handler for x86-64 in 2.6.
2. the 0x80 handler can be modified like the rest of the kernel.
3. You won't break anything by modifying it, unless you remove backwards compatibility. E.g., you can add your own trace or backdoor if you feel so inclined. The other post that says you will break your libs and toolchain if you modify the handler is incorrect. If you break the dispatch algorithm, or modify the dispatch table incorrectly, then you will break things.
3a. As I originally posted, the best way to extend the 0x80 service is to extend the system call handler.
As the kernel source says:
What: The kernel syscall interface
Description:
This interface matches much of the POSIX interface and is based
on it and other Unix based interfaces. It will only be added to
over time, and not have things removed from it.
Note that this interface is different for every architecture
that Linux supports. Please see the architecture-specific
documentation for details on the syscall numbers that are to be
mapped to each syscall.
The system call table entries for i386 are in:
arch/i386/kernel/syscall_table.S
Note that the table is a sequence of pointers, so if you want to maintain a degree of forward compatibility with the kernel maintainers, you'd need to pad the table before placement of your pointer.
The syscall vector number is defined in irq_vectors.h
Then traps.c sets the address of the system_call function via set_system_gate, which places the entry into the interrupt descriptor table. The system_call function itself is in entry.S, and calls the requested pointer from the system call table.
There are a few housekeeping details, which you can see reading the code, but direct modification of the 0x80 interrupt handler is accomplished in entry.S inside the system_call function. In a more sane fashion, you can modify the system call table, inserting your own function without modifying the dispatch mechanism.
In fact, having read the 2.6 source, it says directly that int 0x80 and x86-64 syscall use the same code, so far. So you can make portable changes for x86-32 and x86-64.
END Update
The INT 0x80 method invokes the system call table handler. This matches register arguments to a call table, invoking kernel functions based on the contents of the EAX register. You can easily extend the system call table to add custom kernel API functions.
This may even work with the new syscall code on x86-64, as it uses the system call table, too.
If you alter the current system call table in any manner other than to extend it, you will break all dependent libraries and code, including libc, init, etc.
Here's the current Linux system call table: http://asm.sourceforge.net/syscall.html
It's an architectural overhaul. Everything has changed internally. SMP support is complete, the process scheduler is vastly improved, memory management got an overhaul, and many, many other things.
Yes. It's open-source software. If you do not have a copy of the source, you can get it from your vendor or from kernel.org.
Yes, but it's not advisable because it will break libc, it will break your baselayout, and it will break your toolchain if you change the sequence of existing syscalls, and nearly everything you might think you want to do should be done in userspace when at all possible.

Resources