Detouring and GCC inline assembly (Linux) - linux

I'm programming extensions for a game which offers an API for (us) modders. This API offers a wide variety of things, but it has one limitation. The API is for the 'engine' only, which means that all modifications (mods) that has been released based on the engine, does not offer/have any sort of (mod specific) API. I have created a 'signature scanner' (note: my plugin is loaded as a shared library, compiled with -share & -fPIC) which finds the functions of interest (which is easy since I'm on linux). So to explain, I'll take a specific case: I have found the address to a function of interest, its function header is very simpleint * InstallRules(void);. It takes a nothing (void) and returns an integer pointer (to an object of my interest). Now, what I want to do, is to create a detour (and remember that I have the start address of the function), to my own function, which I would like to behave something like this:
void MyInstallRules(void)
{
if(PreHook() == block) // <-- First a 'pre' hook which can block the function
return;
int * val = InstallRules(); // <-- Call original function
PostHook(val); // <-- Call post hook, if interest of original functions return value
}
Now here's the deal; I have no experience what so ever about function hooking, and I only have a thin knowledge of inline assembly (AT&T only). The pre-made detour packages on the Internet is only for windows or is using a whole other method (i.e preloads a dll to override the orignal one). So basically; what should I do to get on track? Should I read about call conventions (cdecl in this case) and learn about inline assembly, or what to do? The best would probably be a already functional wrapper class for linux detouring. In the end, I would like something as simple as this:
void * addressToFunction = SigScanner.FindBySig("Signature_ASfs&43"); // I've already done this part
void * original = PatchFunc(addressToFunction, addressToNewFunction); // This replaces the original function with a hook to mine, but returns a pointer to the original function (relocated ofcourse)
// I might wait for my hook to be called or whatever
// ....
// And then unpatch the patched function (optional)
UnpatchFunc(addressToFunction, addressToNewFunction);
I understand that I won't be able to get a completely satisfying answer here, but I would more than appreciate some help with the directions to take, because I am on thin ice here... I have read about detouring but there is barely any documentation at all (specifically for linux), and I guess I want to implement what's known as a 'trampoline' but I can't seem to find a way how to acquire this knowledge.
NOTE: I'm also interested in _thiscall, but from what I've read that isn't so hard to call with GNU calling convention(?)

Is this project to develop a "framework" that will allow others to hook different functions in different binaries? Or is it just that you need to hook this specific program that you have?
First, let's suppose you want the second thing, you just have a function in a binary that you want to hook, programmatically and reliably. The main problem with doing this universally is that doing this reliably is a very tough game, but if you are willing to make some compromises, then it's definitely doable. Also let's assume this is x86 thing.
If you want to hook a function, there are several options how to do it. What Detours does is inline patching. They have a nice overview of how it works in a Research PDF document. The basic idea is that you have a function, e.g.
00E32BCE /$ 8BFF MOV EDI,EDI
00E32BD0 |. 55 PUSH EBP
00E32BD1 |. 8BEC MOV EBP,ESP
00E32BD3 |. 83EC 10 SUB ESP,10
00E32BD6 |. A1 9849E300 MOV EAX,DWORD PTR DS:[E34998]
...
...
Now you replace the beginning of the function with a CALL or JMP to your function and save the original bytes that you overwrote with the patch somewhere:
00E32BCE /$ E9 XXXXXXXX JMP MyHook
00E32BD3 |. 83EC 10 SUB ESP,10
00E32BD6 |. A1 9849E300 MOV EAX,DWORD PTR DS:[E34998]
(Note that I overwrote 5 bytes.) Now your function gets called with the same parameters and same calling convention as the original function. If your function wants to call the original one (but it doesn't have to), you create a "trampoline", that 1) runs the original instructions that were overwritten 2) jmps to the rest of the original function:
Trampoline:
MOV EDI,EDI
PUSH EBP
MOV EBP,ESP
JMP 00E32BD3
And that's it, you just need to construct the trampoline function in runtime by emitting processor instructions. The hard part of this process is to get it working reliably, for any function, for any calling convention and for different OS/platforms. One of the issues is that if the 5 bytes that you want to overwrite ends in a middle of an instruction. To detect "ends of instructions" you would basically need to include a disassembler, because there can be any instruction at the beginning of the function. Or when the function is itself shorter than 5 bytes (a function that always returns 0 can be written as XOR EAX,EAX; RETN which is just 3 bytes).
Most current compilers/assemblers produce a 5-byte long function prolog, exactly for this purpose, hooking. See that MOV EDI, EDI? If you wonder, "why the hell do they move edi to edi? that doesn't do anything!?" you are absolutely correct, but this is the purpose of the prolog, to be exactly 5-bytes long (not ending in a middle of an instruction). Note that the disassembly example is not something I made up, it's calc.exe on Windows Vista.
The rest of the hook implementation is just technical details, but they can bring you many hours of pain, because that's the hardest part. Also the behaviour you described in your question:
void MyInstallRules(void)
{
if(PreHook() == block) // <-- First a 'pre' hook which can block the function
return;
int * val = InstallRules(); // <-- Call original function
PostHook(val); // <-- Call post hook, if interest of original functions return value
}
seems worse than what I described (and what Detours does), for example you might want to "not call the original" but return some different value. Or call the original function twice. Instead, let your hook handler decide whether and where it will call the original function. Also then you don't need two handler functions for a hook.
If you don't have enough knowledge about the technologies you need for this (mostly assembly), or don't know how to do the hooking, I suggest you study what Detours does. Hook your own binary and take a debugger (OllyDbg for example) to see at assembly level what it exactly did, what instructions were placed and where. Also this tutorial might come in handy.
Anyway, if your task is to hook some functions in a specific program, then this is doable and if you have any trouble, just ask here again. Basically you can do a lot of assumptions (like the function prologs or used conventions) that will make your task much easier.
If you want to create some reliable hooking framework, then still is a completely different story and you should first begin by creating simple hooks for some simple apps.
Also note that this technique is not OS specific, it's the same on all x86 platforms, it will work on both Linux and Windows. What is OS specific is that you will probably have to change memory protection of the code ("unlock" it, so you can write to it), which is done with mprotect on Linux and with VirtualProtect on Windows. Also the calling conventions are different, that that's what you can solve by using the correct syntax in your compiler.
Another trouble is "DLL injection" (on Linux it will probably be called "shared library injection" but the term DLL injection is widely known). You need to put your code (that performs the hook) into the program. My suggestion is that if it's possible, just use LD_PRELOAD environment variable, in which you can specify a library that will be loaded into the program just before it's run. This has been described in SO many times, like here: What is the LD_PRELOAD trick?. If you must do this in runtime, I'm afraid you will need to get with gdb or ptrace, which in my opinion is quite hard (at least the ptrace thing) to do. However you can read for example this article on codeproject or this ptrace tutorial.
I also found some nice resources:
SourceHook project, but it seems it's only for virtual functions in C++, but you can always take a look at its source code
this forum thread giving a simple 10-line function to do this "inline hook" that I described
this a little more complex code in a forum
here on SO is some example
Also one other point: This "inline patching" is not the only way to do this. There are even simpler ways, e.g. if the function is virtual or if it's a library exported function, you can skip all the assembly/disassembly/JMP thing and simply replace the pointer to that function (either in the table of virtual functions or in the exported symbols table).

Related

Clarion 6.3 DLL, *CSTRING parameter exporte function - adds an invisible parameter?

I need to negotiate a function call, from my Delphi app, into provided DLL made in Clarion 6.3.
I need to pass one or two string parameters (either one functon wit htwo params or two single-params functions).
We quickly settled on using 1-byte 0-ended strings (char* in C terms, CSTRING in Clarion terms, PAnsiChar in Delphi terms), and that is where things got a bit unpredictable and hard too understand.
The working solution we got was passing untyped pointers disguised as 32-bit integers, which Clarion-made DLL then uses to traverse memory with something Clarion programmer called "pick" or maybe "peek". There are also forum articles on interop between Clarion and Visual Basic which address passing strings from VB into Clarion and glancing upon which from behind my shoulder the Clarion developer said something like "i don't need copy of it, i already know it, it is typical".
This however puts more burden on us long-term, as low-level untyped code is much "richer" on boilerplate and error-prone. Typed code would feel better solution.
What i seek here is less of "That is the pattern to copy-paste and make things work without thinking" - we already have it - and more of understanding, what is going on behind the hood, and how can i rely on it, and what should i expect from Clarion DLLs. To avoid getting eventually stuck in "works by chance" solution.
As i was glancing into Clarion 6.3 help from behind his shoulder, the help was not helful on low-level details. It was all about calling DLLs from Clarion, but not about being called. I also don't have Clarion on my machine, not do i want to, ahem, borrow it. And as i've been told sources of Clarion 6.3 runtime are not available to developers either.
Articles like interop between Clarion and VB or between Clarion and C# are not helpful, because they fuse idiosyncrasies of both languages and give yet less information about "bare metal" level.
Google Books pointed to "Clarion Tips & Techniques - David Harms" - and it seem to have interesting insights for Clarion seasoned ones, but i am Clarion zero. At least i was not able to figure out low-level interop-enabling details from it.
Is there maybe a way to make Clarion 6.3 save 'listing files' for the DLLs it make, a standard *.H header file maybe?
So, to repeat, what works, as expected was a function that was passing pointers on Delphi side ( procedure ...(const param1, param2: PAnsiChar); stdcall; which should translate to C stdcall void ...(char* p1, char* p2) and which allegedly look in Clarion something like (LONG, LONG), LONG, pascal, RAW.
This function takes two 32-bit parameters from stack in reverse order, uses them, and exits, passing return value (actually, unused garbage) in EAX register and clearing parameters from stack. Almost exactly stdcall, except that it seems to preserve EBX register for some obscure reason.
Clarion function entry:
04E5D38C 83EC04 sub esp,$04 ' allocate local vars
04E5D38F 53 push ebx ' ????????
04E5D390 8B44240C mov eax,[esp+$0c]
04E5D394 BBB4DDEB04 mov ebx,$04ebddb4
04E5D399 B907010000 mov ecx,$00000107
04E5D39E E889A1FBFF call $04e1752c ' clear off local vars before use
And its exit
00B8D500 8B442406 mov eax,[esp+$06] ' pick return value
00B8D504 5B pop ebx ' ????
00B8D505 83C41C add esp,$1c ' remove local vars
00B8D508 C20800 ret $0008 ' remove two 32-bits params from stack
Except for unexplainable for me manipulation with EBX and returning garbage result - it works as expected. But - untyped low-level operations in Clarion sources required.
Now the function that allegedly only takes one string parameter: on Delphi side - procedure ...(const param1: PAnsiChar); stdcall; which should translate to C stdcall void ...(char* p1) and which allegedly look in Clarion something like (*CSTRING), LONG, pascal, RAW.
Clarion function entry:
00B8D47C 83EC1C sub esp,$1c ' allocate local vars
00B8D47F 53 push ebx ' ????????
00B8D480 8D44240A lea eax,[esp+$0a]
00B8D484 BB16000000 mov ebx,$00000016
00B8D489 B990FEBD00 mov ecx,$00bdfe90
00B8D48E BA15000000 mov edx,$00000015
00B8D493 E82002FBFF call $00b3d6b8 ' clear off local vars before use
And its exit
04E5D492 8B442404 mov eax,[esp+$04] ' pick return value
04E5D496 5B pop ebx ' ????
04E5D497 83C404 add esp,$04 ' remove local vars
04E5D49A C20800 ret $0008 ' remove TWO 32-bits params from stack
What strucks here is that somehow TWO parameters are expected by the function, and only the second one is used (i did not see any reference to the first parameter in the x86 asm code). The function seems to work fine, if being called as procedure ...(const garbage: integer; const param1: PAnsiChar); stdcall; which should translate to C stdcall void ...(int garbage, char* p1).
This "invisible" parameter would look much like a Self/This pointer in object-oriented languages method functions, but the Clarion programmer told me with certainty there was no any objects involved. More so, his 'double-int' function does not seem expect invisible parameter either.
The aforementioned 'Tips' book describes &CSTRING and &STRING Clarion types as actually being two parameters behind the hood, pointer to the buffer and the buffer length. It however gives no information upon how specifically they are passed on stack though. But i was said Clation refused to make a DLL with exported &CSTRING-parametrized function.
I could suppose the invisible parameter is where Clarion wants to store function's return value (if there would had been any assignment to it in Clarion sources), crossing stdcall/PASCAL convention, but the assembler epilogue code shows clear use of EAX register for that, and again the 'double-LONG' function does not use it.
And, so, while i made the "works on my machine" quality code, that successfully calls that Clarion function, by voluntarily inserted a garbage-parameter - i feel rather fuzzy, because i can not understand what and why Clarion is doing there, and hence, what it can suddenly start doing in future after any seemingly unrelated changes.
What is that invisible parameter? Why can it happen there? What to expect from it?
If you are consuming a DLL from Clarion you can prototype with RAW - but procedures in a Clarion DLL cannot do this.
So in the Clarion DLL they can prototype as
Whatever Procedure(*Cstring parm1, *Cstring parm2),C,name('whatever')
And, as you note, from your side you should see this as 4 parameters, length, pointer, length, pointer. (knowing explicit max lengths is not a bad thing from a safety point of view anyway.)
the alternative is
Whatever Procedure(Long parm1, Long parm2),C,name('whatever')
Then from your side it's just 2 addresses.
But there's a bit more code on his side turning those incoming addresses into memory pointers. (yes, he can use PEEK and POKE but that's a bit of overkill)
(From memory he could just declare local variables as
parm1String &cstring,over(parm1)
parm2String &cstring,over(parm2)
but it's been decades since I did this, so I'm not 100% that syntax is legal.)

how can I call a system call in freebsd?

I created a syscall same as /usr/share/examples/kld/syscall/module/syscall.c with a little change in message.
I used kldload and module loaded. now I want to call the syscall.
what is this syscall number so I can call it?
or what is the way to call this syscall?
I suggest you take a look at Designing BSD rootkits, that's how I learned kernel programming on FreeBSD, there's even a section that talks all about making your own syscalls.
Well, if you check /usr/share/examples/kld/syscall directory you will see it contains a test program..... but hey, let's assume the program is not there.
Let's take a look at part of the module itself:
/*
* The offset in sysent where the syscall is allocated.
*/
static int offset = NO_SYSCALL;
[..]
case MOD_LOAD :
printf("syscall loaded at %d\n", offset);
break;
The module prints syscall number on load, so the job now is to learn how to call it... a 'freebsd call syscall' search on google...
Reveals: http://www.freebsd.cz/doc/en/books/developers-handbook/x86-system-calls.html (although arguably not something to use on amd64) and.. https://www.freebsd.org/cgi/man.cgi?query=syscall&sektion=2 - a manual page for a function which allows you to call arbitrary syscalls.
I strongly suggest you do some digging on your own. If you don't, there is absolutely no way you will be able to write any kernel code.

Assembly: Why does jumping to a label that returns via ret cause a segmentation fault?

Linux Assembly Tutorial states:
there is one very important thing to remember: If you are planning to return from a procedure (with the RET instruction), don't jump to it! As in "never!" Doing that will cause a segmentation fault on Linux (which is OK – all your program does is terminate), but in DOS it may blow up in your face with various degrees of terribleness.
But I cannot understand why does it causes a segmentation fault. it sounds just like returning from a function.
I have a situation where I need to implement the logic "If X happens, call procedure A. Otherwise, call procedure B." Is there any other way than jumping around like a kangaroo weaving spaghetti code?
Because CALL pushes the current instruction address onto the stack, and RET pulls it off in order to get back to the call-site. JMP (and related instructions) don't push anything onto the stack.
I think that this advice may have to do with the pipeline, but I'm not sure.
I believe that the question you are asking is:
... subroutine entrypoint ...
... various instructions in a routine ...
jmp label
... move instructions in a routine...
label:
ret
What's the problem, if any, with this? First, I'm not sure that this is a problem at all. But if it is, it's the pipeline. On some processors, one or more instructions after the jmp will be executed before control moves to the label.
Mostly, I fear that you've misunderstood what you've read, or I've misunderstood what you've written. jmp-ing from one point in your subroutine to the ret instruction should be fine. jmp-ing instead of executing ret is, as other people pointed out, is a dumb idea.

Good references for the syscalls

I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.
The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.
If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int

Simple polymorphic engine

I have to program a simple polymorphic engine. I use linux (32-bit) and i can code in assembly and c. I don't know how to start.
Can you give me a schema for constructing such an engine? My idea is to make a program that:
read the code section of a file
encrypts it in a buffer,
make space at the beginning (is it possible?) to add the decrypt routine
write the new buffer inside the code section of the program.
Is that right? Does it reflect the operation of such an engine?
The basic schema is quite different from that you've described. Usually only the virus body is encrypted, an not the whole code section. Consider a simple virus that either extends code section, either creates a new one for its body. Now, to make it polymorphic, you have to add encryption and make the decryptor code to be non-constant, eg:
1) insert nops randomly (nop, add reg, 0, push reg; pop reg, etc)
2) change the program flow with ( jmp next, clc; jc next, etc)
3) use instructions with the same arithmetic effect (add eax, 3 -> add eax, 9; sub eax, 6)
Polymorphic means that it could have a fixed number of encodings, so the simpliest way to create one is to break the decryptor code in several blocks, and provide several encoding with the same length for each.
EDIT: Yes, it's a part of the virus body. In order to use it you put all these "bricks" in the virus body, and when another file is infected, you create a random version of the decriptor for it.

Resources