Good references for the syscalls - linux

I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.

The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.

If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int

Related

Global Offset Table: "Pointers to Pointers"? Is this handled by the loader?

This question is about Linux (Ubuntu) executables.
I'll detail things as I understand them to make it clearer if anything's off (so please correct me where applicable):
The GOT acts an extra level of indirection to enable accessing data from a text section which needs to be position-independent, for instance because the text section might be readonly and the actual addresses of the data may be unknown at (static) linking time.
The GOT then holds addresses to the actual data's location, which is known at loading time, and so the dynamic linker (which is invoked by the loader) is able to modify the appropriate GOT entries and make them point to the actual data.
The main thing that confuses me – not the only one at the moment, mind you :-) – is that this means the addresses in the text section now point to a value "of a different type":
If there was no GOT, I'd have expected this address (for instance in a RIP-relative addressing mode) to point to the actual value I'm after. With a GOT, though, I expect it to point to the appropriate GOT entry, which in turn holds the address to the value I'm after. In this case, there's an extra "dereferencing" required here.
Am I somehow misunderstanding this? If I use RIP-relative addressing, shouldn't the computed address (RIP+offset) be the actual address used in the instruction? So (in AT&T syntax):
mov $fun_data(%rip), %rax
To my understanding, without GOT, this should be "rax = *(rip + (fun_data - rip))", or in short: rax = *fun_data.
With GOT, however, I expect this to be equivalent to rax = **fun_data, since *fun_data is just the GOT entry to the real fun_data.
Am I wrong about this, or is it just that the loader somehow knows to access the real data if the pointer is into the GOT? (In other words: that in a PIE, I suppose, some pointers effectively become pointers-to-pointers?)
Am I wrong about this
No.
or is it just that the loader somehow knows to access the real data if the pointer is into the GOT?
The compiler knows that double dereference is required.
Compile this source with and without -fPIC and observe for yourself:
extern int dddd;
int fn() { return dddd; }
Without -fPIC, you get (expected):
movl dddd(%rip), %eax
With -fPIC you get "double dereference":
movq dddd#GOTPCREL(%rip), %rax # move pointer to dddd into RAX
movl (%rax), %eax # dereference it

save the number of bytes read from file [duplicate]

When I try to research about return values of system calls of the kernel, I find tables that describe them and what do I need to put in the different registers to let them work. However, I don't find any documentation where it states what is that return value I get from the system call. I'm just finding in different places that what I receive will be in the EAX register.
TutorialsPoint:
The result is usually returned in the EAX register.
Assembly Language Step-By-Step: Programming with Linux book by Jeff Duntemann states many times in his programs:
Look at sys_read's return value in EAX
Copy sys_read return value for safe keeping
Any of the websites I have don't explain about this return value. Is there any Internet source? Or can someone explain me about this values?
See also this excellent LWN article about system calls which assumes C knowledge.
Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.
sys_read means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read system call is a kernel function called sys_read(). You can't call it with a call instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read" to distinguish it from the libc function call. However, it's ok to say read even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.
Also note that syscall.h defines constants like SYS_read with the actual system call number, or asm/unistd.h for the Linux __NR_read names for the same constants. (The value you put in EAX before an int 0x80 or syscall instruction).
Linux system call return values (in EAX/RAX on x86) are either "normal" success, or a -errno code for error. e.g. -EFAULT if you pass an invalid pointer. This behaviour is documented in the syscalls(2) man page.
-1 to -4095 means error, anything else means success. See AOSP non-obvious syscall() implementation for more details on this -4095UL .. -1UL range, which is portable across architectures on Linux, and applies to every system call. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linus's don't-break-userspace policy of keeping kernel ABIs stable.)
For example, glibc's generic syscall(2) wrapper function uses this sequence: cmp rax, -4095 / jae SYSCALL_ERROR_LABEL, which is guaranteed to be future-proof for all Linux system calls.
You can use that wrapper function to make any system call, like syscall( __NR_mmap, ... ). (Or use an inline-asm wrapper header like https://github.com/linux-on-ibm-z/linux-syscall-support/blob/master/linux_syscall_support.h that has safe inline-asm for multiple ISAs, avoiding problems like missing "memory" clobbers that some other inline-asm wrappers have.)
Interesting cases include getpriority where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.
For mmap, if you wanted you could also detect error just by checking that the return value isn't page-aligned (e.g. any non-zero bits in the low 11, for a 4k page size), if that would be more efficient than checking p > -4096ULL.
To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #defined. See my answer on a question about that for details. e.g. in asm-generic/errno-base.h / asm-generic/errno.h.
The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read is the raw system call that the glibc read() function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.
RETURN VALUE
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.
On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)
changes.
Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX if the raw system call's return value is negative, so errno=EFAULT and return -1 if the raw system call returned -EFAULT.
And there's a whole section listing all the possible error codes that read() is allowed to return, and what they mean specifically for read(). (POSIX standardizes most of this behaviour.)

The difference between `entry_SYSCALL64_slow_path` and `entry_SYSCALL64_fast_path`

We know that system call will call the function entry_SYSCALL_64 in entry_64.S. When I read the source code, I find there are two different types of call after the prepartion of registers, one is entry_SYSCALL64_slow_path and the other is entry_SYSCALL64_fast_path. Can you tell the difference between the two functions?
Upon entry in entry_SYSCALL_64 Linux will:
Swap gs to get per-cpu parameters.
Set the stack from the parameters of above.
Disable the IRQs.
Create a partial pt_regs structure on the stack. This saves the caller context.
If the current task has _TIF_WORK_SYSCALL_ENTRY or _TIF_ALLWORK_MASK set, it goes to the slow path.
Enter the fast path otherwise.
_TIF_WORK_SYSCALL_ENTRY is defined here with a comment stating:
/*
* work to do in syscall_trace_enter(). Also includes TIF_NOHZ for
* enter_from_user_mode()
*/
_TIF_ALLWORK_MASK does not seems to be defined for x86, a definition for MIPS is here with a comment stating:
/* work to do on any return to u-space */
Fast path
Linux will:
Enable the IRQs.
Check if the syscall number is out of range (note the pt_regs struct was already created with ENOSYS for the value of rax).
Dispatch to the system call with an indirect jump.
Save the return value (rax) of the syscall into rax in the pt_regs on the stack.
Check again if _TIF_ALLWORK_MASK is set for the current task, if it is it will jump to the slow return path.
Restore the caller context and issue a sysret.
Slow return path
Save the registers not saved before in pt_regs (rbx, rbp, r12-r15).
Call syscall_return_slowpath, defined here.
Note that point 2 will end up calling trace_sys_exit.
Slow path
Save the registers not saved before in pt_regs (see above)
Call do_syscall_64, defined here.
Point 2 will call syscall_trace_enter.
So the slow vs fast path has to do with ptrace. I haven't dug into the code but I suppose the whole machinery is skipped if ptrace is not needed for the caller.
This is indeed an important optimization.

Forcing MSVC to generate FIST instructions with the /QIfist option

I'm using the /QIfist compiler switch regularly, which causes the compiler to generate FISTP instructions to round floating point values to integers, instead of calling the _ftol helper function.
How can I make it use FIST(P) DWORD, instead of QWORD?
FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.
FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.
I don't understand what you are trying to say here.
The FIST and FISTP instructions differ from each other in exactly two ways:
FISTP pops the top value off of the floating point stack, while FIST does not. This is the obvious difference, and is reflected in the opcode naming: FISTP has that P suffix, which means "pop", just like ADDP, etc.
FISTP has an additional encoding that works with 64-bit (QWORD) operands. That means you can use FISTP to convert a floating point value to a 64-bit integer. FIST, on the other hand, maxes out at 32-bit (DWORD) operands.
(I don't think there's a technical reason for this. I certainly can't imagine it is related to the popping behavior. I assume that when the Intel engineers added support for 64-bit operands some time later, they figured there was no reason for a non-popping version. They were probably running out of opcode encodings.)
There are lots of online references for the x86 instruction set. For example, this site is the top hit for most Google searches. Or you can look in Intel's manuals (FIST/FISTP are on p. 365).
Where the two instructions read the value from, and where they store it to, is exactly the same. Both read the value from the top of the floating point stack, and both store the result to memory.
There would be absolutely no advantage to the compiler using FIST instead of FISTP. Remember that you always have to pop all values off of the floating point stack when exiting from a function, so if FIST is used, you'd have to follow it by an additional FSTP instruction. That might not be any slower, but it would needlessly inflate the code.
Besides, there's another reason that the compiler prefers FISTP: the support for 64-bit operands. It allows the code generator to be identical, regardless of what size integer you're rounding to.
The only time you might prefer to use FIST is if you're hand-writing assembly code and want to re-use the floating point value on the stack after rounding it. The compiler doesn't need to do that.
So anyway, all of that to say that the answer to your question is no. The compiler can't be made to generate FIST instructions automatically. If you're still insistent, you can write inline assembly that uses whatever instructions you want:
int32 RoundToNearestEven(float value)
{
int32 result;
__asm
{
fld DWORD PTR value
fist DWORD PTR result
// do something with the value on the floating point stack...
//
// ... but be sure to pop it off before returning
fstp st(0)
}
return result;
}

Detouring and GCC inline assembly (Linux)

I'm programming extensions for a game which offers an API for (us) modders. This API offers a wide variety of things, but it has one limitation. The API is for the 'engine' only, which means that all modifications (mods) that has been released based on the engine, does not offer/have any sort of (mod specific) API. I have created a 'signature scanner' (note: my plugin is loaded as a shared library, compiled with -share & -fPIC) which finds the functions of interest (which is easy since I'm on linux). So to explain, I'll take a specific case: I have found the address to a function of interest, its function header is very simpleint * InstallRules(void);. It takes a nothing (void) and returns an integer pointer (to an object of my interest). Now, what I want to do, is to create a detour (and remember that I have the start address of the function), to my own function, which I would like to behave something like this:
void MyInstallRules(void)
{
if(PreHook() == block) // <-- First a 'pre' hook which can block the function
return;
int * val = InstallRules(); // <-- Call original function
PostHook(val); // <-- Call post hook, if interest of original functions return value
}
Now here's the deal; I have no experience what so ever about function hooking, and I only have a thin knowledge of inline assembly (AT&T only). The pre-made detour packages on the Internet is only for windows or is using a whole other method (i.e preloads a dll to override the orignal one). So basically; what should I do to get on track? Should I read about call conventions (cdecl in this case) and learn about inline assembly, or what to do? The best would probably be a already functional wrapper class for linux detouring. In the end, I would like something as simple as this:
void * addressToFunction = SigScanner.FindBySig("Signature_ASfs&43"); // I've already done this part
void * original = PatchFunc(addressToFunction, addressToNewFunction); // This replaces the original function with a hook to mine, but returns a pointer to the original function (relocated ofcourse)
// I might wait for my hook to be called or whatever
// ....
// And then unpatch the patched function (optional)
UnpatchFunc(addressToFunction, addressToNewFunction);
I understand that I won't be able to get a completely satisfying answer here, but I would more than appreciate some help with the directions to take, because I am on thin ice here... I have read about detouring but there is barely any documentation at all (specifically for linux), and I guess I want to implement what's known as a 'trampoline' but I can't seem to find a way how to acquire this knowledge.
NOTE: I'm also interested in _thiscall, but from what I've read that isn't so hard to call with GNU calling convention(?)
Is this project to develop a "framework" that will allow others to hook different functions in different binaries? Or is it just that you need to hook this specific program that you have?
First, let's suppose you want the second thing, you just have a function in a binary that you want to hook, programmatically and reliably. The main problem with doing this universally is that doing this reliably is a very tough game, but if you are willing to make some compromises, then it's definitely doable. Also let's assume this is x86 thing.
If you want to hook a function, there are several options how to do it. What Detours does is inline patching. They have a nice overview of how it works in a Research PDF document. The basic idea is that you have a function, e.g.
00E32BCE /$ 8BFF MOV EDI,EDI
00E32BD0 |. 55 PUSH EBP
00E32BD1 |. 8BEC MOV EBP,ESP
00E32BD3 |. 83EC 10 SUB ESP,10
00E32BD6 |. A1 9849E300 MOV EAX,DWORD PTR DS:[E34998]
...
...
Now you replace the beginning of the function with a CALL or JMP to your function and save the original bytes that you overwrote with the patch somewhere:
00E32BCE /$ E9 XXXXXXXX JMP MyHook
00E32BD3 |. 83EC 10 SUB ESP,10
00E32BD6 |. A1 9849E300 MOV EAX,DWORD PTR DS:[E34998]
(Note that I overwrote 5 bytes.) Now your function gets called with the same parameters and same calling convention as the original function. If your function wants to call the original one (but it doesn't have to), you create a "trampoline", that 1) runs the original instructions that were overwritten 2) jmps to the rest of the original function:
Trampoline:
MOV EDI,EDI
PUSH EBP
MOV EBP,ESP
JMP 00E32BD3
And that's it, you just need to construct the trampoline function in runtime by emitting processor instructions. The hard part of this process is to get it working reliably, for any function, for any calling convention and for different OS/platforms. One of the issues is that if the 5 bytes that you want to overwrite ends in a middle of an instruction. To detect "ends of instructions" you would basically need to include a disassembler, because there can be any instruction at the beginning of the function. Or when the function is itself shorter than 5 bytes (a function that always returns 0 can be written as XOR EAX,EAX; RETN which is just 3 bytes).
Most current compilers/assemblers produce a 5-byte long function prolog, exactly for this purpose, hooking. See that MOV EDI, EDI? If you wonder, "why the hell do they move edi to edi? that doesn't do anything!?" you are absolutely correct, but this is the purpose of the prolog, to be exactly 5-bytes long (not ending in a middle of an instruction). Note that the disassembly example is not something I made up, it's calc.exe on Windows Vista.
The rest of the hook implementation is just technical details, but they can bring you many hours of pain, because that's the hardest part. Also the behaviour you described in your question:
void MyInstallRules(void)
{
if(PreHook() == block) // <-- First a 'pre' hook which can block the function
return;
int * val = InstallRules(); // <-- Call original function
PostHook(val); // <-- Call post hook, if interest of original functions return value
}
seems worse than what I described (and what Detours does), for example you might want to "not call the original" but return some different value. Or call the original function twice. Instead, let your hook handler decide whether and where it will call the original function. Also then you don't need two handler functions for a hook.
If you don't have enough knowledge about the technologies you need for this (mostly assembly), or don't know how to do the hooking, I suggest you study what Detours does. Hook your own binary and take a debugger (OllyDbg for example) to see at assembly level what it exactly did, what instructions were placed and where. Also this tutorial might come in handy.
Anyway, if your task is to hook some functions in a specific program, then this is doable and if you have any trouble, just ask here again. Basically you can do a lot of assumptions (like the function prologs or used conventions) that will make your task much easier.
If you want to create some reliable hooking framework, then still is a completely different story and you should first begin by creating simple hooks for some simple apps.
Also note that this technique is not OS specific, it's the same on all x86 platforms, it will work on both Linux and Windows. What is OS specific is that you will probably have to change memory protection of the code ("unlock" it, so you can write to it), which is done with mprotect on Linux and with VirtualProtect on Windows. Also the calling conventions are different, that that's what you can solve by using the correct syntax in your compiler.
Another trouble is "DLL injection" (on Linux it will probably be called "shared library injection" but the term DLL injection is widely known). You need to put your code (that performs the hook) into the program. My suggestion is that if it's possible, just use LD_PRELOAD environment variable, in which you can specify a library that will be loaded into the program just before it's run. This has been described in SO many times, like here: What is the LD_PRELOAD trick?. If you must do this in runtime, I'm afraid you will need to get with gdb or ptrace, which in my opinion is quite hard (at least the ptrace thing) to do. However you can read for example this article on codeproject or this ptrace tutorial.
I also found some nice resources:
SourceHook project, but it seems it's only for virtual functions in C++, but you can always take a look at its source code
this forum thread giving a simple 10-line function to do this "inline hook" that I described
this a little more complex code in a forum
here on SO is some example
Also one other point: This "inline patching" is not the only way to do this. There are even simpler ways, e.g. if the function is virtual or if it's a library exported function, you can skip all the assembly/disassembly/JMP thing and simply replace the pointer to that function (either in the table of virtual functions or in the exported symbols table).

Resources