How to find out what functions to intercept with LD_PRELOAD? - linux

I am trying to intercept all dynamically loaded functions that call syscall openat with a library comm.so using LD_PRELOAD mechanism.
Consider the following use of /sbin/depmod command:
#strace -f /sbin/depmod 3.10.0-693.17.1.el7.x86_64
(...)
openat(AT_FDCWD, "/lib/modules/3.10.0-693.17.1.el7.x86_64", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
I want to intercept the function that calls this openat syscall.
How to find out what is that function? openat, which may be an alias, and any other similar function, do not work - nothing gets intercepted.
I tried to use this command to find what dynamically loaded functions my command is using:
#readelf -p .dynstr /sbin/depmod
this prints out some .so libraries, so I used readelf on them recursively. At the end of the recursion, I have the following list of functions that have open and at in them:
openat
openat64
open_by_handle_at
__openat64_2
None of these work - they don't intercept the call returning the file descriptor 3.
OK, so how to find out what other function do I need to intercept? Do I have to go through all of the functions shown by readelf command, and recursively so, one by one (there is a lot of them)?

The openat system call (or any other one, see syscalls(2) for a list) could be called without using the openat function from the standard library; and it could be called from ld-linux(8) (which handles LD_PRELOAD). On my Debian/Sid system it looks that the dynamic linker /lib/ld-linux.so.2 is using openat system call (try for example strace /bin/true) and of course it uses its own open or openat function (not the one in libc.so).
Any system call could (in principle) be called by direct machine code (e.g. some appropriate SYSENTER machine instruction), or thru some indirect syscall(2) (and in both cases the openat C function won't be used). See perhaps the Linux Assembly HowTo for more and the Linux x86 ABI spec.
If you want to intercept all of them (including those done by ld-linux, which is weird), you need to use ptrace(2) with PTRACE_SYSCALL in a way similar to strace(1). You'll be able to get the program counter and the call stack at that point.
If you care about following files and file descriptors, consider also inotify(7) facilities.
If you use gdb (which can be painfully used on programs without DWARF debug information) you could use catch syscall (the way to use ptrace PTRACE_SYSCALL in gdb) to find out (and probably "break at") every raw system call.
BTW, it could be possible that some C standard libraries are implementing their open C function with openat system call (or using openat elsewhere). Check by studying the source code of your particular libc.so (probably GNU glibc, maybe musl-libc).

Related

Does executable file of C++ program contain object code of system calls also

We use Linux system calls like fork(), pthread(), signal() and so on in C or C++ programs and compile the program to generate executable file (a.out). Now my doubt is whether the file a.out contain the object code of all linux system calls used, or whether the executable contain only the calls to system functions and the system call functions are linked during runtime? Suppose if I move my a.out file to some other Linux operating system which implements system calls in different syntax and try to compile it there will it work?
My doubt is whether system call function definitions part of a.out file?
User space binaries don't contain implementations of system calls. That would mean that any user could inject any code into kernel and take over system.
Instead they need to switch to kernel mode, by using processor interrupt or special instruction. Then processor can execute system call implementation from the kernel.
User space library, such as libc, is usually used, which provides stubs, which convert arguments of a syscall to a proper protocol and trigger jump to kernel mode. It is usually linked dynamically, so these stubs also don't appear in executable file.

How to link kernel functions to user-space program?

I have a user-space program (Capstone). I would like to use it in FreeBSD kernel. I believe most of them have the same function name, semantics and arguments (in FreeBSD, the kernel printf is also named printf). First I built it as libcapstone.a library, and link my program with it. Since the include files are different between Linux user-space and FreeBSD kernel, the library cannot find symbols like sprintf, memset and vsnprintf. How could I make these symbols (from FreeBSD kernel) visible to libcapstone.a?
If directly including header files like <sys/systm.h> in Linux user-space source code, the errors would be like undefined type u_int or u_char, even if I add -D_BSD_SOURCE to CFLAGS.
Additionally, any other better ways to do this?
You also need ; take a look at kernel man pages, eg "man 9 printf". They list required includes at the top.
Note, however, that you're trying to do something really hard. Some basic functions (eg printf) might be there; others differ completely (eg malloc(9)), and most POSIX APIs are simply not there. You won't be able to use open(2), socket(2), or fork(2).

override libc open() library function

I have same overridden open() which is provided by glibc in my library & I have set LD_PRELOAD with my library first, so when the process calls open(), the open which is defined in my library gets called.
THE PROBLEM:- There are several other functions within glibc which calls open() once such example is getpt(), when getpt() calls open(), the open() which is defined in glibc gets called, how would I make getpt() to invoke the open() which is defined in my library().
Constraints: - I dont have the option of compiling glibc.
As correctly stated by tmcguire, the call from posix_openpt to __open is a call to internal symbol, and can not be interposed.
Effectively, glibc developers consider this call an implementation detail, that you have no business of changing.
I am looking at compile time solution
You can't have it.
than run time solution cause run time solution will have performance impact.
Runtime solution need not have any performance impact (besides the overhead of calling your open instead of glibcs).
I only know of one way for a library to interpose glibc internal calls: runtime patching. The idea is to
find the address of libc.so.6 open (which is an alias for __open),
locate the boundaries of glibc .text section at runtime
scan it for CALL __open instructions
for any such instruction
mprotect the page it's on to be writable
compute a new instruction that is CALL my_open and patch it "on top" of the original instruction
mprotect the page back to read and execute
This is ugly, but it works fine for i*86 (32-bit) Linux, where a CALL can "reach" any other instruction within the 4GB address space. It doesn't work for x86_64, where a CALL is still limited to +/- 2GB, but the distance from your library to glibc could be more than that.
In that case, you need to find a suitable trampoline within libc.so.6 to which you can redirect the original CALL, and into which you could place a register-indirect JMP to your final destination. Fortunately, libc.so.6 usually has multiple suitably-sized unused NOP regions due to function alignment.
I was able to solve it at compile time simply by defining the getpt() function within my library.
This solution is incomplete cause there could be other functions within glibc [other than getpt()] which could call open, then open call within glibc will be called.
I can live with this solution for now, but I would need to fix it completely in future.
I don't think you can do this with LD_PRELOAD.
If you look at the disassembly of libc (using e.g. objdump --disassemble /lib64/libc.so.6 | grep -A20 "<getpt>:", you can see that getpt() calls __open(), which is an alias to open().
000000000011e9d0 <posix_openpt>:
11e9d0: 53 push %rbx
[...]
11e9ee: e8 dd d9 fb ff callq dc3d0 <__open>
However, that call to __open is a PC-relative call that does not go through the PLT - meaning that you can not interpose the symbol with LD_PRELOAD, since all calls within libc will not use the PLT. This is probably because libc was linked with -BSymbolic.
The only option left is to do what strace does, and use ptrace to attach to the process. See this question on how that works.

How to invoke newly added system call by the function id without using syscall(__NR_mysyscall)

I am working with Linux-3.9.3 kernel in Ubuntu 10.04. I have added a basic system call in the kernel directory of the linux-3.9.3 source tree. I am able to use it with syscall() by passing my newly system call number in it as an argument. But I want to invoke it directly by using its method name as in the case of getpid() or open() system calls. Can any one help me to add it in GNU C library. I went through few documents but did not get any clear idea of how to accomplish it.
Thanks!!!
Assuming you are on a 64 bits Linux x86-64, the relevant ABI is the x86-64 ABI. Read also the x86 calling conventions wikipage and the linux assembly howto and syscalls(2)
So syscalls are using a different convention than ordinary function calls (e.g. all arguments are passed by registers, error condition could use the carry bit). Hence, you need a C wrapper to make your syscall available to C applications.
You could look into the source code of existing C libraries, like GNU libc or musl libc (so you'll need to make your own library for that syscall).
The MUSL libc source code is very readable, see e.g. its src/unistd/fsync.c as an example.
I would suggest wrapping your new syscall in your own library without patching libc. Notice that some uncommon syscalls are sitting in a different library, e.g. request_key(2) has its C wrapper in libkeyutils

What are the differences between LD_PRELOAD and strace?

Both methods are used to gather system calls also parameters and return values of them. When we prefer LD_PRELOAD and why? Maybe we can say that we can only gather syscalls via strace but we can gather library calls with LD_PRELOAD trick. However, there is another tracer for libraries whose name is ltrace.
strace is using the ptrace(2) syscall (with PTRACE_SYSCALL probably), so will catch every system call (thru kernel hooks installed by ptrace). It will work on any executable, even on statically linked ones, or those using something else than your distribution's GNU Glibc (like e.g. musl-libc, or some assembly written utility like old versions of busybox).
LD_PRELOAD tricks use the dynamic loader e.g. /lib64/ld-linux-x86-64.so.2 or /lib/ld.so (see ld.so(8) man page) etc... so won't work with statically linked executables (or those using something else than your dynamic loader and your GNU libc).
ltrace is probably also ptrace based.
And all these are being free software, you could study their source code (and improve it).

Resources