What are the differences between LD_PRELOAD and strace?

What are the differences between LD_PRELOAD and strace? - hook

Both methods are used to gather system calls also parameters and return values of them. When we prefer LD_PRELOAD and why? Maybe we can say that we can only gather syscalls via strace but we can gather library calls with LD_PRELOAD trick. However, there is another tracer for libraries whose name is ltrace.

strace is using the ptrace(2) syscall (with PTRACE_SYSCALL probably), so will catch every system call (thru kernel hooks installed by ptrace). It will work on any executable, even on statically linked ones, or those using something else than your distribution's GNU Glibc (like e.g. musl-libc, or some assembly written utility like old versions of busybox).
LD_PRELOAD tricks use the dynamic loader e.g. /lib64/ld-linux-x86-64.so.2 or /lib/ld.so (see ld.so(8) man page) etc... so won't work with statically linked executables (or those using something else than your dynamic loader and your GNU libc).
ltrace is probably also ptrace based.
And all these are being free software, you could study their source code (and improve it).

Related

How to find out what functions to intercept with LD_PRELOAD?

I am trying to intercept all dynamically loaded functions that call syscall openat with a library comm.so using LD_PRELOAD mechanism.
Consider the following use of /sbin/depmod command:
#strace -f /sbin/depmod 3.10.0-693.17.1.el7.x86_64
(...)
openat(AT_FDCWD, "/lib/modules/3.10.0-693.17.1.el7.x86_64", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
I want to intercept the function that calls this openat syscall.
How to find out what is that function? openat, which may be an alias, and any other similar function, do not work - nothing gets intercepted.
I tried to use this command to find what dynamically loaded functions my command is using:
#readelf -p .dynstr /sbin/depmod
this prints out some .so libraries, so I used readelf on them recursively. At the end of the recursion, I have the following list of functions that have open and at in them:
openat
openat64
open_by_handle_at
__openat64_2
None of these work - they don't intercept the call returning the file descriptor 3.
OK, so how to find out what other function do I need to intercept? Do I have to go through all of the functions shown by readelf command, and recursively so, one by one (there is a lot of them)?

The openat system call (or any other one, see syscalls(2) for a list) could be called without using the openat function from the standard library; and it could be called from ld-linux(8) (which handles LD_PRELOAD). On my Debian/Sid system it looks that the dynamic linker /lib/ld-linux.so.2 is using openat system call (try for example strace /bin/true) and of course it uses its own open or openat function (not the one in libc.so).
Any system call could (in principle) be called by direct machine code (e.g. some appropriate SYSENTER machine instruction), or thru some indirect syscall(2) (and in both cases the openat C function won't be used). See perhaps the Linux Assembly HowTo for more and the Linux x86 ABI spec.
If you want to intercept all of them (including those done by ld-linux, which is weird), you need to use ptrace(2) with PTRACE_SYSCALL in a way similar to strace(1). You'll be able to get the program counter and the call stack at that point.
If you care about following files and file descriptors, consider also inotify(7) facilities.
If you use gdb (which can be painfully used on programs without DWARF debug information) you could use catch syscall (the way to use ptrace PTRACE_SYSCALL in gdb) to find out (and probably "break at") every raw system call.
BTW, it could be possible that some C standard libraries are implementing their open C function with openat system call (or using openat elsewhere). Check by studying the source code of your particular libc.so (probably GNU glibc, maybe musl-libc).

How to link kernel functions to user-space program?

I have a user-space program (Capstone). I would like to use it in FreeBSD kernel. I believe most of them have the same function name, semantics and arguments (in FreeBSD, the kernel printf is also named printf). First I built it as libcapstone.a library, and link my program with it. Since the include files are different between Linux user-space and FreeBSD kernel, the library cannot find symbols like sprintf, memset and vsnprintf. How could I make these symbols (from FreeBSD kernel) visible to libcapstone.a?
If directly including header files like <sys/systm.h> in Linux user-space source code, the errors would be like undefined type u_int or u_char, even if I add -D_BSD_SOURCE to CFLAGS.
Additionally, any other better ways to do this?

You also need ; take a look at kernel man pages, eg "man 9 printf". They list required includes at the top.
Note, however, that you're trying to do something really hard. Some basic functions (eg printf) might be there; others differ completely (eg malloc(9)), and most POSIX APIs are simply not there. You won't be able to use open(2), socket(2), or fork(2).

How to invoke newly added system call by the function id without using syscall(__NR_mysyscall)

I am working with Linux-3.9.3 kernel in Ubuntu 10.04. I have added a basic system call in the kernel directory of the linux-3.9.3 source tree. I am able to use it with syscall() by passing my newly system call number in it as an argument. But I want to invoke it directly by using its method name as in the case of getpid() or open() system calls. Can any one help me to add it in GNU C library. I went through few documents but did not get any clear idea of how to accomplish it.
Thanks!!!

Assuming you are on a 64 bits Linux x86-64, the relevant ABI is the x86-64 ABI. Read also the x86 calling conventions wikipage and the linux assembly howto and syscalls(2)
So syscalls are using a different convention than ordinary function calls (e.g. all arguments are passed by registers, error condition could use the carry bit). Hence, you need a C wrapper to make your syscall available to C applications.
You could look into the source code of existing C libraries, like GNU libc or musl libc (so you'll need to make your own library for that syscall).
The MUSL libc source code is very readable, see e.g. its src/unistd/fsync.c as an example.
I would suggest wrapping your new syscall in your own library without patching libc. Notice that some uncommon syscalls are sitting in a different library, e.g. request_key(2) has its C wrapper in libkeyutils

How to build the elf interpreter (ld-linux.so.2/ld-2.17.so) as static library?

I apologize if my question is not precise because I don't have a lot
of Linux related experience. I'm currently building a Linux from
scratch (mostly following the guide at linuxfromscratch.org version
7.3). I ran into the following problem: when I build an executable it
gets a hardcoded path to something called ELF interpreter.
readelf -l program
shows something like
[Requesting program interpreter: /lib/ld-linux.so.2]
I traced this library ld-linux-so.2 to be part of glibc. I am not very
happy with this behaviour because it makes the binary very unportable
- if I change the location of /lib/ld-linux.so.2 the executable no
longer works and the only "fix" I found is to use the patchelf utility
from NixOS to change the hardcoded path to another hardcoded path. For
this reason I would like to link against a static version of the ld
library but such is not produced. And so this is my question, could
you please explain how could I build glibc so that it will produce a
static version of ld-linux.so.2 which I could later link to my
executables. I don't fully understand what this ld library does, but I
assume this is the part that loads other dynamic libraries (or at
least glibc.so). I would like to link my executables dynamically, but
I would like the dynamic linker itself to be statically built into
them, so they would not depend on hardcoded paths. Or alternatively I
would like to be able to set the path to the interpreter with
environment variable similar to LD_LIBRARY_PATH, maybe
LD_INTERPRETER_PATH. The goal is to be able to produce portable
binaries, that would run on any platform with the same ABI no matter
what the directory structure is.
Some background that may be relevant: I'm using Slackware 14 x86 to
build i686 compiler toolchain, so overall it is all x86 host and
target. I am using glibc 2.17 and gcc 4.7.x.

I would like to be able to set the path to the interpreter with environment variable similar to LD_LIBRARY_PATH, maybe LD_INTERPRETER_PATH.
This is simply not possible. Read carefully (and several times) the execve(2), elf(5) & ld.so(8) man pages and the Linux ABI & ELF specifications. And also the kernel code doing execve.
The ELF interpreter is responsible for dynamic linking. It has to be a file (technically a statically linked ELF shared library) at some fixed location in the file hierarchy (often /lib/ld.so.2 or /lib/ld-linux.so.2 or /lib64/ld-linux-x86-64.so.2)
The old a.out format from the 1990s had a builtin dynamic linker, partly implemented in old Linux 1.x kernel. It was much less flexible, and much less powerful.
The kernel enables, by such (in principle) arbitrary dynamic linker path, to have various dynamic linkers. But most systems have only one. This is a good way to parameterize the dynamic linker. If you want to try another one, install it in the file system and generate ELF executables mentioning that path.
With great pain and effort, you might make your own ld.so-like dynamic linker implementing your LD_INTERPRETER_PATH wish, but that linker still has to be an ELF shared library sitting at some fixed location in the file tree.
If you want a system not needing any files (at some predefined, and wired locations, like /lib/ld.so, /dev/null, /sbin/init ...), you'll need to build all its executable binaries statically. You may want (but current Linux distributions usually don't do that) to have a few statically linked executables (like /sbin/init, /bin/sash...) that will enable you to repair a system broken to the point of not having any dynamic linker.
BTW, the /sbin/init -or /bin/sh - path is wired inside the kernel itself. You may pass some argument to the kernel at boot load time -e.g. with GRUB- to overwrite the default. So even the kernel wants some files to be here!
As I commented, you might look into MUSL-Libc for an alternative Libc implementation (providing its own dynamic linker). Read also about VDSO and ASLR and initrd.
In practice, accept the fact that modern Linuxes and Unixes are expecting some non-empty file system ... Notice that dynamic linking and shared libraries are a huge progress (it was much more painful in the 1990s Linux kernels and distributions).
Alternatively, define your own binary format, then make a kernel module or a binfmt_misc entry to handle it.
BTW, most (or all) of Linux is free software, so you can improve it (but this will take months -or many years- of work to you). Please share your improvements by publishing them.
Read also Drepper's Hwo to Write Shared Libraries paper; and this question.

I ran into the same issue. In my case I want to bundle my application with a different GLIBC than comes system installed. Since ld-linux.so must match the GLIBC version I can't simply deploy my application with the according GLIBC. The problem is that I can't run my application on older installations that don't have the required GLIBC version.
The path to the loader interpreter can be modified with --dynamic-linker=/path/to/interp. However, this needs to be set at compile time and therefore would require my application to be installed in that location (or at least I would need to deploy the ld-linux.so that goes with my GLIBC in that location which goes against a simple xcopy deployment.
So what's needed is an $ORIGIN option equivalent to what the -rpath option can handle. That would allow for a fully dynamic deployment.
Given the lack of a dynamic interpreter path (at runtime) leaves two options:
a) Use patchelf to modify the path before the executable gets launched.
b) Invoke the ld-linux.so directly with the executable as an argument.
Both options are not as 'integrated' as a compiled $ORIGIN path in the executable itself.

Linux userland exec

I need a C library which lets me exec() a statically linked binary, without invoking the execve() system call. The reason why the system call wouldn't work is that the binary file is not executable, and it's not possible to make it executable on that system. For dynamically linked binaries, running /lib/ld-linux.so.2 progname does the trick, but that segfaults on my statically linked binary.
I've found ul_exec 1.1 on http://archive.cert.uni-stuttgart.de/bugtraq/2004/01/msg00002.html , but that seems to segfault for its own Hello, World binary on my system.
One option would be to make a copy of the binary, make the copy executable, and call execve(). I'm looking for a solution which doesn't need such a copy (because of performance reasons).

I've updated The Grugq's userland exec to work with modern x86 Linuxes. I wrote an x86_64 userland exec from scratch.

then how about a usermode filesystem (using python-fuse for example) that maps the execute bit to any file specified? would that be too much of a performance hit?

There is a good short wiki article with some not-completely-production-ready implementations: http://plash.beasts.org/wiki/UserModeExec

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string