I have same overridden open() which is provided by glibc in my library & I have set LD_PRELOAD with my library first, so when the process calls open(), the open which is defined in my library gets called.
THE PROBLEM:- There are several other functions within glibc which calls open() once such example is getpt(), when getpt() calls open(), the open() which is defined in glibc gets called, how would I make getpt() to invoke the open() which is defined in my library().
Constraints: - I dont have the option of compiling glibc.
As correctly stated by tmcguire, the call from posix_openpt to __open is a call to internal symbol, and can not be interposed.
Effectively, glibc developers consider this call an implementation detail, that you have no business of changing.
I am looking at compile time solution
You can't have it.
than run time solution cause run time solution will have performance impact.
Runtime solution need not have any performance impact (besides the overhead of calling your open instead of glibcs).
I only know of one way for a library to interpose glibc internal calls: runtime patching. The idea is to
find the address of libc.so.6 open (which is an alias for __open),
locate the boundaries of glibc .text section at runtime
scan it for CALL __open instructions
for any such instruction
mprotect the page it's on to be writable
compute a new instruction that is CALL my_open and patch it "on top" of the original instruction
mprotect the page back to read and execute
This is ugly, but it works fine for i*86 (32-bit) Linux, where a CALL can "reach" any other instruction within the 4GB address space. It doesn't work for x86_64, where a CALL is still limited to +/- 2GB, but the distance from your library to glibc could be more than that.
In that case, you need to find a suitable trampoline within libc.so.6 to which you can redirect the original CALL, and into which you could place a register-indirect JMP to your final destination. Fortunately, libc.so.6 usually has multiple suitably-sized unused NOP regions due to function alignment.
I was able to solve it at compile time simply by defining the getpt() function within my library.
This solution is incomplete cause there could be other functions within glibc [other than getpt()] which could call open, then open call within glibc will be called.
I can live with this solution for now, but I would need to fix it completely in future.
I don't think you can do this with LD_PRELOAD.
If you look at the disassembly of libc (using e.g. objdump --disassemble /lib64/libc.so.6 | grep -A20 "<getpt>:", you can see that getpt() calls __open(), which is an alias to open().
000000000011e9d0 <posix_openpt>:
11e9d0: 53 push %rbx
[...]
11e9ee: e8 dd d9 fb ff callq dc3d0 <__open>
However, that call to __open is a PC-relative call that does not go through the PLT - meaning that you can not interpose the symbol with LD_PRELOAD, since all calls within libc will not use the PLT. This is probably because libc was linked with -BSymbolic.
The only option left is to do what strace does, and use ptrace to attach to the process. See this question on how that works.
Related
I am trying to intercept all dynamically loaded functions that call syscall openat with a library comm.so using LD_PRELOAD mechanism.
Consider the following use of /sbin/depmod command:
#strace -f /sbin/depmod 3.10.0-693.17.1.el7.x86_64
(...)
openat(AT_FDCWD, "/lib/modules/3.10.0-693.17.1.el7.x86_64", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
I want to intercept the function that calls this openat syscall.
How to find out what is that function? openat, which may be an alias, and any other similar function, do not work - nothing gets intercepted.
I tried to use this command to find what dynamically loaded functions my command is using:
#readelf -p .dynstr /sbin/depmod
this prints out some .so libraries, so I used readelf on them recursively. At the end of the recursion, I have the following list of functions that have open and at in them:
openat
openat64
open_by_handle_at
__openat64_2
None of these work - they don't intercept the call returning the file descriptor 3.
OK, so how to find out what other function do I need to intercept? Do I have to go through all of the functions shown by readelf command, and recursively so, one by one (there is a lot of them)?
The openat system call (or any other one, see syscalls(2) for a list) could be called without using the openat function from the standard library; and it could be called from ld-linux(8) (which handles LD_PRELOAD). On my Debian/Sid system it looks that the dynamic linker /lib/ld-linux.so.2 is using openat system call (try for example strace /bin/true) and of course it uses its own open or openat function (not the one in libc.so).
Any system call could (in principle) be called by direct machine code (e.g. some appropriate SYSENTER machine instruction), or thru some indirect syscall(2) (and in both cases the openat C function won't be used). See perhaps the Linux Assembly HowTo for more and the Linux x86 ABI spec.
If you want to intercept all of them (including those done by ld-linux, which is weird), you need to use ptrace(2) with PTRACE_SYSCALL in a way similar to strace(1). You'll be able to get the program counter and the call stack at that point.
If you care about following files and file descriptors, consider also inotify(7) facilities.
If you use gdb (which can be painfully used on programs without DWARF debug information) you could use catch syscall (the way to use ptrace PTRACE_SYSCALL in gdb) to find out (and probably "break at") every raw system call.
BTW, it could be possible that some C standard libraries are implementing their open C function with openat system call (or using openat elsewhere). Check by studying the source code of your particular libc.so (probably GNU glibc, maybe musl-libc).
Assume I have a very big source code and intend to make the rdx register totally unused during the execution, i.e., while generating the assembly code, all I want is to inform my compiler (GCC) that it should not use rdx at all.
NOTE: register rdx is just an example. I am OK with any available Intel x86 register.
I am even happy to update the source code of the compiler and use my custom GCC. But which changes to the source code are needed?
You tell GCC not to allocate a register via the -ffixed-reg option (gcc docs).
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
reg must be the name of a register. The register names accepted are machine-specific and are defined in the REGISTER_NAMES macro in the machine description macro file.
For example, gcc -ffixed-r13 will make gcc leave it alone entirely. Using registers that are part of the calling convention, or required for certain instructions, may be problematic.
You can put some global variable to this register.
For ARM CPU you can do it this way:
register volatile type *global_ptr asm ("r8")
This instruction uses general purpose register "r8" to hold
the value of global_ptr pointer.
See the source in U-Boot for real-life example:
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/include/asm/global_data.h;h=4e3ea55e290a19c766017b59241615f7723531d5;hb=HEAD#l83
File arch/arm/include/asm/global_data.h (line ~83).
#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
I don't know whether there is a simple mechanism to tell that to gcc at run time. I would assume that you must recompile. From what I read I understand that there are description files for the different CPUs, e.g. this file, but what exactly needs to be changed in order to prevent gcc from using the register, and what potential side effects such a change could have, is beyond me.
I would ask on the gcc mailing list for assistence. Chances are that the modification is not so difficult per se, except that building gcc isn't trivial in my experience. In your case, if I analyze the situation correctly, a caveat applies. You are essentially cross-compiling, i.e building for a different architecture. In particular I understand that you have to build your system and other libraries which your program uses because their code would normally use that register. If you intend to link dynamically you probably would also have to build your own ld.so (the dynamic loader) because starting a dynamically linked executable actually starts that loader which would use that register. (Therefore maybe linking statically is better.)
Consider the divq instruction - the dividend is represented by [rdx][rax], and, assuming the divisor (D) satisfies rdx < D, the quotient is stored in %rax and remainder in %rdx. There are no alternative registers that can be used here.
The same applies with the mul/mulq instructions, where the product is stored in [rdx][rax] - even the recent mulx instruction, while more flexible, still uses %rdx as a source register. (If memory serves)
More importantly, %rdx is used to pass parameters in the x86-64 ELF ABI. You could never call C library functions (or any other ELF library for that matter) - even kernel syscalls use %rdx to pass parameters - though the register use is not the same.
I'm not clear on your motivation - but the fact is, you won't be able to do anything practical on any x86[-64] platform (let alone an ELF/Linux platform) - at least in user-space.
I was playing around with gdb and I'd like to set remove executable privilege from a particular page. How could I go about doing that? I don't need to be able to do that from within gdb, its just that I'd like to change the permission somehow(anything short of modifying the source code of the binary will do).
[EDIT]
Im looking for a solution that works for binaries that are not linked against libc.
Use mprotect(): http://linux.die.net/man/2/mprotect
You can call it from within gdb, something like call mprotect(addr, len, 3) where 3 is the numeric value of PROT_READ|PROT_WRITE (at least on my system).
Even if the binary is not using libc, it must execute at least one system call to do anything useful.
What you need to do then is to arrange for correct arguments to be on the stack or in registers (details differ between processors and OSes, and you haven't told us which one you are running on), and then jump (using GDB jump command) to the syscal instruction.
E.g. on Linux/x86_64, you would put 10 (__NR_mprotect) into %rax, addr into %rdi, len into %rsi, and 3 into %rdx. See documentation here.
I am working with Linux-3.9.3 kernel in Ubuntu 10.04. I have added a basic system call in the kernel directory of the linux-3.9.3 source tree. I am able to use it with syscall() by passing my newly system call number in it as an argument. But I want to invoke it directly by using its method name as in the case of getpid() or open() system calls. Can any one help me to add it in GNU C library. I went through few documents but did not get any clear idea of how to accomplish it.
Thanks!!!
Assuming you are on a 64 bits Linux x86-64, the relevant ABI is the x86-64 ABI. Read also the x86 calling conventions wikipage and the linux assembly howto and syscalls(2)
So syscalls are using a different convention than ordinary function calls (e.g. all arguments are passed by registers, error condition could use the carry bit). Hence, you need a C wrapper to make your syscall available to C applications.
You could look into the source code of existing C libraries, like GNU libc or musl libc (so you'll need to make your own library for that syscall).
The MUSL libc source code is very readable, see e.g. its src/unistd/fsync.c as an example.
I would suggest wrapping your new syscall in your own library without patching libc. Notice that some uncommon syscalls are sitting in a different library, e.g. request_key(2) has its C wrapper in libkeyutils
What is the difference between SYS_exit, sys_exit() and exit()?
What I understand :
The linux kernel provides system calls, which are listed in man 2 syscalls.
There are wrapper functions of those syscalls provided by glibc which have mostly similar names as the syscalls.
My question : In man 2 syscalls, there is no mention of SYS_exit and sys_exit(), for example. What are they?
Note : The syscall exit here is only an example. My question really is : What are SYS_xxx and sys_xxx()?
I'll use exit() as in your example although this applies to all system calls.
The functions of the form sys_exit() are the actual entry points to the kernel routine that implements the function you think of as exit(). These symbols are not even available to user-mode programmers. That is, unless you are hacking the kernel, you cannot link to these functions because their symbols are not available outside the kernel. If I wrote libmsw.a which had a file scope function like
static int msw_func() {}
defined in it, you would have no success trying to link to it because it is not exported in the libmsw symbol table; that is:
cc your_program.c libmsw.a
would yield an error like:
ld: cannot resolve symbol msw_func
because it isn't exported; the same applies for sys_exit() as contained in the kernel.
In order for a user program to get to kernel routines, the syscall(2) interface needs to be used to effect a switch from user-mode to kernel mode. When that mode-switch (somtimes called a trap) occurs a small integer is used to look up the proper kernel routine in a kernel table that maps integers to kernel functions. An entry in the table has the form
{SYS_exit, sys_exit},
Where SYS_exit is an preprocessor macro which is
#define SYS_exit (1)
and has been 1 since before you were born because there hasn't been reason to change it. It also happens to be the first entry in the table of system calls which makes look up a simple array index.
As you note in your question, the proper way for a regular user-mode program to access sys_exit is through the thin wrapper in glibc (or similar core library). The only reason you'd ever need to mess with SYS_exit or sys_exit is if you were writing kernel code.
This is now addressed in man syscall itself,
Roughly speaking, the code belonging to the system call with number __NR_xxx defined in /usr/include/asm/unistd.h can be found in the Linux kernel source in the routine sys_xxx(). (The dispatch table for i386 can be found in /usr/src/linux/arch/i386/kernel/entry.S.) There are many exceptions, however, mostly because older system calls were superseded by newer ones, and this has been treated somewhat unsystematically. On platforms with proprietary operating-system emulation, such as parisc, sparc, sparc64, and alpha, there are many additional system calls; mips64 also contains a full set of 32-bit system calls.
At least now /usr/include/asm/unistd.h is a preprocessor hack that links to either,
/usr/include/asm/unistd_32.h
/usr/include/asm/unistd_x32.h
/usr/include/asm/unistd_64.h
The C function exit() is defined in stdlib.h. Think of this as a high level event driven interface that allows you to register a callback with atexit()
/* Call all functions registered with `atexit' and `on_exit',
in the reverse of the order in which they were registered,
perform stdio cleanup, and terminate program execution with STATUS. */
extern void exit (int __status) __THROW __attribute__ ((__noreturn__));
So essentially the kernel provides an interface (C symbols) called __NR_xxx. Traditionally people want sys_exit() which is defined with a preprocessor macro SYS_exit. This macro creates the sys_exit() function. The exit() function is part of the standard C library stdlib.h and ported to other operating systems that lack the Linux Kernel ABI entirely (there may not be __NR_xxx functions) and potentially don't even have sys_* functions available either (you could write exit() to send the interrupt or use VDSO in Assembly).