How ltrace works?
How does it find out that program's calling library functions?
Is there any common code path that all calls to library functions come through? Maybe ltrace is setting breakpoint in this common code path?
Thanks!
Dynamic executables have a symbol table used by the linker when resolving references that need to be connected to library functions. (You can see this yourself by running objdump -T /path/to/binary).
This symbol table is accessible by other tools -- such as ltrace -- as well, so it's trivial to determine which functions need to be hooked and walk that list individually.
See a talk on ltrace internals presented at the Ottowa Linux Symposium, which provides a detailed, function-by-function breakdown; to follow along the source, see the official repository, or a third-party github mirror.
Some newer releases (more recent than that talk) also hook the dlopen() call, to be able to trace invocation of dynamically loaded libraries as well. The mechanism there should be rather obvious on a moment's thought -- if one can replace dlopen() with a shim (when dlopen() itself is dynamically linked as above), one can then set a breakpoint on any function pointer it returns.
Related
I'm encountering an issue which has been elaborated in a good article Shared Library Symbol Conflicts (on Linux). The problem is that when the execution and .so have defined the same name functions, if the .so calls this function name, it would call into that one in execution rather than this one in .so itself.
Let's talk about the case in this article. I understand the DoLayer() function in layer.o has an external function dependency of DoThing() when compiling layer.o.
But when compiling the libconflict.so, shouldn't the external function dependency be resolved in-place and just replaced with the address of conflict.o/DoThing() statically?
Why does the layer.o/DoLayer() still use dynamic linking to find DoThing()? Is this a designed behavior?
Is this a designed behavior?
Yes.
At the time of introduction of shared libraries on UNIX, the goal was to pretend that they work just as if the code was in a regular (archive) library.
Suppose you have foo() defined in both libfoo and libbar, and bar() in libbar calls foo().
The design goal was that cc main.c -lfoo -lbar works the same regardless of whether libfoo and libbar are archive or a shared libraries. The only way to achieve this is to have libbar.so use dynamic linking to resolve call from bar() to foo(), despite having a local version of foo().
This design makes it impossible to create a self-contained libbar.so -- its behavior (which functions it ends up calling) depends on what other functions are linked into the process. This is also the opposite of how Windows DLLs work.
Creating self-contained DSOs was not a consideration at the time, since UNIX was effectively open-source.
You can change the rules with special linker flags, such as -Bsymbolic. But the rules get complicated very quickly, and (since that isn't the default) you may encounter bugs in the linker or the runtime loader.
Yes, this is a designed behavior. When you link a program into a binary, all the references to named external (non-static) functions are resolved to point into the symbol table for the binary. Any shared libraries that are linked against are specified as DT_NEEDED entries.
Then, when you run the binary, the dynamic linker loads each required shared library to a suitable address and resolves each symbol to an address. Sometimes this is done lazily, and sometimes it is done once at first startup. If there are multiple symbols with the same name, one of them will be chosen by the linker, and your program will likely crash since you may not end up with the right one.
Note that this is the behavior on Linux, which has all symbols as a flat namespace. Windows resolves symbols differently, using a tree topology, which has both advantages (fewer conflicts) and disadvantages (the inability to allocate memory in one library and free it in another).
The Linux behavior is very important if you want things like LD_PRELOAD to work. This allows you to use debugging tools like Electric Fence and CPU profiling tools like the Google performance tools, or replace a memory allocator at runtime. None of these things would work if symbols were preferentially resolved to their binary or shared library.
The GNU dynamic linker does support symbol versions, however, so that it's possible to load multiple versions of a shared library into the same program. Oftentimes distros like Debian will do this with libraries they expect to change frequently, like OpenSSL. If the program uses liba which uses OpenSSL 1.0 and libb which uses OpenSSL 1.1, then the program should still function in such a case since OpenSSL has versioned symbols, and each library will use the appropriate version of the relevant symbol.
There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".
One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.
I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols
What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.
One modern Linux security hardening tactic is to compile & link code with the option -Wl,-z-noexecstack, this marks the DLL or binary as not needing an executable stack. This condition can be checked using readelf or other means.
I have been working with uClibc and noticed that it produces objects (.so files) that do not have this flag set. Yet uClibc has a configuration option UCLIBC_BUILD_NOEXECSTACK which according to the help means:
Mark all assembler files as noexecstack, which will mark uClibc
as not requiring an executable stack. (This doesn't prevent other
files you link against from claiming to need an executable stack, it
just won't cause uClibc to request it unnecessarily.)
This is a security thing to make buffer overflows harder to exploit.
...etc...
On some digging into the Makefiles this is correct - the flag is only applied to the assembler.
Because the flag is only passed to the assembler does this mean that the uClibc devs have missed an important hardening flag? There are other options, for example UCLIBC_BUILD_RELRO which do result in the equivalent flag being added to the linker (as -Wl,-z,relro)
However a casual observer could easily misread this and assume, as I originally did, that UCLIBC_BUILD_NOEXECSTACK is actually marking the .so file when it is in fact not. OpenWRT for example ensures that that flag is set when it builds uClibc.
Why would uClibc not do things the 'usual' way? What am I missing here? Are the libraries (e.g. librt.so, libpthread.so, etc) actually not NX?
EDIT
I was able to play with the Makefiles and get the noexecstack bit by using the -Wl,-z,noexecstack argument. So why would they not use that as well?
OK, it turns out after list conversation and further research that:
the GNU linker sets the DLL / executable stack state based on the 'lowest common denominator' i.e. if any linked or referenced part has an exec stack then the whole object is set this way
the 'correct' way to resolve this problem is actually to find and fix assembly / object files that use an exec stack when they dont need to.
Using the linker to 'fix' things is a workaround if you can't otherwise fix the root cause.
So for uClibc solution is to submit a bug so that the underlying objects get fixed. Otherwise anything linked with static libraries wont get a non-exec stack.
For my own question, if building a custom firmware not using any static libraries it is possibly sufficient to use the linker flag.
References:
Ubuntu Security Team - Executable Stacks
I have a object file which has a main() function inside and just needs to be linked with crt... objects to be an executable . Unfortunately I can only compile and I can not link it to be an executable .
so I decided to create a c program ( on a pc with working GCC and linker ) to attach object(s) at the end of itself and execute the codes attached at run time (simulating a linked object ).
I saw DL API but I do'nt know how to use it for the problem I said .
May sb help me to know , how I can executing a code attached at the end of an executable .
Avoid doing that; it would be a mess .... And it probably won't reliably work, at least if the program is dynamically linked to the libc6.so (e.g. because of ASLR)
Just use shared objects and dynamically linked libraries (See dynamic linker wikipage). You need to learn about dlopen(3) etc.
If you really insist, take many weeks to learn a lot more: read Levine's book on Linker and Loaders, read Advanced Linux Programming, read many man pages (including execve(2), mmap(2), elf(5), ld.so(8), ...) study the kernel code for execve and mmap, the GNU libc and MUSL libc source codes (for details about implementations of the dynamic linker), the x86-64 ABI or the ABI for your target processor (is it an ARM?), learn more about the GNU binutils etc, etc, etc.
In short, your life is too short doing such messy things, unless you are already an expert, e.g. able to implement your own dynamic linker.
addenda
Apparently your real issue seems to use tinycc on the ARM (under Android I am guessing). I would then ask on their mailing list (perhaps contribute with some patch), or simply use binutils and make your own GNU ld linker script to make it work. Then the question becomes entirely different and completely unrelated to your original question. There might be some previous attempts to solve that, according to Google searches.
I have a portal in my university LAN where people can upload code to programming puzzles in C/C++. I would like to make the portal secure so that people cannot make system calls via their submitted code. There might be several workarounds but I'd like to know if I could do it simply by setting some clever gcc flags. libc by default seems to include <unistd.h>, which appears to be the basic file where system calls are declared. Is there a way I could tell gcc/g++ to 'ignore' this file at compile time so that none of the functions declared in unistd.h can be accessed?
Some particular reason why chroot("/var/jail/empty"); setuid(65534); isn't good enough (assuming 65534 has sensible limits)?
Restricting access to the header file won't prevent you from accessing libc functions: they're still available if you link against libc - you just won't have the prototypes (and macros) to hand; but you can replicate them yourself.
And not linking against libc won't help either: system calls could be made directly via inline assembler (or even tricks involving jumping into data).
I don't think this is a good approach in general. Running the uploaded code in a completely self-contained virtual sandbox (via QEMU or something like that, perhaps) would probably be a better way to go.
-D can overwrite individual function names. For example:
gcc file.c -Dchown -Dchdir
Or you can set the include guard yourself:
gcc file.c -D_UNISTD_H
However their effects can be easily reverted with #undefs by intelligent submitters :)