Manually patching an elf binary - adjusting symbols / versions names - linux

This post is asking about modifying / patching or even hexediting - due to lack of better methods (see Notes) an elf binary to adjust functions / symbols / versions they point to - and keeping ld.so happy + asking for some Tools to ease this process.
Recompilation of the binaries is not in the scope of the solution I am looking for, neither is a dissassembly of the binary, neither is using LD_* env. variables (not because they aren't valid solutions, but due to my different use-case) except for solutions akin to the one here: writing a 'dummy' function, which exposes the desired soname, and does tricks ld.so in some clever way.
There are 2 situations, for which I'd appreciate some help from the community:
The first one, binary_a, has following unmet dependency, amongst others:
$ ldd binary_a
binary_a: /usr/lib/x86_64-linux-gnu/libcdio.so.13: version `CDIO_13' not found (required by binary_a)
which is normal, since my OS has a newer version of libcdio, namely libcdio.so.17.
My goal is to try to adjust the binary to use that one instead.
What I did was :
use an hexeditor (since I couldn't find a Tool to do that) to change CDIO_13 -> CDIO_17 in the .dynstr section (yes, this version appears in the libcdio.so.17 file - checked that using objdump -SdV libcdio.so.17 or objdump -T libcdio.so.17 )
use patchelf (or again hexedit) as such:
$ patchelf --replace-needed libcdio.so.13 libcdio.so.17 binary_a
but this doesn't seems to work, since ldd still complains (although it is actually ld.so which complains, since ldd is just a bash wrapper script around it):
$ LD_DEBUG=libs ldd ./binary_a | fgrep -i libcdio.
30345: find library=libcdio.so.17 [0]; searching
30345: trying file=/usr/lib/x86_64-linux-gnu/libcdio.so.17
30345: /usr/lib/x86_64-linux-gnu/libcdio.so.17: error: **version lookup error: version `CDIO_17' not found (required by ./binary_a) (continued)**
**./binary_a: /usr/lib/x86_64-linux-gnu/libcdio.so.17: version `CDIO_17' not found (required by ./binary_a)**
libcdio.so.17 => /usr/lib/x86_64-linux-gnu/libcdio.so.17 (0x00007fbadca1e000)
To confirm: at this point, there are not references to the old .so , or to the old version (CDIO_13) in the binary, also sonames for the dependency (see soname explained ) is also valid & objdump / readelf have no issues with modified binary: they see the new version, i.e. CDIO_17 throughout the binary.
My guess is that the problem for which ld rejects the binary might be related to the vna_hash entry in the Elfxx_Vernaux struct. (see 3 for reference) being messed up by the edit, or some other anti-tampering safeguards which elf format has, makes ld to invalidate it, or is ld.so's cache somehow interferring ?
Or is it that I'm hitting the case "if two .so file has same function name, only the first one would accepted." mentioned 2 s.o. same name ?
The removal of libcdio (+ some other cd/dvd-related libs), should, theoretically, not impact the functionality of binary_a (media player), since the laptop doesn't have a dvd drive anyway.
There is also the possibility of 'weakening' those symbols, however, am not sure if that would suffice to have ld.so ignore them - haven't tested, however according to 5 it should take me closer to reaching my goal.
The second case involves, binary_b, which has a dependency to libncurses.so.6, on version NCURSES6_5.0.19991023.
My goal here is to force it to use libncurses.so.5 which only exposes version NCURSES_5.0.19991023 -notice the lack of '6' after 'NCURSES' which, by hexediting it, would change the length of the corresponding entry in the .dynstr Table.
As with previous binary, this version exists in the libncurses.so.5 file, and the file is found & readable by ldd.
Notes:
(1) I've tried elfsh (which after fixing some compile error, errors out about not recognising the elf header), rizin/cutter...
I've tried patchelf (before doing any modification), which fails for following command, presumably because the file is stripped already:
- $ patchelf --remove-needed libcdio.so.13 binary_a
- $ ldd binary_a
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!
(2) Objcopy and elfedit seem like good candidates; however, the info gathered from their man pages about (if) such possibility & how to do it is lacking + my understanding about this whole s.o. libs subject - remains meager, so I'd appreaciate some good pointers here !
(3) The other idea was to try to manually add a new dummy version entry in the .dynstr Table, which is not trivial (see dt_hash article), but I'd need a more step-by-step walkthrough of what the steps should be. Thus, I would add NCURSES_5.0.19991023, and take its offset and overwrite the vna_name entry in the corresponding Elfxx_Vernaux structure with this offset and / or overwriting the vna_flags entry (?), not sure ... 3
There is a similar case in 5, in which the author deals with a similar case (although a more simplistic one, one in which the versions already exists in the binary), and another one in Trick ldand those solutions I could maybe use, provided I'd first managed to somehow add my needed version in the elf header...
Any pointers as to what is wrong (about ldd complains) and how to achieve my goals, would be very much appreciated, Thanks !
References:
NewAppsOnOldGlibc
Symbol Versioning in the ELF header

Related

Linux ELF shared library issue

Currently I am working with ELF files and trying to deal with loading SO files. I am trying to "forcibly" link a new (a fake one, without actual calls to the code) SO dependency into executable file. To do that, I modified the .dynstr section contents (created a new section, filled it with the new contents, and resolved all sh_link fileds of Elf64_Shdr entries). Also I modified the .dynamic section (it has more than one null entry, so I modified one) to have DT_NEEDED type with linkage to the needed third-party SO name.
My small test app, being analyzed, appears to be fine (as readelf with -d option, or objdump -p, show). Nevertheless, when trying to run the application, it tells:
error while loading shared libraries: ��oU: cannot open shared object file: No such file or directory
Every time running, the name is different. This makes me think some addresses in the ELF loaded are invalid.
I understand that this way of patching is highly error-prone, but I am interested anyway. So, my question is: are there any ELF tools (like gdb or strace), which can debug image loading process (i.e. which can tell one what is wrong before entry point is hit)? Or are there any switches or options, which can help with this situation?
I have tried things like strace -d, but it would not tell anything interesting.
You do not mention patching DT_STRTAB and DT_STRSZ. These tags control how the dynamic loader locates the dynamic string table. The section headers are only used by the link editor, not at run time.
First of all, I did not manage to find any possibility to deal with sane debugging. My solution came in just because of hard-way raw ELF file hex bytes manual analysis.
My conception in general was right (forgot to mention the DT_STRTAB and DT_STRSZ modification though, thanks to Florian Weimer for reminding of it). The patchelf util (see in the postscriptum below) made me sure I am generally right.
The thing is: when you add a new section to the end, make sure you put data to the PLT right way. To add a new ".dynstr" section, I had to overwrite an auxiliary note segment (Elf**_Phdr::p_type == PT_NOTE) with a new segment, right for the new ".dynstr" section data. Not yet sure if such overwriting might cause some error.
It turned out that I put a raw ELF file ('offline') offset, but had to put this data RVA in the running image (after loading ELF into memory by the system loader, 'online'). Once I fixed it, the ELF started to work properly.
P.S. found a somewhat similar question: How can I change the filename of a shared library after building a program that depends on it? (a useful util for the same purpose I need, patchelf, is mentioned there; patchelf is available under Debian via APT, it is a nice tool for the stated purpose)

When a shared library is loaded, is it possible that it references something in the current binary?

Say I have a binary server, and when it's compiled, it's linked from server.c, static_lib.a, and dynamically with dynamic_lib.so.
When server is executed and it loads dynamic_lib.so dynamically, but on the code path, dynamic_lib.so actually expects some symbols from static_lib.a. What I'm seeing is that, dynamic_lib.so pulls in static_lib.so so essentially I have two static_lib in memory.
Let's assume there's no way we can change dynamic_lib.so, because it's a 3rd-party library.
My question is, is it possible to make dynamic_lib.so or ld itself search the current binary first, or even not search for it in ld's path, just use the binary's symbol, or abort.
I tried to find some related docs about it, but it's not easy for noobs about linkers like me :-)
You can not change library to not load static_lib.so but you can trick it to use static_lib.a instead.
By default ld does not export any symbols from executables but you can change this via -rdynamic. This option is quite crude as it exports all static symbols so for finer-grained control you can use -Wl,--dynamic-list (see example use in Clang sources).

Binary linked against different shared libraries of the same package

I have 2 shared libraries conflicting with each other, and other binaries linked against them. To be more detailed, I have something like this:
top-lib1.so linked with libprotobuf.so;
top-lib2.so linked with libprotobuf-lite.so;
binary linked with top-lib1.so and top-lib2.so.
The problem is that when I launch my binary, I have crash due to some memory corruption caused by double-free: the first from protobuf.so and the second from protobuf-lite.so (see related bug).
I haven't access to top-lib2.so sources, and I can't link top-lib1.so with protobuf-lite.so due to my app functionality.
Thus my question is: how to deal with it?
I can't leave both due to this crash, I can't re-link my lib (top-lib1.so) with libprotobuf-lite.so, and I can't change top-lib2.so.
Is there any way to re-link top-lib2.so with libprotobuf.so without sources? Or is there any other possibility?
You do have a few choices.
The upstream bug you mentioned states that "libprotobuf.so has everything libprotobuf-lite.so has, and more". If that is indeed the case, one possible solution is to binary-patch top-lib2.so's .dynamic section to reference libprotobuf.so instead of the -lite.so. The former is shorter, so simply overwriting the string libprotobuf-lite.so with libprotobuf.so\0e.so is all you should need.
If you don't want to binary-patch top-lib2.so, you have other choices:
You could link in all of top-lib1.so comprising object files and all of libprotobuf.so ones into the main binary and hide all libprotobuf's symbols in it (via linker script). If you do that, top-lib2.so can't tell that there is anything except libprotobuf-lite.so which it expects.
You could do the same with top-lib1.so -- i.e. hide libprotobuf inside of it.
You could link your copy of libprotobuf.so with -Wl,--default-symver, which will append ##libprotobuf.so version to every symbol exported from libprotobuf.so, and avoid the symbol collision that causes the problem in the first place.

A workaround for the “Template Haskell + C” bug?

I've got the following situation:
Library X is a wrapper over some code in C.
Library A depends on library X.
Library B uses Template Haskell and depends on library A.
GHC bug #9010 makes it impossible to install library B using GHC 7.6. When TH is processed, GHCi fires up and tries to load library X, which fails with a message like
Loading package charsetdetect-ae-1.0 ... linking ... ghc:
~/.cabal/lib/x86_64-linux-ghc-7.6.3/charsetdetect-ae-1.0/
libHScharsetdetect-ae-1.0.a: unknown symbol `_ZTV15nsCharSetProber'
(the actual name of the “unknown symbol” differs from machine to machine).
Are there any workarounds for this problem (apart from “don't use Template Haskell”, of course)? Maybe library X has to be compiled differently, or there's some way to stop it from loading (as it shouldn't be called during code generation anyway)?
This is really one of the main reasons that 7.8 switched to dynamic GHCi by default. Rather than try to support every feature of every object file format, it builds dynamic libraries and lets the system dynamic loader handle them.
Try building with the g++ option -fno-weak. From the g++ man page:
-fno-weak
Do not use weak symbol support, even if it is provided by the linker. By default, G++ will use weak symbols if they are available. This option exists only for testing, and should not be used by end-users; it will result in inferior code and has no benefits. This option may be removed in a future release of G++.
There is another issue with __dso_handle. I found that you can at least get the library to load and apparently work by linking in a file which defines that symbol. I don't know whether this hack will cause anything to go wrong.
So in X.cabal add
if impl(ghc < 7.8)
cc-option: -fno-weak
c-sources: cbits/dso_handle.c
where cbits/dso_handle.c contains
void *__dso_handle;

How to restrict access to symbols in shared object?

I have a plug-in in the form of a shared library (bar.so) that links into a larger program (foo). Both foo and bar.so depend on the same third party library (baz) but they need to keep their implementations of baz completely separate. So when I link foo (using the supplied object files and archives) I need it to ignore any use of baz in bar.so and vice versa.
Right now if I link foo with --trace-symbol=baz_fun where baz_fun is one of the offending symbols I get the following output:
bar.so: definition of baz_fun
foo/src.a(baz.o): reference to baz_fun
I believe this is telling me that foo is referencing baz_fun from bar.so (and execution of foo confirms this).
Solutions that I have tried:
Using objcopy to "localize" the symbols of interest: objcopy --localize-symbols=local.syms bar.so where local.syms contains all of the symbols of interest. I think I might just be confused here and maybe "local" doesn't mean what I think it means. Regardless, I get the same output from the link above. I should note that if I run the nm tool on bar.so prior to using objcopy all of the symbols in question have the T flag (upper-case indicating global) and after objcopy they have a t indicating they are local now. So it appears I am using objcopy correctly.
Compiling with -fvisibility=hidden however due to some other constraints I need to use GCC 3.3 which doesn't appear to support that feature. I might be able to upgrade to a newer version of GCC but would like confirmation that compiling with this flag will help me before heading down that road.
Other things to note:
I do not have access to the source code of either foo or baz
I would prefer to keep all of my plug-in in one shared object (bar.so). baz is actually a licensing library so I don't want it separated
Use dlopen to load your plugin with RTLD_DEEPBIND flag.
(edit)
Please note that RTLD_DEEPBIND is Linux-specific and need glibc 2.3.4 or newer.

Resources