Force mapping between symbols and shared libraries - linux

I have an executable with four shared libraries and the dependency tree look like this: Executable app does a dlopen of foo.so and bar.so. foo.so in turn links to fooHelper.so and bar.so links to barHelper.so.
Now, the issue is that fooHelper.so and barHelper.so have some of the same symbols. For instance, let us say we have a func with different implementations in fooHelper.so and barHelper.so. Is there a way to force foo.so to use fooHelper.so's implementation and bar.so to use barHelper.so's? What happens at present is that depending on the order of linking of the helpers, only one of the implementations of func is used by both foo.so and bar.so. This is because of the default Unix linkage model, if the definition of a symbol is already loaded, then any other definitions from shared libraries loaded subsequently are just discarded. Basically, func will be picked up from the helper library linked first. I need a way to explicitly specify the appropriate mapping without changing the source code of the shared libraries.
I'm working on Linux with g++ 4.4.

Is there a way to force foo.so to use fooHelper.so's implementation and bar.so to use barHelper.so's?
Yes: that's what RTLD_LOCAL is for (when dlopening foo.so and bar.so).
RTLD_LOCAL
This is the converse of RTLD_GLOBAL, and the default if neither flag
is specified. Symbols defined in this library are not made available
to resolve references in subsequently loaded libraries.

If both funcs happen to be in the same name-space, you're in a bit of trouble - if you are programming in C. The term to look for is "function overloading". There have been previous discussions on this topic, e.g. this one:
function overloading in C
EDIT: http://litdream.blogspot.de/2007/03/dynamic-loading-using-dlopen-api-in-c.html

Related

Why are some foreign functions statically linked while others are dynamically linked?

I'm working on a program that needs to manipulate git repositories. I've decided to use libgit2. Unfortunately, the haskell bindings for it are several years out of date and lack several functions that I require. Because of this I've decided to write the portions that use libgit2 in C and call them through the FFI. For demonstration purposes one of them is called git_update_repo.
git_update_repo works perfectly when used in a pure C program, however when it's called from haskell an assertion fails indicating that the libgit2 global init function, git_libgit2_init, hasn't been called. But, git_libgit2_init is called by git_update_repo. And if I use gdb I can see that git_libgit2_init is indeed called and reports that the initialization has been successful.
I've used nm to examine the executables and found something interesting. In a pure C executable, all the libgit2 functions are dynamically linked (as expected). However, in my haskell executable, git_libgit2_init is dynamically linked, while the rest of the libgit2 functions are statically linked. I'm certain that this mismatch is the cause of my issue.
So why do certain functions get linked dynamically and others statically? How can I change this?
The relevant settings in my .cabal file are
cc-options: -g
c-sources:
src/git-bindings.c
extra-libraries:
git2

Compile part of all dependencies as shared libraries

Say I got (regular source) libraries A and B and executable E which depends on both.
Now, I want E to include the object files of A directly, whereas B should be added as a shared library (concrete use: B contains shared types of a plugin architecture). How would I do that with existing tools, preferably stack?
Is that possible or is it rather an all-or-nothing choice (use only shared libraries or link everything into the same binary)?
Optimally, I'd like to specify for each dependency if it should be linked statically or dynamically. Also, that should probably go into the .cabal file, but we have to work with what we got...
(Well, technically that's both statically linked, but in the second case the object code is split up in different files, you get the idea).

Dynamic loading and weak symbol resolution

Analyzing this question I found out some things about behavior of weak symbol resolution in the context of dynamic loading (dlopen) on Linux. Now I'm looking for the specifications governing this.
Let's take an example. Suppose there is a program a which dynamically loads libraries b.so and c.so, in that order. If c.so depends on two other libraries foo.so (actually libgcc.so in that example) and bar.so (actually libpthread.so), then usually symbols exported by bar.so can be used to satisfy weak symbol linkages in foo.so. But if b.so also depends on foo.so but not on bar.so, then these weak symbols will apparently not be linked against bar.so. It seems as if foo.so inkages only look for symbols from a and b.so and all their dependencies.
This makes sense, to some degree, since otherwise loading c.so might change the behavior of foo.so at some point where b.so has already been using the library. On the other hand, in the question that got me started this caused quite a bit of trouble, so I wonder whether there is a way around this problem. And in order to find ways around, I first need a good understanding about the very exact details how symbol resolution in these cases is specified.
What is the specification or other technical document to define correct behavior in these scenarios?
Unfortunately, the authoritative documentation is the source code. Most distributions of Linux use glibc or its fork, eglibc. In the source code for both, the file that should document dlopen() reads as follows:
manual/libdl.texi
#c FIXME these are undocumented:
#c dladdr
#c dladdr1
#c dlclose
#c dlerror
#c dlinfo
#c dlmopen
#c dlopen
#c dlsym
#c dlvsym
What technical specification there is can be drawn from the ELF specification and the POSIX standard. The ELF specification is what makes a weak symbol meaningful. POSIX is the actual specification for dlopen() itself.
This is what I find to be the most relevant portion of the ELF specification.
When the link editor searches archive libraries, it extracts archive
members that contain definitions of undefined global symbols. The
member’s definition may be either a global or a weak symbol.
The ELF specification makes no reference to dynamic loading so the rest of this paragraph is my own interpretation. The reason I find the above relevant is that resolving symbols occurs at a single "when". In the example you give, when program a dynamically loads b.so, the dynamic loader attempts to resolve undefined symbols. It may end up doing so with either global or weak symbols. When the program then dynamically loads c.so, the dynamic loader again attempts to resolve undefined symbols. In the scenario you describe, symbols in b.so were resolved with weak symbols. Once resolved, those symbols are no longer undefined. It doesn't matter if global or weak symbols were used to defined them. They're already no longer undefined by the time c.so is loaded.
The ELF specification gives no precise definition of what a link editor is or when the link editor must combine object files. Presumably it's a non-issue because the document has dynamic-linking in mind.
POSIX describes some of the dlopen() functionality but leaves much up to the implementation, including the substance of your question. POSIX makes no reference to the ELF format or weak symbols in general. For systems implementing dlopen() there need not even be any notion of weak symbols.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html
POSIX compliance is part of another standard, the Linux Standard Base. Linux distributions may or may not choose to follow these standards and may or may not go to the trouble of being certified. For example, I understand that a formal Unix certification by Open Group is quite expensive -- hence the abundance of "Unix-like" systems.
An interesting point about the standards compliance of dlopen() is made on the Wikipedia article for dynamic loading. dlopen(), as mandated by POSIX, returns a void*, but C, as mandated by ISO, says that a void* is a pointer to an object and such a pointer is not necessarily compatible with a function pointer.
The fact remains that any conversion between function and object
pointers has to be regarded as an (inherently non-portable)
implementation extension, and that no "correct" way for a direct
conversion exists, since in this regard the POSIX and ISO standards
contradict each other.
The standards that do exist contradict and what standards documents there are may not be especially meaningful anyway. Here's Ulrich Drepper writing about his disdain for Open Group and their "specifications".
http://udrepper.livejournal.com/8511.html
Similar sentiment is expressed in the post linked by rodrigo.
The reason I've made this change is not really to be more conformant
(it's nice but no reason since nobody complained about the old
behaviour).
After looking into it, I believe the proper answer to the question as you've asked it is that there is no right or wrong behavior for dlopen() in this regard. Arguably, once a search has resolved a symbol it is no longer undefined and in subsequent searches the dynamic loader will not attempt to resolve the already defined symbol.
Finally, as you state in the comments, what you describe in the original post is not correct. Dynamically loaded shared libraries can be used to resolve undefined symbols in previously dynamically loaded shared libraries. In fact, this isn't limited to undefined symbols in dynamically loaded code. Here is an example in which the executable itself has an undefined symbol that is resolved through dynamic loading.
main.c
#include <dlfcn.h>
void say_hi(void);
int main(void) {
void* symbols_b = dlopen("./dyload.so", RTLD_NOW | RTLD_GLOBAL);
/* uh-oh, forgot to define this function */
/* better remember to define it in dyload.so */
say_hi();
return 0;
}
dyload.c
#include <stdio.h>
void say_hi(void) {
puts("dyload.so: hi");
}
Compile and run.
gcc-4.8 main -fpic -ldl -Wl,--unresolved-symbols=ignore-all -o main
gcc-4.8 dyload.c -shared -fpic -o dyload.so
$ ./main
dyload.so: hi
Note that the main executable itself was compiled as PIC.

Is there an equivalent of dyld for Linux?

Mac OS X provides a useful library for dynamic loading, called dyld. Among the many interesting functions for dynamic loading handling are functions to allow one to install callbacks which will be called by dyld whenever an image is loaded or unloaded, by dlopen and dlclose, respectively. Those functions are void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)) and void _dyld_register_func_for_remove_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)), respectively.
I know it's not possible to have an exact port for Linux, because dyld functions deal with mach-o files and Linux uses ELF files.
So, is there an equivalent of the dyld library for Linux. Or, at least, is there an equivalent of those two functions, _dyld_register_func_for_add_image and _dyld_register_func_for_remove_image, in any Linux library? Or will I have to implement my own versions of these two by myself, which is not so hard, but I would have to find a way to make dlopen and dlclose call callback functions whenever they get called.
EDIT
To must things clearer, I need to make a library that has a callback function that must be called whenever an external library is dynamically loaded by dlopen. My callback function must perform some operations on any dynamic loaded library.
Yes, it is called dlopen(3) using the -ldl standard library
More precisely:
compile your plugin's source code using the -fPIC flag to get position independent code object files *.pic.o
make a shared library plugin by linking with gcc -shared your *.pic.o files (and you can also link another shared library).
use GCC function attributes, notably constructor and destructor functions (or static C++ data with explicit constructors & destructors, hence the name). The functions with __attribute__((constructor)) are called during dlopen time of your plugin, those with __attribute__((destructor)) in your plugin are called during dlclose time
linking the main program with the -rdynamic attribute is useful & needed, as soon as the plugin call some functions in the main program.
don't forget to declare extern "C" your C++ plugin functions (needed for the program)
use dlsym inside your main program to fetch function or data addresses inside your plugin.
There is indeed no hooks for dlopen like _dyld_register_func_for_add_image does. You may want to use constructor functions and/or dl_iterate_phdr(3) to mimic that.
If you can change the plugin (the shared object which you dlopen) you could play constructor tricks inside to mimic such hooks. Otherwise, use some own convention (e.g. that a plugin having a module_start function gets that module_start function called just after dlopen etc...).
Some libraries are wrapping dlopen into something of higher level. For example Qt has QPluginLoader & QLibrary etc...
There is also the LD_PRELOAD trick (perhaps you might redefine your own dlopen & dlclose thru such a trick, and have your modified functions do the hooks). The ifunc function attribute might also be relevant.
And since Gnu Libc is free software providing the dlopen - there is also MUSL Libc, you could patch it to suit your needs. dladdr(3) could be useful too!
addenda
If you are making your own runtime for some Objective-C, you should know well the conventions of the Objective-C compiler using that runtime, and you probably could have your own module loader, instead of overloading dlopen...

Linker Errors for Unused Functionality: When do they occur?

Assume a static library libfoo that depends on another static library libbar for some functionality. These and my application are written in D. If my application only uses libfoo directly, and only calls functions from libfoo that do not reference symbols from libbar, sometimes the program links successfully without passing libbar to the linker and other times it doesn't.
Which of these happens seems to depend on what compiler I'm using to compile libfoo, libbar and my application, even though all compilers use the GCC toolchain to link. If I'm using DMD, I never receive linker errors if I don't pass libbar to the linker. If I'm using GDC, I sometimes do, for reasons I don't understand. If I'm using LDC, I always do.
What determines whether the GCC linker fails when a symbol referred in libfoo is undefined, but this symbol occurs in a function not referred to by the application object file?
What determines whether the GCC linker fails when a symbol referred in libfoo is undefined, but this symbol occurs in a function not referred to by the application object file?
If the linker complains about an unresolved symbol, then that is symbol is referenced from somewhere.
Usually the linker will tell you which object the unresolved reference comes from, but if it doesn't, the -Wl,-y,unres_symbol should.
You may also want to read this description of how the whole thing works.
if the linker does no effort to eliminate dead (unused) code in the libraries it simply assumes all referenced symbols are used and tries to link them in
if it does the elimination (through for example a simple mark and sweep algo (note that you cannot fully decide if some code is unused as that problem can be reduced to the halting problem)) it can eliminate the unused libraries if they are never used
this behavior is implementation defined (and there may be linker flags you can set to en/disable it

Resources