ELF Dynamic loader symbol lookup ordering - linux

What is the search order for symbol lookup when resolving dynamic relocations?
When resolving symbols for a shared library does the loader first search in the 'main executable' (to let the main executable override definitions...) or what?

Per my understanding, each executable object has its own "lookup scope":
The main executable is usually the first object in the "global" lookup scope. This means that symbols defined in the main executable would override those in dependent shared libraries. Shared objects that are added using the LD_PRELOAD facility are added to the global lookup scope, right after the main executable.
However, if the shared object being loaded uses the DF_SYMBOLIC flag, then symbol references that originate within that object will look for definitions within the object before searching in the global lookup scope.
Shared objects opened using dlopen() may have their own dependencies. If the RTLD_GLOBAL flag was not set during the call to dlopen(), these dependencies are added to the lookup scope for that object, but do not affect the global lookup scope. If the RTLD_GLOBAL flag was passed to dlopen(), then the shared object (and its dependencies) will be added to the "global" lookup scope, changing the behavior of subsequent symbol lookups.
Ulrich Drepper's guide "How to Write Shared Libraries" is recommended reading on this topic.

When resolving symbols for a shared library does the loader first search in the 'main executable' (to let the main executable override definitions...) or what?
Yes, exactly. The dynamic loader has a linked list of loaded ELF objects (the head of the list is _r_dynamic.r_map) and searches dynamic symbol tables of objects in that list linearly, until it finds the symbol definition it is looking for.
The head of the list always points to the main executable. If a given symbol is exported from the main executable, then it (almost) always "wins" (overrides other definitions).
However, note that -Bsymbolic linker flag changes the picture a bit.

Related

Managing secondary dependencies of shared libraries

I have a problem with the seconday dependences of the shared libraries.
Suppose we have the following dependency tree:
libA.so
├─libB.so
└─libC.so
└─libD.so
That is, a shared library A depends on shared libraries B and C, and shared library C depends itself on shared library D. A good example for libC is GSL shared library which depends on CBLAS.
To make a self-sufficient easy-to-use package and to avoid problems with different versions of installed shared libraries, I package the libraries libB, libC and libD with libA, and as explained, e.g., in Drepper’s article, “How to write shared libraries” (2011), I add the link flag -Wl,-rpath,\$ORIGIN --enable-new-dtags to set the RUN_PATH of libA.so, so that the dynamic loader can find the dependencies of libA in the corresponding directories, without needing to set LD_LIBRARY_PATH (which has its downsides).
The problem is that setting the RUN_PATH in libA does not help with the secondary dependencies, like libD.so.
Turning on the dynamic loader debug messages (via setting LD_DEBUG) reveals that the loader attempts to find libD.so only in the standard library locations, and not in the location of libA (contrary to what it does for finding libB or libC).
Is there a way to overcome this problem?
Indeed, I can link the libraries C and D statically, if I have the source code or properly compiled static libraries.
But is there a better way?
Solution:
As explained by Employed Russian the solution is to set RPATH instead of RUNPATH.
With recent versions of GCC, one should use the following link flag:
-Wl,--disable-new-dtags,-rpath,\$ORIGIN
RPATH is searched for transitive dependencies; that is, paths in RPATH will be considered for everything that is dynamically loaded, even dependencies of dependencies.
In contrast, the ld dynamic linker does not search RUNPATH locations for transitive dependencies (unlike RPATH).
See also:
Wikipedia: https://en.wikipedia.org/wiki/Rpath#The_role_of_GNU_ld
SO question: How to set RPATH and RUNPATH with GCC/LD?
By setting DT_RUNPATH you are telling the loader that each of your binaries is linked with all of its dependencies.
But that isn't true for libC.so -- it (apparently) doesn't have DT_RUNPATH of its own.
I can link the libraries C and D statically, ... But is there a better way?
Yes: link libC.so with correct DT_RUNPATH (if libC.so and libD.so are in the same directory, then -Wl,-rpath,\$ORIGIN will work for libC.so as well).
Update:
The problem is that I may not have the source or the properly compiled object files of libC
In that cause you should use RPATH instead of RUNPATH. Unlike the latter, the former applies to the object itself and to all dependencies of that object.
In other words, using --enable-new-dtags is wrong in this case -- you want the opposite.
In this case, there is no solution (besides static linking); correct?
The other solution (besides using RPATH) is to set LD_LIBRARY_PATH in the environment.
Update 2:
Difference between RPATH and RUNPATH
The difference is explained in the ld.so man page:
If a shared object dependency does not contain a slash, then it
is searched for in the following order:
o Using the directories specified in the DT_RPATH dynamic
section attribute of the binary if present and DT_RUNPATH
attribute does not exist. Use of DT_RPATH is deprecated.
...
o Using the directories specified in the DT_RUNPATH dynamic
section attribute of the binary if present. Such directories
are searched only to find those objects required by DT_NEEDED
(direct dependencies) entries and do not apply to those
objects' children, which must themselves have their own
DT_RUNPATH entries. This is unlike DT_RPATH, which is applied
to searches for all children in the dependency tree.

What is BIND_NOW on Linux ELF?

When resolving the functions derived from Shared Libraries I had a problem and realized that the problem is related to BIND_NOW which is one of security features for ELF.
Since there was not enough information online, I couldn't go further at the moment. what is BIND_NOW?
It means to resolve the symbols right while loading the object (instead of when being used, which is called lazy binding)
Quoted from man 3 dlopen:
One of the following two values must be included in flag:
RTLD_LAZY
Perform lazy binding. Only resolve symbols as the code that
references them is executed. If the symbol is never
referenced, then it is never resolved. (Lazy binding is
performed only for function references; references to
variables are always immediately bound when the library is
loaded.)
RTLD_NOW
If this value is specified, or the environment variable
LD_BIND_NOW is set to a nonempty string, all undefined symbols
in the library are resolved before dlopen() returns. If this
cannot be done, an error is returned.
EDITED: Also see this link, which I've found while Googling.

When to use --dynamic option in nm

Sometimes when I do nm on a .so file (for example, libstdc++.so.6), it says no symbols, and I need to use nm --dynamic. But for some other .so files, I can see the symbols without --dynamic.
The doc says:
Display the dynamic symbols rather than the normal symbols. This is only meaningful for dynamic objects, such as certain types of shared libraries.
But it is confusing... what "types" of shared libraries need --dynamic? How is this determined? During the compilation of the library? I thought all shared libraries are dynamic (I mean, can be dynamically loaded during run time), but seems that my understanding is not quite right.
It could very well possible that if your symbol is not exported from your shared library, it would end up in the normal symbol table instead of the dynamic symbol table.
There are many types of symbols in ELF files.
symbols part of the Normal Symbol table. This is the output from mere nm libabc.so or objdump --syms libabc.so. These symbols are only used during static linking.
symbols part of the Dynamic Symbol table. This is the output from nm --dynamic libabc.so or objdump --dynamic-syms libabc.so. Dynamic symbol table is the one used by the run-time linker/loader which binds the symbols between the ELF file which refers them and the ELF file which defines them. It is also used by the static linker, while linking a shared library with an application which requires it. This is the component which helps in showing all the undefined symbol errors during linking.
Hidden symbols - these are the symbols which have been marked using _attribute_ ((visibility("hidden"))). These symbols are not exported outside and can only be used within the library. The visibility is checked during the linking step, and hence is enforced only for shared libraries. The default visibility is public, i.e. the symbols are exported unless otherwise specified. The behavior can be modified using the -fvisibility=default|internal|hidden|protected.
Set the default ELF image symbol visibility to the specified
option—all symbols will be marked with this unless overridden within
the code. Using this feature can very substantially improve linking
and load times of shared object libraries, produce more optimized
code, provide near-perfect API export and prevent symbol clashes. It
is strongly recommended that you use this in any shared objects you
distribute. Despite the nomenclature, default always means public ie;
available to be linked against from outside the shared object.
protected and internal are pretty useless in real-world usage so the
only other commonly used option will be hidden. The default if
-fvisibility isn't specified is default, i.e., make every symbol public—this causes the same behavior as previous versions of GCC.
An overview of these techniques, their benefits and how to use them is
at http://gcc.gnu.org/wiki/Visibility.
To answer your question when would you use the --dynamic option of nm, it would be when you want to list all the symbols exported by your shared library, and the only ones that are available to the ELF images which reference them.
You need to use --dynamic or -D option on a shared library if it is stripped and thus only contains a dynamic symbol table.
You may want to use this option for other shared libraries to explicitly display the dynamic symbol table as this is the table that is consulted by the dynamic linker.
The file utility indicates whether a shared library is stripped or not. Example:
$ file /usr/lib64/libcrypt-nss-2.26.so
[..] ELF 64-bit LSB shared object, x86-64 [..], not stripped
$ file /usr/lib64/libxml2.so.2.9.7
[..] ELF 64-bit LSB shared object, x86-64 [..], stripped
Example for how the different symbol tables may contain different symbols:
$ nm -D /usr/lib64/libcrypt-nss-2.26.so | wc -l
39
$ nm /usr/lib64/libcrypt-nss-2.26.so | wc -l
112

dlopen() .so fails to find symbols in a stripped executable

I have an executable in linux - exe
This executable has some functions in it, that are used throughout the code:
sendMsg
debugPrint
I then want to dynamically load a .so that provides extra functionality to my executable.
In this shared library I include the headers for sendMsg and debugPrint.
I load this shared library with dlopen() and create an API with dlsym().
However, at dlopen() I use RTLD_NOW to resolve all symbols at load time.
It fails stating that it cannot find sendMsg symbol.
This symbol must be in the executable as the sendMsg.c is compiled in there.
However, my executable is stripped by the make process. As such, it would make sense that dlopen cannot find the symbol.
How can i solve this situation?
I could build the shared functions into a static library and link that static library into both exe and the .so. This would increase code size :(
I could remove the stripping of the exe so the symbols can be found
Do some compile time linking magic that I don't know about so the .so knows where the symbols are in exe
man ld:
-E
--export-dynamic
--no-export-dynamic
When creating a dynamically linked executable, using the -E option or the --export-dynamic option causes the linker to add all symbols to the dynamic symbol table. The
dynamic symbol table is the set of symbols which are visible from dynamic objects at run time.
If you do not use either of these options (or use the --no-export-dynamic option to restore the default behavior), the dynamic symbol table will normally contain only those
symbols which are referenced by some dynamic object mentioned in the link.
If you use "dlopen" to load a dynamic object which needs to refer back to the symbols defined by the program, rather than some other dynamic object, then you will probably
need to use this option when linking the program itself.
You can also use the dynamic list to control what symbols should be added to the dynamic symbol table if the output format supports it. See the description of
--dynamic-list.
Note that this option is specific to ELF targeted ports. PE targets support a similar function to export all symbols from a DLL or EXE; see the description of
--export-all-symbols below.
You can also pass the -rdynamic option to gcc/g++ (as noted int the comment). Depending on how you setup your make script, this will be convenient

Main Program and Shared Library initializes same static variable in __static_initialization_and_destruction_0

Does anyone know why a library initialized within dlopen() would initialize a static variable owned by the main program. Both the main program and shared library have a copy of the static variable, but for some reason the shared library re-initializes the main program's copy of the static variable and destructs it, causing a segfault when the main program attempts to destruct it.
Is this a case of bad name mangling in the symbol table?
This is a case where the runtime linker only wants a single active copy of a symbol in a process. If both a shared object and the executable have a copy of the symbol, the runtime linker will resolve all references to one of those.
What you can do to solve this problem is to use symbol reduction using the version command of the link editor when building the shared object. Make sure the symbol for the static variable is not global and you will get the behavior you are looking for.

Resources