How to do runtime binding based on CPU capabilities on linux - linux

Is it possible to have a linux library (e.g. "libloader.so") load another library to resolve any external symbols?
I've got a whole bunch of code that gets conditionally compiled for the SIMD level to be supported ( SSE2, AVX, AVX2 ). This works fine if the build platform is the same as the runtime platform. But it hinders reuse across different processor generations.
One thought is to have executable which calls function link to libloader.so that does not directly implement function. Rather, it resolves(binds?) that symbol from another loaded library e.g. libimpl_sse2.so, libimpl_avx2.so or so on depending on cpuflags.
There are hundreds of functions that need to be dynamically bound in this way, so changing the declarations or calling code is not practical.
The program linkage is fairly easy to change. The runtime environment variables could also be changed, but I'd prefer not to.
I've gotten as far as making an executable that builds and starts with unresolved external symbols (UES) via the ld flag --unresolved-symbols=ignore-all. But subsequent loading of the impl lib does not change the value of the UES function from NULL.

Edit: I found out later on that the technique described below will only work under limited circumstances. Specifically, your shared libraries must contain functions only, without any global variables. If there are globals inside the libraries that you want to dispatch to, then you will end up with a runtime dynamic linker error. This occurs because global variables are relocated before shared library constructors are invoked. Thus, the linker needs to resolve those references early, before the dispatching scheme described here has a chance to run.
One way of accomplishing what you want is to (ab)use the DT_SONAME field in your shared library's ELF header. This can be used to alter the name of the file that the dynamic loader (ld-linux-so*) loads at runtime in order to resolve the shared library dependency. This is best explained with an example. Say I compile a shared library libtest.so with the following command line:
g++ test.cc -shared -o libtest.so -Wl,-soname,libtest_dispatch.so
This will create a shared library whose filename is libtest.so, but its DT_SONAME field is set to libtest_dispatch.so. Let's see what happens when we link a program against it:
g++ testprog.cc -o test -ltest
Let's examine the runtime library dependencies for the resulting application binary test:
> ldd test
linux-vdso.so.1 => (0x00007fffcc5fe000)
libtest_dispatch.so => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1e4a55000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1e4e4f000)
Note that instead of looking for libtest.so, the dynamic loader instead wants to load libtest_dispatch.so instead. You can exploit this to implement the dispatching functionality that you want. Here's how I would do it:
Create the various versions of your shared library. I assume that there is some "generic" version that can always be used, with other optimized versions utilized at runtime as appropriate. I would name the generic version with the "plain" library name libtest.so, and name the others however you choose (e.g. libtest_sse2.so, libtest_avx.so, etc.).
When linking the generic version of the library, override its DT_SONAME to something else, like libtest_dispatch.so.
Create a dispatcher library called libtest_dispatch.so. When the dispatcher is loaded at application startup, it is responsible for loading the appropriate implementation of the library. Here's pseudocode for what the implementation of libtest_dispatch.so might look like:
#include <dlfcn.h>
#include <stdlib.h>
// the __attribute__ ensures that this function is called when the library is loaded
__attribute__((constructor)) void init()
{
// manually load the appropriate shared library based upon what the CPU supports
// at runtime
if (avx_is_available) dlopen("libtest_avx.so", RTLD_NOW | RTLD_GLOBAL);
else if (sse2_is_available) dlopen("libtest_sse2.so", RTLD_NOW | RTLD_GLOBAL);
else dlopen("libtest.so", RTLD_NOW | RTLD_GLOBAL);
// NOTE: this is just an example; you should check the return values from
// dlopen() above and handle errors accordingly
}
When linking an application against your library, link it against the "vanilla" libtest.so, the one that has its DT_SONAME overridden to point to the dispatcher library. This makes the dispatching essentially transparent to any application authors that use your library.
This should work as described above on Linux. On Mac OS, shared libraries have an "install name" that is analogous to the DT_SONAME used in ELF shared libraries, so a process very similar to the above could be used instead. I'm not sure about whether something similar could be used on Windows.
Note: There is one important assumption made in the above: ABI compatibility between the various implementations of the library. That is, your library should be designed such that it is safe to link against the most generic version at link time while using an optimized version (e.g. libtest_avx.so) at runtime.

Related

Dynamic linking in a shared object .so file

I want to use an SDK/library that uses glibc2.14. My machine has glibc2.12. I installed glibc2.14 in a separate location. Used the SDK in my executable by using compile option --rpath and it works good.
Now, I want to use the SDK (that uses glibc2.14) in a shared object binary (.so). I tried --rpath and --dynamic-linker options but the shared object is not loaded and it gives me an error during runtime -
/lib64/libc.so.6: version ``GLIBC_2.14'' not found (required by /usr/local/lib/libsdk.so.1).
How do I make the shared object binary look at the glibc2.14?
Now, I want to use the SDK (that uses glibc2.14) in a shared object binary (.so).
You can't.
As this answer explains, ld-linux and libc.so must come from the same build of GLIBC.
Since the ld-linux is determined by the main executable (is hard-coded into it at static link time), it will not match your custom libc.so.6 no matter what you do to compile your .so.
In addition, specifying --dynamic-linker while building .so is pointless: it is the job of ld-linux to load the .so, so by definition ld-linux must already be loaded before any .so is loaded into the process.
P.S. If it were possible to make your .so use a different libc.so.6, the result would be an (almost) immediate crash, as libc.so.6 is not designed to work when multiple copies are loaded into the same process.
Update:
upgrade my OS which I don't think is possible because I am developing the software for a client and getting them to upgrade is not going to be easy. Second would be to ask the SDK supplier to recompile with glibc2.12.
GLIBC-2.14 was released 9 years ago. It's not unreasonable for your supplier to "only" support 2.14 (and later), and it is somewhat unreasonable for your client to run such an old OS.
I think you have 3rd possible option: have the client install newer GLIBC in parallel, and build their main executable with --rpath and --dynamic-linker flags (as you have done). Their binary will then have no problem loading your SDK.

Call dlopen with file descriptor?

I want to open a shared object as a data file and perform a verification check on it. The verification is a signature check, and I sign the shared object. If the verification is successful, I would like to load the currently opened shared object as a proper shared object.
First question: is it possible to call dlopen and load the shared object as a data file during the signature check so that code is not executed? According to the man pages, I don't believe so since I don't see a flag similar to RTLD_DATA.
Since I have the shared object open as a data file, I have the descriptor available. Upon successful verification, I would like to pass the descriptor to dlopen so the dynamic loader loads the shared object properly. I don't want to close the file then re-open it via dlopen because it could introduce a race condition (where the file verified is not the same file opened and executed).
Second question: how does one pass an open file to dlopen using a file descriptor so that dlopen performs customary initialization of the shared object?
On Linux, you probably could dlopen some /proc/self/fd/15 file (for file descriptor 15).
RTLD_DATA does not seems to exist. So if you want it, you have to patch your own dynamic loader. Perhaps doing that within MUSL Libc could be less hard. I still don't understand why you need it.
You have to trust the dlopen-ed plugin somehow (and it will run its constructor functions at dlopen time).
You could analyze the shared object plugin before dlopen-ing it by using some ELF parsing library, perhaps libelf or libbfd (from binutils); but I still don't understand what kind of analysis you want to make (and you really should explain that; in particular what happens if the plugin is indirectly linked to some bad behaving software). In other words you should explain more about your verification step. Notice that a shared object could overwrite itself....
Alternatively, don't use dlopen and just mmap your file (you'll need to parse some ELF and process relocations; see elf(5) and Levine's Linkers and Loaders for details, and look into the source code of your ld.so, e.g. in GNU glibc).
Perhaps using some JIT generation techniques might be useful (you would JIT generate code from some validated data), e.g. with GCCJIT, LLVM, or libjit or asmjit (or even LuaJit or SBCL) etc...
And if you have two file descriptors to the same shared object you probably won't have any race conditions.
An option is to build your ad-hoc static C or C++ source code analyzer (perhaps using some GCC plugin provided by you). That plugin might (with months, or perhaps years, of development efforts) check some property of the user C++ code. Beware of Rice's theorem (limiting the properties of every static source code analyzer). Then your program might (like my manydl.c does, or like RefPerSys will soon do, in mid 2020, or like the obsolete GCC MELT did a few years ago) take as input the user C++ code, run some static analysis on that C++ code (e.g. using your GCC plugin), compile that C++ code into a temporary shared object, and dlopen that shared object. So read Drepper's paper How to Write Shared Libraries.

Which libraries appear in /proc/$PID/pmaps?

On Linux you can inspect /proc/$PID/pmaps to see the libraries loaded by a particular program, and a program can open /proc/self/pmaps to examine the libraries it itself has loaded.
I know pmaps will only contain dynamic libraries, and obviously the kernel can't predict which libraries we might dlopen at a later point, so I expect those aren't included in /proc/self/maps. But I'm unsure of a few other other scenarios:
Are libraries that have been linked at build time but we haven't called any function in yet included? My understanding is the Linux delays linking symbols until the first time they are used, so I'm not sure if they'll show up.
Does pmaps contain all the libraries used recursively? E.g. if I look at each library in pmaps and run ldd on it, and then run ldd on those, ad nauseum, I shouldn't find any new libraries that weren't in the original pmaps? I tried this on a couple binaries and it appears to be so but maybe I'm getting lucky.
Are libraries that have been linked at build time but we haven't called any function in yet included?
Yes: the runtime loader will mmap every library that your executable directly depends on, before your program starts running.
You can find the list of such libraries by running
readelf -d a.out | grep NEEDED
Does pmaps contain all the libraries used recursively?
Yes: if a library that you directly depend on itself depends on some other library, the runtime loader will mmap the recursive dependencies as well.
My understanding is the Linux delays linking symbols until the first time they are used
That is mosty correct for function symbols, but false for data symbols, which can't be resolved lazily.
Also, whether the symbols are resolved lazily or not depends on LD_BIND_NOW environment variable, and on an equivalent setting in the executable dynamic section, controlled by -znow linker flag.
None of that changes the mmap pciture though; if you have a DT_NEEDED entry for foo.so in your dynamic section, then foo.so will be mmaped (and will show in /proc/self/*map*) independent of lazy or non-lazy resolution.
/proc/$pid/maps is not only going to list libraries that are loaded, but also ALL other mapped memory segments.
Read this thread and the article in there:
Understanding Linux /proc/id/maps

are ".o" files "loadable"?

I have been reading John R. Levine's Linkers and Loaders and I read that the properties of an object file will include one or more of the following.
file should be linkable
file should be loadable
file should be executable
Now, considering this example:
#include<stdio.h>
int main() {
printf("testing\n");
return 0;
}
Which I would compile and link with:
$ gcc -c t.c
$ gcc -o t t.o
I tried inspecting t.o using objdump and its type shows up as REL. What all properties does t.o satisfy? I believe that its linkable, non-executable. I would have believed that its not loadable(unless you create an .so file from the .o file); however the type REL means that its supposed to be relocated, and relocation would occur only in the context of loading, so I'm having a confusion here.
My doubts summarized :-
Are ".o" files loadable?
Reading resources regarding the sections present in a ".o", ".so" file - differences etc?
An object file (i.e., a file with the .o extension) is not loadable. This is because it lacks critical information about how to resolve all the symbols within it: in this case, the println symbol in particular would need additional information. (C compilers do not bind library identities into the object files they create, which is occasionally even useful.)
When you link the object file into a shared library (.so), you are adding that binding. Typically, you're also grouping a number of object files together and resolving references between them (plus a few more esoteric things). That then makes the result possible to load, since the loader can then just do resolution of references and loading of dependencies that it doesn't already know about.
Going from there to executable is typically just a matter of adding on the OS-defined program bootstrap. This is a small piece of code that the OS will start the program running by calling, and it typically works by loading the rest of the program and dependencies and then calling main() with information about the arguments. (It's also responsible for exiting cleanly if main returns.)
Just to set the context this link states somethings similar (emphasis for readability only);
A file may be linkable, used as input by a link editor or linking
loader. It my be executable, capable of being loaded into
memory and run as a program, loadable, capable of being loaded
into memory as a library along with a program, or any combination of
the three.
A .o file is a linker object file, which is according to this definition not executable and definitely linkable. Loadable is a tougher call, but since .o files are not loadable without some definitely not cross platform trickery, I'd say the spirit is that it's not loadable.

How tcamalloc gets linked to the main program

I want to know how malloc gets linked to the main program.Basically I have a program which uses several static and dynamic libraries.I am including all these in my makefile using option "-llibName1 -llibName2".
The documentation of TCmalloc says that we can override our malloc simply by calling "LD_PRELOAD=/usr/lib64/libtcmalloc.so".I am not able to understand how tcamlloc gets called to the all these static and dynamic libraries.Also how does tcmalloc also gets linked to STL libraries and new/delete operations of C++?
can anyone please give any insights on this.
"LD_PRELOAD=/usr/lib64/libtcmalloc.so" directs the loader to use libtcmalloc.so before any other shared library when resolving symbols external to your program, and because libtcmalloc.so defines a symbol named "malloc", that is the verison your program will use.
If you omit the LD_PRELOAD line, glibc.so (or whatever C library you have on your system) will be the first shared library to define a symbol named "malloc".
Note also that if you link against a static library which defines a symbol named "malloc" (and uses proper arguments, etc), or another shared library is loaded that defines a symbol named "malloc", your program will attempt to use that version of malloc.
That's the general idea anyway; the actual goings-on is quite interesting and I will have to direct to you http://en.wikipedia.org/wiki/Dynamic_linker as a starting point for more information.

Resources