Loading multiple shared libraries with different versions - linux

I have an executable on Linux that loads libfoo.so.1 (that's a SONAME) as one of its dependencies (via another shared library). It also links to another system library, which, in turn, links to a system version, libfoo.so.2. As a result, both libfoo.so.1 and libfoo.so.2 are loaded during execution, and code that was supposed to call functions from library with version 1 ends up calling (binary-incompatible) functions from a newer system library with version 2, because some symbols stay the same. The result is usually stack smashing and a subsequent segfault.
Now, the library which links against the older version is a closed-source third-party library, and I can't control what version of libfoo it compiles against. Assuming that, the only other option left is rebuilding a bunch of system libraries that currently link with libfoo.so.2 to link with libfoo.so.1.
Is there any way to avoid replacing system libraries wiith local copies that link to older libfoo? Can I load both libraries and have the code calling correct version of symbols? So I need some special symbol-level versioning?

You may be able to do some version script tricks:
http://sunsite.ualberta.ca/Documentation/Gnu/binutils-2.9.1/html_node/ld_26.html
This may require that you write a wrapper around your lib that pulls in libfoo.so.1 that exports some symbols explicitly and masks all others as local. For example:
MYSYMS {
global:
foo1;
foo2;
local:
*;
};
and use this when you link that wrapper like:
gcc -shared -Wl,--version-script,mysyms.map -o mylib wrapper.o -lfoo -L/path/to/foo.so.1
This should make libfoo.so.1's symbols local to the wrapper and not available to the main exe.

I can only come up with a work-around. Which would be to statically link a version of the "system library" that you are using. For your static build, you could make it link against the same old version as the third-party library. Given that it does not rely on the newer version...
Perhaps it is also possible to avoid these problems with not linking to the third-party library the ordinary way. Instead, your program could load it at execution time. Perhaps then it could be shadowed against the rest. But I don't know much about that.

Related

Shared library symbol conflicts and static linking (on Linux)

I'm encountering an issue which has been elaborated in a good article Shared Library Symbol Conflicts (on Linux). The problem is that when the execution and .so have defined the same name functions, if the .so calls this function name, it would call into that one in execution rather than this one in .so itself.
Let's talk about the case in this article. I understand the DoLayer() function in layer.o has an external function dependency of DoThing() when compiling layer.o.
But when compiling the libconflict.so, shouldn't the external function dependency be resolved in-place and just replaced with the address of conflict.o/DoThing() statically?
Why does the layer.o/DoLayer() still use dynamic linking to find DoThing()? Is this a designed behavior?
Is this a designed behavior?
Yes.
At the time of introduction of shared libraries on UNIX, the goal was to pretend that they work just as if the code was in a regular (archive) library.
Suppose you have foo() defined in both libfoo and libbar, and bar() in libbar calls foo().
The design goal was that cc main.c -lfoo -lbar works the same regardless of whether libfoo and libbar are archive or a shared libraries. The only way to achieve this is to have libbar.so use dynamic linking to resolve call from bar() to foo(), despite having a local version of foo().
This design makes it impossible to create a self-contained libbar.so -- its behavior (which functions it ends up calling) depends on what other functions are linked into the process. This is also the opposite of how Windows DLLs work.
Creating self-contained DSOs was not a consideration at the time, since UNIX was effectively open-source.
You can change the rules with special linker flags, such as -Bsymbolic. But the rules get complicated very quickly, and (since that isn't the default) you may encounter bugs in the linker or the runtime loader.
Yes, this is a designed behavior. When you link a program into a binary, all the references to named external (non-static) functions are resolved to point into the symbol table for the binary. Any shared libraries that are linked against are specified as DT_NEEDED entries.
Then, when you run the binary, the dynamic linker loads each required shared library to a suitable address and resolves each symbol to an address. Sometimes this is done lazily, and sometimes it is done once at first startup. If there are multiple symbols with the same name, one of them will be chosen by the linker, and your program will likely crash since you may not end up with the right one.
Note that this is the behavior on Linux, which has all symbols as a flat namespace. Windows resolves symbols differently, using a tree topology, which has both advantages (fewer conflicts) and disadvantages (the inability to allocate memory in one library and free it in another).
The Linux behavior is very important if you want things like LD_PRELOAD to work. This allows you to use debugging tools like Electric Fence and CPU profiling tools like the Google performance tools, or replace a memory allocator at runtime. None of these things would work if symbols were preferentially resolved to their binary or shared library.
The GNU dynamic linker does support symbol versions, however, so that it's possible to load multiple versions of a shared library into the same program. Oftentimes distros like Debian will do this with libraries they expect to change frequently, like OpenSSL. If the program uses liba which uses OpenSSL 1.0 and libb which uses OpenSSL 1.1, then the program should still function in such a case since OpenSSL has versioned symbols, and each library will use the appropriate version of the relevant symbol.

How to compile with make but also include all dependencies

I'm compiling a C++ program on linux, and I can run make and it all compiles, but when I need to downgrade or change one of it's dependencies for another program, it breaks. I was wondering if it was possible to create a standalone executable, with dependencies bundled inside? There's not many dependencies, so size isn't an issue.
So, what you're asking is, can you link with static versions of libraries (which are included in the program directly) instead of dynamic versions of libraris (shared libraries) which are kept external to your program.
The answer is "yes", but it's not always straightforward. First you have to ensure you actually have the static versions of the libraries installed in your system: the static and dynamic libraries are different files and often the "standard" installation provides only the dynamic library.
If you're already compiling code against those libraries you probably already have the static libraries installed because, at least on GNU/Linux systems, the static libraries are often included in the "dev" packages along with the header files etc. needed to compile code.
To make this work you need to modify your linker command line. If you have a sufficiently new version of the binutils package (which provides the linker), you can change your link line to replace arguments like -lssl -lcrypto with arguments like -l:libssl.a -l:libcrypto.a (don't forget the colon after the -l) and that should do it.

What to do when two shared libraries which uses different versions of the same 3rd party library?

I have a process A which uses two shared libraries: libA.so and libB.so. Because the two libraries were written by different people. Unfortunately libA.so uses version 1.0 of the 3rd party library libD.so. While libB.so uses version 2.0 of the library in static form libD.a. I know that if libA.so and libA.so use libD.so, some errors might happen because of the Global Symbol Interpose. But does this situation have the same problem too?
I know the link flag -Bsymbolic could be used on libA.so or libB.so to force the symbol resolving symbols with the library first. In order to make process A run correctly, both of the two libraries must be linked with this flag, am I right? However, I don't have the source code of libA.so. So I cannot re-link the libA.so again.
To be more general, if one process uses two 3rd party libraries, which contains another same 3rd party library. Will the same thing happen? Is there anything I can do to solve this problem?
This may or may not help you, but given the lack of information I'm hoping it at least sparks an idea or leads you to something similar.
This is an application that allows you to alter your shell settings on a per directory basis:
https://github.com/zimbatm/direnv
It sounds like you actually have an issue that would require you to recompile one of your libraries from source though. That's not ideal, but if there is no build using a compatible thirdparty version you might seek a completely different library to accomplish the original task.

Can I install both shared .so and static .a versions of a library?

My questions is related to this: Creating both static and shared C++ libraries
I'm compiling a library in order to install it in ~/local on two different systems. It seems that every time I do this I end up with linker problems that take me hours to figure out. The specific library I'm looking at is primesieve. In that library, it's the default to build static libraries only. Unfortunately the example code count_primes.cpp wouldn't link with the static version of the library on one of my systems, for whatever reason. Eventually I figured out how to build the shared version and the code now compiles nicely, with no ugly hacks necessary.
Given the above, it seems to be that compiling both static and shared versions is a good idea if you're working with multiple systems and want the best chance of having your code compile. Is this true? Are there reasons not to build both versions? I realize that this is a bit of a subjective question but it's a serious programming issue that I think many people here have probably encountered.
PS.
This is what I ended up using to compile and install both shared and static versions of primesieve to ~/local:
make
make lib
make install PREFIX=~/local
make clean
make lib SHARED=yes
make install PREFIX=~/local
The make clean is because of this. I then added this to my .bash_profile:
export LIBRARY_PATH=$LIBRARY_PATH:~/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/local/lib
export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:~/local/include
Alternatively, without changing the environment variables I was able to compile the example program count_primes.cpp like this:
g++ -I ~/local/include/ -L ~/local/lib/ -lprimesieve count_primes.cpp
To use a static library you can just include it in the compilation as if it were a regular object file, e.g.
g++ -o foo foo.cpp /path/to/mylib.a
Of course, this means static linking too.
You can still statically link with a dynamic library, so there's not much use for static libraries really.
There is no reason not to build both. Neither library will "do" anything. The shared library will only be loaded if it is in a path viable to the dynamic linker (like you did by adding it to your LD library path). The static one won't be used unless you explicitly link against it - but that is only done at compile (link) time.

Remove references in a shared object to another shared object

I want to remove references in a shared object to symbols of another shared object because the referenced object is GPL'd and I don't want to release my code under that license. (I don't think that the symbols provided by the referenced object are used by my code.) What are the steps to take to do this on Linux? I'm not steeped in this technology and it would help if you provided commands. Would another approach be better? Would it be better to somehow create a stub object replacing the referenced object?
Edit 1: I am using PyInstaller to build a self-contained archive containing the binaries for the code I've written along with all libraries that code requires and all libraries those libraries require. These libraries are shared objects that already exist on the build system. It would be too much work to forego PyInstaller and re-compile everything so that the GPL'd libraries are not linked in.
On Linux, the only libraries directly referenced by an executable or library are libc.so, ld-linux.so, linux-gate.so, plus anything you explicitly request on the compiler command line. As such, you can remove these references simply by removing them from the compiler command line.
Note that many times, pkg-config scripts will return all indirect dependencies as well as direct dependencies when queried for linker flags. You can either remove the unnecessary dependencies manually, or pass the -Wl,--as-needed flag to the linker to instruct it to remove unnecessary direct references to shared libraries automatically.
As for the pyinstaller bundling, keep in mind that indirectly linking GPL'd libraries via an intermediate library is already kind of a grey area; if you in addition merge them into a single file, this may not count as 'mere aggregation' and may not avoid the GPL restrictions. Also note that the GPL never mentions 'linking'; it's all about derivative works. I'm no lawyer and this is not legal advice, but the mere addition or removal of a NEEDED entry for a GPL'd library when no symbols are used seems unlikely to affect whether you are in violation of the GPL, when the GPL itself never mentions such a thing.

Resources