Wanted to know in detail about how shared libraries work vis-a-vis static library - shared-libraries

I am working on creating and linking shared library (.so). While working with them, many questions popped up which i could not find satisying answers when i searched for them, hence putting them here. The questions about shared libraries i have are:
1.) How is shared library different than static library? What are the Key differences in way they are created, they execute?
2.) In case of a shared library at what point are the addresses where a particular function in shared library will be loaded and run from, given? Who gives those functions is load/run addresses?
3.) Will an application linked against shared library be slower in execution as compared to that which is linked with a static library?
4.) Will application executable size differ in these two cases?
5.) Can one do source level debugging of by stepping into functions defined inside a shared library? Is any thing extra needed to make these functions visible to the application?
6.) What are pros and cons in using either kind of library?
Thanks.
-AD

See this SO question When to use dynamic vs. static libraries and this HOWTO.

Related

CMake: the right way to deal with static and shared libraries

Let's consider Debian or Ubuntu distro where one can install some library package, say libfoobar, and a corresponding libfoobar-dev. The first one contains shared library object. The later usually contains headers, static library variant and cmake exported targets (say, FoobarTargets.cmake) which allow CMake stuff like find_package to work flawlessly. To do so FoobarTargets.cmake have to contain both targets: for static and for shared variant.
From the other side, I've read many articles stating that I (as a libfoobar's author) should refrain from declaring add_library(foobar SHARED ....) and add_library(foobar_static STATIC ...) in CMakeLists.txt in favour of building and installing library using BUILD_SHARED_LIBS=ON and BUILD_SHARED_LIBS=OFF. But this approach will export only one variant of the library into FoobarTargets.cmake because the later build will overwrite the first one.
So, the question is: how to do it the right way? So that package maintainer would not need to patch library's CMakeLists.txt to properly export both variants from the one side. And to adhere CMake's sense of a true way, removing duplicating targets which differ only by static/shared from the other side?
I wrote an entire blog post about this. A working example is available on GitHub.
The basic idea is that you need to write your FoobarConfig.cmake file in such a way that it loads one of FoobarSharedTargets.cmake or FoobarStaticTargets.cmake in a principled, user-controllable way, that's also tolerant to only one or the other being present. I advocate for the following strategy:
If the find_package call lists exactly one of static or shared among the required components, then load the corresponding set of targets.
If the variable Foobar_SHARED_LIBS is defined, then load the corresponding set of targets.
Otherwise, honor the setting of BUILD_SHARED_LIBS for consistency with FetchContent users.
In all cases, your users will link to Foobar::Foobar.
Ultimately, you cannot have both static and shared imported in the same subdirectory while also providing a consistent build-tree (FetchContent) and install-tree (find-package) interface. But this is not a big deal since usually consumers want only one or the other, and it's totally illegal to link both to a single target.

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".
One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.
I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols
What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

examining text segment in a statically linked executable

I have a statically linked application binary that links against multiple user libraries and the pthread library. The application only uses a limited set of functions from each of these libraries. From a previous post Size of a library and the executable and from my experiments I realize that the linker only includes functions(in the executable) that are used/needed and not the entire contents of library.
I want to find out which functions from each of the respective libraries are linked
to the executable and their addresses (VMA). Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries based on the functions (in the library) that are mapped to the text segment.
One approach to do this is to make a list of functions in the library and then look for each of these functions in the executable and the corresponding Virtual Memory address it is mapped to. But this seems rather tedious to me. Is there a simpler way to accomplish this? Thanks.
I want to find out which functions from each of the respective libraries are linked to the executable and their addresses (VMA).
Add -Wl,-Map=foo.map argument to your link line. The resulting foo.map file will tell you all of the above.
Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries
That assumes that the linker does not re-order functions (and thus all functions from a single library occupy continuous range of text addresses). This assumption is probably true in simple cases, but in no way is guaranteed. See e.g. this patch.

What are the pro and cons of statically linking a library?

I want to release an application I developed as a hobby both for Linux and Windows. This application depends on boost (and possibly other libraries). The norm for this kind of application (a chess engine) is to provide only an executable file and possibly some helper files.
I tough it would be a good idea to statically link the libraries so the executable would not have any dependencies. So the end user can just put the executable in a directory and start using it.
However, while doing some research online I found some negative comments about statically linking libraries, some even arguing that an application with statically linked libraries would be hardly portable, meaning that it would only run on my system of highly similar systems.
So what are the pros and cons of statically linking library?
I already know that the executable will be bigger. But I can't see why it would make my application less portable.
Pros:
No dependencies.
Cons:
Higher memory usage, as the OS can no longer use a shared copy of the library.
If the library needs to be updated, your application needs to be rebuilt. This is doubly important for libraries that then have security fixes.
Of course, a bigger issue for portability is the lack of source code distribution.
Let's say the static library "A" you include has a dependency on function "B". If this dependency can't be fulfilled by the target system, then your program won't run.
But if you're using dynamic linking, the user could maybe install another version of library "A" that uses function "C" instead of "B", so it can run successfully.
If you link the libraries statically, unless you add the smarts to also check the user's system for the libraries you've linked, you're locking your application to use those versions of the libraries until you update your executable. Security holes happen, and updates happen. (For a chess engine there may not be too much issue, but who knows.)
With dynamically linked libraries, if the library say X, you have linked with is not available at the user system, your code crashes ungracefully leaving the end user wondering.
Whereas, in the case of static libraries everything is fused into the executable, so a condition like above mayn't happen, the executable however will be very bulky.
The above problem in dynamically linked libraries can however, be eliminated by dynamic loading.

Any downsides to using statically linked applications on Linux?

I seen several discussions here on the subject, but wanted to ask about my particular situation:
If I have some 3rd part libraries which my application is using, and I'd like to link them together in order to save myself the hassle in LD_LIBRARY, etc., is there any downside to it on Linux, other then larger file size?
Also, is it possible to statically link only some libraries, and other (standard Linux libraries) to link dynamically?
Thanks.
It is indeed possible to dynamically link against some libraries and statically link against others.
It sounds like what you really want to do is dynamically link against the system libraries, and statically link against the nonstandard ones that a user may not have installed (or that different users may have different installations of).
That's perfectly reasonable.
It's not generally a good idea to statically link against system libraries, especially libc.
It can often make sense to statically link against libraries that do not come with the OS and that will not be distributed with your application.
There are some bits of libc - those that use nsswitch - that need to load libraries dynamically. This can cause problems if you want to produce a completely static binary.
Statically linking your 3rd party libraries into your application should be completely fine.
The statically linked binary will be larger than if you had uses a shared library, but I find that disadvantage outweight the library path hassles, provided I control the distribution of all the libraries involved. If you are dependant on a particular distros shared libraries, then you have no choice but to use dynamic linking.
The main disadvantage I see is your application loses any automatic bugfixes that might be applied to a shared library. On the flip-side you don't get new bugs.
Static linking does not just affect the file size of the library, it also affects the memory footprint and start up time of the application. Dynamically linked libraries are loaded once no matter how many programs use them. Statically linked libraries must be loaded once per program that uses them (because they are now part of that program).
To answer your second question, yes, it is possible to have dynamic and static libraries linked to the same application. Just be careful to avoid interlibrary dependencies so you don't have a problem with library order. You should be able to list the libraries in any arbitrary order. Where I work, we prefer to list them alphabetically.
Edit: To link a static library, use the flag -lfoo. To add a directory to the library search path, use -L/path/to/libfoo.
Edit: You don't have to link a dynamic library. Your program can use a function provided by your compiler to open a dynamic library at run time, or you can link it at compile time and the compiler will resolve the symbols but not include them in the binary. See pjc50's comment below.
Statically linking will make your binary bulky, but you wont need to have a shared version of that library on the target runtime environment. This is especially the case while developing embedded apps.

Resources