Any downsides to using statically linked applications on Linux? - linux

I seen several discussions here on the subject, but wanted to ask about my particular situation:
If I have some 3rd part libraries which my application is using, and I'd like to link them together in order to save myself the hassle in LD_LIBRARY, etc., is there any downside to it on Linux, other then larger file size?
Also, is it possible to statically link only some libraries, and other (standard Linux libraries) to link dynamically?
Thanks.

It is indeed possible to dynamically link against some libraries and statically link against others.
It sounds like what you really want to do is dynamically link against the system libraries, and statically link against the nonstandard ones that a user may not have installed (or that different users may have different installations of).
That's perfectly reasonable.
It's not generally a good idea to statically link against system libraries, especially libc.
It can often make sense to statically link against libraries that do not come with the OS and that will not be distributed with your application.

There are some bits of libc - those that use nsswitch - that need to load libraries dynamically. This can cause problems if you want to produce a completely static binary.
Statically linking your 3rd party libraries into your application should be completely fine.

The statically linked binary will be larger than if you had uses a shared library, but I find that disadvantage outweight the library path hassles, provided I control the distribution of all the libraries involved. If you are dependant on a particular distros shared libraries, then you have no choice but to use dynamic linking.

The main disadvantage I see is your application loses any automatic bugfixes that might be applied to a shared library. On the flip-side you don't get new bugs.

Static linking does not just affect the file size of the library, it also affects the memory footprint and start up time of the application. Dynamically linked libraries are loaded once no matter how many programs use them. Statically linked libraries must be loaded once per program that uses them (because they are now part of that program).

To answer your second question, yes, it is possible to have dynamic and static libraries linked to the same application. Just be careful to avoid interlibrary dependencies so you don't have a problem with library order. You should be able to list the libraries in any arbitrary order. Where I work, we prefer to list them alphabetically.
Edit: To link a static library, use the flag -lfoo. To add a directory to the library search path, use -L/path/to/libfoo.
Edit: You don't have to link a dynamic library. Your program can use a function provided by your compiler to open a dynamic library at run time, or you can link it at compile time and the compiler will resolve the symbols but not include them in the binary. See pjc50's comment below.

Statically linking will make your binary bulky, but you wont need to have a shared version of that library on the target runtime environment. This is especially the case while developing embedded apps.

Related

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".
One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.
I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols
What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!
I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Linux static linking is dead?

In fact, -static gcc flag on Linux doesn't work now. Let me cite from the GNU libc FAQ:
2.22. Even statically linked programs need some shared libraries
which is not acceptable for me. What
can I do?
{AJ} NSS (for details just type `info
libc "Name Service Switch"') won't
work properly without shared
libraries. NSS allows using different
services (e.g. NIS, files, db, hesiod)
by just changing one configuration
file (/etc/nsswitch.conf) without
relinking any programs. The only
disadvantage is that now static
libraries need to access shared
libraries. This is handled
transparently by the GNU C library.
A solution is to configure glibc with
--enable-static-nss. In this case you can create a static binary that will
use only the services dns and files
(change /etc/nsswitch.conf for this).
You need to link explicitly against
all these services. For example:
gcc -static test-netdb.c -o test-netdb \
-Wl,--start-group -lc -lnss_files -lnss_dns -lresolv -Wl,--end-group
The problem with this approach is
that you've got to link every static
program that uses NSS routines with
all those libraries.
{UD} In fact, one cannot say anymore that a libc compiled with this option
is using NSS. There is no switch
anymore. Therefore it is highly
recommended not to use
--enable-static-nss since this makes the behaviour of the programs on the
system inconsistent.
Concerning that fact is there any reasonable way now to create a full-functioning static build on Linux or static linking is completely dead on Linux? I mean static build which:
Behaves exactly the same way as
dynamic build do (static-nss with
inconsistent behaviour is evil!);
Works on reasonable variations of glibc environment and Linux versions;
I think this is very annoying, and I think it is arrogant to call a feature "useless" because it has problems dealing with certain use cases. The biggest problem with the glibc approach is that it hard-codes paths to system libraries (gconv as well as nss), and thus it breaks when people try to run a static binary on a Linux distribution different from the one it was built for.
Anyway, you can work around the gconv issue by setting GCONV_PATH to point to the appropriate location, this allowed me to take binaries built on Ubuntu and run them on Red Hat.
Static linking is back on the rise!
Linus Torvalds is in support of static linking, and expressed concern about the amount of static linking in Linux distributions (see also this discussion).
Many (most?) Go programming language executables are statically linked.
The increased portability and backward compatibility is one reason for them being popular.
Other programming languages have similar efforts to make static linking really easy, for example:
Haskell (I am working on this effort)
Zig (see here for details)
Configurable Linux distributions / package sets like NixOS / nixpkgs make it possible to link a large fraction of their packages statically (for example, its pkgsStatic package set can provide all kinds of statically linked executables).
Static linking can result in better unused-code elimination at link time, making executables smaller.
libcs like musl make static linking easy and correct.
Some big software industry leaders agree on this. For example Google is writing new libc targeted at static linking ("support static non-PIE and static-PIE linking", "we do not intend to invest in at this point [in] dynamic loading and linking support").
Concerning that fact is there any reasonable way now to create a full-functioning static build on Linux or static linking is completely dead on Linux?
I do not know where to find the historic references, but yes, static linking is dead on GNU systems. (I believe it died during the transition from libc4/libc5 to libc6/glibc 2.x.)
The feature was deemed useless in light of:
Security vulnerabilities. Application which was statically linked doesn't even support upgrade of libc. If app was linked on system containing a lib vulnerability then it is going to be perpetuated within the statically linked executable.
Code bloat. If many statically linked applications are ran on the same system, standard libraries wouldn't be reused, since every application contains inside its own copy of everything. (Try du -sh /usr/lib to understand the extent of the problem.)
Try digging LKML and glibc mail list archives from 10-15 years ago. I'm pretty sure long ago I have seen something related on LKML.
Static linking doesn't seem to get much love in the Linux world. Here's my take.
People who do not see the appeal of static linking typically work in the realm of the kernel and lower-level operating system. Many *nix library developers have spent a lifetime dealing with the inevitable issues of trying to link a hundred ever-changing libraries together, a task they do every day. Take a look at autotools if you ever want to know the backflips they are comfortable performing.
But everyone else should not be expected to spend most of their time on this. Static linking will take you a long way towards being buffered from library churn. The developer can upgrade her software's dependencies according to the software's schedule, rather than being forced to do it the moment new library versions appear. This is important for user-facing applications with complex user interfaces that need to control the flux of the many lower-level libraries upon which they inevitably depend. And that's why I will always be a fan of static linking. If you can statically link cross-compiled portable C and C++ code, you have pretty much made the world your oyster, as you can more quickly deliver complex software to a wide range of the world's ever-growing devices.
There's lots to disagree with there, from other perspectives, and it's nice that open source software allows for them all.
Just because you have to dynamically link to the NSS service doesn't mean you can't statically link to any other library. All that FAQ is saying is that even "statically" linked programs have some dynamically-linked libraries. It's not saying that static linking is "impossible" or that it "doesn't work".
Adding on other answers:
Due to the reasons said in the other answers, it's not recommended for most of Linux distributions, but there are actually distributions that are made specifically to run statically linked binaries:
stali
morpheus
starchlinux
bifrost
From stali description:
static linux is based on a hand selected collection of the best tools
for each task and each tool being statically linked (including some X
clients such as st, surf, dwm, dmenu),
It also targets binary size reduction through the avoidance of glibc
and other bloated GNU libraries where possible (early experiments show
that statically linked binaries are usually smaller than their
dynamically linked glibc counterparts!!!). Note, this is pretty much
contrary to what Ulrich Drepper reckons about static linking.
Due to the side-benefit that statically linked binaries start faster,
the distribution also targets performance gains.
Statically linking also helps to for dependency reduction.
You can read more about it in this question about static vs dynamic linking.

What are the pro and cons of statically linking a library?

I want to release an application I developed as a hobby both for Linux and Windows. This application depends on boost (and possibly other libraries). The norm for this kind of application (a chess engine) is to provide only an executable file and possibly some helper files.
I tough it would be a good idea to statically link the libraries so the executable would not have any dependencies. So the end user can just put the executable in a directory and start using it.
However, while doing some research online I found some negative comments about statically linking libraries, some even arguing that an application with statically linked libraries would be hardly portable, meaning that it would only run on my system of highly similar systems.
So what are the pros and cons of statically linking library?
I already know that the executable will be bigger. But I can't see why it would make my application less portable.
Pros:
No dependencies.
Cons:
Higher memory usage, as the OS can no longer use a shared copy of the library.
If the library needs to be updated, your application needs to be rebuilt. This is doubly important for libraries that then have security fixes.
Of course, a bigger issue for portability is the lack of source code distribution.
Let's say the static library "A" you include has a dependency on function "B". If this dependency can't be fulfilled by the target system, then your program won't run.
But if you're using dynamic linking, the user could maybe install another version of library "A" that uses function "C" instead of "B", so it can run successfully.
If you link the libraries statically, unless you add the smarts to also check the user's system for the libraries you've linked, you're locking your application to use those versions of the libraries until you update your executable. Security holes happen, and updates happen. (For a chess engine there may not be too much issue, but who knows.)
With dynamically linked libraries, if the library say X, you have linked with is not available at the user system, your code crashes ungracefully leaving the end user wondering.
Whereas, in the case of static libraries everything is fused into the executable, so a condition like above mayn't happen, the executable however will be very bulky.
The above problem in dynamically linked libraries can however, be eliminated by dynamic loading.

How to create a shared object that is statically linked with pthreads and libstdc++ on Linux/gcc?

How to create a shared object that is statically linked with pthreads and libstdc++ on Linux/gcc?
Before I go to answering your question as it was described, I will note that it is not exactly clear what you are trying to achieve in the end, and there is probably a better solution to your problem.
That said - there are two main problems with trying to do what you described:
One is, that you will need to decompose libpthread and libstdc++ to the object files they are made with. This is because ELF binaries (used on Linux) have two levels of "run time" library loading - even when an executable is statically linked, the loader has to load the statically linked libraries within the binary on execution, and map the right memory addresses. This is done before the shared linkage of libraries that are dynamically loaded (shared objects) and mapped to shared memory. Thus, a shared object cannot be statically linked with such libraries, as at the time the object is loaded, all static linked libraries were loaded already. This is one difference between linking with a static library and a plain object file - a static library is not merely glued like any object file into the executable, but still contains separate tables which are referred to on loading. (I believe that this is in contrast to the much simpler static libraries in MS-DOS and classic Windows, .LIB files, but there may be more to those than I remember).
Of course you do not actually have to decompose libpthread and libstdc++, you can just use the object files generated when building them. Collecting them may be a bit difficult though (look for the objects referred to by the final Makefile rule of those libraries). And you would have to use ld directly and not gcc/g++ to link, to avoid linking with the dynamic versions as well.
The second problem is consequential. If you do the above, you will sure have such a shared object / dynamic library as you asked to build. However, it will not be very useful, as once you try to link a regular executable that uses those libpthread/libstdc++ (the latter being any C++ program) with this shared object, it will fail with symbol conflicts - the symbols of the static libpthread/libstdc++ objects you linked your shared object against will clash with the symbols from the standard libpthread/libstdc++ used by that executable, no matter if it is dynamically or statically linked with the standard libraries.
You could of course then try to either hide all symbols in the static objects from libstdc++/libpthread used by your shared library, make them private in some way, or rename them automatically on linkage so that there will be no conflict. However, even if you get that to work, you will find some undesireable results in runtime, since both libstdc++/libpthread keep quite a bit of state in global variables and structures, which you would now have duplicate and each unaware of the other. This will lead to inconsistencies between these global data and the underlying operating system state such as file descriptors and memory bounds (and perhaps some values from the standard C library such as errno for libstdc++, and signal handlers and timers for libpthread.
To avoid over-broad interpretation, I will add a remark: at times there can be sensible grounds for wanting to statically link against even such basic libraries as libstdc++ and even libc, and even though it is becoming a bit more difficult with recent systems and versions of those libraries (due to a bit of coupling with the loader and special linker tricks used), it is definitely possible - I did it a few times, and know of other cases in which it is still done. However, in that case you need to link a whole executable statically. Static linkage with standard libraries combined with dynamic linkage with other objects is not normally feasible.
Edit: One issue which I forgot to mention but is important to take into account is C++ specific. C++ was unfortunately not designed to work well with the classic model of object linkage and loading (used on Unix and other systems). This makes shared libraries in C++ not really portable as they should be, because a lot of things such as type information and templates are not cleanly separated between objects (often being taken, together with a lot of actual library code at compile time from the headers). libstdc++ for that reason is tightly coupled with GCC, and code compiled with one version of g++ will in general only work with the libstdc++ from with this (or a very similar) version of g++. As you will surely notice if you ever try to build a program with GCC 4 with any non-trivial library on your system that was built with GCC 3, this is not just libstdc++. If your reason for wanting to do that is trying to ensure that your shared object is always linked with the specific versions of libstdc++ and libpthread that it was built against, this would not help because a program that uses a different/incompatible libstdc++ would also be built with an incompatible C++ compiler or version of g++, and would thus fail to link with your shared object anyway, aside from the actual libstdc++ conflicts.
If you wonder "why wasn't this done simpler?", a general rumination worth pondering: For C++ to work nicely with dynamic/shared libraries (meaning compatibility across compilers, and the ability to replace a dynamic library with another version with a compatible interface without rebuilding everything that uses it), not just compiler standartization is needed, but at the level of the operating system's loader, the structure and interface of object and library files and the work of the linker would need to be significantly extended beyond the relatively simple Unix classics used on common operating systems (Microsoft Windows, Mach based systems and NeXTStep relatives such as Mac OS, VMS relatives and some mainframe systems also included) for natively built code today. The linker and dynamic loader would need to be aware of such things as templates and typing, having to some extent functionality of a small compiler to actually adapt the library's code to the type given to it - and (personal subjective observation here) it seems that higher-level intermediate intermediate code (together with higher-level languages and just-in-time compilation) is catching ground faster and likely to be standardized sooner than such extensions to the native object formats and linkers.
You mentioned in a separate comment that you are trying to port a C++ library to an embedded device. (I am adding a new answer here instead of editing my original answer here because I think other StackOverflow users interested in this original question may still be interested in that answer in its context)
Obviously, depending on how stripped down your embedded system is (I have not much embedded Linux experience, so I am not sure what is most likely), you may of course be able to just install the shared libstdc++ on it and dynamically link everything as you would do otherwise.
If dynamically linking with libstdc++ would not be good for you or not work on your system (there are so many different levels of embedded systems that one cannot know), and you need to link against a static libstdc++, then as I said, your only real option is static linking the executable using the library with it and libstdc++. You mentioned porting a library to the embedded device, but if this is for the purpose of using it in some code you write or build on the device and you do not mind a static libstdc++, then linking everything statically (aside from perhaps libc) is probably OK.
If the size of libstdc++ is a problem, and you find that your library is actually only using a small part of its interfaces, then I would nonetheless suggest first trying to determine the actual space you would save by linking against only the parts you need. It may be significant or not, I never looked that deep into libstdc++ and I suspect that it has a lot of internal dependencies, so while you surely do not need some of the interfaces, you may or may not still depend on a big part of its internals - I do not know and did not try, but it may surprise you. You can get an idea by just linking a binary using the library against a static build of it and libstdc++ (not forgetting to strip the binary, of course), and comparing the size of the resulting executable that with the total size of a (stripped) executable dynamically linked together with the full (stripped) shared objects of the library and libstdc++.
If you find that the size difference is significant, but do not want to statically link everything, you try to reduce the size of libstdc++ by rebuilding it without some parts you know that you do not need (there are configure-time options for some parts of it, and you can also try to remove some independent objects at the final creation of libstdc++.so. There are some tools to optimize the size of libraries - search the web (I recall one from a company named MontaVista but do not see it on their web site now, there are some others too).
Other than the straightforward above, some ideas and suggestions to think of:
You mentioned that you use uClibc, which I never fiddled with myself (my experience with embedded programming is a lot more primitive, mostly involving assembly programming for the embedded processor and cross-compiling with minimal embedded libraries). I assume you checked this, and I know that uClibc is intended to be a lightweight but rather full standard C library, but do not forget that C++ code is hardly independent on the C library, and g++ and libstdc++ depend on quite some delicate things (I remember problems with libc on some proprietary Unix versions), so I would not just assume that g++ or the GNU libstdc++ actually works with uClibc without trying - I don't recall seeing it mentioned in the uClibc pages.
Also, if this is an embedded system, think of its performance, compute power, overall complexity, and timing/simplicity/solidity requirements. Take into consideration the complexity involved, and think whether using C++ and threads is appropriate in your embedded system, and if nothing else in the system uses those, whether it is worth introducing for that library. It may be, not knowing the library or system I cannot tell (again, embedded systems being such a wide range nowadays).
And in this case also, just a quick link I stumbled upon looking for uClibc -- if you are working on an embedded system, using uClibc, and want to use C++ code on it -- take a look at uClibc++. I do not know how much of the standard C++ stuff you need and it already supports, and it seems to be an ongoing project, so not clear if it is in a state good enough for you already, but assuming that your work is also under development still, it might be a good alternative to GCC's libstdc++ for your embedded work.
I think this guy explains quite well why that wouldn't make sense. C++ code that uses your shared object but a different libstdc++ would link alright, but wouldn't work.

Resources