Determining binary compatibility under linux - linux

What is the best way to determine a pre-compiled binary's dependencies (specifically in regards to glibc and libstdc++ symbols & versions) and then ensure that a target system has these installed?
I have a limitation in that I cannot provide source code to compile on each machine (employer restriction) so the defacto response of "compile on each machine to ensure compatibility" is not suitable. I also don't wish to provide statically compiled binaries -> seems very much a case of using a hammer to open an egg.
I have considered a number of approaches which loosely center around determining the symbols/libraries my executable/library requires through use of commands such as
ldd -v </path/executable>
or
objdump -x </path/executable> | grep UND
and then somehow running a command on the target system to check if such symbols, libraries and versions are provided (not entirely certain how I do this step?).
This would then be followed by some pattern or symbol matching to ensure the correct versions, or greater, are present.
That said, I feel like this will already have been largely done for me and I'm suffering from ... "a knowledge gap ?" of how it is currently implemented.
Any thoughts/suggestions on how to proceed?
I should add that this is for the purposes of installing my software on a wide variety of linux distributions - in particular customised clusters - which may not obey distribution guidelines or standardised packaging methods. The objective being a seamless install.
I wish to accomplish binary compatibility at install time, not at a subsequent runtime, which may occur by a user with insufficient privileges to install dependencies.
Also, as I don't have source code access to all the third party libraries I use and install (specialised maths/engineering libraries) then an in-code solution does not work so well. I suppose I could write a binary that tests whether certain symbols (&versions) are present, but this binary itself would have compatibility issues to run.
I think my solution has to be to compile against older libraries (as mentioned) and install this as well as using the LSB checker (looks promising).

GNU libraries (glibc and libstdc++) support a mechanism called symbol versioning. First of all, these libraries export special symbols used by dynamic linker to resolve the appropriate symbol version (CXXABI_* and GLIBCXX_* in libstdc++, GLIBC_* in glibc). A simple script to the tune of:
nm -D libc.so.6 | grep " A "
will return a list of version symbols which can then be further shell processed to establish the maximum supported libc interface version (same works for libstdc++). From a C code, one has the option to do the same using dlvsym() (first dlopen() the library, then check whether certain minimal version of the symbols you need can be looked up using dlvsym()).
Other options for obtaining a glibc version in runtime include gnu_get_libc_version() and confstr() library calls.
However, the proper use of versioning interface is to write code which explicitly links to a specific glibc/libstdc++ library version. For example, code linking to GLIBC_2.10 interface version is expected to work with any glibc version newer than 2.10 (all versions up to 2.18 and beyond). While it is possible to enable versioning on a per symbol basis (using a ".symver" assembler/linker directive) the more reasonable approach is to set up a chroot environment using older (minimal supported) version of the toolchain and compile the project against it (it will seamlessly run with whatever newer version encountered).

Use Linux Application Checker tool ([1], [2], [3]) to check binary compatibility of your application with various Linux distributions. You can also check compatibility with your custom distribution by this tool.

Related

Create portable and static fortran linux binary?

I'm investigating options to create portable static Linux binaries from Fortran code (in the sense that the binaries should be able to run on both any new and resonably old Linux distros). If I understand correctly (extrapolating from C) the main issue for portability is that glibc is forwards but not backwards compatible (that is static binaries created on old distros will work on newer but not vice versa). This at least seems to work in my so far limited tests (with one caveat that use of Scratch files causes segfaults running on newer distros in some cases).
It seems at least in C that one can avoid compiling on old distros by adding legacy glibc headers, as described in
https://github.com/wheybags/glibc_version_header
This specific method does not work on Fortran code and compilers, but I would like to know if anyone knows of a similar approach (or more specifically what might be needed to create portable Fortran binaries, is an old glibc enough or must one also use old libfortran etc.)?
I suggest to use the manylinux docker images as a starting point.
In short: manylinux is a "platform definition" to distribute binary wheels (Python packages that may contain compiled code) that run on most current linux systems. The need for manylinux and its definition can be found as Python Enhancement Proposal 513
Their images are based on CentOS 5 and include all the basic development tools, including gfortran. The process for you would be (I did not test and it may require minor adjustments):
Run the docker image from https://github.com/pypa/manylinux
Compile your code with the flag -static-libgfortran
The possible tweak is in the situation that they don't ship the static version of libgfortran in which case you could add it here.
The resulting code should run on most currently-used linux systems.

C++ .a: what affects portability across distros?

I'm building a .a from C++ code. It only depends on the standard library (libc++/libstdc++). From general reading, it seems that portability of binaries depends on
compiler version (because it can affect the ABI). For gcc, the ABI is linked to the major version number.
libc++/libstdc++ versions (because they could pass a vector<T> into the .a and its representation could change).
I.e. someone using the .a needs to use the same (major version of) the compiler + same standard library.
As far as I can see, if compiler and standard library match, a .a should work across multiple distros. Is this right? Or is there gubbins relating to system calls, etc., meaning a .a for Ubuntu should be built on Ubuntu, .a for CentOS should be built on CentOS, and so on?
Edit: see If clang++ and g++ are ABI incompatible, what is used for shared libraries in binary? (though it doens't answer this q.)
Edit 2: I am not accessing any OS features explicitly (e.g. via system calls). My only interaction with the system is to open files and read from them.
It only depends on the standard library
It could also depend implicitly upon other things (think of resources like fonts, configuration files under /etc/, header files under /usr/include/, availability of /proc/, of /sys/, external programs run by system(3) or execvp(3), specific file systems or devices, particular ioctl-s, available or required plugins, etc...)
These are kind of details which might make the porting difficult. For example look into nsswitch.conf(5).
The evil is in the details.
(in other words, without a lot more details, your question don't have much sense)
Linux is perceived as a free software ecosystem. The usual way of porting something is to recompile it on -or at least for- the target Linux distribution. When you do that several times (for different and many Linux distros), you'll understand what details are significant in your particular software (and distributions).
Most of the time, recompiling and porting a library on a different distribution is really easy. Sometimes, it might be hard.
For shared libraries, reading Program Library HowTo, C++ dlopen miniHowTo, elf(5), your ABI specification (see here for some incomplete list), Drepper's How To Write Shared Libraries could be useful.
My recommendation is to prepare binary packages for various common Linux distributions. For example, a .deb for Debian & Ubuntu (some particular versions of them).
Of course a .deb for Debian might not work on Ubuntu (sometimes it does).
Look also into things like autoconf (or cmake). You may want at least to have some externally provided #define-d preprocessor strings (often passed by -D to gcc or g++) which would vary from one distribution to the next (e.g. on some distributions, you print by popen-ing lp, on others, by popen-ing lpr, on others by interacting with some CUPS server etc...). Details matter.
My only interaction with the system is to open files
But even these vary a lot from one distribution to another one.
It is probable that you won't be able to provide a single -and the same one- lib*.a for several distributions.
NB: you probably need to budget more work than what you believe.

Different versions of compilers + libgcc on windows encountered

I have a third-party library which depends on libgcc_s_sjlj-1.dll.
My own program is compiled under MSYS2 (mingw-w64) and it depends on libgcc_s_dw2-1.dll.
Please note that the third-party library is pure binaries (no source). Please also note that both libgcc_s_sjlj-1.dll and libgcc_s_dw2-1.dll are 32-bit, so I don't think it's an issue related to architecture.
The outcome is apparent, programs compiled based on libgcc_s_dw2-1.dll can't work with third-party libraries based on libgcc_s_sjlj-1.dll. What I get is a missing entrypoint __gxx_personality_sj0.
I can definitely try to adapt my toolchain to align with the third-party's libgcc_s_sjlj-1.dll, but I do not know how much effort I need to go about doing it. I find no such variant of libgcc dll under MSYS2 using this setjmp/longjmp version. I am even afraid that I need to eliminate the entire toolchain because all the binaries I had under MSYS2 sits atop this libgcc_s_dw2-1.dll module.
My goal is straightforward: I would like to find a solution so that my code will sit on top of libgcc_s_sjlj-1.dll instead of libgcc_s_dw2-1.dll. But I don't know if I am asking a stupid question simply because this is just not possible.
The terms dw2 and sjlj refer to two different types of exception handling that GCC can use on Windows. I don't know the details, but I wouldn't try to link binaries using the different types. Since MSYS2 does not provide an sjlj toolchain, you'll have to find one somewhere else. I would recommend downloading one from the "MingW-W64-builds" project, which you can find listed on this page:
https://mingw-w64.org/doku.php/download
You could use MSYS2 as a Bash shell but you can probably not link to any of its libraries in your program; you would need to recompile all libraries yourself (except for this closed source third-party one).

GCC: Specifying a Minimum Shared Library Version

Background
I inherited and maintain a Linux shared library that is very closely coupled with specific hardware; let's call it libfoo.so.0.0.0. This library has been around for some time and "just worked". This library has now become a dependency for several higher-layer applications.
Now, unfortunately, new hardware designs have forced me to create symbols with wider types, thereby resulting in libfoo.so.0.1.0. There have been only additions; no deletions or other API changes. The original, narrow versions of the updated symbols still exist in their original form.
Additionally, I have an application (say, myapp) that depends on libfoo. It was originally written to support the 0.0.0 version of the library but has now been reworked to support the new 0.1.0 APIs.
For backwards compatibility reasons, I would like to be able to build myapp for either the old or new library via a compile flag. The kernel that a given build of myapp will be loaded on will always have exactly one version of the library, known at compile time.
The Question
It is very likely that libfoo will be updated again in the future.
When building myapp, is it possible to specify a minimum version of libfoo to link against based on a build flag?
I know it is possible to specify the library name directly on the build CLI. Will this cause myapp to require exactly that version or will later versions of the lib with the same major revision still be able to link against it (ex. libfoo.so.0.2.0)? I am really hoping to not have to update every dependent app's build each time a new minor version is released.
Is there a more intelligent way of accomplishing this in an application-agnostic way?
References
How do you link to a specific version of a shared library in GCC
You are describing external library versioning, where the app is built against libfoo.so.0, libfoo.so.1, etc. Documentation here.
Using external library versioning requires that exactly the same version of libfoo.so.x be present at runtime.
This is generally not the right technique on Linux, which, through the magic of symbol versioning, allows a single libfoo.so.Y to provide multiple incompatible definitions of the same symbol, and thus allows a single library serve both the old and the new applications simultaneously.
In addition, if you are simply always adding new symbols, and are not modifying existing symbols in incompatible way, then there is no reason to increment the external version. Keep libfoo.so at version 0, provide a int foo_version_X_Y; global variable in it (as well as all previous versions: foo_version_1_0, foo_version_1_1, etc.), and have an application binary read the variable that it requres. If an application requires a new symbol foo_version_1_2 and is run with an old library that only provides foo_version_1_1, then the application will fail to start with an obvious error.

Does recompiling a compiler has effects on the compiled code?

I have to install without root access some software (the gromacs simulation package) on a cluster server, on which jobs can be sent through slurm. I only have direct access to the front-end machine, and the home directory is shared among all the servers and front-end. I had to manually build and install locally:
gcc 4.8
automake, autoconf, cmake
openmpi
lapack libs
gromacs
Right now, I have installed all of this only on the front-end, which is an older Intel Xeon machine. The production servers have new AMD processor instead. This is my question: in order to achieve optimal performance, which parts of the aforementioned stack should be recompiled on the production servers? I guess it would make much sense to rebuild the final software (gromacs) and maybe the lapack libs, because of the different instruction sets and processor architecture, but I'm not exactly sure whether it would make any sense to rebuild the compiler or other parts of the system. Hence the question: does using a compiler (and the associated libraries) which have been built on a different machine result in higher execution times for the generated binaries?
In general, I'd expect a compiler to produce the same binaries if given the same output, so the answer would be no; but what about the libraries (as libstdc++) which have been compiled together with the compiler on the other machine?
thank you
In order to optimize gromacs (parallel molecular dynamics code), you can forget about recompiling the compileror the compilation tools: that's useless.
You should go after and check for optimizations. For Intel CPU using the Intel C compiler makes a difference. It's possible you observe some gains with AMDs as well.
Another alternative is to use the Portland Group compiler.
Regarding MPI, you need to be sure it's customized for your interconnect (for example, if you have infiniband, avoid to use the TCP standard version).
regarding lapack libraries, you need to install optimized lapack (ACML for AMDs, MKL for Intels. You can use with very good performance GOTO or ATLAS blas - they are included in many linux distros).
You have not mentioned FFT: they are indeed important for electromagnetics (Ewald summations) in the simulations: FFTW here is a good choice. You need to install the correct version for the processor or compile it on the target processor, because it performs a sort of "auto-tuning" in the compilation process.
Going below than this (tools, compilers) make no difference on the produced executables.
Building the GCC compiler already involves a four-stage bootstrap process, one of whose purposes is to QA the compiler by ensuring the last two stages produce the same output. So there is no reason to believe that a fifth stage will have any effect at all.

Resources