Create portable and static fortran linux binary? - linux

I'm investigating options to create portable static Linux binaries from Fortran code (in the sense that the binaries should be able to run on both any new and resonably old Linux distros). If I understand correctly (extrapolating from C) the main issue for portability is that glibc is forwards but not backwards compatible (that is static binaries created on old distros will work on newer but not vice versa). This at least seems to work in my so far limited tests (with one caveat that use of Scratch files causes segfaults running on newer distros in some cases).
It seems at least in C that one can avoid compiling on old distros by adding legacy glibc headers, as described in
https://github.com/wheybags/glibc_version_header
This specific method does not work on Fortran code and compilers, but I would like to know if anyone knows of a similar approach (or more specifically what might be needed to create portable Fortran binaries, is an old glibc enough or must one also use old libfortran etc.)?

I suggest to use the manylinux docker images as a starting point.
In short: manylinux is a "platform definition" to distribute binary wheels (Python packages that may contain compiled code) that run on most current linux systems. The need for manylinux and its definition can be found as Python Enhancement Proposal 513
Their images are based on CentOS 5 and include all the basic development tools, including gfortran. The process for you would be (I did not test and it may require minor adjustments):
Run the docker image from https://github.com/pypa/manylinux
Compile your code with the flag -static-libgfortran
The possible tweak is in the situation that they don't ship the static version of libgfortran in which case you could add it here.
The resulting code should run on most currently-used linux systems.

Related

What is the difference between alpine docker image and busybox docker image?

What is the difference between alpine docker image and busybox docker image ?
When I check their dockfiles, alpine is like this (for Alpine v3.12 - 3.12.7)
FROM scratch
ADD alpine-minirootfs-3.12.7-x86_64.tar.gz /
CMD ["/bin/sh"]
busybox is like this
FROM scratch
ADD busybox.tar.xz /
CMD ["sh"]
But as https://alpinelinux.org/about/ says
Alpine Linux is built around musl libc and busybox.
So what is exactly the difference ?
I am also curious that many docker images, (nodejs/nginx/php just name a few) provide images based on alpine but not on busybox. Why is that ? What is use case for busybox image then ? I need to emphasize that I am not looking for an answer about why A is better than B or vise versa or software recommendation.
I have been experiencing intermittent DNS lookup failure for my alpine docker, as here musl-libc - Alpine's Greatest Weakness and here Does Alpine have known DNS issue within Kubernetes? said. That is one of reasons I asked my question.
PS, https://musl.libc.org/ says "musl is an implementation of the C standard library built on top of the Linux system call API" and https://en.wikipedia.org/wiki/Alpine_Linux mentioned
It previously used uClibc as its C standard library instead of the
traditional GNU C Library (glibc) most commonly used. Although it is
more lightweight, it does have the significant drawback of being
binary incompatible with glibc. Thus, all software must be compiled
for use with uClibc to work properly. As of 9 April 2014,[16] Alpine
Linux switched to musl, which is partially binary compatible with
glibc.
The key difference between these is that older versions of the busybox image statically linked busybox against glibc (current versions dynamically link busybox against glibc due to use of libnss even in static configuration), whereas the alpine image dynamically links against musl libc.
Going into the weighting factors used to choose between these in detail would be off-topic here (software recommendation requests), but some key points:
Comparing glibc against musl libc, a few salient points (though there are certainly many other factors as well):
glibc is built for performance and portability over size (often adding special-case performance optimizations that take a large amount of code).
musl libc is built for correctness and size over performance (it's willing to be somewhat slower to have a smaller code size and to run in less RAM); and it's much more aggressive about having correct error reporting (instead of just exiting immediately) in the face of resource exhaustion.
glibc is more widely used, so bugs that manifest against its implementation tend to be caught more quickly. Often, when one is the first person to build a given piece of software against musl, one will encounter bugs (typically in that software, not in musl) or places where the maintainer explicitly chose to use GNU extensions instead of sticking to the libc standard.
glibc is licensed under LGPL terms; only software under GPL-compatible terms can be statically linked against it; whereas musl is under a MIT license, and usable with fewer restrictions.
Comparing the advantages of a static build against a dynamic build:
If your system image will only have a single binary executable (written in C or otherwise using a libc), a static build is always better, as it discards any parts of your libraries that aren't actually used by that one executable.
If your system image is intended to have more binaries added that are written in C, using dynamic linking will keep the overall size down, since it allows those binaries to use the libc that's already there.
If your system image is intended to have more binaries added in a language that doesn't use libc (this can be the case for Go and Rust, f/e), then you don't benefit from dynamic linking; you don't need the unused parts of libc there because you won't be using them anyhow.
Honestly, these two images don't between themselves cover the whole matrix space of possibilities; there are situations where neither of them is optimal. There would be value to having an image with only busybox that statically links against musl libc (if everything you're going to add is in a non-C language), or an image with busybox that dynamically links against glibc (if you're going to add more binaries that need libc and aren't compatible with musl).
When I first asked the question I was not sure about the use case of busybox docker image and my link about busybox dockerfile was not entirely correct.
This was the correct dockerfile link and it explains many things. So busybox provides 3 different versions, build on glibc, musl, uclibc
A more appropriate question is what is the difference between alpine image and busybox image build based on musl? I still don't know the answer, except that alphine image is more actively maintained.
"Use Cases and Tips for Using the BusyBox Docker Official Image" was published Jul 14 2022 (so quite new) and it said "Maintaining the BusyBox image has also been an ongoing priority at Docker."
I still hope to see someone may provide answer about the use case of BusyBox image build on glibc or uclibc
--- update ---
As here discuss package manager for docker container running image busybox:uclibc "Anything based on Busybox doesn't have a package manager. It's a single binary with a bunch of symlinks into it, and the way to add software to it is to write C code and recompile." and here Package manager for Busybox also explained, busybox does NOT have package manager that is probably the reason why most poeple use alpine.

C++ .a: what affects portability across distros?

I'm building a .a from C++ code. It only depends on the standard library (libc++/libstdc++). From general reading, it seems that portability of binaries depends on
compiler version (because it can affect the ABI). For gcc, the ABI is linked to the major version number.
libc++/libstdc++ versions (because they could pass a vector<T> into the .a and its representation could change).
I.e. someone using the .a needs to use the same (major version of) the compiler + same standard library.
As far as I can see, if compiler and standard library match, a .a should work across multiple distros. Is this right? Or is there gubbins relating to system calls, etc., meaning a .a for Ubuntu should be built on Ubuntu, .a for CentOS should be built on CentOS, and so on?
Edit: see If clang++ and g++ are ABI incompatible, what is used for shared libraries in binary? (though it doens't answer this q.)
Edit 2: I am not accessing any OS features explicitly (e.g. via system calls). My only interaction with the system is to open files and read from them.
It only depends on the standard library
It could also depend implicitly upon other things (think of resources like fonts, configuration files under /etc/, header files under /usr/include/, availability of /proc/, of /sys/, external programs run by system(3) or execvp(3), specific file systems or devices, particular ioctl-s, available or required plugins, etc...)
These are kind of details which might make the porting difficult. For example look into nsswitch.conf(5).
The evil is in the details.
(in other words, without a lot more details, your question don't have much sense)
Linux is perceived as a free software ecosystem. The usual way of porting something is to recompile it on -or at least for- the target Linux distribution. When you do that several times (for different and many Linux distros), you'll understand what details are significant in your particular software (and distributions).
Most of the time, recompiling and porting a library on a different distribution is really easy. Sometimes, it might be hard.
For shared libraries, reading Program Library HowTo, C++ dlopen miniHowTo, elf(5), your ABI specification (see here for some incomplete list), Drepper's How To Write Shared Libraries could be useful.
My recommendation is to prepare binary packages for various common Linux distributions. For example, a .deb for Debian & Ubuntu (some particular versions of them).
Of course a .deb for Debian might not work on Ubuntu (sometimes it does).
Look also into things like autoconf (or cmake). You may want at least to have some externally provided #define-d preprocessor strings (often passed by -D to gcc or g++) which would vary from one distribution to the next (e.g. on some distributions, you print by popen-ing lp, on others, by popen-ing lpr, on others by interacting with some CUPS server etc...). Details matter.
My only interaction with the system is to open files
But even these vary a lot from one distribution to another one.
It is probable that you won't be able to provide a single -and the same one- lib*.a for several distributions.
NB: you probably need to budget more work than what you believe.

Different versions of compilers + libgcc on windows encountered

I have a third-party library which depends on libgcc_s_sjlj-1.dll.
My own program is compiled under MSYS2 (mingw-w64) and it depends on libgcc_s_dw2-1.dll.
Please note that the third-party library is pure binaries (no source). Please also note that both libgcc_s_sjlj-1.dll and libgcc_s_dw2-1.dll are 32-bit, so I don't think it's an issue related to architecture.
The outcome is apparent, programs compiled based on libgcc_s_dw2-1.dll can't work with third-party libraries based on libgcc_s_sjlj-1.dll. What I get is a missing entrypoint __gxx_personality_sj0.
I can definitely try to adapt my toolchain to align with the third-party's libgcc_s_sjlj-1.dll, but I do not know how much effort I need to go about doing it. I find no such variant of libgcc dll under MSYS2 using this setjmp/longjmp version. I am even afraid that I need to eliminate the entire toolchain because all the binaries I had under MSYS2 sits atop this libgcc_s_dw2-1.dll module.
My goal is straightforward: I would like to find a solution so that my code will sit on top of libgcc_s_sjlj-1.dll instead of libgcc_s_dw2-1.dll. But I don't know if I am asking a stupid question simply because this is just not possible.
The terms dw2 and sjlj refer to two different types of exception handling that GCC can use on Windows. I don't know the details, but I wouldn't try to link binaries using the different types. Since MSYS2 does not provide an sjlj toolchain, you'll have to find one somewhere else. I would recommend downloading one from the "MingW-W64-builds" project, which you can find listed on this page:
https://mingw-w64.org/doku.php/download
You could use MSYS2 as a Bash shell but you can probably not link to any of its libraries in your program; you would need to recompile all libraries yourself (except for this closed source third-party one).

How to install a single Perl Crypt::OpenSSL::AES for use by different linux environments

I have a sticky problem that I am not quite sure how to solve. The situation is as follows:
We have a common 32bit perl 5.10.0
It is used by both 32bit and 64bit linux machines
Now the problem is that I need to install Crypt::OpenSSL::AES module for the Perl, however since it builds a shared library a lot of problems appear:
If built on 64bit machines the module is not usable with "wrong ELF class: ELFCLASS64" error for the generated AES.so
If built on a 32bit machine the module is not usable on the 64bit with undefined symbol: AES_encrypt
The problem I'm guessing is that the different machines have different versions of OpenSSL installed and they are not compatible with each other.
My question is given that I cannot change any of the machine configurations, what should I do to get the AES module working on all the machines?
Thanks!
I solved the problem with a combination of staticperl and building statically linked Crypt::OpenSSL::AES so that I have a single perl executable that is fully statically linked.
Given that I am not able to modify the environment, this is the best I can come up with.
Perl's default configuration very intentionally puts platform-specific things in a separate directory; you appear to have broken that model. Consider restoring it.
I assume you built your perl on a 32 bit machine, so during the build process, Configure didn't include any of the 'make this 32 bit' compiler switches. If you build on a 64 bit machine now, the build process will use exactly the same switches, so you get a 64 bit binary that cant't be loaded from 32 bit perl - not even on the 64 bit machines, beacuse the 32 bit perl binary you're running there can't load a 64 bit shared library either.
You might try building your shared perl on a 64 bit machine, explicitly stating you want a 32 bit perl. There should be some configure parameters for this. That way, you have a perl that sets the "use 32 bit" compiler flag when building modules. Then, you can use that version of perl on each of the machines to build the module. The modules won't be identical, but each of them will run on its bit size, and your software distribution process could pull the correct module when distributing to a specific machine.
However, the real problem is somewhat behind. I assume someone in your company at some point said "We don't want to be dependent on what the distributions provide, let's build our own perl that we can copy everywhere". This sounds like a good idea, but it is NOT. Different Linux versions use different versions of shared libraries, default directories for configuration files, default path variables etc. The configure process takes care of that and creates a perl binary for exactly your machine. If you copy this to a different machine, it might not find symbols in other versions of shared libraries. It might try to read lib from directories that don't exist there. It might not include a workaround for some bug that was corrected on the machine where you built it, but need the workaround on the older system you copied it to. Or, it might provide a workaround for something that has long been fixed on the newer system, thus wasting CPU time.
So, essentially, creating one perl to copy everywhere will ONLY work well if you build a static perl that includes everything and doesn't need any shared libraries. The standard, shared-library-using perl you compile on one machine, does NOT meet the "behaves the same everywhere i copy it to" request you probably had, because it depends way too much on stuff "around" it.

Determining binary compatibility under linux

What is the best way to determine a pre-compiled binary's dependencies (specifically in regards to glibc and libstdc++ symbols & versions) and then ensure that a target system has these installed?
I have a limitation in that I cannot provide source code to compile on each machine (employer restriction) so the defacto response of "compile on each machine to ensure compatibility" is not suitable. I also don't wish to provide statically compiled binaries -> seems very much a case of using a hammer to open an egg.
I have considered a number of approaches which loosely center around determining the symbols/libraries my executable/library requires through use of commands such as
ldd -v </path/executable>
or
objdump -x </path/executable> | grep UND
and then somehow running a command on the target system to check if such symbols, libraries and versions are provided (not entirely certain how I do this step?).
This would then be followed by some pattern or symbol matching to ensure the correct versions, or greater, are present.
That said, I feel like this will already have been largely done for me and I'm suffering from ... "a knowledge gap ?" of how it is currently implemented.
Any thoughts/suggestions on how to proceed?
I should add that this is for the purposes of installing my software on a wide variety of linux distributions - in particular customised clusters - which may not obey distribution guidelines or standardised packaging methods. The objective being a seamless install.
I wish to accomplish binary compatibility at install time, not at a subsequent runtime, which may occur by a user with insufficient privileges to install dependencies.
Also, as I don't have source code access to all the third party libraries I use and install (specialised maths/engineering libraries) then an in-code solution does not work so well. I suppose I could write a binary that tests whether certain symbols (&versions) are present, but this binary itself would have compatibility issues to run.
I think my solution has to be to compile against older libraries (as mentioned) and install this as well as using the LSB checker (looks promising).
GNU libraries (glibc and libstdc++) support a mechanism called symbol versioning. First of all, these libraries export special symbols used by dynamic linker to resolve the appropriate symbol version (CXXABI_* and GLIBCXX_* in libstdc++, GLIBC_* in glibc). A simple script to the tune of:
nm -D libc.so.6 | grep " A "
will return a list of version symbols which can then be further shell processed to establish the maximum supported libc interface version (same works for libstdc++). From a C code, one has the option to do the same using dlvsym() (first dlopen() the library, then check whether certain minimal version of the symbols you need can be looked up using dlvsym()).
Other options for obtaining a glibc version in runtime include gnu_get_libc_version() and confstr() library calls.
However, the proper use of versioning interface is to write code which explicitly links to a specific glibc/libstdc++ library version. For example, code linking to GLIBC_2.10 interface version is expected to work with any glibc version newer than 2.10 (all versions up to 2.18 and beyond). While it is possible to enable versioning on a per symbol basis (using a ".symver" assembler/linker directive) the more reasonable approach is to set up a chroot environment using older (minimal supported) version of the toolchain and compile the project against it (it will seamlessly run with whatever newer version encountered).
Use Linux Application Checker tool ([1], [2], [3]) to check binary compatibility of your application with various Linux distributions. You can also check compatibility with your custom distribution by this tool.

Resources