I have to install without root access some software (the gromacs simulation package) on a cluster server, on which jobs can be sent through slurm. I only have direct access to the front-end machine, and the home directory is shared among all the servers and front-end. I had to manually build and install locally:
gcc 4.8
automake, autoconf, cmake
openmpi
lapack libs
gromacs
Right now, I have installed all of this only on the front-end, which is an older Intel Xeon machine. The production servers have new AMD processor instead. This is my question: in order to achieve optimal performance, which parts of the aforementioned stack should be recompiled on the production servers? I guess it would make much sense to rebuild the final software (gromacs) and maybe the lapack libs, because of the different instruction sets and processor architecture, but I'm not exactly sure whether it would make any sense to rebuild the compiler or other parts of the system. Hence the question: does using a compiler (and the associated libraries) which have been built on a different machine result in higher execution times for the generated binaries?
In general, I'd expect a compiler to produce the same binaries if given the same output, so the answer would be no; but what about the libraries (as libstdc++) which have been compiled together with the compiler on the other machine?
thank you
In order to optimize gromacs (parallel molecular dynamics code), you can forget about recompiling the compileror the compilation tools: that's useless.
You should go after and check for optimizations. For Intel CPU using the Intel C compiler makes a difference. It's possible you observe some gains with AMDs as well.
Another alternative is to use the Portland Group compiler.
Regarding MPI, you need to be sure it's customized for your interconnect (for example, if you have infiniband, avoid to use the TCP standard version).
regarding lapack libraries, you need to install optimized lapack (ACML for AMDs, MKL for Intels. You can use with very good performance GOTO or ATLAS blas - they are included in many linux distros).
You have not mentioned FFT: they are indeed important for electromagnetics (Ewald summations) in the simulations: FFTW here is a good choice. You need to install the correct version for the processor or compile it on the target processor, because it performs a sort of "auto-tuning" in the compilation process.
Going below than this (tools, compilers) make no difference on the produced executables.
Building the GCC compiler already involves a four-stage bootstrap process, one of whose purposes is to QA the compiler by ensuring the last two stages produce the same output. So there is no reason to believe that a fifth stage will have any effect at all.
Related
I am trying to understand the concept of Linux From Scratch and would like to know why there are multiple passes for building binutils, gcc etc.
Why do we need pass1 and pass2 separately? Why can't we build the tools in pass 1 and then use them to build gcc , glibc, libstdc++ , etc.
The goal is to ensure that your build is consistent, no matter which compiler you're using to compile your compiler (and thus which bugs that compiler has).
Let's say you're building gcc 4.1 with gcc 3.2 (I'm going to call that gcc 3.2 "stage-0"). The folks who did QA for gcc 4.1 didn't test it to work correctly when built with any compiler other than gcc 4.1 -- hence, the need to first build a stage-1 gcc, and then use that stage-1 to compile a stage-2 compiler, to prevent any bugs in the stage-0 compiler from impacting the final result.
Then, the default compile process for gcc uses the stage-2 compiler to build a stage-3 compiler, and compares the two binaries: Any difference between them can be used as proof of presence of a bug.
(Of course, this is only an effective mechanism to avoid unintended bugs; see the classic Ken Thompson paper Reflections on Trusting Trust for a discussion of how intended bugs can survive this kind of measure).
This goes beyond gcc into the entire toolchain because the same principles apply throughout: If you have any differences in the result between building glibc-x.y on a system running glibc-x.y and a system running glibc-x.(y-1) and you don't do an extra pass to ensure that you're building in a match for your target environment, then reproducing those bugs (and testing proposed fixes) is made far more difficult than would otherwise be the case: Nobody who doesn't have your (typically undisclosed) build environment can necessarily recreate the bug!
I know this query is a bit old, but I have something to add to the answers: a clarification of the meaning of 'bootstrap'.
The primary reason for the multi-stage build is to eliminate every vestige of the build host's programs/config/libs from the resultant software. It's not enough to have fresh software compiled. You also have to avoid any and all references to the host's libraries, the host's kernel interfaces (kernel headers), the host's pkg versions, and all other such dependencies on the host system.
Suppose you happened to be a masochist and wanted to build Debian 4 on Fedora 27 (it should be possible). Simply building the software would pull in references to 27's libraries and other things. And your resultant system would not run because those things are not available when the final system is installed.
LFS eases the process somewhat by building simple x86-to-x86 binutils and gcc cross tools in Stage 1, then installing the headers for the kernel to be used in the final system, then glibc. Stage 2 (binutils and gcc) is built using the cross tools, which guarantees that the host's programs/libs/config are not used at all. The rest of the toolchain (I call it Stage 3) is built using the tools from Stage 2. Now the final stage can be built (with a few small adjustments) with the assurance that no part of the build host will be referenced or used, and that no part of the toolchain will be referenced or used. The final stage is built using a path much like PATH=/bin:/usr/bin:/tools/bin; thus as the final tools are built, they will be used instead of those in the toolchain.
Building a toolchain is not for the impatient. It took me months to update Smoothwall Express' build system and the pkgs used, because building a toolchain is fraught with peril. I battled many dragons, balrocs, and dwarfs. I referenced LFS often to figure out how they did it. The result is an automated re-entrant build system that builds the entire distro with no references to the host system. I primarily build it on Debian 8, but it's been known to build on Gentoo, and it supposed to be able to build on itself.
I'm investigating options to create portable static Linux binaries from Fortran code (in the sense that the binaries should be able to run on both any new and resonably old Linux distros). If I understand correctly (extrapolating from C) the main issue for portability is that glibc is forwards but not backwards compatible (that is static binaries created on old distros will work on newer but not vice versa). This at least seems to work in my so far limited tests (with one caveat that use of Scratch files causes segfaults running on newer distros in some cases).
It seems at least in C that one can avoid compiling on old distros by adding legacy glibc headers, as described in
https://github.com/wheybags/glibc_version_header
This specific method does not work on Fortran code and compilers, but I would like to know if anyone knows of a similar approach (or more specifically what might be needed to create portable Fortran binaries, is an old glibc enough or must one also use old libfortran etc.)?
I suggest to use the manylinux docker images as a starting point.
In short: manylinux is a "platform definition" to distribute binary wheels (Python packages that may contain compiled code) that run on most current linux systems. The need for manylinux and its definition can be found as Python Enhancement Proposal 513
Their images are based on CentOS 5 and include all the basic development tools, including gfortran. The process for you would be (I did not test and it may require minor adjustments):
Run the docker image from https://github.com/pypa/manylinux
Compile your code with the flag -static-libgfortran
The possible tweak is in the situation that they don't ship the static version of libgfortran in which case you could add it here.
The resulting code should run on most currently-used linux systems.
I'm building a .a from C++ code. It only depends on the standard library (libc++/libstdc++). From general reading, it seems that portability of binaries depends on
compiler version (because it can affect the ABI). For gcc, the ABI is linked to the major version number.
libc++/libstdc++ versions (because they could pass a vector<T> into the .a and its representation could change).
I.e. someone using the .a needs to use the same (major version of) the compiler + same standard library.
As far as I can see, if compiler and standard library match, a .a should work across multiple distros. Is this right? Or is there gubbins relating to system calls, etc., meaning a .a for Ubuntu should be built on Ubuntu, .a for CentOS should be built on CentOS, and so on?
Edit: see If clang++ and g++ are ABI incompatible, what is used for shared libraries in binary? (though it doens't answer this q.)
Edit 2: I am not accessing any OS features explicitly (e.g. via system calls). My only interaction with the system is to open files and read from them.
It only depends on the standard library
It could also depend implicitly upon other things (think of resources like fonts, configuration files under /etc/, header files under /usr/include/, availability of /proc/, of /sys/, external programs run by system(3) or execvp(3), specific file systems or devices, particular ioctl-s, available or required plugins, etc...)
These are kind of details which might make the porting difficult. For example look into nsswitch.conf(5).
The evil is in the details.
(in other words, without a lot more details, your question don't have much sense)
Linux is perceived as a free software ecosystem. The usual way of porting something is to recompile it on -or at least for- the target Linux distribution. When you do that several times (for different and many Linux distros), you'll understand what details are significant in your particular software (and distributions).
Most of the time, recompiling and porting a library on a different distribution is really easy. Sometimes, it might be hard.
For shared libraries, reading Program Library HowTo, C++ dlopen miniHowTo, elf(5), your ABI specification (see here for some incomplete list), Drepper's How To Write Shared Libraries could be useful.
My recommendation is to prepare binary packages for various common Linux distributions. For example, a .deb for Debian & Ubuntu (some particular versions of them).
Of course a .deb for Debian might not work on Ubuntu (sometimes it does).
Look also into things like autoconf (or cmake). You may want at least to have some externally provided #define-d preprocessor strings (often passed by -D to gcc or g++) which would vary from one distribution to the next (e.g. on some distributions, you print by popen-ing lp, on others, by popen-ing lpr, on others by interacting with some CUPS server etc...). Details matter.
My only interaction with the system is to open files
But even these vary a lot from one distribution to another one.
It is probable that you won't be able to provide a single -and the same one- lib*.a for several distributions.
NB: you probably need to budget more work than what you believe.
I am trying to understand the concept of Linux From Scratch and would like to know why there are multiple passes for building binutils, gcc etc.
Why do we need pass1 and pass2 separately? Why can't we build the tools in pass 1 and then use them to build gcc , glibc, libstdc++ , etc.
The goal is to ensure that your build is consistent, no matter which compiler you're using to compile your compiler (and thus which bugs that compiler has).
Let's say you're building gcc 4.1 with gcc 3.2 (I'm going to call that gcc 3.2 "stage-0"). The folks who did QA for gcc 4.1 didn't test it to work correctly when built with any compiler other than gcc 4.1 -- hence, the need to first build a stage-1 gcc, and then use that stage-1 to compile a stage-2 compiler, to prevent any bugs in the stage-0 compiler from impacting the final result.
Then, the default compile process for gcc uses the stage-2 compiler to build a stage-3 compiler, and compares the two binaries: Any difference between them can be used as proof of presence of a bug.
(Of course, this is only an effective mechanism to avoid unintended bugs; see the classic Ken Thompson paper Reflections on Trusting Trust for a discussion of how intended bugs can survive this kind of measure).
This goes beyond gcc into the entire toolchain because the same principles apply throughout: If you have any differences in the result between building glibc-x.y on a system running glibc-x.y and a system running glibc-x.(y-1) and you don't do an extra pass to ensure that you're building in a match for your target environment, then reproducing those bugs (and testing proposed fixes) is made far more difficult than would otherwise be the case: Nobody who doesn't have your (typically undisclosed) build environment can necessarily recreate the bug!
I know this query is a bit old, but I have something to add to the answers: a clarification of the meaning of 'bootstrap'.
The primary reason for the multi-stage build is to eliminate every vestige of the build host's programs/config/libs from the resultant software. It's not enough to have fresh software compiled. You also have to avoid any and all references to the host's libraries, the host's kernel interfaces (kernel headers), the host's pkg versions, and all other such dependencies on the host system.
Suppose you happened to be a masochist and wanted to build Debian 4 on Fedora 27 (it should be possible). Simply building the software would pull in references to 27's libraries and other things. And your resultant system would not run because those things are not available when the final system is installed.
LFS eases the process somewhat by building simple x86-to-x86 binutils and gcc cross tools in Stage 1, then installing the headers for the kernel to be used in the final system, then glibc. Stage 2 (binutils and gcc) is built using the cross tools, which guarantees that the host's programs/libs/config are not used at all. The rest of the toolchain (I call it Stage 3) is built using the tools from Stage 2. Now the final stage can be built (with a few small adjustments) with the assurance that no part of the build host will be referenced or used, and that no part of the toolchain will be referenced or used. The final stage is built using a path much like PATH=/bin:/usr/bin:/tools/bin; thus as the final tools are built, they will be used instead of those in the toolchain.
Building a toolchain is not for the impatient. It took me months to update Smoothwall Express' build system and the pkgs used, because building a toolchain is fraught with peril. I battled many dragons, balrocs, and dwarfs. I referenced LFS often to figure out how they did it. The result is an automated re-entrant build system that builds the entire distro with no references to the host system. I primarily build it on Debian 8, but it's been known to build on Gentoo, and it supposed to be able to build on itself.
I'm looking for the best way to develop and package different variants of a library with different compile settings but for the same ABI and then selecting the best fit at runtime. In more concrete terms, I'd like a NEON and non-NEON armeabi-v7a build.
The native library has a public C interface that third parties link to. They seem to need to link to one of the variants to prevent link errors, but I'd like to load the alternative variant at runtime if it's a better fit for the device, and have the runtime loader do the correct relocations.
From what I see so far it seems I need to give both variants the same file name, so need to put them in different folders. Subfolders under the abi folder don't seem to get copied by the package installation process so that approach doesn't work. The best suggestion I've seen so far is to manually copy one variant from the res folder to a known device path and to call System.loadLibrary() with a full path. Reference: https://groups.google.com/forum/#!topic/android-ndk/zu_dmcmUlMo
Is this still the best/recommended approach?
How will this interact with the binary translation done on non-arm devices? (Although I can supply an x86 build, some third parties may leave it out of their apk).
I'm assuming cpufeatures on a device using binary translation will not report the cpu family as ARM, so my proposed solution would be to build a standard armeabi-v7a library in the normal way (which I guess will get binary translated), and ship a NEON-supporting library in res/raw. Then at runtime if cpufeatures reports an ARM CPU with NEON support then copy out that library and call loadLibrary with the full path. Can anyone see any problems with that approach?
If you explicitly want to have two different builds of a lib, then yes, it's probably the best compromise.
First off - do note that many libraries that can use NEON can be built with those parts runtime-enabled so that you can have a normal ARMv7 build which doesn't strictly require NEON but can enable those codepaths at runtime if detected - e.g. libav/FFmpeg do that, and the same goes for many other similar libraries. This allows you to have one single ARMv7 binary that fully utilizes NEON where applicable, while still works on the few ARMv7 devices without NEON.
If you're trying to use compiler autovectorization, or if this is a library where the NEON routines aren't easily confined to restricted parts that are enabled at runtime (or hoping to gain extra performance by building the whole library with NEON enabled), your approach sounds sane.
Keep in mind that you want to have at least one native library that is packaged "normally" (which you seem to have, but which has been an issue in e.g. https://stackoverflow.com/a/29329413/3115956). On installation, the installer picks the best match of the bundled architectures and only extracts the libs from that one, and runs the process in that mode. On devices with multiple ABIs (32 and 64 bit), this is essential since if the process is started in a different mode it's too late to switch mode once you try to load a library in a different form.
On an x86 device that emulates ARM binaries, at least the cpufeatures library will return ARM if the process is running in ARM mode. If you use system properties to find the primary and secondary ABIs, you won't know which of them the current process is using though.
EDIT: x86 devices with binary translation actually seem to be able to load an armeabi library even if the same process already has loaded some bundled x86 libraries as well. So apparently this translation is done on a per library basis, not like 32 vs 64 bit, where a certain mode is chosen for the process at startup, which excludes loading any libraries of the other variant.