What is the difference between alpine docker image and busybox docker image ?
When I check their dockfiles, alpine is like this (for Alpine v3.12 - 3.12.7)
FROM scratch
ADD alpine-minirootfs-3.12.7-x86_64.tar.gz /
CMD ["/bin/sh"]
busybox is like this
FROM scratch
ADD busybox.tar.xz /
CMD ["sh"]
But as https://alpinelinux.org/about/ says
Alpine Linux is built around musl libc and busybox.
So what is exactly the difference ?
I am also curious that many docker images, (nodejs/nginx/php just name a few) provide images based on alpine but not on busybox. Why is that ? What is use case for busybox image then ? I need to emphasize that I am not looking for an answer about why A is better than B or vise versa or software recommendation.
I have been experiencing intermittent DNS lookup failure for my alpine docker, as here musl-libc - Alpine's Greatest Weakness and here Does Alpine have known DNS issue within Kubernetes? said. That is one of reasons I asked my question.
PS, https://musl.libc.org/ says "musl is an implementation of the C standard library built on top of the Linux system call API" and https://en.wikipedia.org/wiki/Alpine_Linux mentioned
It previously used uClibc as its C standard library instead of the
traditional GNU C Library (glibc) most commonly used. Although it is
more lightweight, it does have the significant drawback of being
binary incompatible with glibc. Thus, all software must be compiled
for use with uClibc to work properly. As of 9 April 2014,[16] Alpine
Linux switched to musl, which is partially binary compatible with
glibc.
The key difference between these is that older versions of the busybox image statically linked busybox against glibc (current versions dynamically link busybox against glibc due to use of libnss even in static configuration), whereas the alpine image dynamically links against musl libc.
Going into the weighting factors used to choose between these in detail would be off-topic here (software recommendation requests), but some key points:
Comparing glibc against musl libc, a few salient points (though there are certainly many other factors as well):
glibc is built for performance and portability over size (often adding special-case performance optimizations that take a large amount of code).
musl libc is built for correctness and size over performance (it's willing to be somewhat slower to have a smaller code size and to run in less RAM); and it's much more aggressive about having correct error reporting (instead of just exiting immediately) in the face of resource exhaustion.
glibc is more widely used, so bugs that manifest against its implementation tend to be caught more quickly. Often, when one is the first person to build a given piece of software against musl, one will encounter bugs (typically in that software, not in musl) or places where the maintainer explicitly chose to use GNU extensions instead of sticking to the libc standard.
glibc is licensed under LGPL terms; only software under GPL-compatible terms can be statically linked against it; whereas musl is under a MIT license, and usable with fewer restrictions.
Comparing the advantages of a static build against a dynamic build:
If your system image will only have a single binary executable (written in C or otherwise using a libc), a static build is always better, as it discards any parts of your libraries that aren't actually used by that one executable.
If your system image is intended to have more binaries added that are written in C, using dynamic linking will keep the overall size down, since it allows those binaries to use the libc that's already there.
If your system image is intended to have more binaries added in a language that doesn't use libc (this can be the case for Go and Rust, f/e), then you don't benefit from dynamic linking; you don't need the unused parts of libc there because you won't be using them anyhow.
Honestly, these two images don't between themselves cover the whole matrix space of possibilities; there are situations where neither of them is optimal. There would be value to having an image with only busybox that statically links against musl libc (if everything you're going to add is in a non-C language), or an image with busybox that dynamically links against glibc (if you're going to add more binaries that need libc and aren't compatible with musl).
When I first asked the question I was not sure about the use case of busybox docker image and my link about busybox dockerfile was not entirely correct.
This was the correct dockerfile link and it explains many things. So busybox provides 3 different versions, build on glibc, musl, uclibc
A more appropriate question is what is the difference between alpine image and busybox image build based on musl? I still don't know the answer, except that alphine image is more actively maintained.
"Use Cases and Tips for Using the BusyBox Docker Official Image" was published Jul 14 2022 (so quite new) and it said "Maintaining the BusyBox image has also been an ongoing priority at Docker."
I still hope to see someone may provide answer about the use case of BusyBox image build on glibc or uclibc
--- update ---
As here discuss package manager for docker container running image busybox:uclibc "Anything based on Busybox doesn't have a package manager. It's a single binary with a bunch of symlinks into it, and the way to add software to it is to write C code and recompile." and here Package manager for Busybox also explained, busybox does NOT have package manager that is probably the reason why most poeple use alpine.
Related
I am trying to understand the concept of Linux From Scratch and would like to know why there are multiple passes for building binutils, gcc etc.
Why do we need pass1 and pass2 separately? Why can't we build the tools in pass 1 and then use them to build gcc , glibc, libstdc++ , etc.
The goal is to ensure that your build is consistent, no matter which compiler you're using to compile your compiler (and thus which bugs that compiler has).
Let's say you're building gcc 4.1 with gcc 3.2 (I'm going to call that gcc 3.2 "stage-0"). The folks who did QA for gcc 4.1 didn't test it to work correctly when built with any compiler other than gcc 4.1 -- hence, the need to first build a stage-1 gcc, and then use that stage-1 to compile a stage-2 compiler, to prevent any bugs in the stage-0 compiler from impacting the final result.
Then, the default compile process for gcc uses the stage-2 compiler to build a stage-3 compiler, and compares the two binaries: Any difference between them can be used as proof of presence of a bug.
(Of course, this is only an effective mechanism to avoid unintended bugs; see the classic Ken Thompson paper Reflections on Trusting Trust for a discussion of how intended bugs can survive this kind of measure).
This goes beyond gcc into the entire toolchain because the same principles apply throughout: If you have any differences in the result between building glibc-x.y on a system running glibc-x.y and a system running glibc-x.(y-1) and you don't do an extra pass to ensure that you're building in a match for your target environment, then reproducing those bugs (and testing proposed fixes) is made far more difficult than would otherwise be the case: Nobody who doesn't have your (typically undisclosed) build environment can necessarily recreate the bug!
I know this query is a bit old, but I have something to add to the answers: a clarification of the meaning of 'bootstrap'.
The primary reason for the multi-stage build is to eliminate every vestige of the build host's programs/config/libs from the resultant software. It's not enough to have fresh software compiled. You also have to avoid any and all references to the host's libraries, the host's kernel interfaces (kernel headers), the host's pkg versions, and all other such dependencies on the host system.
Suppose you happened to be a masochist and wanted to build Debian 4 on Fedora 27 (it should be possible). Simply building the software would pull in references to 27's libraries and other things. And your resultant system would not run because those things are not available when the final system is installed.
LFS eases the process somewhat by building simple x86-to-x86 binutils and gcc cross tools in Stage 1, then installing the headers for the kernel to be used in the final system, then glibc. Stage 2 (binutils and gcc) is built using the cross tools, which guarantees that the host's programs/libs/config are not used at all. The rest of the toolchain (I call it Stage 3) is built using the tools from Stage 2. Now the final stage can be built (with a few small adjustments) with the assurance that no part of the build host will be referenced or used, and that no part of the toolchain will be referenced or used. The final stage is built using a path much like PATH=/bin:/usr/bin:/tools/bin; thus as the final tools are built, they will be used instead of those in the toolchain.
Building a toolchain is not for the impatient. It took me months to update Smoothwall Express' build system and the pkgs used, because building a toolchain is fraught with peril. I battled many dragons, balrocs, and dwarfs. I referenced LFS often to figure out how they did it. The result is an automated re-entrant build system that builds the entire distro with no references to the host system. I primarily build it on Debian 8, but it's been known to build on Gentoo, and it supposed to be able to build on itself.
I'm investigating options to create portable static Linux binaries from Fortran code (in the sense that the binaries should be able to run on both any new and resonably old Linux distros). If I understand correctly (extrapolating from C) the main issue for portability is that glibc is forwards but not backwards compatible (that is static binaries created on old distros will work on newer but not vice versa). This at least seems to work in my so far limited tests (with one caveat that use of Scratch files causes segfaults running on newer distros in some cases).
It seems at least in C that one can avoid compiling on old distros by adding legacy glibc headers, as described in
https://github.com/wheybags/glibc_version_header
This specific method does not work on Fortran code and compilers, but I would like to know if anyone knows of a similar approach (or more specifically what might be needed to create portable Fortran binaries, is an old glibc enough or must one also use old libfortran etc.)?
I suggest to use the manylinux docker images as a starting point.
In short: manylinux is a "platform definition" to distribute binary wheels (Python packages that may contain compiled code) that run on most current linux systems. The need for manylinux and its definition can be found as Python Enhancement Proposal 513
Their images are based on CentOS 5 and include all the basic development tools, including gfortran. The process for you would be (I did not test and it may require minor adjustments):
Run the docker image from https://github.com/pypa/manylinux
Compile your code with the flag -static-libgfortran
The possible tweak is in the situation that they don't ship the static version of libgfortran in which case you could add it here.
The resulting code should run on most currently-used linux systems.
I'm building a .a from C++ code. It only depends on the standard library (libc++/libstdc++). From general reading, it seems that portability of binaries depends on
compiler version (because it can affect the ABI). For gcc, the ABI is linked to the major version number.
libc++/libstdc++ versions (because they could pass a vector<T> into the .a and its representation could change).
I.e. someone using the .a needs to use the same (major version of) the compiler + same standard library.
As far as I can see, if compiler and standard library match, a .a should work across multiple distros. Is this right? Or is there gubbins relating to system calls, etc., meaning a .a for Ubuntu should be built on Ubuntu, .a for CentOS should be built on CentOS, and so on?
Edit: see If clang++ and g++ are ABI incompatible, what is used for shared libraries in binary? (though it doens't answer this q.)
Edit 2: I am not accessing any OS features explicitly (e.g. via system calls). My only interaction with the system is to open files and read from them.
It only depends on the standard library
It could also depend implicitly upon other things (think of resources like fonts, configuration files under /etc/, header files under /usr/include/, availability of /proc/, of /sys/, external programs run by system(3) or execvp(3), specific file systems or devices, particular ioctl-s, available or required plugins, etc...)
These are kind of details which might make the porting difficult. For example look into nsswitch.conf(5).
The evil is in the details.
(in other words, without a lot more details, your question don't have much sense)
Linux is perceived as a free software ecosystem. The usual way of porting something is to recompile it on -or at least for- the target Linux distribution. When you do that several times (for different and many Linux distros), you'll understand what details are significant in your particular software (and distributions).
Most of the time, recompiling and porting a library on a different distribution is really easy. Sometimes, it might be hard.
For shared libraries, reading Program Library HowTo, C++ dlopen miniHowTo, elf(5), your ABI specification (see here for some incomplete list), Drepper's How To Write Shared Libraries could be useful.
My recommendation is to prepare binary packages for various common Linux distributions. For example, a .deb for Debian & Ubuntu (some particular versions of them).
Of course a .deb for Debian might not work on Ubuntu (sometimes it does).
Look also into things like autoconf (or cmake). You may want at least to have some externally provided #define-d preprocessor strings (often passed by -D to gcc or g++) which would vary from one distribution to the next (e.g. on some distributions, you print by popen-ing lp, on others, by popen-ing lpr, on others by interacting with some CUPS server etc...). Details matter.
My only interaction with the system is to open files
But even these vary a lot from one distribution to another one.
It is probable that you won't be able to provide a single -and the same one- lib*.a for several distributions.
NB: you probably need to budget more work than what you believe.
I am trying to understand the concept of Linux From Scratch and would like to know why there are multiple passes for building binutils, gcc etc.
Why do we need pass1 and pass2 separately? Why can't we build the tools in pass 1 and then use them to build gcc , glibc, libstdc++ , etc.
The goal is to ensure that your build is consistent, no matter which compiler you're using to compile your compiler (and thus which bugs that compiler has).
Let's say you're building gcc 4.1 with gcc 3.2 (I'm going to call that gcc 3.2 "stage-0"). The folks who did QA for gcc 4.1 didn't test it to work correctly when built with any compiler other than gcc 4.1 -- hence, the need to first build a stage-1 gcc, and then use that stage-1 to compile a stage-2 compiler, to prevent any bugs in the stage-0 compiler from impacting the final result.
Then, the default compile process for gcc uses the stage-2 compiler to build a stage-3 compiler, and compares the two binaries: Any difference between them can be used as proof of presence of a bug.
(Of course, this is only an effective mechanism to avoid unintended bugs; see the classic Ken Thompson paper Reflections on Trusting Trust for a discussion of how intended bugs can survive this kind of measure).
This goes beyond gcc into the entire toolchain because the same principles apply throughout: If you have any differences in the result between building glibc-x.y on a system running glibc-x.y and a system running glibc-x.(y-1) and you don't do an extra pass to ensure that you're building in a match for your target environment, then reproducing those bugs (and testing proposed fixes) is made far more difficult than would otherwise be the case: Nobody who doesn't have your (typically undisclosed) build environment can necessarily recreate the bug!
I know this query is a bit old, but I have something to add to the answers: a clarification of the meaning of 'bootstrap'.
The primary reason for the multi-stage build is to eliminate every vestige of the build host's programs/config/libs from the resultant software. It's not enough to have fresh software compiled. You also have to avoid any and all references to the host's libraries, the host's kernel interfaces (kernel headers), the host's pkg versions, and all other such dependencies on the host system.
Suppose you happened to be a masochist and wanted to build Debian 4 on Fedora 27 (it should be possible). Simply building the software would pull in references to 27's libraries and other things. And your resultant system would not run because those things are not available when the final system is installed.
LFS eases the process somewhat by building simple x86-to-x86 binutils and gcc cross tools in Stage 1, then installing the headers for the kernel to be used in the final system, then glibc. Stage 2 (binutils and gcc) is built using the cross tools, which guarantees that the host's programs/libs/config are not used at all. The rest of the toolchain (I call it Stage 3) is built using the tools from Stage 2. Now the final stage can be built (with a few small adjustments) with the assurance that no part of the build host will be referenced or used, and that no part of the toolchain will be referenced or used. The final stage is built using a path much like PATH=/bin:/usr/bin:/tools/bin; thus as the final tools are built, they will be used instead of those in the toolchain.
Building a toolchain is not for the impatient. It took me months to update Smoothwall Express' build system and the pkgs used, because building a toolchain is fraught with peril. I battled many dragons, balrocs, and dwarfs. I referenced LFS often to figure out how they did it. The result is an automated re-entrant build system that builds the entire distro with no references to the host system. I primarily build it on Debian 8, but it's been known to build on Gentoo, and it supposed to be able to build on itself.
I have few queries related to BusyBox & embutils.
Can anyone brief me on the differences between BusyBox and embutils?
It would also be great if you can illustrate scenarios in which these respective tools are being used in embedded Linux environment?
An extract of Building embedded Linux Systems book :
embutils was written by the author of diet libc with very similar goals.
Though it supports many of the most common Unix commands, embutils is still far from being as exhaustive as BusyBox. For example, version 0.18 still lacks find, grep, ifconfig, ps and route.
In Contrast to BusyBox however, embutils must be statically linked with diet libc. It can't be linked to any other library. Because diet libc is already very small, This can make embutils a better choice than BusyBox when just a few binaries are needed, because the overall size is smaller.
Be aware that the embutils last version is about 6 years old, so not well maintained compared to BusyBox. As usual, choice depends of your needs, goals and contraints.