Haskell Tool Stack and executable size - haskell

I created a Haskell CLI with the Stack tool. I've just successfully set up cross-compilation thanks to Travis, but I don't understand why the executable size is so different between linux (6MB), osx (2MB) and windows (18MB!). How come?
Release: https://github.com/unfog-io/unfog-cli/releases/tag/v0.1.2
Travis conf: https://github.com/unfog-io/unfog-cli/blob/master/.travis.yml
EDIT
When I compress executables with tar.gz, I reduce the difference, but still! I have now linux (1.35MB), osx (0.61MB), windows (3.93MB) (see release)

The difference between the Linux and MacOS builds is probably due to something called "split sections".
Enabling the -split-sections GHC flag on Linux puts each compiled function into its own linker section (instead of the historical approach of placing all functions into a single ".text" section). This allows the linker to drop code that isn't used with a granularity that isn't really possible otherwise.
Unfortunately, all dependencies need to be built with this flag in order to benefit. You can force Stack to rebuild everything properly by adding the following lines to your project's stack.yaml file:
ghc-options:
"$everything": -split-sections
This method of specifying GHC options for dependencies is documented here.
If you rebuild your unfog with this change, it'll actually rebuild "base" and "vector" and everything else from scratch, so it might take a while. But, the resulting binary makes it worth it. Unstripped, its size drops from about 11Meg to 4Meg, and if you strip it, it's only:
-rwxr-xr-x 1 buhr buhr 1494696 Nov 27 19:40 unfog
which is even smaller than the MacOS version you posted.
Now, as I understand it, the reason the original MacOS version is only 2Meg is that the MacOS linker already implements something similar to split sections. I'm not sure if also enabling -split-sections for the MacOS build might give a additional gain, or if -split-sections is an automatic default under MacOS. Anyway, it can't hurt to try it out.
For Windows, the main reason it's so gigantic is that the MinGW GCC toolchain is used to compile Windows binaries, so there's an entire compatibility layer of GNU-ish libraries (libc, libm, libpthread, libgmp, etc.), and -- unlike with the Linux and MacOS builds -- they all get statically linked into the Windows binary. The only dynamic linking for Windows is to standard Windows DLLs.
Note that -split-sections might or might not work on Windows. There are some comments on the bug tracker that make it unclear. Anyway, it might be worth trying out to see if it makes a difference.
Some additional references:
A GHC bug tracker for turning on -split-sections by default
A feature request to support -split-sections in Stack

Related

Handling autoconf with Android after NDK16

I'm trying to update an existing configuration we have we are cross compiling for a number of targets - the question specifically here is about Android. More specifically we are building code using cmake and the hunter package manager. However we are building ICU using a link that uses autoconf/configure, called from cmake. Not sure that is specifically important except that we have less control on the use of configure than is generally the case.
OK: we have a version that builds against an old NDK but I am updating and have hit a problem identified by https://android.googlesource.com/platform/ndk/+/master/docs/UnifiedHeaders.md: with NDK16 and later, the value of the sysroot parameter needs to vary between compilation and linkage. As it stands the configure script tries to build a small program conftest.c - the program fails to link. Manually I can compile the code in two stages using -c and then linking the subsequent .o, but that is not what configure is trying to do.
Now the reality is that when I build this code, I don't actually need to link the code - I am generating a library which is used elsewhere. However that is not currently the way that configure sees it.
I may look to redo the configuration script to just check that the code can be compiled when cross compiling. However I am curious to know if anybody has managed to handle this sort of thing by keeping the existing config files and just changing the parameters by which the scripts are called.
When r19 releases to stable this problem will go away on its own (https://github.com/android-ndk/ndk/issues/780), but since that's still in beta it's not a good solution just yet.
Prior to r19 (this isn't really unique to r16+, this has always been the case and it was just asymptomatic previously), autoconf builds should be done using a standalone toolchain.
You however should not use a standalone toolchain for CMake, so odds are something about your configuration will need to change until r19 is released. Depending on the effort involved, it may make sense to keep to r15 until r19 is available.

Create portable and static fortran linux binary?

I'm investigating options to create portable static Linux binaries from Fortran code (in the sense that the binaries should be able to run on both any new and resonably old Linux distros). If I understand correctly (extrapolating from C) the main issue for portability is that glibc is forwards but not backwards compatible (that is static binaries created on old distros will work on newer but not vice versa). This at least seems to work in my so far limited tests (with one caveat that use of Scratch files causes segfaults running on newer distros in some cases).
It seems at least in C that one can avoid compiling on old distros by adding legacy glibc headers, as described in
https://github.com/wheybags/glibc_version_header
This specific method does not work on Fortran code and compilers, but I would like to know if anyone knows of a similar approach (or more specifically what might be needed to create portable Fortran binaries, is an old glibc enough or must one also use old libfortran etc.)?
I suggest to use the manylinux docker images as a starting point.
In short: manylinux is a "platform definition" to distribute binary wheels (Python packages that may contain compiled code) that run on most current linux systems. The need for manylinux and its definition can be found as Python Enhancement Proposal 513
Their images are based on CentOS 5 and include all the basic development tools, including gfortran. The process for you would be (I did not test and it may require minor adjustments):
Run the docker image from https://github.com/pypa/manylinux
Compile your code with the flag -static-libgfortran
The possible tweak is in the situation that they don't ship the static version of libgfortran in which case you could add it here.
The resulting code should run on most currently-used linux systems.

GCC/G++: building without GNU unique object symbols for older Linux kernels

I am currently working on updating the build system for a large pile of code, which happens to include a Linux C++ project. It would be nice if all of the developers here could run a build when hacking around with their own ideas, so I was examining if it would be possible to build this on vaguely modern Linux systems despite the target system being 2.6.18.
By 'vaguely modern' I am estimating something like GCC 4.5+, something that a distribution in the past year or two might come with. Currently I solve the libstdc++ issue by compiling that in statically, and any glibc issues are neatly worked around by remapping to old versions of the memcpy symbols (and so on) with a quick bit of wrapper code. So far so good.
The one problem I can't seem to completely figure out is that certain symbols built into the executable from the .o files are of type 'u', which is a GNU unique object, an extension to the ELF standard that 2.6.18 doesn't seem to recognise at all. This means the executable won't run because it can't find the symbols, though they are in fact present (just of type '?' on the target, from 'nm').
One can disable the use of GNU unique objects when compiling G++ but it's not exactly the most convenient solution. I can't see any way to just disable it when compiling code (distro gcc/g++ invariably has this option on), and I imagine the only way to get the target system to recognise it would be to update ld-linux and the kernel. That's almost certainly not going to happen.
Is there an option I haven't found to disable these symbol types? Or perhaps is there some neat way around this, or something that I'm missing? I am beginning to suspect it will just have to be compiled on G++ 4.1.x, which will mean an old Linux installation or building that from source.
I was trying to deal with the same problem (which led me to finding this question) and after a bunch of research came to the definitive conclusion that no, you are not missing anything, there is no way around this besides compiling your own g++. See this recent question on the gcc-help mailing list:
http://gcc.gnu.org/ml/gcc-help/2013-01/msg00008.html
I compared gcc sources and found that you can go as high as stock 4.4, as unique symbols were added in 4.5. However on RHEL/CentOS 6 they default to 4.4 but patched unique symbol support into it, so as usual one must beware of distribution-specific gcc versions. For me this is a huge bummer as it means that things compiled on RHEL 6 can't be run on RHEL 5, even with a copy of libstdc++ made just for gcc 4.4 + RHEL 5.
Here's the message where unique symbol support was first proposed, by the way:
https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01240.html
If you search around you'll find that people have complained about it on other lists for various reasons, but I guess it's here to stay.

F# on linux mono with Full Static Compilation

I would like to be able to run code written in F# on a linux system (Debian) but it's unlikely that I'll be able to install Mono on it. Is there any way to compile the F# to be fully static and have absolutely no dependencies on Mono? Basically just end up with an executable binary that I could run just like any other linux binary?
Even on a stripped down account you can compile your own version of Mono - it is not particularly hard, see http://www.mono-project.com/Compiling_Mono. There are a few dependencies, but they aren't hard to find. You will need to prefix most of your run calls with mono though, like mono myapp.exe rather than ./myapp.exe
Try AOT. But be ware of it's limitations.
Update:
I think I've jumped for an answer a bit too fast and haven't dive deep enough to turn it into something useful. AOT will pre-compile code into shared libraries, under the right conditions this may increase performance.
Still, if you have a requirement to not install the mono runtime in the client machine at all (why?), I think you should try mkbundle / mkbundle2. This will produce a huge self contained executable (C# Hello World + deps generated a file around 2.5MB for my machine... With -z I got around 900k). You can try to combine it with Linker to further strip out unused portions of libraries that your application depends on.
As for your second question F# compiler will generate CIL as any other .NET compiler. So, it should not matter. Still, if your application contains either IL instructions that are not yet supported by mono AOT compiler (e.g., you need mkbundle2 to handle generics) or dependencies to external linked libraries that you can't install in your Debian box you are out of lucky. Guess you will have to do a bit of trial and error operations by yourself.

Can autotools create multi-platform makefiles

I have a plugin project I've been developing for a few years where the plugin works with numerous combinations of [primary application version, 3rd party library version, 32-bit vs. 64-bit]. Is there a (clean) way to use autotools to create a single makefile that builds all versions of the plugin.
As far as I can tell from skimming through the autotools documentation, the closest approximation to what I'd like is to have N independent copies of the project, each with its own makefile. This seems a little suboptimal for testing and development as (a) I'd need to continually propagate code changes across all the different copies and (b) there is a lot of wasted space in duplicating the project so many times. Is there a better way?
EDIT:
I've been rolling my own solution for a while where I have a fancy makefile and some perl scripts to hunt down various 3rd party library versions, etc. As such, I'm open to other non-autotools solutions. For other build tools, I'd want them to be very easy for end users to install. The tools also need to be smart enough to hunt down various 3rd party libraries and headers without a huge amount of trouble. I'm mostly looking for a linux solution, but one that also works for Windows and/or the Mac would be a bonus.
If your question is:
Can I use the autotools on some machine A to create a single universal makefile that will work on all other machines?
then the answer is "No". The autotools do not even make a pretense at trying to do that. They are designed to contain portable code that will determine how to create a workable makefile on the target machine.
If your question is:
Can I use the autotools to configure software that needs to run on different machines, with different versions of the primary software which my plugin works with, plus various 3rd party libraries, not to mention 32-bit vs 64-bit issues?
then the answer is "Yes". The autotools are designed to be able to do that. Further, they work on Unix, Linux, MacOS X, BSD.
I have a program, SQLCMD (which pre-dates the Microsoft program of the same name by a decade and more), which works with the IBM Informix databases. It detects the version of the client software (called IBM Informix ESQL/C, part of the IBM Informix ClientSDK or CSDK) is installed, and whether it is 32-bit or 64-bit. It also detects which version of the software is installed, and adapts its functionality to what is available in the supporting product. It supports versions that have been released over a period of about 17 years. It is autoconfigured -- I had to write some autoconf macros for the Informix functionality, and for a couple of other gizmos (high resolution timing, presence of /dev/stdin etc). But it is doable.
On the other hand, I don't try and release a single makefile that fits all customer machines and environments; there are just too many possibilities for that to be sensible. But autotools takes care of the details for me (and my users). All they do is:
./configure
That's easier than working out how to edit the makefile. (Oh, for the first 10 years, the program was configured by hand. It was hard for people to do, even though I had pretty good defaults set up. That was why I moved to auto-configuration: it makes it much easier for people to install.)
Mr Fooz commented:
I want something in between. Customers will use multiple versions and bitnesses of the same base application on the same machine in my case. I'm not worried about cross-compilation such as building Windows binaries on Linux.
Do you need a separate build of your plugin for the 32-bit and 64-bit versions? (I'd assume yes - but you could surprise me.) So you need to provide a mechanism for the user to say
./configure --use-tppkg=/opt/tp/pkg32-1.0.3
(where tppkg is a code for your third-party package, and the location is specifiable by the user.) However, keep in mind usability: the fewer such options the user has to provide, the better; against that, do not hard code things that should be optional, such as install locations. By all means look in default locations - that's good. And default to the bittiness of the stuff you find. Maybe if you find both 32-bit and 64-bit versions, then you should build both -- that would require careful construction, though. You can always echo "Checking for TP-Package ..." and indicate what you found and where you found it. Then the installer can change the options. Make sure you document in './configure --help' what the options are; this is standard autotools practice.
Do not do anything interactive though; the configure script should run, reporting what it does. The Perl Configure script (note the capital letter - it is a wholly separate automatic configuration system) is one of the few intensively interactive configuration systems left (and that is probably mainly because of its heritage; if starting anew, it would most likely be non-interactive). Such systems are more of a nuisance to configure than the non-interactive ones.
Cross-compilation is tough. I've never needed to do it, thank goodness.
Mr Fooz also commented:
Thanks for the extra comments. I'm looking for something like:
./configure --use-tppkg=/opt/tp/pkg32-1.0.3 --use-tppkg=/opt/tp/pkg64-1.1.2
where it would create both the 32-bit and 64-bit targets in one makefile for the current platform.
Well, I'm sure it could be done; I'm not so sure that it is worth doing by comparison with two separate configuration runs with a complete rebuild in between. You'd probably want to use:
./configure --use-tppkg32=/opt/tp/pkg32-1.0.3 --use-tppkg64=/opt/tp/pkg64-1.1.2
This indicates the two separate directories. You'd have to decide how you're going to do the build, but presumably you'd have two sub-directories, such as 'obj-32' and 'obj-64' for storing the separate sets of object files. You'd also arrange your makefile along the lines of:
FLAGS_32 = ...32-bit compiler options...
FLAGS_64 = ...64-bit compiler options...
TPPKG32DIR = #TPPKG32DIR#
TPPKG64DIR = #TPPKG64DIR#
OBJ32DIR = obj-32
OBJ64DIR = obj-64
BUILD_32 = #BUILD_32#
BUILD_64 = #BUILD_64#
TPPKGDIR =
OBJDIR =
FLAGS =
all: ${BUILD_32} ${BUILD_64}
build_32:
${MAKE} TPPKGDIR=${TPPKG32DIR} OBJDIR=${OBJ32DIR} FLAGS=${FLAGS_32} build
build_64:
${MAKE} TPPKGDIR=${TPPKG64DIR} OBJDIR=${OBJ64DIR} FLAGS=${FLAGS_64} build
build: ${OBJDIR}/plugin.so
This assumes that the plugin would be a shared object. The idea here is that the autotool would detect the 32-bit or 64-bit installs for the Third Party Package, and then make substitutions. The BUILD_32 macro would be set to build_32 if the 32-bit package was required and left empty otherwise; the BUILD_64 macro would be handled similarly.
When the user runs 'make all', it will build the build_32 target first and the build_64 target next. To build the build_32 target, it will re-run make and configure the flags for a 32-bit build. Similarly, to build the build_64 target, it will re-run make and configure the flags for a 64-bit build. It is important that all the flags affected by 32-bit vs 64-bit builds are set on the recursive invocation of make, and that the rules for building objects and libraries are written carefully - for example, the rule for compiling source to object must be careful to place the object file in the correct object directory - using GCC, for example, you would specify (in a .c.o rule):
${CC} ${CFLAGS} -o ${OBJDIR}/$*.o -c $*.c
The macro CFLAGS would include the ${FLAGS} value which deals with the bits (for example, FLAGS_32 = -m32 and FLAGS_64 = -m64, and so when building the 32-bit version,FLAGS = -m32would be included in theCFLAGS` macro.
The residual issues in the autotools is working out how to determine the 32-bit and 64-bit flags. If the worst comes to the worst, you'll have to write macros for that yourself. However, I'd expect (without having researched it) that you can do it using standard facilities from the autotools suite.
Unless you create yourself a carefully (even ruthlessly) symmetric makefile, it won't work reliably.
As far as I know, you can't do that. However, are you stuck with autotools? Are neither CMake nor SCons an option?
We tried it and it doesn't work! So we use now SCons.
Some articles to this topic: 1 and 2
Edit:
Some small example why I love SCons:
env.ParseConfig('pkg-config --cflags --libs glib-2.0')
With this line of code you add GLib to the compile environment (env). And don't forget the User Guide which just great to learn SCons (you really don't have to know Python!). For the end user you could try SCons with PyInstaller or something like that.
And in comparison to make, you use Python, so a complete programming language! With this in mind you can do just everything (more or less).
Have you ever considered to use a single project with multiple build directories?
if your automake project is implemented in a proper way (i.e.: NOT like gcc)
the following is possible:
mkdir build1 build2 build3
cd build1
../configure $(YOUR_OPTIONS)
cd build2
../configure $(YOUR_OPTIONS2)
[...]
you are able to pass different configuration parameters like include directories and compilers (cross compilers i.e.).
you can then even run this in a single make call by running
make -C build1 -C build2 -C build3

Resources