linux build system tool

linux build system tool - linux

Our project got pretty big and our build system does not scale anymore. We are doing cross platform development on linux machines. We have too many platforms to build against and even more build options. We believe that we need to upgrade our Makefile based build environment.
These are the requirements (in an ideal world):
Fast (so no libtool)
Can do parallel builds
Cross compile friendly
Ccache integration
Does incremental builds and can short circuit if certain conditions are met (short circuit if a,b,c options have not changed, rebuild if they did)
Easily scriptable (python integration would be perfect)
User friendly syntax
Distributed system. Modules can be developed separately from each other
Can build third party libraries (that use autotools, cmake ..)
Can track dependencies between modules (but flexible enough so that modules can be replaced by alternative external ones).
built-in unit testing support
Large binaries can be stored separately from the version control and can be downloaded if needed
Can keep track of open source licenses
git integration
Are you aware of any tools (or group of tools) that would meet (at least some of) these requirements? Currently I am leaning towards gyp+ninja. But syntax is not very friendly and there is no documentation. So it is a tough sell.

You mention Python integration, so SCons sounds like it would fit the bill. It's entirely based around Python (the build scripts are in fact Python scripts), it is very flexible, and it meets quite a number of your other requirements.
From the web site:
SCons is an Open Source software construction tool—that is, a next-generation build tool. Think of SCons as an improved, cross-platform substitute for the classic Make utility with integrated functionality similar to autoconf/automake and compiler caches such as ccache. In short, SCons is an easier, more reliable and faster way to build software.

As for C++ is concerned, a very good build sysem is CMake.

ninja is not supposed to be used by the end-user, rather by some other high level build tool like CMake. And that is really a good option, especially for large and crossplatform projects. It has no built in python support, but you rarely or never need external scripting using CMake - it has tools for most common tasks.

Related

Why does cabal download and compile from source?

When I make a new project. Say, a web app using Snap.
I generate the skeleton using snap init barebones, make a new sandbox and then install the dependencies.
This takes forever. Seriously. If you have ever worked with pretty much any other web framework (node.js with express, for example), the process is nearly identical but takes a fraction of the time. I'm aware that most node dependencies do not require any compilation but I find it really strange that this isn't considered a bigger problem. For example, I will never be able to run a Yesod app on my cheap VPS because the VPS isn't powerful enough to compile it and I can't really upload 500mb of precompiled libraries.
The question is, why doesn't the repository host binaries instead of just code?
.NET is also compiled (to bytecode) but I can use it's DLLs without any need for recompilation.
There are of course drawbacks of hosting binaries like more storage space needed, multiple binaries per library for multiple OSs... But all the problems seems insignificant to the huge benefits that you get such as
No more compile errors
Much faster setup for new projects
Significantly less memory needed
Knowing that a library doesn't support your OS BEFORE you find out for yourself
I have trouble seeing why cabal hell exists in the first place. If all the libraries were available for dynamic linking, wouldn't the need for recompiling simply not exist at all?
Currently, one has to try really hard to stick with Haskell in these regards. It seems like the system punishes me for trying out things. If I want to add a new library to my project I have to be sure I'm willing to wait for 15-45(!!!) minutes for it to compile. Not to mention that a library fails to compile way more often than I'm comfortable with. After surviving the process, only then can I actually figure out if that library is what I want to use, or if it's even compatible with the rest of my project.

In a nutshell: because native code is hard.
If you want to host binaries for arbitrary systems, you have to match the binaries to each system you want to run on. That may mean compiling dozens of sets of binaries to support all of the systems the code will compile on.
On the other hand, you may well find that someone has compiled the code you need: your distribution provider may well provide packages for the Haskell libraries you need.

Because that's the easiest way to distribute everything while keeping it up to date. By offloading build costs to the users, library authors only need to provide source code.
This can be mitigated in various ways. For example, my CI setup uses CircleCI and Heroku. Nodes on both hold precached cabal sandboxes (it's actually very easy to set up). I build my project on Heroku, but there's no reason why you couldn't take prebuilt artifacts from your CI and deploy them directly.
As for dynamic linking, there's a possibility to link Haskell modules dynamically, but shared libraries more often than not are a source of problems. One look at Windows DLL hell should be enough to see this, and most commercial applications simply ship DLLs they use anyway. If a library changes, the DLLs have to be replaced anyway, and the way Cabal does it makes it simplest to have latest and greatest versions of everything.

First, note that on some platforms, you can in fact install binary libraries. For example, on my OpenSUSE Linux system, YaST will quite happily download and install certain Haskell libraries, without having to build anything from source.
Of course, this only covers a fairly small set of libraries, and all the RPMs will be many months out of date. (Not a big deal for X11, kind of a deal-breaker for something like Yesod that's under heavy development...)
I think another big part of the problem is that if you compile a Haskell library with GHC 7.6.4, then you cannot use that binary compiled library with GHC 7.8.3. So we're not just talking about one compiled binary for each OS; we're talking about one compiled binary for every OS + GHC minor point-release combination.
Oh, and did I mention? If you compile Yesod 1.4.0 against ByteString 0.9.2.0, then that compiled binary is useless if your system has ByteString 0.9.2.1 installed. So you potentially need one compiled binary for every OS, every GHC release, and every release of every library that it transitively depends on.
...This is partly why the Haskell Platform was invented. It's a single binary download that gives you a big heap of code that you don't need to compile from source, and where all the versions of the libraries in it are mutually compatible. (No dependency hell - the Haskell Platform maintainers sort that out for you!)
I do agree that binary packages would be extremely nice to have. But the above problems make it unlikely, IMHO.

Shipping a cross platform desktop application

We are trying to ship a cross platform desktop application to the 3 major platforms (Windows, MacOSX and Linux). On windows, distribution is very common through an exe installer and a dmg on MacOSX. My question is, what to distribute on Linux?
I've seen companies distributing .sh binaries. Is that the best way to ship for Linux?
Thanks

Some companies, like Mozilla, distribute one tar.gz per architecture:
large, statically linked,
not updatable by standard tools,
hard to maintain centrally (in an organization),
dead easy to install
simple to release.
Other companies, like Google, distribute multiple package formats, or at least .rpm and .deb, aimed at major versions of major distributions.
compact, common dependencies are handled by package manager,
uses standard package manager, can be easily centrally maintained,
needs privileges to install,
needs to closely watch compatibility with supported disto releases,
needs complex packaging infrastructure.

Although .rpm and .deb packages are the standard for many
Linux distributions, as mentioned by 9000, you would be forced to
maintain multiple formats and also worry about differences in
different versions of the package managers. For example, RPM packages
targeting 4.x won't work on 3.x:
http://fedora.linuxsir.org/fedoradocs/rpm-guide/en/ch-rpm-evolution.html
and in many cases the way RPM dependencies between distributions are quite big.
If your application is a popular open source app, then it is less of
an issue, because distribution maintainers will take care of creating
packages for it. Things change if you are a commercial or closed
source company, in particular if you also need to support Windows and
OS X
Sometimes it is more convenient to have a single installer that will
work on top of different Linux distributions or in some cases you need more
flexibility when asking required information to your end user before
the actual installation of your application. The shell scripts used by
some companies (internally packing a tar.gz) may workaround this
limitation of the native packages but are still very limited in its UI
and very complex to maintain (most of the logic must be included in
the shell script, must be portable among different shells...). They
also require pre-unpacking the bundled tar.gz.
There are some tools that allow creating binary executable installer,
independent of the Linux version and highly customizable. These tools
have the advantages of being easier to maintain and install but
depending on the tool may require some prerequisites in the target
machine such as Java.
You could try our BitRock InstallBuilder (disclaimer, I'm one of the
developers), that creates binary installers that run in any Linux
version without any external dependencies or pre-requisites. The
generated installers add a minimal size overhead to the packed files
and are really easy to customize. Also, the same project file can be
shared to generate multiple platforms so you could have the three
platforms with very little incremental work. We also support
generating .rpm and .deb packages but as mentioned above you would
lose flexibility. I want to note that we offer free licenses for open
source projects.

Packaging proprietary software for Linux

I'm doing cross-platform development and I want to build a nice, self-contained (!) package for Linux. I know that that's not the way it's usually done, but the application requires all data in one place, so I'm installing it into /opt, like many other proprietary software packages do. I will eventually provide deb and rpm packages, but it will only be .tar.gz for now. The user should extract it somewhere and it should work. I'd rather not have an installer.
First my questions, then the details:
How do other people package proprietary software for Linux?
Are there tools for packaging software including shared libraries?
Now for some details: This is my project's (I call it foo for this purpose) layout:
foo (binary)
config.ini
data
Now in the package, there will be two additional elements:
libs
foo.sh
libs will contain all the shared libraries the project requires, and foo.sh is a script that sets LD_LIBRARY_PATH to include libs. Therefore, the user will execute foo.sh and the program should start.
I have a shell script that packages the software in the following steps:
Create empty directory and copy foo.sh to it
Invoke the build process and make install into the new directory
Copy shared libs from the filesystem
Package everything as .tar.gz
What do you think of this? There are some problems with this approach:
I have to hard code all dependencies twice (once in CMake, once in the packaging script)
I have to define the version number twice (once in the source code, once in the packaging script)
How do you do it?
Edit:
Another question that just came up: How do you determine on which libraries your software depends? I did an ldd foo, but there's an awfull lot. I looked at how WorldOfGoo packages look, and they ship only very few libraries. How can I make assumptions about which library will be present on a user's system and which won't? Just install all targeted distributions into a virtual matine and see what's required?

Generic issues
Your way to package your stuff (with dependent libs) to /opt is how proprietary (and even open-source) software is packaged. It's recommended practice of The Linux Foundation (see my answer to the other question for links).
External libs may be either compiled from scratch and embedded into your build process as a separate step (especially if you modify them), or fetched from packages of some distributions. The second approach is easier, but the first one allows more flexibility.
Note that it's not necessary to include some really low-level libraries (such as glibc, Xorg) into your package. They'd be better left to system vendors to tune, and you may just assume they exist. Moreover, there's an Linux Standard Base, that documents the most important libraries; these libraries exist almost everywhere, and can be trusted.
Note also that if you compile under a newer system, most likely, users of older systems won't be able to use it, while the reverse is not true. So, to reach better compatibility, it might be useful to compile package under a system that's two years older than today.
I just outlined some generic stuff, but I believe that Linux Developers Network website contains more information about packaging and portability.
Packaging
Judging by what I saw in the open-source distribution projects, your script does it the same way distribution vendors package software. Their scripts automatically patch sources, mimic installation of software and package the resultant folders into DEBs and RPMs.
Tar.gz, or course, could also work, but creating, for example, an RPM is not complex enough for you to miss such an opportunity to make life of your users so much easier.
Answering your questions,
Yes, you have to hard-code dependencies twice.
The thing is that when you hardocde them in CMake, you specify them in the other terms than when you specify them in a packaging script. CMake refers to shared libraries and header files, while packaging script refers to packages.
There's no cross-distribution one-to-one relationship between package names and shared libs and headers. It varies through distributions. Therefore, it should be specified twice.
But the package can be easily re-packed by distribution vendors, especially if you strive to packing all dependent libs into it (so there'll be less external dependencies to port). Also, a tool that can port packages from one distribution to another will appear soon (I'll update my answer when it's released).
Yes, you have to specify your version twice.
But the thing is that you may organize your packaging process in such a way that package and software versions never get out-of-sync. Just make the packaging script check out from your repository (or download from your website) exactly the same version that the script will write to package specifications.
Analyzing Dependencies
To analyze dependencies of your software, you may use our open-source, free Linux Application Checker tool. It will report the list of libraries it depends on, show distributions your software is compatible with, and help your application be more portable across distributions. It turns out that sometimes more cross-distribution compatibility can be achieved by little effort, and you don't have to lock yourself into support of just a few selected distributions.

Think long and hard (or ask your product development department) which distributions / architectures you need to support.
Make sure that they fully understand the testing implications.
I expect you will come up with a very short list of supported distributions and architectures.
It really depends on which customers are paying for Linux support. Most people use Redhat Enterprise (on servers) or Centos (which is indistinguishable from a technical perspective).
If you only need to support Redhat, you only need to support RPM, job's a good'un.

What is the prefered way to publish a binary-only application for multiple Linux distributions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I have a closed-source Linux application that I want to distribute. This application is using wxWidgets/GTK so there is a huge list of shared libraries (60+) that this application depends on.
What is the prefered way to publish the application and support the maximum number of distros?
Is it to build the application for each supported distribution and publish them separately? This has the drawback of being complicated to build (a chroot and a build per distro) and will only work on supported distribution.
Is it to add all shared libraries in the installer and use them with the LD_LIBRARY_PATH env variable (like VMware)? This has the drawback of increasing the size of the installer.
Is it to build a completely static application? This is surely not possible as it will break some licenses.
Is it a mix of that or another option? How do most commercial vendors publish their own graphical (preferably GTK-based) application?

You should have a look at the Linux Standard Base. It's designed specifically to help people in your position. It defines an environment that 3rd party application developers can rely upon - so there's set version of libc and other libraries, and certain programs and directories live in known places. All of the main Linux distribution support LSB.
That said, you should still probably package the result specifically for each major distribution - just so that your customers can manage your app with their familiar package management tools.

Basically, there are two ways. You can chose both, if you wish.
The first way is the common way games and such do it. Make a lib/ subdirectory, use LD_LIBARY_PATH and include just about every shared library you need. That ensures a pain-free experience from your user, but does make the installer bigger and probably the memory footprint bigger as well. I would not even attempt to reuse preexisting libraries, as they would tend to disappear as upgrades are made to the system.
The second way is to provide distribution packages. These are generally not that hard to make, and will then integrate nicely with the distributions, and will furthermore seem a lot more welcoming to your customers. The 2 downsides are: You'll need to do this for each distribution (Debian, Ubuntu, SuSE, redhat is probably a good start), and you will need to maintain them: as time goes on, some libraries will no longer be available in a specific version, and thus the user will get dependency problems.

In your installer, check which libraries are installed and then download the binaries for those which aren't.
For additional comfort of your users, if there is no connection to the Internet, have the installer generate a key which you can enter on your website to receive a ZIP archive which you can then feed to the installer.
For utmost comfort, check which libraries are available on the target distro and ask the user to use the standard admin tool to install them. That way, you won't pollute the computer with different versions of the same library.
That said: It might be smarter to put your valuable code into a link library and then provide that as binary blob in a source package. This way, your code is as protected as it would be in a pure binary and users can compile the glue code on their favorite system without you having to worry about stuff.
I mean: How much worth is the part of your code which sets up the UI? How much will you lose when someone steals that?

Linux lib / include organization for cross-compiled libraries?

We are cross-compiling an application for an embedded Linux target under desktop Linux. For testing and other purposes we are using statically linked libraries with our application. The testing library we are using is CMockery.
My question is: Where should the static libraries and include files for CMockery live, given that we are cross-compiling?
If we weren't cross-compiling, things should go in /usr/local/lib.
Some suggestions from our team have been:
/opt/google/lib and /opt/google/include
/opt/embeddedLinuxDistro/usr/local/share/google/lib (and include)
/usr/local/arch/lib (and include)
Any pointers appreciated!

Note: After writing this answer, my summary would be:
Keep anything that is non-standard to the Linux distro you're using separate. In fact keep files for different projects separate even if they share libraries. This will make it much easier to move your files to another machine, to setup multiple complete builds for testing, and most importantly to be able to recreate the build starting from scratch.
The decision is really subjective.
Do you just need one copy of the library for all users?
Does it rarely change?
If your build machine caught fire and you had no backups of that machine, how quickly and easily could you re-build your environment of libraries and cross-compilers?
I ask these questions, because if the library changes often or different users may need different versions, you're better off having it be portable. That is, you can specify in your build where to find the files.
Of your team's suggestions, I would lean towards a path that contains a reference to your project. This will make it easier a year from now (when someone asks you to setup another build machine) to reproduce everything.
Lastly, I wouldn't worry about trying to adhere to "standard" library locations because you're not creating and managing a Linux distribution. Furthermore, most people don't really know anything more than "/usr/lib" and /usr/local/lib" and even the people that know those do not know the difference.
Do what's best for your project no matter what that may be.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string