When I make a new project. Say, a web app using Snap.
I generate the skeleton using snap init barebones, make a new sandbox and then install the dependencies.
This takes forever. Seriously. If you have ever worked with pretty much any other web framework (node.js with express, for example), the process is nearly identical but takes a fraction of the time. I'm aware that most node dependencies do not require any compilation but I find it really strange that this isn't considered a bigger problem. For example, I will never be able to run a Yesod app on my cheap VPS because the VPS isn't powerful enough to compile it and I can't really upload 500mb of precompiled libraries.
The question is, why doesn't the repository host binaries instead of just code?
.NET is also compiled (to bytecode) but I can use it's DLLs without any need for recompilation.
There are of course drawbacks of hosting binaries like more storage space needed, multiple binaries per library for multiple OSs... But all the problems seems insignificant to the huge benefits that you get such as
No more compile errors
Much faster setup for new projects
Significantly less memory needed
Knowing that a library doesn't support your OS BEFORE you find out for yourself
I have trouble seeing why cabal hell exists in the first place. If all the libraries were available for dynamic linking, wouldn't the need for recompiling simply not exist at all?
Currently, one has to try really hard to stick with Haskell in these regards. It seems like the system punishes me for trying out things. If I want to add a new library to my project I have to be sure I'm willing to wait for 15-45(!!!) minutes for it to compile. Not to mention that a library fails to compile way more often than I'm comfortable with. After surviving the process, only then can I actually figure out if that library is what I want to use, or if it's even compatible with the rest of my project.
In a nutshell: because native code is hard.
If you want to host binaries for arbitrary systems, you have to match the binaries to each system you want to run on. That may mean compiling dozens of sets of binaries to support all of the systems the code will compile on.
On the other hand, you may well find that someone has compiled the code you need: your distribution provider may well provide packages for the Haskell libraries you need.
Because that's the easiest way to distribute everything while keeping it up to date. By offloading build costs to the users, library authors only need to provide source code.
This can be mitigated in various ways. For example, my CI setup uses CircleCI and Heroku. Nodes on both hold precached cabal sandboxes (it's actually very easy to set up). I build my project on Heroku, but there's no reason why you couldn't take prebuilt artifacts from your CI and deploy them directly.
As for dynamic linking, there's a possibility to link Haskell modules dynamically, but shared libraries more often than not are a source of problems. One look at Windows DLL hell should be enough to see this, and most commercial applications simply ship DLLs they use anyway. If a library changes, the DLLs have to be replaced anyway, and the way Cabal does it makes it simplest to have latest and greatest versions of everything.
First, note that on some platforms, you can in fact install binary libraries. For example, on my OpenSUSE Linux system, YaST will quite happily download and install certain Haskell libraries, without having to build anything from source.
Of course, this only covers a fairly small set of libraries, and all the RPMs will be many months out of date. (Not a big deal for X11, kind of a deal-breaker for something like Yesod that's under heavy development...)
I think another big part of the problem is that if you compile a Haskell library with GHC 7.6.4, then you cannot use that binary compiled library with GHC 7.8.3. So we're not just talking about one compiled binary for each OS; we're talking about one compiled binary for every OS + GHC minor point-release combination.
Oh, and did I mention? If you compile Yesod 1.4.0 against ByteString 0.9.2.0, then that compiled binary is useless if your system has ByteString 0.9.2.1 installed. So you potentially need one compiled binary for every OS, every GHC release, and every release of every library that it transitively depends on.
...This is partly why the Haskell Platform was invented. It's a single binary download that gives you a big heap of code that you don't need to compile from source, and where all the versions of the libraries in it are mutually compatible. (No dependency hell - the Haskell Platform maintainers sort that out for you!)
I do agree that binary packages would be extremely nice to have. But the above problems make it unlikely, IMHO.
Related
Yesterday I learnt about a new Haskell tool called Stack. At the first blush, it looks like it does much the same job as Cabal. So, what is the difference between them? Is stack a replacement for Cabal? In which cases should I use Stack instead of Cabal? What can Stack do that Cabal can't?
Is stack a replacement for Cabal?
Yes and No.
In which cases should I use Stack instead of Cabal? What can Stack do that Cabal can't?
Stack uses the curated stackage packages by default. That being so, any dependencies are known to build together, avoiding version conflict problems (which, back when they were commonplace in the Haskell experience, used to be known as "cabal hell"). Recent versions of Cabal also have measures in place to prevent conflict. Still, setting up a reproducible build configuration in which you know exactly what will be pulled from the repositories is more straightforward with Stack. Note that there is also provision for using non stackage packages, so you are good to go even if a package isn't present in the stackage snapshot.
Personally, I like Stack and would recommend every Haskell developers to use it. Their development is fast. And it has a much better UX. And there are things which Stack does which Cabal yet doesn't provide:
Stack even downloads GHC for you and keeps it in an isolated location.
Docker support (which is very convenient for deploying your Haskell applications)
Reproducible Haskell script: You can pinpoint version of a package and can get guarantee that it will always execute without any problem. (Cabal also has a script feature, but fully ensuring reproducibility with it is not quite as straightforward.)
Ability to do stack build --fast --file-watch. This will automatically rebuild if you change the local files present. Using it along with --pedantic option is a deal-breaker for me.
Stack supports creating projects using templates. It also supports your own custom templates.
Stack has built-in hpack support in it. It provides an alternative (IMO, a better) way of writing cabal files using yaml file which is more widely used in the industry.
Intero has a smooth experience when working with Stack.
There is a nice blog post explaining the difference: Why is Stack not Cabal? While Cabal has, in the intervening years since that post, evolved so as to overcome some of the issues discussed there, the discussion of the design goals and philosophy behind Stack remains relevant.
In what follows, I will refer to the two tools being compared as cabal-install and stack. In particular, I will use cabal-install to avoid confusion with the Cabal library, which is common infrastructure used by both tools.
Broadly speaking, we can say cabal-install and stack are frontends to Cabal. Both tools make it possible to build Haskell projects whose sets of dependencies might conflict with each other within the confines of a single system. The key difference between them lies in how they address this goal:
By default, cabal-install will, when asked to build a project, look at the dependencies specified in its .cabal file and use a dependency solver to figure out a set of packages and package versions that satisfy it. This set is drawn from Hackage as a whole -- all packages and all versions, past and present. Once a feasible build plan is found, the chosen version of the dependencies will be installed and indexed in a database somewhere in ~/.cabal. Version conflicts between dependencies are avoided by indexing the installed packages according to their versions (as well as other relevant configuration options), so that different projects can retrieve the dependency versions they need without stepping on each other's toes. This arrangement is what the cabal-install documentation means by "Nix-style local builds".
When asked to build a project, stack will, rather than going to Hackage, look at the resolver field of stack.yaml. In the default workflow, that field specifies a Stackage snapshot, which is a subset of Hackage packages with fixed versions that are known to be mutually compatible. stack will then attempt to satisfy the dependencies specified in the .cabal file (or possibly the project.yaml file -- different format, same role) using only what is provided by the snapshot. Packages installed from each snapshot are registered in separate databases, which do not interfere with each other.
We might say that the stack approach trades some setup flexibility for straightforwardness when it comes to specifying a build configuration. In particular, if you know that your project uses, say, the LTS 15.3 snapshot, you can go to its Stackage page and know, at a glance, the versions of any dependency stack might pull from Stackage. That said, both tools offer features that go beyond the basic workflows so that, by and large, each can do all that the other does (albeit possibly in a less convenient manner). For instance, there are ways to freeze exact versions of a known good build configuration and to solve dependencies with an old state of Hackage with cabal-install, and it is possible to require non-Stackage dependencies or override snapshot package versions while using stack.
Lastly, another difference between cabal-install and stack which is big enough to be worth mentioning in this overview is that stack aims at providing a complete build environment, with features such as automatic GHC installation management and Docker integration. In contrast, cabal-install is meant to be orthogonal to other parts of the ecosystem, and so it doesn't attempt to provide this sort of feature (in particular, GHC versions have to be installed and managed separately, for instance through the ghcup tool).
From what I can glean from the FAQ, it seems that Stack uses the Cabal library, but not the cabal.exe binary (more correctly known as cabal-install). It looks like the aim of the project is automatic sandboxing and avoidance of dependency hell.
In other words, it uses the same Cabal package structure, it just provides a different front-end for managing this stuff. (I think!)
I need to build an application running on an embedded vendor supplied version of linux. According to the documentation it has libc version 2.8.90. I have built a simple application in C++ on a desktop and copied the binary across to the hardware along with copies of the libraries it is linked to. In order to remove any potential conflicts of linking against different versions of libraries I considered attempting to link to libraries statically. After some research I found the following question and answers and after reading through it gave the impression that linking statically is not a good thing to do. What I could not find here (or anywhere else so far) was a simple explanation of why this seems to be frowned upon. It would seem to me (pretty much a novice to linux) to be a way of solving my problem of bundling my executable as a single package and running it on my hardware but clearly it seems to be considered a bad idea but can someone please explain why??
Obviously I am aware that it would cause bloating of my binary but I am not worried about that. Additionally, I am aware of the licensing issues, but I am not concerned with that aspect of things particularly. This is not a commercial application so I do not think that it applies to me.
The advantages are, as you expect, a single binary that works without having to install the other dependencies and which you can easily move around.
The disadvantages are the size and the need to recompile the entire application if there's an update (e.g. a security fix) to the linked library and perhaps licensing issues (as you've noted).
Tradeoffs. If it solves your problem, go for it.
Our project got pretty big and our build system does not scale anymore. We are doing cross platform development on linux machines. We have too many platforms to build against and even more build options. We believe that we need to upgrade our Makefile based build environment.
These are the requirements (in an ideal world):
Fast (so no libtool)
Can do parallel builds
Cross compile friendly
Ccache integration
Does incremental builds and can short circuit if certain conditions are met (short circuit if a,b,c options have not changed, rebuild if they did)
Easily scriptable (python integration would be perfect)
User friendly syntax
Distributed system. Modules can be developed separately from each other
Can build third party libraries (that use autotools, cmake ..)
Can track dependencies between modules (but flexible enough so that modules can be replaced by alternative external ones).
built-in unit testing support
Large binaries can be stored separately from the version control and can be downloaded if needed
Can keep track of open source licenses
git integration
Are you aware of any tools (or group of tools) that would meet (at least some of) these requirements? Currently I am leaning towards gyp+ninja. But syntax is not very friendly and there is no documentation. So it is a tough sell.
You mention Python integration, so SCons sounds like it would fit the bill. It's entirely based around Python (the build scripts are in fact Python scripts), it is very flexible, and it meets quite a number of your other requirements.
From the web site:
SCons is an Open Source software construction tool—that is, a next-generation build tool. Think of SCons as an improved, cross-platform substitute for the classic Make utility with integrated functionality similar to autoconf/automake and compiler caches such as ccache. In short, SCons is an easier, more reliable and faster way to build software.
As for C++ is concerned, a very good build sysem is CMake.
ninja is not supposed to be used by the end-user, rather by some other high level build tool like CMake. And that is really a good option, especially for large and crossplatform projects. It has no built in python support, but you rarely or never need external scripting using CMake - it has tools for most common tasks.
I'm doing cross-platform development and I want to build a nice, self-contained (!) package for Linux. I know that that's not the way it's usually done, but the application requires all data in one place, so I'm installing it into /opt, like many other proprietary software packages do. I will eventually provide deb and rpm packages, but it will only be .tar.gz for now. The user should extract it somewhere and it should work. I'd rather not have an installer.
First my questions, then the details:
How do other people package proprietary software for Linux?
Are there tools for packaging software including shared libraries?
Now for some details: This is my project's (I call it foo for this purpose) layout:
foo (binary)
config.ini
data
Now in the package, there will be two additional elements:
libs
foo.sh
libs will contain all the shared libraries the project requires, and foo.sh is a script that sets LD_LIBRARY_PATH to include libs. Therefore, the user will execute foo.sh and the program should start.
I have a shell script that packages the software in the following steps:
Create empty directory and copy foo.sh to it
Invoke the build process and make install into the new directory
Copy shared libs from the filesystem
Package everything as .tar.gz
What do you think of this? There are some problems with this approach:
I have to hard code all dependencies twice (once in CMake, once in the packaging script)
I have to define the version number twice (once in the source code, once in the packaging script)
How do you do it?
Edit:
Another question that just came up: How do you determine on which libraries your software depends? I did an ldd foo, but there's an awfull lot. I looked at how WorldOfGoo packages look, and they ship only very few libraries. How can I make assumptions about which library will be present on a user's system and which won't? Just install all targeted distributions into a virtual matine and see what's required?
Generic issues
Your way to package your stuff (with dependent libs) to /opt is how proprietary (and even open-source) software is packaged. It's recommended practice of The Linux Foundation (see my answer to the other question for links).
External libs may be either compiled from scratch and embedded into your build process as a separate step (especially if you modify them), or fetched from packages of some distributions. The second approach is easier, but the first one allows more flexibility.
Note that it's not necessary to include some really low-level libraries (such as glibc, Xorg) into your package. They'd be better left to system vendors to tune, and you may just assume they exist. Moreover, there's an Linux Standard Base, that documents the most important libraries; these libraries exist almost everywhere, and can be trusted.
Note also that if you compile under a newer system, most likely, users of older systems won't be able to use it, while the reverse is not true. So, to reach better compatibility, it might be useful to compile package under a system that's two years older than today.
I just outlined some generic stuff, but I believe that Linux Developers Network website contains more information about packaging and portability.
Packaging
Judging by what I saw in the open-source distribution projects, your script does it the same way distribution vendors package software. Their scripts automatically patch sources, mimic installation of software and package the resultant folders into DEBs and RPMs.
Tar.gz, or course, could also work, but creating, for example, an RPM is not complex enough for you to miss such an opportunity to make life of your users so much easier.
Answering your questions,
Yes, you have to hard-code dependencies twice.
The thing is that when you hardocde them in CMake, you specify them in the other terms than when you specify them in a packaging script. CMake refers to shared libraries and header files, while packaging script refers to packages.
There's no cross-distribution one-to-one relationship between package names and shared libs and headers. It varies through distributions. Therefore, it should be specified twice.
But the package can be easily re-packed by distribution vendors, especially if you strive to packing all dependent libs into it (so there'll be less external dependencies to port). Also, a tool that can port packages from one distribution to another will appear soon (I'll update my answer when it's released).
Yes, you have to specify your version twice.
But the thing is that you may organize your packaging process in such a way that package and software versions never get out-of-sync. Just make the packaging script check out from your repository (or download from your website) exactly the same version that the script will write to package specifications.
Analyzing Dependencies
To analyze dependencies of your software, you may use our open-source, free Linux Application Checker tool. It will report the list of libraries it depends on, show distributions your software is compatible with, and help your application be more portable across distributions. It turns out that sometimes more cross-distribution compatibility can be achieved by little effort, and you don't have to lock yourself into support of just a few selected distributions.
Think long and hard (or ask your product development department) which distributions / architectures you need to support.
Make sure that they fully understand the testing implications.
I expect you will come up with a very short list of supported distributions and architectures.
It really depends on which customers are paying for Linux support. Most people use Redhat Enterprise (on servers) or Centos (which is indistinguishable from a technical perspective).
If you only need to support Redhat, you only need to support RPM, job's a good'un.
We are cross-compiling an application for an embedded Linux target under desktop Linux. For testing and other purposes we are using statically linked libraries with our application. The testing library we are using is CMockery.
My question is: Where should the static libraries and include files for CMockery live, given that we are cross-compiling?
If we weren't cross-compiling, things should go in /usr/local/lib.
Some suggestions from our team have been:
/opt/google/lib and /opt/google/include
/opt/embeddedLinuxDistro/usr/local/share/google/lib (and include)
/usr/local/arch/lib (and include)
Any pointers appreciated!
Note: After writing this answer, my summary would be:
Keep anything that is non-standard to the Linux distro you're using separate. In fact keep files for different projects separate even if they share libraries. This will make it much easier to move your files to another machine, to setup multiple complete builds for testing, and most importantly to be able to recreate the build starting from scratch.
The decision is really subjective.
Do you just need one copy of the library for all users?
Does it rarely change?
If your build machine caught fire and you had no backups of that machine, how quickly and easily could you re-build your environment of libraries and cross-compilers?
I ask these questions, because if the library changes often or different users may need different versions, you're better off having it be portable. That is, you can specify in your build where to find the files.
Of your team's suggestions, I would lean towards a path that contains a reference to your project. This will make it easier a year from now (when someone asks you to setup another build machine) to reproduce everything.
Lastly, I wouldn't worry about trying to adhere to "standard" library locations because you're not creating and managing a Linux distribution. Furthermore, most people don't really know anything more than "/usr/lib" and /usr/local/lib" and even the people that know those do not know the difference.
Do what's best for your project no matter what that may be.