Arranging auxiliary tasks for a Haskell project

Arranging auxiliary tasks for a Haskell project - haskell

There are some repetitive auxiliary tasks that I usually have to run when developing or testing a project. For example: downloading some data, setting up the database, cleaning the logs, etc. In Ruby land, they are handled by rake while other languages prefer make or something else (tasks occasionally depend on other tasks, so we may occasionally need one task to perform subtasks that it depends on).
So, is there some conventional way to organize those tasks in a Haskell project?
I would assume that cabal could be used for that, but not all of those auxiliary tasks are about running Haskell code: sometimes it's just a case of performing rm -r logs/*.log or downloading some data with wget or curl. Would it make sense to make cabal's test target depend on other cabal targets that, ugh, run shell scripts/commands from Haskell code? (If it's possible to have dependent targets in cabal at all?)
Alternatively, I could use make, but would "an average haskeller" (an "outside" project contributor, for example) find that intuitive? I believe one would first try cabal test before discovering that it requires setting up the database for the testing first, then running a whole chain of other tasks. Would one notice a Makefile in the first place?
I couldn't find any recipes for handling those auxiliary tasks in Haskell project around.

As long as I know, there's no de facto standard tool in Haskell project.
But recently I heard Shake, a monadic build system written in Haskell.

Related

Can I actually build and run an executable from the same package as part of a test suite?

It struck me that I do not really know of a way to black box test an executable packaged with Cabal.
With npm, for instance, I can run arbitrary shell commands, and I surely can wire it so that the necessary sources are transpiled and executed, and their side effects inspected.
Stack (as said here) builds the executables and publishes them in $PATH for the test suite, so I can easily run them.
But with Cabal, a test suite apparently cannot even depend on an executable, so there is no way to force the latter to be built. (Am I wrong about this?) And even then, I would have to know the path to the compiled binary.
How do I approach this problem?
The particulars of my situation are that an executable must extensively analyze the state of the system and branch accordingly, and I want to integration test that it does not forget to do so.
Note also that I am not at peace with running the relevant IO functions directly because I find it not integrative enough. Or, rather, I would like it to be possible to run the individual IO functions and also run the program as a whole. In my case, there are testing shell scripts in place already, but I would really like to "bake them in".

It turns out that there is a (slightly hacky) way to do this, at least for now, using the new(ish) build-tool-depends Cabal field. There has been some discussion (https://github.com/haskell/cabal/issues/5411, https://github.com/haskell/cabal/pull/4104#issuecomment-266838873) of build-tool-depends only being available at build-time, and having a separate field for executables that should be available when running a component. However, this separate run-time tool depends field doesn't exist yet. Luckily, it seems like Cabal (at least 2.1 and 2.2) completely doesn't draw this distinction: executables listed in build-tool-depends are actually available when cabal new-test runs a test suite. This means that you can use a pkg.cabal file that looks like this:
name: pkg
executable exe
...
test-suite test
...
build-tool-depends: pkg:exe
And when you run the test suite, the executable will be built & on the path.

Why is "cabal build" so slow compared with "make"?

If I have a package with several executables, which I initially build using cabal build. Now I change one file that impacts just one executable, cabal seems to take about a second or two to examine each executable to see if it's impacted or not. On the other hand, make, given an equivalent number of executables and source files, will determine in a fraction of a second what needs to be recompiled. Why the huge difference? Is there a reason, cabal can't just build its own version of a makefile and go from there?

Disclaimer: I'm not familiar enough with Haskell or make internals to give technical specifics, but some web searching does offer some insight that lines up with my proposal (trying to avoid eliciting opinions by providing references). Also, I'm assuming your makefile is calling ghc, as cabal apparently would.
Proposal: I believe there could be several key reasons, but the main one is that make is written in C, whereas cabal is written in Haskell. This would be coupled with superior dependency checking from make (although I'm not sure how to prove this without looking at the source code). Other supporting reasons, as found on the web:
cabal tries to do a lot more than simply compiling, e.g. appears to take steps with regard to packaging (https://www.haskell.org/cabal/)
cabal is written in haskell, although the run time is written in C (https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler)
Again, not being overly familiar with make internals, make may simply have a faster dependency checking mechanism, thereby better tracking these changes. I point this out because from the OP it sounds like there is a significant enough difference to where cabal may be doing a blanket check against all dependencies. I suspect this would be the primary reason for the speed difference, if true.
At any rate, these are open source and can be downloaded from their respective sites (haskell.org/cabal/ and savannah.gnu.org/projects/make/) allowing anyone to examine specifics of the implementations.
It is also likely one could see a lot of variance in speed based upon the switches passed to the compilers in use.
HTH at least point you in the right direction.

One multimode Haskell executable vs separate executables sharing a library

I'm working on a project now in which I configure the cabal file to build several executables which share the library built by the same cabal file. The cabal project is structured much like this one, with one library section followed by several executable sections that include this library in their build-depends sections.
I'm using this approach so I can make common functions available to any number of executables, and create more executables easily as needed.
Yet in his Monad Reader article on Hoogle p.33, Neil Mitchell advocates bundling up Haskell projects into a single executable with multiple modes (e.g. by using Neil Mitchell's CmdArgs library.) So there might be one mode to start a web server, another mode to query the database from the command line, etc. Quote:
Provide one executable
Version 3 had four executable programs – one to generate ranking
information, one to do command line searching, one to do web
searching, and one to do regression testing. Version 4 has one
executable, which does all the above and more, controlled by ﬂags.
There are many advantages to providing only one end program – it
reduces the chance of code breaking without noticing it, it makes the
total ﬁle size smaller by not duplicating the Haskell run-time system,
it decreases the number of commands users need to learn. The move to
one multipurpose executable seems to be a common theme, which tools
such as darcs and hpc both being based on one command with multiple
modes.
Is a single multimode executable really the better way to go? Are there countervailing reasons to stick with separate executables sharing the same library?

Personally, I'm more of a fan of the Unix philosophy "write programs that do one thing and do it well". However there are reasons for doing either way, so the only reasonable answer here is: it depends.
One example where it makes senses to bundle everything into same executable, is when you're targeting a platform that is very limited on resources (e.g, embedded system). This is the approach taken by BusyBox.
On the other hand if you divide into multiple executables, you give your clients the option of just using those that matter to them. With a single executable, even if your client really just wanted one functionality, he'll have no way to get rid of the extra baggage.
I'm sure there are a lot of more reasons for going either way, but this just goes to show that there's no definitive answer. It depends on the use case.

Differences between SCons and Shake

I'm working on a Python/Haskell project and I'm looking for alternatives to Makefile. Obvious choices are Python SCons and Haskell Shake. Since I have no experience with either of them, I'd like to ask if there is any comparison of their drawbacks and advantages.
Update: The project has somewhat complex requirements for building:
Let the user configure the build - like options to enable/disable, paths to tools etc.
There are both Haskell and Python files generated at compile time. Their dependencies should work properly.
There are multiple Haskell programs that share most of the source files. I'd like so that:
it's possible to build each one individually, not building the sources that aren't needed;
source files aren't built multiple times when compiling multiple programs;
yet achieve parallelism during compilation, if possible.
Check for several installed programs on target systems and their paths (like python, flock etc.)
Check for dependencies on target systems, both Python, Haskell.
Parametrize the build according to the dependencies - if the dependencies for testing are missing, it should still be possible to build the project, skipping the tests (and informing the user about it).

There is a Why Shake? document that gives reasons to chose Shake over other build systems, but it does not focus on a comparison to SCons.
Update: All of your requirements seem easy enough to express in Shake (ask on StackOverflow if you get stuck with any of them). As to Shake vs SCons:
Shake is particularly good at dealing with generated files with dependencies that cannot be statically predicted, particularly if you are generating the files from programs you compile.
Building the Haskell parts of your project is likely to be harder than building the Python (since Haskell has a richer structure and more complex compiler). Using Shake makes it easier to tap into existing examples of compiling Haskell and use libraries for parsing Haskell if you need it.

There is a SCons wiki page that compares it to other build tools, unfortunately there is no comparison there with Haskell/Shake.
Also, this question may help.
SCons really shines as compared to other tools (especially make and cmake) by its Python syntax, and its implicit dependency system that is very accurate and easy to use.

Is there a generic way to consume my dependency's grunt build process?

Let's say I have a project where I want to use Lo-Dash and jQuery, but I don't need all of the features.
Sure, both these projects have build tools so I can compile exactly the versions I need to save valuable bandwidth and parsing time, but I think it's quite uncomfortable and ugly to install both of them locally, generate my versions and then check them it into my repository.
Much rather I'd like to integrate their grunt process into my own and create custom builds on the go, which would be much more maintainable.
The Lo-Dash team offers this functionality with a dedicated cli and even wraps it with a grunt task. That's very nice indeed, but I want a generic solution for this problem, as it shouldn't be necessary to have every package author replicate this.
I tried to achieve this somehow with grunt-shell hackery, but as far as I know it's not possible to devDependencies more than one level deep, which makes it impossible even more ugly to execute the required grunt tasks.
So what's your take on this, or should I just move this over to the 0.5.0 discussion of grunt?

What you ask assumes that the package has:
A dependency on Grunt to build a distribution; most popular libraries have this, but some of the less common ones may still use shell scripts or the npm run command for general minification/compression.
Some way of generating a custom build in the first place with a dedicated tool like Modernizr or Lo-Dash has.
You could perhaps substitute number 2 with a generic one that parses both your source code and the library code and uses code coverage to eliminate unnecessary functions from the library. This is already being developed (see goldmine), however I can't make any claims about how good that is because I haven't used it.
Also, I'm not sure how that would work in a AMD context where there are a lot of interconnected dependencies; ideally you'd be able to run the r.js optimiser and get an almond build for production, and then filter that for unnecessary functions (most likely Istanbul, would then have to make sure that the filtered script passed all your unit/integration tests). Not sure how that would end up looking but it'd be pretty cool if that could happen. :-)
However, there is a task especially for running Grunt tasks from 'sub-gruntfiles' that you might like to have a look at: grunt-subgrunt.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string