How to use 'make' with GHC Dependency Generation

How to use 'make' with GHC Dependency Generation - haskell

I've got a couple of (independent) files that take quite a while to compile, so I thought I would try out parallel compilation, per Don Stewart's answer here.
I followed the directions here, so my makefile looks something like
quickbuild:
ghc --make MyProg.hs -o MyProg
depend:
ghc -M -dep-makefile makefile MyProg
# DO NOT DELETE: Beginning of Haskell dependencies
...
MyProg.o : MyProg.hs
MyProg.o : B.hi
MyProg.o : C.hi
...
# DO NOT DELETE: End of Haskell dependenciesghc
(Note: contrary to the docs, GHC seems to default to "Makefile" rather than "makefile", even when "makefile" exists.)
My question is: How do I make quickbuild depend on any of the auto-gen dependencies (so that make will actually run in parallel)? I tried adding 'MyProg.o' to the dependency list of 'quickbuild', but 'make' (rightly) complained that there was no rule to build 'B.hi'.

I suggest not to use make for this kind of purpose.
Look at ghc-parmake and its issues, especially this one - GHC has a very sophisticated recompilation checker that you cannot replicate with Makefiles (it can detect e.g. if a package file outside of your own project changes).
You will also not receive a large speedup (in practice not > 2) from a parallel make -j for running multiple GHCs in parallel, since firing multiple GHCs has high startup overhead which is avoided by ghc --make. In particular, each new GHC invocation has to parse and typecheck all the interface .hi files involved in all dependencies of the module you are compiling; ghc --make caches them.
Instead, use the new ghc --make -j of GHC 7.8 - it is truly parallel.
It will be more reliable and less effort than your manually written Makefile, and do recompilation avoidance better than Make can do with its file time stamps.
On the first view, this sounds like a drawback of Haskell, but in fact it is not. In other languages that like to use make for building, say C++, it is impossible to notice when files outside of your project change; having a build system in the compiler itself like ghc --make allows to notice this.

Related

Where do I define Arbitrary instances?

I can't figure out where to define Arbitrary instances for my datatype. If I put it in the package, then the package unnecessarily has to have QuickCheck as a dependency. If I put it in the tests, then other packages can't make use of the instance. If I put it in a separate test-utils package then the tests also have to live in a separate package so it's an orphan instance and also stack test --coverage doesn't work.
What other options are there?

I'd usually pick the separate package option — but then I don't use stack test --coverage. Thanks for introducing me to it!
(Edit: I'd probably do this, and then use the test flag option only for running stack test --coverage --flag thepackage:arbitrary so that nobody else has to deal with the flags.)
It may also be worth raising the --coverage issue on the stack issue tracker, as it would be good for the coverage check to work in this case.
You ask for other options — the best one is probably a test flag.
A test flag
It is possible to define a flag in your cabal file (defaulting to false) which will only build the modules with your QuickCheck dependency if the flag is selected.
Place the required code in the directory arbitrary (for example). Then add the equivalent of the following to the relevant parts of your package.yaml (1st snippet) or the-library.cabal (2nd snippet) file:
flags:
arbitrary:
description: Compile with arbitrary instances
default: false
manual: true
library:
⁝
when:
- condition: flag(arbitrary)
dependencies:
- QuickCheck
source-dirs:
- arbitrary
flag arbitrary
description: Compile with arbitrary instances
manual: True
default: False
library
⁝
if flag(arbitrary)
hs-source-dirs:
arbitrary
build-depends:
QuickCheck
Then, packages which want to use the instances should add the following in their stack.yaml (1st) or cabal.project (2nd) files:
flag:
the-library:
arbitrary: true
constraints: the-library +arbitrary
But there's a slight problem… there is currently no way for that library to only depend on the +arbitrary version in only its test suite, unless it also defines such a flag. This may be a price worth paying.
Note: I haven't tested the downstream packaging, yet.
Ivan Milenovic's blog was useful as an initial resource.
DerivingVia/Generic instances
There may be another possibility, now that GHC 8.6 has been released, with DerivingVia. There's a case study in Blöndal, Löh & Scott (2018) for Arbitrary instances.
You would create newtype wrappers, and implement Arbitrary for those newtypes.
It doesn't quite avoid the problem, as is. But you may be able to implement Generic for those newtypes in such a way that the instances derivable using generic-arbitrary match what you'd want.
There may be some other options. In particular, QuickCheck's dependencies aren't actually that heavy. And there are other testing libraries, too. Plus, note that there has been some discussion of separating an Arbitrary-like typeclass into a standalone library.
I would also have suggested internal libraries, but that doesn't allow other packages to use your instances.

How do I compile Haskell programs using Shake

I have a Haskell program that I want to compile with GHC, orchestrated by the Shake build system. Which commands should I execute, and under what circumstances should they be rerun?

There are two approaches to doing the compilation, and two approaches to getting the dependencies. You need to pick one from each set (all 4 combinations make sense), to come up with a combined approach.
Compilation
You can either:
Call ghc -c on each file in turn, depending on the .hs file and any .hi files it transitively imports, generating both a .hi and .o file. At the end, call ghc -o depending on all the .o files. For actual code see this example.
OR Call ghc --make once, depending on all .hs files. For actual code see this example.
The advantage of ghc --make is that it is faster than multiple calls to ghc -c since GHC can load each .hi file only once, instead of once per command. Typically the speedup is 3x. The disadvantage is parallelism is harder (you can use -j to ghc --make, but Shake still assumes each action consumes one CPU), and that two ghc --make compilations can't both run at the same time if they overlap on any dependencies.
Dependencies
You can either:
Parse the Haskell files to find dependencies recursively. To parse a file you can either look for import statements (and perhaps #include statements) following a coding convention, or use a library such as haskell-src-exts. For actual code with a very approximate import parser see this example.
OR Use the output of ghc -M to detect the dependencies, which can be parsed using the Shake helper function parseMakefile. For actual code see this example.
The advantage of parsing the Haskell files is that it is possible to have generated Haskell files and it can be much quicker. The advantage of using ghc -M is that it is easier to support all GHC features.

Cabal compilation conditional on compiling with llvm or not

In a library I'm writing I need to use CPP to choose between two blocks of code depending on whether my user is compiling with LLVM or the native code gen. Is there a way to detect this in the .cabal file and do something like
library
-- not real:
if backend(llvm)
CPP-Options: -DUSING_LLVM
Or maybe it's even possible to detect arbitrary flags passed to GHC (instead of just -fllvm)?

Ah, I forgot to check the GHC docs. GHC defines a macro __GLASGOW_HASKELL_LLVM__ which is defined when -fllvm was specified (and can be used to check llvm version as well):
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/phases.html#options-affecting-the-c-pre-processor

Why can't I replace libraries distributed with GHC? What would happen if I did?

I see in this answer and this one that "everything will break horribly" and Stack won't let me replace base, but it will let me replace bytestring. What's the problem with this? Is there a way to do this safely without recompiling GHC? I'm debugging a problem with the base libraries and it'd be very convenient.
N.B. when I say I want to replace base I mean with a modified version of base from the same GHC version. I'm debugging the library, not testing a program against different GHC releases.

Most libraries are collections of Haskell modules containing Haskell code. The meaning of those libraries is determined by the code in the modules.
The base package, though, is a bit different. Many of the functions and data types it offers are not implemented in standard Haskell; their meaning is not given by the code contained in the package, but by the compiler itself. If you look at the source of the base package (and the other boot libraries), you will see many operations whose complete definition is simply undefined. Special code in the compiler's runtime system implements these operations and exposes them.
For example, if the compiler didn't offer seq as a primitive operation, there would be no way to implement seq after-the-fact: no Haskell term that you can write down will have the same type and semantics as seq unless it uses seq (or one of the Haskell extensions defined in terms of seq). Likewise many of the pointer operations, ST operations, concurrency primitives, and so forth are implemented in the compiler themselves.
Not only are these operations typically unimplementable, they also are typically very strongly tied to the compiler's internal data structures, which change from one release to the next. So even if you managed to convince GHC to use the base package from a different (version of the) compiler, the most likely outcome would simply be corrupted internal data structures with unpredictable (and potentially disastrous) results -- race conditions, trashing memory, space leaks, segfaults, that kind of thing.
If you need several versions of base, just install several versions of GHC. It's been carefully architected so that multiple versions can peacefully coexist on a single machine. (And in particular installing multiple versions definitely does not require recompiling GHC or even compiling GHC a first time, which seems to be your main concern.)

Profiling Template Haskell

I have a TH-heavy file which takes around 30 seconds to compile. What are some techniques I can use to help debug the performance of my Template Haskell?

If I understand compile flow of TH correctly, the ordinary haskell functions are being executed while splicing at compile time. But you can run then at the runtime on your own, of course.
For example you have something like $(foo x y ...) in your TH-heavy file. Create another file and call 'foo x y' there but don't splice the result. Then you'll be able to profile 'foo' as usual. If the bottleneck is at the AST generation stage you'll locate it. Don't forget to consider lazyness.

As of GHC 8, this can be done with -fexternal-interpreter.
Compile the library defining the TH function with profiling enabled, then compile the code* which uses the TH function in a splice with GHC options -fexternal-interpreter -opti+RTS -opti-p. This should produce a file called ghc-iserv-prof.prof.
This approach has the advantage that you can use the full functionality of the Q monad.
* A benchmark suite in the same cabal project as the TH library (but in a different hs-source-dir) also works. It might even work with a TH function defined and used in the same library, but I think you'll be profiling interpreted code then.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string