I have a Haskell program that I want to compile with GHC, orchestrated by the Shake build system. Which commands should I execute, and under what circumstances should they be rerun?
There are two approaches to doing the compilation, and two approaches to getting the dependencies. You need to pick one from each set (all 4 combinations make sense), to come up with a combined approach.
Compilation
You can either:
Call ghc -c on each file in turn, depending on the .hs file and any .hi files it transitively imports, generating both a .hi and .o file. At the end, call ghc -o depending on all the .o files. For actual code see this example.
OR Call ghc --make once, depending on all .hs files. For actual code see this example.
The advantage of ghc --make is that it is faster than multiple calls to ghc -c since GHC can load each .hi file only once, instead of once per command. Typically the speedup is 3x. The disadvantage is parallelism is harder (you can use -j to ghc --make, but Shake still assumes each action consumes one CPU), and that two ghc --make compilations can't both run at the same time if they overlap on any dependencies.
Dependencies
You can either:
Parse the Haskell files to find dependencies recursively. To parse a file you can either look for import statements (and perhaps #include statements) following a coding convention, or use a library such as haskell-src-exts. For actual code with a very approximate import parser see this example.
OR Use the output of ghc -M to detect the dependencies, which can be parsed using the Shake helper function parseMakefile. For actual code see this example.
The advantage of parsing the Haskell files is that it is possible to have generated Haskell files and it can be much quicker. The advantage of using ghc -M is that it is easier to support all GHC features.
Related
I'm working on a program that needs to manipulate git repositories. I've decided to use libgit2. Unfortunately, the haskell bindings for it are several years out of date and lack several functions that I require. Because of this I've decided to write the portions that use libgit2 in C and call them through the FFI. For demonstration purposes one of them is called git_update_repo.
git_update_repo works perfectly when used in a pure C program, however when it's called from haskell an assertion fails indicating that the libgit2 global init function, git_libgit2_init, hasn't been called. But, git_libgit2_init is called by git_update_repo. And if I use gdb I can see that git_libgit2_init is indeed called and reports that the initialization has been successful.
I've used nm to examine the executables and found something interesting. In a pure C executable, all the libgit2 functions are dynamically linked (as expected). However, in my haskell executable, git_libgit2_init is dynamically linked, while the rest of the libgit2 functions are statically linked. I'm certain that this mismatch is the cause of my issue.
So why do certain functions get linked dynamically and others statically? How can I change this?
The relevant settings in my .cabal file are
cc-options: -g
c-sources:
src/git-bindings.c
extra-libraries:
git2
In a library I'm writing I need to use CPP to choose between two blocks of code depending on whether my user is compiling with LLVM or the native code gen. Is there a way to detect this in the .cabal file and do something like
library
-- not real:
if backend(llvm)
CPP-Options: -DUSING_LLVM
Or maybe it's even possible to detect arbitrary flags passed to GHC (instead of just -fllvm)?
Ah, I forgot to check the GHC docs. GHC defines a macro __GLASGOW_HASKELL_LLVM__ which is defined when -fllvm was specified (and can be used to check llvm version as well):
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/phases.html#options-affecting-the-c-pre-processor
I see in this answer and this one that "everything will break horribly" and Stack won't let me replace base, but it will let me replace bytestring. What's the problem with this? Is there a way to do this safely without recompiling GHC? I'm debugging a problem with the base libraries and it'd be very convenient.
N.B. when I say I want to replace base I mean with a modified version of base from the same GHC version. I'm debugging the library, not testing a program against different GHC releases.
Most libraries are collections of Haskell modules containing Haskell code. The meaning of those libraries is determined by the code in the modules.
The base package, though, is a bit different. Many of the functions and data types it offers are not implemented in standard Haskell; their meaning is not given by the code contained in the package, but by the compiler itself. If you look at the source of the base package (and the other boot libraries), you will see many operations whose complete definition is simply undefined. Special code in the compiler's runtime system implements these operations and exposes them.
For example, if the compiler didn't offer seq as a primitive operation, there would be no way to implement seq after-the-fact: no Haskell term that you can write down will have the same type and semantics as seq unless it uses seq (or one of the Haskell extensions defined in terms of seq). Likewise many of the pointer operations, ST operations, concurrency primitives, and so forth are implemented in the compiler themselves.
Not only are these operations typically unimplementable, they also are typically very strongly tied to the compiler's internal data structures, which change from one release to the next. So even if you managed to convince GHC to use the base package from a different (version of the) compiler, the most likely outcome would simply be corrupted internal data structures with unpredictable (and potentially disastrous) results -- race conditions, trashing memory, space leaks, segfaults, that kind of thing.
If you need several versions of base, just install several versions of GHC. It's been carefully architected so that multiple versions can peacefully coexist on a single machine. (And in particular installing multiple versions definitely does not require recompiling GHC or even compiling GHC a first time, which seems to be your main concern.)
I've got a couple of (independent) files that take quite a while to compile, so I thought I would try out parallel compilation, per Don Stewart's answer here.
I followed the directions here, so my makefile looks something like
quickbuild:
ghc --make MyProg.hs -o MyProg
depend:
ghc -M -dep-makefile makefile MyProg
# DO NOT DELETE: Beginning of Haskell dependencies
...
MyProg.o : MyProg.hs
MyProg.o : B.hi
MyProg.o : C.hi
...
# DO NOT DELETE: End of Haskell dependenciesghc
(Note: contrary to the docs, GHC seems to default to "Makefile" rather than "makefile", even when "makefile" exists.)
My question is: How do I make quickbuild depend on any of the auto-gen dependencies (so that make will actually run in parallel)? I tried adding 'MyProg.o' to the dependency list of 'quickbuild', but 'make' (rightly) complained that there was no rule to build 'B.hi'.
I suggest not to use make for this kind of purpose.
Look at ghc-parmake and its issues, especially this one - GHC has a very sophisticated recompilation checker that you cannot replicate with Makefiles (it can detect e.g. if a package file outside of your own project changes).
You will also not receive a large speedup (in practice not > 2) from a parallel make -j for running multiple GHCs in parallel, since firing multiple GHCs has high startup overhead which is avoided by ghc --make. In particular, each new GHC invocation has to parse and typecheck all the interface .hi files involved in all dependencies of the module you are compiling; ghc --make caches them.
Instead, use the new ghc --make -j of GHC 7.8 - it is truly parallel.
It will be more reliable and less effort than your manually written Makefile, and do recompilation avoidance better than Make can do with its file time stamps.
On the first view, this sounds like a drawback of Haskell, but in fact it is not. In other languages that like to use make for building, say C++, it is impossible to notice when files outside of your project change; having a build system in the compiler itself like ghc --make allows to notice this.
I have a TH-heavy file which takes around 30 seconds to compile. What are some techniques I can use to help debug the performance of my Template Haskell?
If I understand compile flow of TH correctly, the ordinary haskell functions are being executed while splicing at compile time. But you can run then at the runtime on your own, of course.
For example you have something like $(foo x y ...) in your TH-heavy file. Create another file and call 'foo x y' there but don't splice the result. Then you'll be able to profile 'foo' as usual. If the bottleneck is at the AST generation stage you'll locate it. Don't forget to consider lazyness.
As of GHC 8, this can be done with -fexternal-interpreter.
Compile the library defining the TH function with profiling enabled, then compile the code* which uses the TH function in a splice with GHC options -fexternal-interpreter -opti+RTS -opti-p. This should produce a file called ghc-iserv-prof.prof.
This approach has the advantage that you can use the full functionality of the Q monad.
* A benchmark suite in the same cabal project as the TH library (but in a different hs-source-dir) also works. It might even work with a TH function defined and used in the same library, but I think you'll be profiling interpreted code then.