Monitoring GHC activity - haskell

If GHC takes a long time to compile something, is there a way to find out what it's doing?
Firstly, it would be nice to know if I've actually crashed the compiler (i.e., put it into some sort of infinite loop somehow), or whether it's actually making progress, but just very slowly.
Secondly, it would be nice to know exactly what part of the compilation process GHC is having trouble with. Is it the parsing, or desugaring, or type-checking, or Core optimisation, or code generation, or...?
Is there some way to monitor what's going on? (Bearing in mind that if GHC is taking a long time, that probably means it's doing a lot of work, so if you ask for too much output it's going to be huge!)
GHC already tells you which modules it's trying to (re)compile. In my case, the problem is a single self-contained module. I'd like to know where GHC is getting stuck.

Following Daniel Fischer's comment, I tried running GHC with different verbosity options.
-v1: Produced a bit more output, but nothing during the main compilation step.
-v2: Tells you what step GHC is currently doing (parser, desugar, type check, simplifier, etc). This is pretty much what I actually wanted.
-v3: Appears to make the simplifier actually dump what it's doing to the console - bad idea while compiling 8MB of source code!
So it seems that -v2 is the place to start.
(In the specific case of the program that prompted this question, it seems GHC is spending forever in the type checking phase.)

Related

Elm Compiler running forever, computer just getting hot

I'm not sure what's causing this issue, but in a project, I'm building, the compiler is taking hours just to compile a module. The total size of my codebase is 352KB, but none of the modules are over 10KB large. I am using a Native port, but it's very trivial; I'm just fetching Date.now() with it.
Is there anything well-known that would cause the elm compiler to take forever to compile? I don't have many dependencies, but I'm using Html a lot. I would really appreciate any hints as to what would cause this.
Edit
So it turns out large case expressions will cause the optimizer to take a long time, as of 0.16. Here's the discussion on Elm-Discuss bringing up the issue, and a gist of the nasty case match.
I guess to be verbose and to keep a carrot out there, why would elm's compiler take this route for case-matching? What's the underlying machinery going on here? Why would the compiler take longer than an hour for optimizing 60+ pattern matches on a case statement?
Large case expressions will cause the optimizer to take a long time, as of 0.16. Here's the discussion on Elm-Discuss bringing up the issue, and a gist of the nasty case match.

Why is "cabal build" so slow compared with "make"?

If I have a package with several executables, which I initially build using cabal build. Now I change one file that impacts just one executable, cabal seems to take about a second or two to examine each executable to see if it's impacted or not. On the other hand, make, given an equivalent number of executables and source files, will determine in a fraction of a second what needs to be recompiled. Why the huge difference? Is there a reason, cabal can't just build its own version of a makefile and go from there?
Disclaimer: I'm not familiar enough with Haskell or make internals to give technical specifics, but some web searching does offer some insight that lines up with my proposal (trying to avoid eliciting opinions by providing references). Also, I'm assuming your makefile is calling ghc, as cabal apparently would.
Proposal: I believe there could be several key reasons, but the main one is that make is written in C, whereas cabal is written in Haskell. This would be coupled with superior dependency checking from make (although I'm not sure how to prove this without looking at the source code). Other supporting reasons, as found on the web:
cabal tries to do a lot more than simply compiling, e.g. appears to take steps with regard to packaging (https://www.haskell.org/cabal/)
cabal is written in haskell, although the run time is written in C (https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler)
Again, not being overly familiar with make internals, make may simply have a faster dependency checking mechanism, thereby better tracking these changes. I point this out because from the OP it sounds like there is a significant enough difference to where cabal may be doing a blanket check against all dependencies. I suspect this would be the primary reason for the speed difference, if true.
At any rate, these are open source and can be downloaded from their respective sites (haskell.org/cabal/ and savannah.gnu.org/projects/make/) allowing anyone to examine specifics of the implementations.
It is also likely one could see a lot of variance in speed based upon the switches passed to the compilers in use.
HTH at least point you in the right direction.

Using the GHC API to do a "dry run" of code compilation

I'm working on a fairly simple text-editor for Haskell, and I'd like to be able to highlight static errors in code when the user hits "check."
Is there a way to use the GHC-API to do a "dry-run" of compiling a haskell file without actually compiling it? I'd like to be able to take a string and do all the checks of normal compilation, but without the output. The GHC-API would be ideal because then I wouldn't have to parse command-line output from GHC to highlight errors and such.
In addition, is it possible to do this check on a string, instead of on a file? (If not, I can just write it to a temp file, which isn't terribly efficient, but would work).
If this is possible, could you provide or point me to an example how how to do this?
This question ask the same thing, but it is from three years ago, at which time the answer was "GHC-API is new and there isn't good documentation yet." So my hope is that the status has changed.
EDIT: the "dry-run" restriction is because I'm doing this in a web-based setting where compilation happens server side, so I'd like to avoid unnecessary disk reads/write every time the user hits "check". The executable would just get thrown away anyways, until they had a version ready to run.
Just to move this to an answer, this already exists as ghc-mod, here's the homepage. This already has frontends for Emacs, Sublime, and Vim so if you need examples of how to use it, there are plenty. In essence ghc-mod is just what you want, a wrapper around the GHC API designed for editors.

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

How can I help SpecConstr in GHC?

I'm using GHC 7.4.1 to try to compile a program that uses Repa. But partway through compilation, I'm running out of memory. With ghc -v, I can see that it's getting stuck in the SpecConstr phase.
SpecConstr is one of GHC's Core-to-Core transformations. Simon Peyton Jones has a nice description here, and there's some code here, but it's pretty slow-going for me since I'm not very familiar with the inner workings of GHC.
I'd like to be able to help the compiler along somehow - is there a way to tell where it's getting stuck? Alternatively, is there a way to limit memory usage in this phase until I can recompile on a bigger machine?
Thanks,
Chad
You can try compiling with the flags -fspec-constr-threshold=n and -fspec-constr-count=n. More details are in the GHC docs. With 7.4.1, the defaults are n=200 for the threshold and n=3 for the count.
Without seeing code, though, it's possible you're running into this bug. In which case you may need to entirely disable the specconstr pass if the above options aren't sufficient.
In addition to John L's answer, ensure you compile with the flag -fno-liberate-case. The liberate case transform tends to cause code-blowup, which then makes SpecConstr's job harder.

Resources