How can I help SpecConstr in GHC? - haskell

I'm using GHC 7.4.1 to try to compile a program that uses Repa. But partway through compilation, I'm running out of memory. With ghc -v, I can see that it's getting stuck in the SpecConstr phase.
SpecConstr is one of GHC's Core-to-Core transformations. Simon Peyton Jones has a nice description here, and there's some code here, but it's pretty slow-going for me since I'm not very familiar with the inner workings of GHC.
I'd like to be able to help the compiler along somehow - is there a way to tell where it's getting stuck? Alternatively, is there a way to limit memory usage in this phase until I can recompile on a bigger machine?
Thanks,
Chad

You can try compiling with the flags -fspec-constr-threshold=n and -fspec-constr-count=n. More details are in the GHC docs. With 7.4.1, the defaults are n=200 for the threshold and n=3 for the count.
Without seeing code, though, it's possible you're running into this bug. In which case you may need to entirely disable the specconstr pass if the above options aren't sufficient.

In addition to John L's answer, ensure you compile with the flag -fno-liberate-case. The liberate case transform tends to cause code-blowup, which then makes SpecConstr's job harder.

Related

Confusions arising from a programming language whose compiler is written in itself

By accident I knew that the compiler of Haskell is written in Haskell. It sounds strange to me. How is this possible, I mean, to compile itself? Who is to compile the compiler then? What is the ultimate code accepted by machine?
Consider the programming language who is the first to have a compiler. What is the language of its compiler? Going back even farther in time, how did people program before the era of compiler?
Broadly speaking, I am often confused about the border between software (e.g. programming written by people) and hardware (e.g. something executable on a physical machine).
P.S.: I have basic knowledge about compiler such as lexical analysis, parsing, and code optimization. However, I know little about hardware (the machine).
It seems that the answer to a related post Implementing a compiler in “itself” does not go deeply into the border between software and hardware.
And I would like to see some concrete examples.
EDIT: Some comments mentioned the term "bootstrapping". It seems that there is some minimum core part of a language (like axioms/basic theorems in mathematics) which must be compiled in a lower-level way (instead of by itself). What are they? Are they basically the same in different languages? Again, I would like to see some concrete examples.
As you can read in A history of Haskell page 28, the first haskell compiler was written in Lazy ML in June 1989. It implemented essentially all of Haskell 1.0.
Now that this compiler existed, it can then be used to compile the Haskell version of GHC. The first beta of GHC written in Haskell was release on 1 April 1991. The full release came in December 1992.
Because the Lazy ML-based compiler wasn't developed further, today you use a previous version of GHC to compile GHC. So if you want to build GHC 7.8, you use GHC 7.6 to build it (in practice, it's a bit more complicated, because there are multiple stages and only the first stage, which doesn't support GHCi or TemplateHaskell is built with GHC 7.6)
That means that if you don't have a working haskell compiler today, you have two options:
Try to install a LML compiler and compile the first version of GHC written in Lazy ML. Then use this compiler to compile the next version which is written in Haskell. Then again use that compiler to build the next version, and repeat until you have a reasonably recent compiler. It may be possible to skip a few versions, but I don't know how many. As you can imagine, this could take a lot of time.
(Much easier) Download pre-built GHC binaries.
Um... I have not tried this, but another route would be simply compiling to c and using a c compiler to compile latest ghc...ghc is itself built in stages, so you don't really even need to convert the whole code base to c, just the first stage, which then can compile the rest. Certainly no need to dig up Lazy ML.
Edit: Note the resulting compiler will not build binaries targeting the new platform, it would simply run on that platform and be a cross-platform compiler for targets that ghc already has backends for. Another note is that i actually intended this in response to bennofs answer, not as stand alone answer to the OP.

Ghc's profiling and code coverage options conflict?

For GHC, if I add the -fhpc option while the -prof -fprof-auto options are enabled, GHC does not add any cost centers to the code, and the profiling report shows only CAFs. However, if I remove -fhpc profiling works fine. Is there a reason for this? Is there a way to enable both of these features?
There is no deep reason behind this - the simple truth of the matter is that GHC's "Coverage" pass is only run once (at maximum), and only ever generates one kind of annotation.
I think at this point it would mainly be a question of somebody putting in a bit of time to implement a fix and properly check that mixtures of annotations don't cause bad side-effects. Opening a GHC ticket for this particular issue might be a good idea - especially if you have a good use-case.

Monitoring GHC activity

If GHC takes a long time to compile something, is there a way to find out what it's doing?
Firstly, it would be nice to know if I've actually crashed the compiler (i.e., put it into some sort of infinite loop somehow), or whether it's actually making progress, but just very slowly.
Secondly, it would be nice to know exactly what part of the compilation process GHC is having trouble with. Is it the parsing, or desugaring, or type-checking, or Core optimisation, or code generation, or...?
Is there some way to monitor what's going on? (Bearing in mind that if GHC is taking a long time, that probably means it's doing a lot of work, so if you ask for too much output it's going to be huge!)
GHC already tells you which modules it's trying to (re)compile. In my case, the problem is a single self-contained module. I'd like to know where GHC is getting stuck.
Following Daniel Fischer's comment, I tried running GHC with different verbosity options.
-v1: Produced a bit more output, but nothing during the main compilation step.
-v2: Tells you what step GHC is currently doing (parser, desugar, type check, simplifier, etc). This is pretty much what I actually wanted.
-v3: Appears to make the simplifier actually dump what it's doing to the console - bad idea while compiling 8MB of source code!
So it seems that -v2 is the place to start.
(In the specific case of the program that prompted this question, it seems GHC is spending forever in the type checking phase.)

Haskell measuring function performance

In Haskell, how can i 'simply' measure a functions performance. For example, how long it takes to run, or how much memory it takes?. I am aware of profiling, however, is there a more simple way that will not require me to change my code too much?
Measuring how long it takes to run and how much memory it takes are two separate problems, namely: benchmarking and profiling. Haskell has a well defined set of tools for both. Solving neither of the problems requires you to make any changes to the actual application's code.
Benchmarking
This is done using libraries. There is an ultimate winner in that area, which was suggested by Niklas in the comments, namely Criterion. The library is very well designed, isn't hard to use and produces a very detailed data.
The workflow is the following: you create a separate module containing the setup of your benchmark, compile it and run it with options. To get a reference on available options run it with --help modifier.
You can find examples of setup modules here.
Profiling
There is enough of good materials on that already, so I'll just refer to them:
General reference on profiling
A tutorial in Real World Haskell
A tutorial on profiling with Cabal
For extremely crude information on how individual functions perform compared to each other, you can use ghci
Prelude> :set +s
Prelude> last [1..100000000]
100000000
(1.65 secs, 4000685276 bytes)
You need to be aware that ghci doesn't compile code, so runs much slower than ghc, the timing and memory usage data is approximate, and that absolutely no optimisation has been performed.
This means that it gives you only a very rough idea of how (in)efficient your code is, and is no substitute for proper benchmarking and profiling of compiled and optimised code, as detailed in Nikita Volkov's answer.

The compilation of the compiler could affect the compiled programs?

Probably my question sounds weird, but my point is: i have to compile a program using GCC, if i compile GCC from the source i will get a slight edge in terms of performances from a software compiled with the fresh new GCC? What I should expect?
You won't get any faster programs out of a compiler built with optimizing flags. Since a program is the compilers' output, and optimizations don't change the output of a correct program, the programs stay the same.
You might, however, profit from new available options if your distributor ships an incomplete compiler. Look through the GCC manual for any options you want to enable (like certain target architecture variants), and if you can't enable them in your current compiler build, there might be potential in a custom-built compiler. However, it is unlikely that it's worth it.
Not unless you're building a newer version of gcc, or enabling cloog, graphite, etc.
the performance difference usually is nothing or is negligible.
in a very rare, really very rare cases you can see noticeable difference, but not always performance improvement. degradation is possible too.

Resources