Ghc's profiling and code coverage options conflict? - haskell

For GHC, if I add the -fhpc option while the -prof -fprof-auto options are enabled, GHC does not add any cost centers to the code, and the profiling report shows only CAFs. However, if I remove -fhpc profiling works fine. Is there a reason for this? Is there a way to enable both of these features?

There is no deep reason behind this - the simple truth of the matter is that GHC's "Coverage" pass is only run once (at maximum), and only ever generates one kind of annotation.
I think at this point it would mainly be a question of somebody putting in a bit of time to implement a fix and properly check that mixtures of annotations don't cause bad side-effects. Opening a GHC ticket for this particular issue might be a good idea - especially if you have a good use-case.

Related

Correctly dynamic loading PIEs

Many discussions like this and this have warned us with examples that trying to dlopen a PIE could never be correct. The reasons are various: copy relocations, TLS, etc.
However, these problems can be circumvented if we loose the restriction. This question showed us compiling with fPIC can eliminate copy relocation, and TLS seems to work alright.
This brings up the question about how far we are from correctly dynamic loading a PIE. I agree with the idea again in link 1:
Bottom line: this was never designed to work, and you just happened to not step on many of the land-mines, so you thought it is working, when in fact you were exercising undefined behavior.
But I'm more interesting about WHY we could not do that, instead of another failing example.
More specifically, users could write their own runtime dynamic linker as this comment suggest, which could make some strong assumptions or compromises just for this purpose. Yet this requires extremely broad knowledge on compiling, linking and loading, some of which are known to be poorly documented.
So again, how do users correctly dynamic load PIEs, or at least how can they try to find a way to do that(or not to do that)?
But I'm more interesting about WHY we could not do that, instead of another failing example.
Because the designers of GLIBC didn't intend to allow for this to happen and don't consider this to be a valid use case.
More specifically, users could write their own runtime dynamic linker
Absolutely. You are free to design your own libc and the dynamic loader to allow for this use case. That requirement will add some complexity, but there is no fundamental reason it can't be done.
You may also find an existing alternate libc implementation which doesn't have this restriction (either because it has been designed in, or because the designers forgot to enforce it, as was the case with GLIBC before this patch).
how do users correctly dynamic load PIEs
They don't.
how can they try to find a way to do that(or not to do that)?
The usual solution is to "not do that", and in fact the need to "do that" seems to be very esoteric.
Why do you need to dlopen a PIE executable in the first place?

Why is "cabal build" so slow compared with "make"?

If I have a package with several executables, which I initially build using cabal build. Now I change one file that impacts just one executable, cabal seems to take about a second or two to examine each executable to see if it's impacted or not. On the other hand, make, given an equivalent number of executables and source files, will determine in a fraction of a second what needs to be recompiled. Why the huge difference? Is there a reason, cabal can't just build its own version of a makefile and go from there?
Disclaimer: I'm not familiar enough with Haskell or make internals to give technical specifics, but some web searching does offer some insight that lines up with my proposal (trying to avoid eliciting opinions by providing references). Also, I'm assuming your makefile is calling ghc, as cabal apparently would.
Proposal: I believe there could be several key reasons, but the main one is that make is written in C, whereas cabal is written in Haskell. This would be coupled with superior dependency checking from make (although I'm not sure how to prove this without looking at the source code). Other supporting reasons, as found on the web:
cabal tries to do a lot more than simply compiling, e.g. appears to take steps with regard to packaging (https://www.haskell.org/cabal/)
cabal is written in haskell, although the run time is written in C (https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler)
Again, not being overly familiar with make internals, make may simply have a faster dependency checking mechanism, thereby better tracking these changes. I point this out because from the OP it sounds like there is a significant enough difference to where cabal may be doing a blanket check against all dependencies. I suspect this would be the primary reason for the speed difference, if true.
At any rate, these are open source and can be downloaded from their respective sites (haskell.org/cabal/ and savannah.gnu.org/projects/make/) allowing anyone to examine specifics of the implementations.
It is also likely one could see a lot of variance in speed based upon the switches passed to the compilers in use.
HTH at least point you in the right direction.

Monitoring GHC activity

If GHC takes a long time to compile something, is there a way to find out what it's doing?
Firstly, it would be nice to know if I've actually crashed the compiler (i.e., put it into some sort of infinite loop somehow), or whether it's actually making progress, but just very slowly.
Secondly, it would be nice to know exactly what part of the compilation process GHC is having trouble with. Is it the parsing, or desugaring, or type-checking, or Core optimisation, or code generation, or...?
Is there some way to monitor what's going on? (Bearing in mind that if GHC is taking a long time, that probably means it's doing a lot of work, so if you ask for too much output it's going to be huge!)
GHC already tells you which modules it's trying to (re)compile. In my case, the problem is a single self-contained module. I'd like to know where GHC is getting stuck.
Following Daniel Fischer's comment, I tried running GHC with different verbosity options.
-v1: Produced a bit more output, but nothing during the main compilation step.
-v2: Tells you what step GHC is currently doing (parser, desugar, type check, simplifier, etc). This is pretty much what I actually wanted.
-v3: Appears to make the simplifier actually dump what it's doing to the console - bad idea while compiling 8MB of source code!
So it seems that -v2 is the place to start.
(In the specific case of the program that prompted this question, it seems GHC is spending forever in the type checking phase.)

How can I help SpecConstr in GHC?

I'm using GHC 7.4.1 to try to compile a program that uses Repa. But partway through compilation, I'm running out of memory. With ghc -v, I can see that it's getting stuck in the SpecConstr phase.
SpecConstr is one of GHC's Core-to-Core transformations. Simon Peyton Jones has a nice description here, and there's some code here, but it's pretty slow-going for me since I'm not very familiar with the inner workings of GHC.
I'd like to be able to help the compiler along somehow - is there a way to tell where it's getting stuck? Alternatively, is there a way to limit memory usage in this phase until I can recompile on a bigger machine?
Thanks,
Chad
You can try compiling with the flags -fspec-constr-threshold=n and -fspec-constr-count=n. More details are in the GHC docs. With 7.4.1, the defaults are n=200 for the threshold and n=3 for the count.
Without seeing code, though, it's possible you're running into this bug. In which case you may need to entirely disable the specconstr pass if the above options aren't sufficient.
In addition to John L's answer, ensure you compile with the flag -fno-liberate-case. The liberate case transform tends to cause code-blowup, which then makes SpecConstr's job harder.

What is the difference between gcc optimization levels?

What is the difference between different optimization levels in GCC? Assuming I don't care to have any debug hooks, why wouldn't I just use the highest level of optimization available to me? does a higher level of optimization necessarily (i.e. provably) generate a faster program?
Yes, a higher level can sometimes mean a better performing program. However, it can cause problems depending on your code. For example, branch prediction (enabled in -O1 and up) can break poorly written multi threading programs by causing a race condition. Optimization will actually decide something that's better than what you wrote, which in some cases might not work.
And sometimes, the higher optimizations (-O3) add no reasonable benefit but a lot of extra size. Your own testing can determine if this size tradeoff makes a reasonable performance gain for your system.
As a final note, the GNU project compiles all of their programs at -O2 by default, and -O2 is fairly common elsewhere.
Generally optimization levels higher than -O2 (just -O3 for gcc but other compilers have higher ones) include optimizations that can increase the size of your code. This includes things like loop unrolling, lots of inlining, padding for alignment regardless of size, etc. Other compilers offer vectorization and inter-procedural optimization at levels higher than -O3, as well as certain optimizations that can improve speed a lot at the cost of correctness (e.g., using faster, less accurate math routines). Check the docs before you use these things.
As for performance, it's a tradeoff. In general, compiler designers try to tune these things so that they don't decrease the performance of your code, so -O3 will usually help (at least in my experience) but your mileage may vary. It's not always the case that really aggressive size-altering optimizations will improve performance (e.g. really aggressive inlining can get you cache pollution).
I found a web page containing some information about the different optimization levels. One thing a remember hearing somewhere is that optimization might actually break your program and that can be an issue. But I'm not sure how much of a an issue that is any longer. Perhaps todays compilers are smart enough to handle those problems.
Sidenote:
It's quite hard to predict exactly what flags are turned on by the global -O directives on the gcc command line for different versions and platforms, and all documentation on the GCC site is likely to become outdated quickly or doesn't cover the compiler internals in enough detail.
Here is an easy way to check exactly what happens on your particular setup when you use one of the -O flags and other -f flags and/or combinations thereof:
Create an empty source file somewhere:touch dummy.c
Run it though the compiler pass just as you normally would, with all -O, -f and/or -m flags you would normally use, but adding -Q -v to the command line:gcc -c -Q -v dummy.c
Inspect the generated output, perhaps saving it for different run.
Change the command line to your liking, remove the generated object file via rm -f dummy.o and re-run.
Also, always keep in mind that, from a purist point of view, most non-trivial optimizations generate "broken" code (where broken is defined as deviating from the optimal path in corner cases), so choosing whether or not to enable a certain set of optimization mechanisms sometimes boils down to choosing the level of correctness for the compiler output. There always have (and currently are) bugs in any compiler's optimizer - just check the GCC mailing list and Bugzilla for some samples. Compiler optimization should only be used after actually performing measurements sincegains from using a better algorithm will dwarf any gains from compiler optimization,there is no point in optimizing code that will run every once in a blue moon,if the optimizer introduces bugs, it's immaterial how fast your code runs.

Resources