Are code reloading and laziness two incompatible features in Haskell? - haskell

One of Haskell's best known feature is its laziness, which permits to write more elegant and reusable code. Internally, laziness builds computations in memory for execution only when needed, which can happen much later at another place in the code. Also, the GHC API can reload new module versions, a feature that some programs use, most notably GHCi.
Now suppose that a lazy computation holds a reference to a function in a module, and the module is reloaded. Which version of the function will be executed when the lazy computation is evaluated to its full extent?
It seems natural that the old version should be executed, because its semantics was the one intended by the time the lazy computation was created, and so running the new version instead seems unreasonable. However, if the old version fails, subsequent debugging will be problematic because the code that failed no longer exists in the system, making it hard to track the cause of the failure.
All this reasoning leads me to think that laziness and code reloading (i.e. hot upgrades) are incompatible at a conceptual level. Is this really so, or is there some design decision in Haskell or GHC that solves this conceptual incompatibility?


How to reliably influence generated code at near machine level using GHC?

While this may sound as theoretical question, suppose I decide to invest and build a mission-critical application written in Haskell. A year later I find that I absolutely need to improve performance of some very thin bottleneck and this will require optimizing memory access close to raw machine capabilities.
Some assumptions:
It isn't realtime system - occasional latency spikes are tolerable (from interrupts, thread scheduling irregularities, occasional GC etc.)
It isn't a numeric problem - data layout and cache-friendly access patterns are most important (avoiding pointer chasing, reducing conditional jumps etc.)
Code may be tied to specific GHC release (but no forking)
Performance goal requires inplace modification of pre-allocated offheap arrays taking alignment into account (C strings, bit-packed fields etc.)
Data is statically bounded in arrays and allocations are rarely if ever needed
What mechanisms does GHC offer to perfom this kind of optimization? By saying reliably I mean that if source change causes code to no longer perform, it is correctible in source code without rewriting it in assembly.
Is it already possible using GHC-specific extensions and libraries?
Would custom FFI help avoid C calling convention overhead?
Could a special purpose compiler plugin do it through a restricted source DSL?
Could source code generator from a "high-level" assembly (LLVM?) be solution?
It sounds like you're looking for unboxed arrays. "unboxed" in haskell-land means "has no runtime heap representation". You can usually learn whether some part of your code is compiled to an unboxed loop (a loop that performs no allocation), say, by looking at the core representation (this is a very haskell-like language, that's the first stage in compilation). So e.g. you might see Int# in the core output which means an integer which has no heap representation (it's gonna be in a register).
When optimizing haskell code we regularly look at core and expect to be able to manipulate or correct for performance regressions by changing the source code (e.g. adding a strictness annotation, or fiddling with a function such that it can be inlined). This isn't always fun, but will be fairly stable especially if you are pinning your compiler version.
Back to unboxed arrays: GHC exposes a lot of low-level primops in GHC.Prim, in particular it sounds like you want mutable unboxed arrays (MutableByteArray). The primitive package exposes these primops behind a slightly safer, friendlier API and is what you should use (and depend on if writing your own library).
There are many other libraries that implement unboxed arrays, such as vector, and which are built on MutableByteArray, but the point is that operations on that structure generate no garbage and likely compile down to pretty predictable machine instructions.
You might also like to check out this technique if you're doing numeric work and want to use a particular instruction or implement some loop directly in assembly.
GHC also has a very powerful FFI, and you can research about how to write portions of your program in C and interop; haskell supports pinned arrays among other structures for this purpose.
If you need more control than those give you then haskell is likely the wrong language. It's impossible to tell from your description if this is the case for your problem (Your requirements seem contradictory: you need to be able to write a carefully cache-tuned algorithm, but arbitrary GC pauses are okay?).
One last note: you can't rely on GHC's native code generator to perform any of the low-level strength reduction optimizations that e.g. GCC performs (GHC's NCG will probably never ever know about bit-twiddling hacks, autovectorization, etc. etc.). Instead you can try the LLVM backend, but whether you see a speedup in your program is by no means guaranteed.

GHC Partial Evaluation and Separate Compilation

Whole-program compilers like MLton create optimized binaries in part to their ability to use the total source of the binary to perform partial evaluation: aggressively inlining constants and evaluating them until stuck—all during compilation!
This has been explored public ally a bit in the Haskell space by Gabriel Gonzalez's Morte.
Now my understanding is that Haskell does not do very much of this—if any at all. The cited reason I understand is that it is antithetical to separate compilation. This makes sense to prohibit partial evaluation across source-file boundaries, but it seems like in-file partial evaluation would still be an option.
As far as I know, in-file partial evaluation is still not performed, though.
My question is: is this true? If so, what are the tradeoffs for performing in-file partial evaluation? If not, what is an example file where one can improve compiled performance by putting more functionality into the same file?
(Edit: To clarify the above, I know there are a lot of questions as to what the best set of reductions to perform are—many are undecidable! I'd like to know the tradeoffs made in an "industrial strength" compiler with separate compilation that live at a level above choosing the right equational theory if there are any interesting things to talk about there. Things like compilation speed or file bloat are more toward the scope I'm interested in. Another question in the same space might be: "Why can't MLton get separate compilation just by compiling each module separately, leaving the API exposed, and then linking them all together?")
This is definitely an optimization that a small set of people are interested in and are pursuing. The Google search term to find information on it is "supercompilation". I believe there are at least two approaches floating about at the moment.
It seems one of the big tradeoffs is compilation-time resources (time and memory both), and at the moment the performance wins of paying these costs appear to be somewhat unpredictable. There's quite some work left. A few links:
A page on the GHC wiki
Neil Mitchell's Supero
Max Bolingbroke's Supercompilation by evaluation

Why doesn't Haskell support mutually recursive modules?

Haskell supports mutually recursive let-bindings, which is great. Haskell doesn't support mutually recursive modules, which is sometimes terrible. I know that GHC has its .hs-boot mechanism, but I think that's a bit of a hack.
As far as I know, transparent support for mutually recursive modules should be relatively "simple", and it can be done exactly like mutually recursive let-bindings: instead of taking each separate module as a compilation unit, I would take every strongly connected component of the module dependency graph as a compilation unit.
Am I missing something here? Is there any non-trivial reason why Haskell doesn't support mutually recursive modules in this way?
This 6-year-old feature request ticket contains a fair amount of discussion, which you may have already seen. The gist of it is that it's not entirely a simple change as far as GHC is concerned. A few specific issues raised:
GHC currently has a lot of baked-in assumptions about how modules are processed during compilation, and changing those assumptions significantly would vastly outweigh the benefits of transparent support for mutually recursive modules.
Lumping groups of modules together means they have to be compiled together, which means more recompilation and awkwardness with generating separate .hi and .o files.
Backward compatibility with existing builds that use hs-boot files.
You have the potential for mutually-recursive bindings that cross module boundaries in a mutually-recursive module group, which raises issues with anything that involves implicit, module-level scope (such as defaulting, and possibly type class instances).
And of course, the potential for unknown, unanticipated bugs, as with anything that alters long-standing assumptions in GHC. Even without massive changes to the compilation process, many things are currently assumed to be compiled on a per-module basis.
A lot of people would like to see this supported, but so far nobody has either produced a possible implementation or worked out a detailed, well-specified design that handles all the fiddly corner cases of the sort mentioned above.

is Haskell a managed language?

I'm a complete newbie in Haskell. One thing that always bugs me is the ambiguity in whether Haskell is a managed(term borrowed from MS) language like Java or a compile-to-native code like C?
The GHC page says this "GHC compiles Haskell code either directly to native code or using LLVM as a back-end".
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
Thanks so much for your answer. Conceptually, can you please help point out which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
There is a program that runs along side a Haskell program to perform garbage collection?
As far as I am aware the term "managed language" specifically means a language that targets .NET/the Common Language Runtime. So no, Haskell is not a managed language and neither is Java.
Regarding what Haskell is compiled to: As the documentation you quoted says, GHC compiles Haskell to native code. It can do so by either directly emitting native code or by first emitting LLVM code and then letting LLVM compile that to native code. Either way the end result of running GHC is a native executable.
Besides GHC there are also other implementations of Haskell - most notably Hugs, which is a pure interpreter that never produces an executable (native or otherwise).
how can features like garbage collection be possible without something like a JVM?
The same way that they're possible with the JVM: Every time memory is allocated, it is registered with the garbage collector. Then from time to time the garbage collector runs, following the steps of the given garbage collection algorithm. GHC-compiled code uses generational garbage collection.
In response to your edit:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
Basically. Except that saying "garbage collection routines will be added to the original program code" might paint the wrong picture. The GC routines are just part of the library that every Haskell program is linked against. The compiled code simply contains calls to those routines at the appropriate places.
Basically all there is to it is to call the GC's alloc function every time you would otherwise call malloc.
Just look at any GC library for C and how it's used: All you need to do is to #include the library's header and link against the library, and replace each occurence of malloc with the GC library's alloc function (and remove all calls to free) and bam, your code is garbage collected.
There is a program that runs along side a Haskell program to perform garbage collection?
whether Haskell is a managed(term borrowed from MS) language like Java
GHC-compiled programs include a garbage collector. (As far as I know, all implementations of Haskell include garbage collection, but this is not part of the specification.)
or a compile-to-native code like C?
GHC-compiled programs are compiled to native code. Hugs interprets programs, and does not compile to native code. There are several other implementations which all, as far as I know, compile to native code, but I list these separately because I'm not as confident of this fact.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
GHC-compiled programs include a runtime system that provides some basic capabilities like M-to-N green threading, garbage collection, and an IO manager. In a sense, this is a bit like having "something like a JVM" in that it provides many of the same features, but it's very different in implementation: there is no common bytecode across all architectures (and hence no "virtual machine").
which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
There is a program that runs along side a Haskell program to perform garbage collection?
Case 1 is correct: the runtime system code is added to the program code during compilation.
"Managed language" is an overloaded term so here are one-word answers and then some details for the usual different meanings that come to (my) mind:
Managed as in a CLR target
No, Haskell does not compile to Microsoft CLI's IL.
Well, I read there are some solutions that can do that, but imo, don't.. the CLR isn't built for FP and will seriously lack optimizations, probably yielding a research language performance. If I personally would really really want to target the CLR, I'd use F# -- it's not a functional language but it's close.
N.B. This is the most accurate and actual meaning for the term "managed language". The next meanings are, well, wrong, but nevertheless & unfortunately common.
Managed as in automatically garbage-collected
Yes, and this is pretty much a must have. I mean, beyond the specification: If we would have to garbage collect it would destroy the functional theme that makes us work in the high altitudes that are our beloved home.
It would also enforce impurity and a memory model.
Managed as in compiled to bytecode which is ran by a VM
No (usually).
It depends on your backend:
Not only we have different Haskell compilers today, some compilers have different backends -- there are even backends for JavaScript!
So if you do want to target a VM, you can use an existing / make a backend for it. But Haskell doesn't require it. So just as you can compile to native raw-metal binary, you can compile to anything else.
In contrast to CLR languages like C#1, VB.NET, and in contrast to Java, etc. you don't have to target a JVM, the CLR, Mono, etc. as Haskell doesn't require a VM at all.
GHC is a good example. When you compile in GHC, it doesn't compile you straight to binary, it compiles to an intermediate language called Core, and then optimizes from Core to Core for some times before it proceeds to another language called STG, and only then proceeds to code generation (it can stop there if you tell it to).2 And these days you can also use it to compile to LLVM bytecode (which is subject to some awesome optimizations). With the LLVM backend, GHC can produce wildly faster programs. For more information about it and about GHC backends, go here.
The diagram below illustrates the GHC compilation pipeline, and here you can find more information about the various stages.
See the fork at the bottom for three different targets? those are the backends I was referring to.
1 A future exception and a fun fact: Microsoft are currently working on native .NET! the cunningly named: Microsoft .NET Native.
What, for you, is the defining feature of a "managed language"? The phrase "GHC compiles Haskell code either directly to native code or using LLVM as a back-end" that you quote is quite clear about what GHC does, so I suspect the "ambiguity" that bugs you is rather in the term "managed language" than in GHC's docs.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
How exactly do you think "something like a JVM" implements features like garbage collection? The JVM isn't magic, it's just a program like everything else. At some level you need to have native code in order for the CPU to execute it, so clearly features like garbage collection are possible in native code.
For where you currently are, it's probably best to think of (GHC) Haskell as "managed," but that the platform GHC compiles to is not targeted by anything else. There is, of course, more to it than that, but that's a sufficient explanation in lieu of more Haskell experience.

Expression trees vs IL.Emit for runtime code specialization

I recently learned that it is possible to generate C# code at runtime and I would like to put this feature to use. I have code that does some very basic geometric calculations like computing line-plane intersections and I think I could gain some performance benefits by generating specialized code for some of the methods because many of the calculations are performed for the same plane or the same line over and over again. By specializing the code that computes the intersections I think I should be able to gain some performance benefits.
The problem is that I'm not sure where to begin. From reading a few blog posts and browsing MSDN documentation I've come across two possible strategies for generating code at runtime: Expression trees and IL.Emit. Using expression trees seems much easier because there is no need to learn anything about OpCodes and various other MSIL related intricacies but I'm not sure if expression trees are as fast as manually generated MSIL. So are there any suggestions on which method I should go with?
The performance of both is generally same, as expression trees internally are traversed and emitted as IL using the same underlying system functions that you would be using yourself. It is theoretically possible to emit a more efficient IL using low-level functions, but I doubt that there would be any practically important performance gain. That would depend on the task, but I have not come of any practical optimisation of emitted IL, compared to one emitted by expression trees.
I highly suggest getting the tool called ILSpy that reverse-compiles CLR assemblies. With that you can look at the code actually traversing the expression trees and actually emitting IL.
Finally, a caveat. I have used expression trees in a language parser, where function calls are bound to grammar rules that are compiled from a file at runtime. Compiled is a key here. For many problems I came across, when what you want to achieve is known at compile time, then you would not gain much performance by runtime code generation. Some CLR JIT optimizations might be also unavailable to dynamic code. This is only an opinion from my practice, and your domain would be different, but if performance is critical, I would rather look at native code, highly optimized libraries. Some of the work I have done would be snail slow if not using LAPACK/MKL. But that is only a piece of the advice not asked for, so take it with a grain of salt.
If I were in your situation, I would try alternatives from high level to low level, in increasing "needed time & effort" and decreasing reusability order, and I would stop as soon as the performance is good enough for the time being, i.e.:
first, I'd check to see if Math.NET, LAPACK or some similar numeric library already has similar functionality, or I can adapt/extend the code to my needs;
second, I'd try Expression Trees;
third, I'd check Roslyn Project (even though it is in prerelease version);
fourth, I'd think about writing common routines with unsafe C code;
[fifth, I'd think about quitting and starting a new career in a different profession :) ],
and only if none of these work out, would I be so hopeless to try emitting IL at run time.
But perhaps I'm biased against low level approaches; your expertise, experience and point of view might be different.
