Can I use the OCaml GC with the LLVM GC interface? - garbage-collection

For my PHP LLVM backend I'd like to try out the OCaml GC. Is it possible to use it with LLVM? Especially:
Is the OCaml GC decoupled enough to be used outside of the compiler?
Is the LLVM GC interface mature enough to be used with the OCaml GC?

This shouldn't represent too much work as the OCaml GC is already handled in some way in LLVM: http://llvm.org/docs/GarbageCollection.html#the-erlang-and-ocaml-gcs . This means that stack frames descriptors are correctly emitted for function calls (not the smallest ones, but this should improve with current LLVM GC handling developments). An old version of LLVM's documentation tells that OCaml gc doesn't use write barriers, which is erroneous. So you should be careful to ensure that the generated code is correct for assignments.
For the LLVM GC interface, the current one is quite restricted and does not allows to generate very efficient code, but this should be sufficient to prototype while waiting for the next version that should contain some important changes on that side.

While it appears as though it would be relatively easy to tear out the OCaml GC and Frankenstein it into a different project, I'm not sure this is really something you would want to do in practice.
The OCaml garbage collector was designed with a functional programming style in mind, and this GC architecture might be a liability for a language such as PHP, which is usually not used in a functional style.
If you are set on doing this then I would suggest either waiting a few months for multicore support to be accepted into the OCaml compiler/runtime or using one of the various projects trying to bring multicore support to OCaml at the moment (the most serious of these would probably be this project by the people over at OCamllabs). Right now the OCaml GC lacks true multicore support, and while this isn't really much of a problem in practice, some people can't seem to live without it.

Related

How to reliably influence generated code at near machine level using GHC?

While this may sound as theoretical question, suppose I decide to invest and build a mission-critical application written in Haskell. A year later I find that I absolutely need to improve performance of some very thin bottleneck and this will require optimizing memory access close to raw machine capabilities.
Some assumptions:
It isn't realtime system - occasional latency spikes are tolerable (from interrupts, thread scheduling irregularities, occasional GC etc.)
It isn't a numeric problem - data layout and cache-friendly access patterns are most important (avoiding pointer chasing, reducing conditional jumps etc.)
Code may be tied to specific GHC release (but no forking)
Performance goal requires inplace modification of pre-allocated offheap arrays taking alignment into account (C strings, bit-packed fields etc.)
Data is statically bounded in arrays and allocations are rarely if ever needed
What mechanisms does GHC offer to perfom this kind of optimization? By saying reliably I mean that if source change causes code to no longer perform, it is correctible in source code without rewriting it in assembly.
Is it already possible using GHC-specific extensions and libraries?
Would custom FFI help avoid C calling convention overhead?
Could a special purpose compiler plugin do it through a restricted source DSL?
Could source code generator from a "high-level" assembly (LLVM?) be solution?
It sounds like you're looking for unboxed arrays. "unboxed" in haskell-land means "has no runtime heap representation". You can usually learn whether some part of your code is compiled to an unboxed loop (a loop that performs no allocation), say, by looking at the core representation (this is a very haskell-like language, that's the first stage in compilation). So e.g. you might see Int# in the core output which means an integer which has no heap representation (it's gonna be in a register).
When optimizing haskell code we regularly look at core and expect to be able to manipulate or correct for performance regressions by changing the source code (e.g. adding a strictness annotation, or fiddling with a function such that it can be inlined). This isn't always fun, but will be fairly stable especially if you are pinning your compiler version.
Back to unboxed arrays: GHC exposes a lot of low-level primops in GHC.Prim, in particular it sounds like you want mutable unboxed arrays (MutableByteArray). The primitive package exposes these primops behind a slightly safer, friendlier API and is what you should use (and depend on if writing your own library).
There are many other libraries that implement unboxed arrays, such as vector, and which are built on MutableByteArray, but the point is that operations on that structure generate no garbage and likely compile down to pretty predictable machine instructions.
You might also like to check out this technique if you're doing numeric work and want to use a particular instruction or implement some loop directly in assembly.
GHC also has a very powerful FFI, and you can research about how to write portions of your program in C and interop; haskell supports pinned arrays among other structures for this purpose.
If you need more control than those give you then haskell is likely the wrong language. It's impossible to tell from your description if this is the case for your problem (Your requirements seem contradictory: you need to be able to write a carefully cache-tuned algorithm, but arbitrary GC pauses are okay?).
One last note: you can't rely on GHC's native code generator to perform any of the low-level strength reduction optimizations that e.g. GCC performs (GHC's NCG will probably never ever know about bit-twiddling hacks, autovectorization, etc. etc.). Instead you can try the LLVM backend, but whether you see a speedup in your program is by no means guaranteed.

What happened to libgreen?

As far as I understand libgreen is not a part of Rust standard library anymore. Also I can't find a separate libgreen package. There are a few alternatives - coroutine, which does not provide actual green threads for now, and green-rs, which is broken. Do I right understand that for now there is no lightweight Go-like processes in Rust?
You are correct that there's no lightweight tasking library in std (or the rest of the main distribution), that green doesn't compile and that coroutine doesn't seem to fully handle the threading aspect yet. I do not know of any other library in this space.
As for what happened: the RFC linked to by that issue—RFC 230—is the canonical source of information. The summary is that it was found that the method by which green threading/IO was handled (std tried to abstract across both models, allowing them to be used interoperably automagically) was not worth the downsides. Now, std aims to just provide a minimum base-line of useful support: for IO/threading, that means "thin", safe wrappers for operating system functionality.
Read this https://aturon.github.io/blog/2016/08/11/futures/ and also:
Steve Klabnik's response in the comments:
In the beginning, Rust had only green threads. Eventually, it was
decided that a systems language without systems threads is... strange.
So we needed to add them. Why not add choice? Since the interfaces
could be the same, why not abstract over them, and you could just
choose which one you wanted?
At the same time, the problems with green threads by default were
becoming issues. Segmented stacks cause slow C interop. You need a
runtime to manage them, etc. Furthermore, the overall abstraction was
causing an unacceptable cost. The green threads weren't very green.
Plus, with the need to actually release someday looming, decisions
needed to be made regarding tradeoffs. And since Rust is supposed to
be a systems language, having 1:1 threads and basically no runtime
makes more sense than N:M threads and a runtime. . So libgreen was
removed, the interface was re-done to be 1:1 thread centric.
The 'release someday looming' is a big part of it. We want to be
really stable with Rust, and with all the things to do to actually
ship a 1.0, we didn't want to crystallize an interface we weren't
happy with. Heck, we pulled out a lot of libraries that are even less
important for similar reasons, like rand. Engineering is all about
tradeoffs, and we decided to choose minimalism.
mio is a non starter for us, as are most of the other async i/o frameworks for Rust, because we need Windows and besides we don't want
to get locked into an expensive to replace library which may get
orphaned.
Totally understood here, especially in the general case. In the
specific case, mio is going to either have Windows support, or a
windows-specific version of mio is going to be released, with a
higher-level package providing the features for all platforms. And in
this case, it's maintained by one of the people who's currently using
Rust heavily in production, so it's not likely to go away anytime
soon. But, unless you're actively involved, it's hard to know things
like that, which is, of itself an issue.
One of the reasons we were comfortable removing libgreen is that you
can write your own libraries to do different kinds of IO. 1.0 is a
strong core that we feel good about stabilizing forever, not the final
bit. Libraries like https://github.com/carllerche/mio can test out
different ways of handling things like async IO, and, when they're
mature enough, we can always pull them back in the standard library if
need be. But in the meantime, it's just one line to your Cargo.toml to
add them in.
And such text from reddit:
Unfortunately they ended up canning the greenlet support because
theirs were slower than kernel threads which in turn demonstrates
someone didn’t understand how to get a language compiler to generate
stackless coroutines effectively (not surprising, the number of
engineers wired the right way is not many in this world, but see
http://www.reddit.com/r/rust/comments/2l0a4b/do_rust_web_servers_use_libuv_through_libgreen_or/
for more detail). And they canned the async i/o because libuv is
“slow” (which it is only because it is single threaded only, plus
forces a malloc + free per async operation as the buffers must last
until completion occurs, plus it enforces a penalty over synchronous
i/o see
http://blog.kazuhooku.com/2014/09/the-reasons-why-i-stopped-using-libuv.html),
which was a real shame - they should have taken the opportunity to
replace libuv with something better (hint: ASIO + AFIO, and yes I know
they are both C++, but Rust could do with much better C++ interop than
the presently none it currently has) instead of canning
always-async-everything in what could have been an amazing step up
from C++ with most of the benefits of Erlang without the disadvantages
of Erlang.
For newcomers, there is now may, a crate that implements green threads similar to goroutines.

Efficient GC write barrier in generated C

I'm designing a system that up-front compiles CIL bytecode. In order to keep it relatively simple and also make it very portable, the system will emit C source code (however with all higher-level constructs like OOP factored out) instead of machine code. The intention is that a standard C compiler for the target platform will then be used on that code to get the end product.
Initially I intend to use a very simple GC approach such as stop-the-world. However, while the application doesn't require stellar performance, it does need decent performance, so eventually the GC may need to be changed.
I'm thinking ahead to eventually a more sophisticated GC requiring some sort of write barrier. I've looked at SATB and card marking approaches but I'm not ready to actually plan out a good GC yet. I just don't want to shoot myself in the foot by having the thing emit C source code only to later find that an efficient GC write barrier will demand inline assembly, largely defeating the purpose of emitting C.
So, my question is, can typical write barriers be efficiently implemented in C code? We can assume that the C compiler has a decent optimizer. It's also already a given that the resultant "source code" will be utterly illegible, so clarity doesn't matter.
I'm guessing that - at the expense of bloating the source files even more, - it probably can be done reasonably, but I'd appreciate word from people more experienced in GC design and/or compiler internals.
I am assuming you want a precise generational moving or copying GC.
You could have a write barrier in C; as an example, both Ocaml and MELT runtimes have generational GC with a write barrier. And qish is a generational copying GC with a write barrier, working with C.
(BTW, MELT is a domain specific language to extend GCC, and it is compiled to C, exactly like you want to do)
A more significant issue is how do you keep local pointers (and how the GC knows about them), that is the precise aspect of your GC. You may want to pack them in some local structure.... But then it could happen that the C compiler (e.g. GCC) is optimizing a bit less.
You might look into the source code of recent versions of MONO, they have a generational copying GC. Look also inside Chicken Scheme (also generating C code).
Notice that your C code generator will have to be changed to fit inside some (or yours) particular GC implementation (because each GC has slightly different invariants and expectations). Take also care of tail recursion (some C compilers, notably recent GCC, may optimize them in limited cases).
In Qish, MELT or Ocaml the write barrier (on the C side) is implemented by some macro (or inline functions) called for each touched pointer. Details are implementation specific. Your C code generator will have to take care of them.
Be careful that multi-threaded GC are difficult to design, and that debugging GC, even simple ones, takes a lot of time and is difficult.

is Haskell a managed language?

I'm a complete newbie in Haskell. One thing that always bugs me is the ambiguity in whether Haskell is a managed(term borrowed from MS) language like Java or a compile-to-native code like C?
The GHC page says this "GHC compiles Haskell code either directly to native code or using LLVM as a back-end".
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
/Update/
Thanks so much for your answer. Conceptually, can you please help point out which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
OR
There is a program that runs along side a Haskell program to perform garbage collection?
As far as I am aware the term "managed language" specifically means a language that targets .NET/the Common Language Runtime. So no, Haskell is not a managed language and neither is Java.
Regarding what Haskell is compiled to: As the documentation you quoted says, GHC compiles Haskell to native code. It can do so by either directly emitting native code or by first emitting LLVM code and then letting LLVM compile that to native code. Either way the end result of running GHC is a native executable.
Besides GHC there are also other implementations of Haskell - most notably Hugs, which is a pure interpreter that never produces an executable (native or otherwise).
how can features like garbage collection be possible without something like a JVM?
The same way that they're possible with the JVM: Every time memory is allocated, it is registered with the garbage collector. Then from time to time the garbage collector runs, following the steps of the given garbage collection algorithm. GHC-compiled code uses generational garbage collection.
In response to your edit:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
Basically. Except that saying "garbage collection routines will be added to the original program code" might paint the wrong picture. The GC routines are just part of the library that every Haskell program is linked against. The compiled code simply contains calls to those routines at the appropriate places.
Basically all there is to it is to call the GC's alloc function every time you would otherwise call malloc.
Just look at any GC library for C and how it's used: All you need to do is to #include the library's header and link against the library, and replace each occurence of malloc with the GC library's alloc function (and remove all calls to free) and bam, your code is garbage collected.
There is a program that runs along side a Haskell program to perform garbage collection?
No.
whether Haskell is a managed(term borrowed from MS) language like Java
GHC-compiled programs include a garbage collector. (As far as I know, all implementations of Haskell include garbage collection, but this is not part of the specification.)
or a compile-to-native code like C?
GHC-compiled programs are compiled to native code. Hugs interprets programs, and does not compile to native code. There are several other implementations which all, as far as I know, compile to native code, but I list these separately because I'm not as confident of this fact.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
GHC-compiled programs include a runtime system that provides some basic capabilities like M-to-N green threading, garbage collection, and an IO manager. In a sense, this is a bit like having "something like a JVM" in that it provides many of the same features, but it's very different in implementation: there is no common bytecode across all architectures (and hence no "virtual machine").
which one of my following understandings of garbage collection in Haskell is correct:
GHC compiles Haskell code to native code. In the processing of compiling, garbage collection routines will be added to the original program code?
There is a program that runs along side a Haskell program to perform garbage collection?
Case 1 is correct: the runtime system code is added to the program code during compilation.
"Managed language" is an overloaded term so here are one-word answers and then some details for the usual different meanings that come to (my) mind:
Managed as in a CLR target
No, Haskell does not compile to Microsoft CLI's IL.
Well, I read there are some solutions that can do that, but imo, don't.. the CLR isn't built for FP and will seriously lack optimizations, probably yielding a research language performance. If I personally would really really want to target the CLR, I'd use F# -- it's not a functional language but it's close.
N.B. This is the most accurate and actual meaning for the term "managed language". The next meanings are, well, wrong, but nevertheless & unfortunately common.
Managed as in automatically garbage-collected
Yes, and this is pretty much a must have. I mean, beyond the specification: If we would have to garbage collect it would destroy the functional theme that makes us work in the high altitudes that are our beloved home.
It would also enforce impurity and a memory model.
Managed as in compiled to bytecode which is ran by a VM
No (usually).
It depends on your backend:
Not only we have different Haskell compilers today, some compilers have different backends -- there are even backends for JavaScript!
So if you do want to target a VM, you can use an existing / make a backend for it. But Haskell doesn't require it. So just as you can compile to native raw-metal binary, you can compile to anything else.
In contrast to CLR languages like C#1, VB.NET, and in contrast to Java, etc. you don't have to target a JVM, the CLR, Mono, etc. as Haskell doesn't require a VM at all.
GHC is a good example. When you compile in GHC, it doesn't compile you straight to binary, it compiles to an intermediate language called Core, and then optimizes from Core to Core for some times before it proceeds to another language called STG, and only then proceeds to code generation (it can stop there if you tell it to).2 And these days you can also use it to compile to LLVM bytecode (which is subject to some awesome optimizations). With the LLVM backend, GHC can produce wildly faster programs. For more information about it and about GHC backends, go here.
The diagram below illustrates the GHC compilation pipeline, and here you can find more information about the various stages.
See the fork at the bottom for three different targets? those are the backends I was referring to.
1 A future exception and a fun fact: Microsoft are currently working on native .NET! the cunningly named: Microsoft .NET Native.
What, for you, is the defining feature of a "managed language"? The phrase "GHC compiles Haskell code either directly to native code or using LLVM as a back-end" that you quote is quite clear about what GHC does, so I suspect the "ambiguity" that bugs you is rather in the term "managed language" than in GHC's docs.
In the case of "compiled to native code", how can features like garbage collection be possible without something like a JVM?
How exactly do you think "something like a JVM" implements features like garbage collection? The JVM isn't magic, it's just a program like everything else. At some level you need to have native code in order for the CPU to execute it, so clearly features like garbage collection are possible in native code.
For where you currently are, it's probably best to think of (GHC) Haskell as "managed," but that the platform GHC compiles to is not targeted by anything else. There is, of course, more to it than that, but that's a sufficient explanation in lieu of more Haskell experience.

Mixing Erlang and Haskell

If you've bought into the functional programming paradigm, the chances are that you like both Erlang and Haskell. Both have purely functional cores and other goodness such as lightweight threads that make them a good fit for a multicore world. But there are some differences too.
Erlang is a commercially proven fault-tolerant language with a mature distribution model. It has a seemingly unique feature in its ability to upgrade its version at runtime via hot code loading. (Way cool!)
Haskell, on the otherhand, has the most sophisticated type system of any mainstream language. (Where I define 'mainstream' to be any language that has a published O'Reilly book so Haskell counts.) Its straightline single threaded performance looks superior to Erlang's and its lightweight threads look even lighter too.
I am trying to put together a development platform for the rest of my coding life and was wondering whether it was possible to mix Erlang and Haskell to achieve a best of breed platform. This question has two parts:
I'd like to use Erlang as a kind of fault tolerant MPI to glue GHC runtime instances together. There would be one Erlang process per GHC runtime. If "the impossible happened" and the GHC runtime died, then the Erlang process would detect that somehow and die too. Erlang's hot code loading and distribution features would just continue to work. The GHC runtime could be configured to use just one core, or all cores on the local machine, or any combination in between. Once the Erlang library was written, the rest of the Erlang level code should be purely boilerplate and automatically generated on a per application basis. (Perhaps by a Haskell DSL for example.) How does one achieve at least some of these things?
I'd like Erlang and Haskell to be able to share the same garabage collector. (This is a much further out idea than 1.) Languages that run on the JVM and the CLR achieve greater mass by sharing a runtime. I understand there are technical limitations to running Erlang (hot code loading) and Haskell (higher kinded polymorphism) on either the JVM or the CLR. But what about unbundling just the garbage collector? (Sort of the start of a runtime for functional languages.) Allocation would obviously still have to be really fast, so maybe that bit needs to be statically linked in. And there should be some mechansim to distinguish the mutable heap from the immutable heap (incuding lazy write once memory) as GHC needs this. Would it be feasible to modify both HIPE and GHC so that the garbage collectors could share a heap?
Please answer with any experiences (positive or negative), ideas or suggestions. In fact, any feedback (short of straight abuse!) is welcome.
Update
Thanks for all 4 replies to date - each taught me at least one useful thing that I did not know.
Regarding the rest of coding life thing - I included it slightly tongue in cheek to spark debate, but it is actually true. There is a project that I have in mind that I intend to work on until I die, and it needs a stable platform.
In the platform I have proposed above, I would only write Haskell, as the boilerplate Erlang would be automatically generated. So how long will Haskell last? Well Lisp is still with us and doesn't look like it is going away anytime soon. Haskell is BSD3 open source and has achieved critical mass. If programming itself is still around in 50 years time, I would expect Haskell, or some continuous evolution of Haskell, will still be here.
Update 2 in response to rvirding's post
Agreed - implementing a complete "Erskell/Haslang" universal virtual machine might not be absolutely impossible, but it would certainly be very difficult indeed. Sharing just the garbage collector level as something like a VM, while still difficult, sounds an order of magnitude less difficult to me though. At the garbage collection model, functional languages must have a lot in common - the unbiquity of immutable data (including thunks) and the requirement for very fast allocation. So the fact that commonality is bundled tightly with monolithic VMs seems kind of odd.
VMs do help achieve critical mass. Just look at how 'lite' functional languages like F# and Scala have taken off. Scala may not have the absolute fault tolerance of Erlang, but it offers an escape route for the very many folks who are tied to the JVM.
While having a single heap makes
message passing very fast it
introduces a number of other problems,
mainly that doing GC becomes more
difficult as it has to be interactive
and globally non-interruptive so you
can't use the same simpler algorithms
as the per-process heap model.
Absolutely, that makes perfect sense to me. The very smart people on the GHC development team appear to be trying to solve part of the problem with a parallel "stop the world" GC.
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf
(Obviously "stop the world" would not fly for general Erlang given its main use case.) But even in the use cases where "stop the world" is OK, their speedups do not appear to be universal. So I agree with you, it is unlikely that there is a universally best GC, which is the reason I specified in part 1. of my question that
The GHC runtime could be configured to
use just one core, or all cores on the
local machine, or any combination in
between.
In that way, for a given use case, I could, after benchmarking, choose to go the Erlang way, and run one GHC runtime (with a singlethreaded GC) plus one Erlang process per core and let Erlang copy memory between cores for good locality.
Alternatively, on a dual processor machine with 4 cores per processor with good memory bandwidth on the processor, benchmarking might suggest that I run one GHC runtime (with a parallel GC) plus one Erlang process per processor.
In both cases, if Erlang and GHC could share a heap, the sharing would probably be bound to a single OS thread running on a single core somehow. (I am getting out of my depth here, which is why I asked the question.)
I also have another agenda - benchmarking functional languages independently of GC. Often I read of results of benchmarks of OCaml v GHC v Erlang v ... and wonder how much the results are confounded by the different GCs. What if choice of GC could be orthogonal to choice of functional language? How expensive is GC anyway? See this devil advocates blog post
http://john.freml.in/garbage-collection-harmful
by my Lisp friend John Fremlin, which he has, charmingly, given his post title "Automated garbage collection is rubbish". When John claims that GC is slow and hasn't really sped up that much, I would like to be able to counter with some numbers.
A lot of Haskell and Erlang people are interested in the model where Erlang supervises distribution, while Haskell runs the shared memory nodes in parallel doing all the number crunching/logic.
A start towards this is the haskell-erlang library: http://hackage.haskell.org/package/erlang
And we have similar efforts in Ruby land, via Hubris: http://github.com/mwotton/Hubris/tree/master
The question now is to find someone to actually push through the Erlang / Haskell interop to find out the tricky issues.
You're going to have an interesting time mixing GC between Haskell and Erlang. Erlang uses a per-process heap and copies data between processes -- as Haskell doesn't even have a concept of processes, I'm not sure how you would map this "universal" GC between the two. Furthermore, for best performance, Erlang uses a variety of allocators, each with slightly tweaked behaviours that I'm sure would affect the GC sub-system.
As with all things in software, abstraction comes at a cost. In this case, I rather suspect you'd have to introduce so many layers to get both languages over their impedance mismatch that you'd wind up with a not very performant (or useful) common VM.
Bottom line -- embrace the difference! There are huge advantages to NOT running everything in the same process, particularly from a reliability standpoint. Also, I think it's a little naive to expect one language/VM to last you for the rest of your life (unless you plan on a.) living a short time or b.) becoming some sort of code monk that ONLY works on a single project). Software development is all about mental agility and being willing to use the best available tools to build fast, reliable code.
Although this is a pretty old thread, if readers are still interested then it's worth taking a look at Cloud Haskell, which brings Erlang style concurrency and distribution to the GHC stable.
The forthcoming distributed-process-platform library adds support for OTP-esque constructs like gen_servers, supervision trees and various other "haskell flavoured" abstractions borrowed from and inspired by Erlang/OTP.
You could use an OTP gen_supervisor process to monitor Haskell instances that you spawn with open_port(). Depending on how the "port" exited, you would then be able to restart it or decide that it stopped on purpose and let the corresponding Erlang process die, too.
Fugheddaboudit. Even these language-independent VMs you speak of have trouble with data passed between languages sometimes. You should just serialize data between the two somehow: database, XML-RPC, something like that.
By the way, the idea of a single platform for the rest of your life is probably impractical, too. Computing technology and fashion change too often to expect that you can keep using just one language forever. Your very question points this out: no one language does everything we might wish, even today.
As dizzyd mentioned in his comment not all data in messages is copied, large binaries exist outside of the process heaps and are not copied.
Using a different memory structure to avoid having separate per-process heaps is certainly possible and has been done in a number of earlier implementations. While having a single heap makes message passing very fast it introduces a number of other problems, mainly that doing GC becomes more difficult as it has to be interactive and globally non-interruptive so you can't use the same simpler algorithms as the per-process heap model.
As long as we use have immutable data-structures there is no problem with robustness and safety. Deciding on which memory and GC models to use is a big trade-off, and unfortunately there universally best model.
While Haskell and Erlang are both functional languages they are in many respects very different languages and have very different implementations. It would difficult to come up with an "Erskell" (or Haslang) machine which could handle both languages efficiently. I personally think it is much better to keep them separate and to make sure you have a really good interface between them.
The CLR supports tail call optimization with an explicit tail opcode (as used by F#), which the JVM doesn't (yet) have an equivalent, which limits the implementation of such a style of language. The use of separate AppDomains does allow the CLR to hot-swap code (see e.g. this blog post showing how it can be done).
With Simon Peyton Jones working just down the corridor from Don Syme and the F# team at Microsoft Research, it would be a great disappointment if we didn't eventually see an IronHaskell with some sort of official status. An IronErlang would be an interesting project -- the biggest piece of work would probably be porting the green-threading scheduler without getting as heavyweight as the Windows Workflow engine, or having to run a BEAM VM on top the CLR.

Resources