Functional languages with concurrent garbage collectors? - garbage-collection

Microsoft's new F# programming language provides the powerful combination of functional programming (first-class lexical closures and tail calls) with an efficient concurrent garbage collector that makes it easy to leverage multicores.
OCaml, Haskell, Erlang and all free Lisp and Scheme implementations that I know of do not have concurrent GCs. Scala and Clojure have a concurrent GC but no tail calls.
So there appear to be no open source programming languages that combine these features. Is that correct?

Erlang has a shared nothing model where each process has it's own garbage collector. Whether you consider that to be no concurrency or not it's up to you. But it sure scales very well as the number of processes goes up.

Latest version of GHC supports parallel GC. See the release notes.

Scala has some tail recursion optimization. But get SISC scheme for the full thing.

Not really an answer to your question, but to my best knowledge, F# uses the standard .NET garbage collector, which is not concurrent; all threads are stopped during GC.
Edit : my mistake, there is a concurrent GC in multiprocessor mode.

Java is supposedly adding tail calls. When that happens, clojure will get them. In the meantime, you can get them manually with the loop/recur mechanism.

Related

Can I use the OCaml GC with the LLVM GC interface?

For my PHP LLVM backend I'd like to try out the OCaml GC. Is it possible to use it with LLVM? Especially:
Is the OCaml GC decoupled enough to be used outside of the compiler?
Is the LLVM GC interface mature enough to be used with the OCaml GC?
This shouldn't represent too much work as the OCaml GC is already handled in some way in LLVM: http://llvm.org/docs/GarbageCollection.html#the-erlang-and-ocaml-gcs . This means that stack frames descriptors are correctly emitted for function calls (not the smallest ones, but this should improve with current LLVM GC handling developments). An old version of LLVM's documentation tells that OCaml gc doesn't use write barriers, which is erroneous. So you should be careful to ensure that the generated code is correct for assignments.
For the LLVM GC interface, the current one is quite restricted and does not allows to generate very efficient code, but this should be sufficient to prototype while waiting for the next version that should contain some important changes on that side.
While it appears as though it would be relatively easy to tear out the OCaml GC and Frankenstein it into a different project, I'm not sure this is really something you would want to do in practice.
The OCaml garbage collector was designed with a functional programming style in mind, and this GC architecture might be a liability for a language such as PHP, which is usually not used in a functional style.
If you are set on doing this then I would suggest either waiting a few months for multicore support to be accepted into the OCaml compiler/runtime or using one of the various projects trying to bring multicore support to OCaml at the moment (the most serious of these would probably be this project by the people over at OCamllabs). Right now the OCaml GC lacks true multicore support, and while this isn't really much of a problem in practice, some people can't seem to live without it.

threads and high level languages

can someone tell me why if i use threads it's better to use an low level languages like c++
and not c# and JAVA? someone asked me that in an interview and i did'nt know the answer
It's news to me. Higher level languages provide easy to use abstractions over thread management, for example.
I expect the interviewer's point would make sense in context. It's dependent on the problem in hand - the level of timing control you need if you're writing a computer game or software for an engine management system may be greater than if you are writing a conference room booking system.
You trade off the low-level control and the associated learning curve and risk you get with lower-level languages for ease of use, safety and productivity of higher-level languages.
I don't think this is necessarily true. In Java (I can't comment on C#) a thread maps directly to a native thread. From here:
The Java HotSpotâ„¢ virtual machine
currently associates each Java thread
with a unique native thread. The
relationship between the Java thread
and the native thread is stable and
persists for the lifetime of the Java
thread.
plus you have the additional high level constructs such as the Executor framework.
Going forwards, functional languages (such as F# and Scala) encourage immutability, which contribute to a safer threaded environment.
There may well be scenarios where a low-level language offers more control (as for most requirements), but I suspect those will be fairly specialised situations. You have to balance that against the safety/productivity that the higher-level languages offer.
EDIT : From your comments supplementing the question, this may relate to running a garbage collector and consequent garbage-collection pauses and the impact on providing real-time performance and predictability. Threading in C/C++ may well offer some benefits in this area since a garbage collection cycle is not going to kick off during some critical time-dependent code. For this reason (amongst others) Java can't be considered as a real-time platform.
like most answers : it depends. languages with built in threading facilities like C# and Java
will do some or most of the work needed for thread usage and synchronization for you.
with C++ you have do it yourself but you can employ better optimization techniques for your specific OS and platform
Will you use threads or not - depends solely on application, not on language. And language is a function of design.
C++ provides more control, c# provides more abstraction, Java provides simplification, but in the end they all work the same way.

Mixing Erlang and Haskell

If you've bought into the functional programming paradigm, the chances are that you like both Erlang and Haskell. Both have purely functional cores and other goodness such as lightweight threads that make them a good fit for a multicore world. But there are some differences too.
Erlang is a commercially proven fault-tolerant language with a mature distribution model. It has a seemingly unique feature in its ability to upgrade its version at runtime via hot code loading. (Way cool!)
Haskell, on the otherhand, has the most sophisticated type system of any mainstream language. (Where I define 'mainstream' to be any language that has a published O'Reilly book so Haskell counts.) Its straightline single threaded performance looks superior to Erlang's and its lightweight threads look even lighter too.
I am trying to put together a development platform for the rest of my coding life and was wondering whether it was possible to mix Erlang and Haskell to achieve a best of breed platform. This question has two parts:
I'd like to use Erlang as a kind of fault tolerant MPI to glue GHC runtime instances together. There would be one Erlang process per GHC runtime. If "the impossible happened" and the GHC runtime died, then the Erlang process would detect that somehow and die too. Erlang's hot code loading and distribution features would just continue to work. The GHC runtime could be configured to use just one core, or all cores on the local machine, or any combination in between. Once the Erlang library was written, the rest of the Erlang level code should be purely boilerplate and automatically generated on a per application basis. (Perhaps by a Haskell DSL for example.) How does one achieve at least some of these things?
I'd like Erlang and Haskell to be able to share the same garabage collector. (This is a much further out idea than 1.) Languages that run on the JVM and the CLR achieve greater mass by sharing a runtime. I understand there are technical limitations to running Erlang (hot code loading) and Haskell (higher kinded polymorphism) on either the JVM or the CLR. But what about unbundling just the garbage collector? (Sort of the start of a runtime for functional languages.) Allocation would obviously still have to be really fast, so maybe that bit needs to be statically linked in. And there should be some mechansim to distinguish the mutable heap from the immutable heap (incuding lazy write once memory) as GHC needs this. Would it be feasible to modify both HIPE and GHC so that the garbage collectors could share a heap?
Please answer with any experiences (positive or negative), ideas or suggestions. In fact, any feedback (short of straight abuse!) is welcome.
Update
Thanks for all 4 replies to date - each taught me at least one useful thing that I did not know.
Regarding the rest of coding life thing - I included it slightly tongue in cheek to spark debate, but it is actually true. There is a project that I have in mind that I intend to work on until I die, and it needs a stable platform.
In the platform I have proposed above, I would only write Haskell, as the boilerplate Erlang would be automatically generated. So how long will Haskell last? Well Lisp is still with us and doesn't look like it is going away anytime soon. Haskell is BSD3 open source and has achieved critical mass. If programming itself is still around in 50 years time, I would expect Haskell, or some continuous evolution of Haskell, will still be here.
Update 2 in response to rvirding's post
Agreed - implementing a complete "Erskell/Haslang" universal virtual machine might not be absolutely impossible, but it would certainly be very difficult indeed. Sharing just the garbage collector level as something like a VM, while still difficult, sounds an order of magnitude less difficult to me though. At the garbage collection model, functional languages must have a lot in common - the unbiquity of immutable data (including thunks) and the requirement for very fast allocation. So the fact that commonality is bundled tightly with monolithic VMs seems kind of odd.
VMs do help achieve critical mass. Just look at how 'lite' functional languages like F# and Scala have taken off. Scala may not have the absolute fault tolerance of Erlang, but it offers an escape route for the very many folks who are tied to the JVM.
While having a single heap makes
message passing very fast it
introduces a number of other problems,
mainly that doing GC becomes more
difficult as it has to be interactive
and globally non-interruptive so you
can't use the same simpler algorithms
as the per-process heap model.
Absolutely, that makes perfect sense to me. The very smart people on the GHC development team appear to be trying to solve part of the problem with a parallel "stop the world" GC.
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf
(Obviously "stop the world" would not fly for general Erlang given its main use case.) But even in the use cases where "stop the world" is OK, their speedups do not appear to be universal. So I agree with you, it is unlikely that there is a universally best GC, which is the reason I specified in part 1. of my question that
The GHC runtime could be configured to
use just one core, or all cores on the
local machine, or any combination in
between.
In that way, for a given use case, I could, after benchmarking, choose to go the Erlang way, and run one GHC runtime (with a singlethreaded GC) plus one Erlang process per core and let Erlang copy memory between cores for good locality.
Alternatively, on a dual processor machine with 4 cores per processor with good memory bandwidth on the processor, benchmarking might suggest that I run one GHC runtime (with a parallel GC) plus one Erlang process per processor.
In both cases, if Erlang and GHC could share a heap, the sharing would probably be bound to a single OS thread running on a single core somehow. (I am getting out of my depth here, which is why I asked the question.)
I also have another agenda - benchmarking functional languages independently of GC. Often I read of results of benchmarks of OCaml v GHC v Erlang v ... and wonder how much the results are confounded by the different GCs. What if choice of GC could be orthogonal to choice of functional language? How expensive is GC anyway? See this devil advocates blog post
http://john.freml.in/garbage-collection-harmful
by my Lisp friend John Fremlin, which he has, charmingly, given his post title "Automated garbage collection is rubbish". When John claims that GC is slow and hasn't really sped up that much, I would like to be able to counter with some numbers.
A lot of Haskell and Erlang people are interested in the model where Erlang supervises distribution, while Haskell runs the shared memory nodes in parallel doing all the number crunching/logic.
A start towards this is the haskell-erlang library: http://hackage.haskell.org/package/erlang
And we have similar efforts in Ruby land, via Hubris: http://github.com/mwotton/Hubris/tree/master
The question now is to find someone to actually push through the Erlang / Haskell interop to find out the tricky issues.
You're going to have an interesting time mixing GC between Haskell and Erlang. Erlang uses a per-process heap and copies data between processes -- as Haskell doesn't even have a concept of processes, I'm not sure how you would map this "universal" GC between the two. Furthermore, for best performance, Erlang uses a variety of allocators, each with slightly tweaked behaviours that I'm sure would affect the GC sub-system.
As with all things in software, abstraction comes at a cost. In this case, I rather suspect you'd have to introduce so many layers to get both languages over their impedance mismatch that you'd wind up with a not very performant (or useful) common VM.
Bottom line -- embrace the difference! There are huge advantages to NOT running everything in the same process, particularly from a reliability standpoint. Also, I think it's a little naive to expect one language/VM to last you for the rest of your life (unless you plan on a.) living a short time or b.) becoming some sort of code monk that ONLY works on a single project). Software development is all about mental agility and being willing to use the best available tools to build fast, reliable code.
Although this is a pretty old thread, if readers are still interested then it's worth taking a look at Cloud Haskell, which brings Erlang style concurrency and distribution to the GHC stable.
The forthcoming distributed-process-platform library adds support for OTP-esque constructs like gen_servers, supervision trees and various other "haskell flavoured" abstractions borrowed from and inspired by Erlang/OTP.
You could use an OTP gen_supervisor process to monitor Haskell instances that you spawn with open_port(). Depending on how the "port" exited, you would then be able to restart it or decide that it stopped on purpose and let the corresponding Erlang process die, too.
Fugheddaboudit. Even these language-independent VMs you speak of have trouble with data passed between languages sometimes. You should just serialize data between the two somehow: database, XML-RPC, something like that.
By the way, the idea of a single platform for the rest of your life is probably impractical, too. Computing technology and fashion change too often to expect that you can keep using just one language forever. Your very question points this out: no one language does everything we might wish, even today.
As dizzyd mentioned in his comment not all data in messages is copied, large binaries exist outside of the process heaps and are not copied.
Using a different memory structure to avoid having separate per-process heaps is certainly possible and has been done in a number of earlier implementations. While having a single heap makes message passing very fast it introduces a number of other problems, mainly that doing GC becomes more difficult as it has to be interactive and globally non-interruptive so you can't use the same simpler algorithms as the per-process heap model.
As long as we use have immutable data-structures there is no problem with robustness and safety. Deciding on which memory and GC models to use is a big trade-off, and unfortunately there universally best model.
While Haskell and Erlang are both functional languages they are in many respects very different languages and have very different implementations. It would difficult to come up with an "Erskell" (or Haslang) machine which could handle both languages efficiently. I personally think it is much better to keep them separate and to make sure you have a really good interface between them.
The CLR supports tail call optimization with an explicit tail opcode (as used by F#), which the JVM doesn't (yet) have an equivalent, which limits the implementation of such a style of language. The use of separate AppDomains does allow the CLR to hot-swap code (see e.g. this blog post showing how it can be done).
With Simon Peyton Jones working just down the corridor from Don Syme and the F# team at Microsoft Research, it would be a great disappointment if we didn't eventually see an IronHaskell with some sort of official status. An IronErlang would be an interesting project -- the biggest piece of work would probably be porting the green-threading scheduler without getting as heavyweight as the Windows Workflow engine, or having to run a BEAM VM on top the CLR.

Which scripting languages support multi-core programming?

I have written a little python application and here you can see how Task Manager looks during a typical run.
(source: weinzierl.name)
While the application is perfectly multithreaded, unsurprisingly it uses only one CPU core.
Regardless of the fact that most modern scripting languages support multithreading, scripts can run on one CPU core only.
Ruby, Python, Lua, PHP all can only run on a single core.
Even Erlang, which is said to be especially good for concurrent programming, is affected.
Is there a scripting language that has built in
support for threads that are not confined to a single core?
WRAP UP
Answers were not quite what I expected, but the TCL answer comes close.
I'd like to add perl, which (much like TCL) has interpreter-based threads.
Jython, IronPython and Groovy fall under the umbrella of combining a proven language with the proven virtual machine of another language. Thanks for your hints in this
direction.
I chose Aiden Bell's answer as Accepted Answer.
He does not suggest a particular language but his remark was most insightful to me.
You seem use a definition of "scripting language" that may raise a few eyebrows, and I don't know what that implies about your other requirements.
Anyway, have you considered TCL? It will do what you want, I believe.
Since you are including fairly general purpose languages in your list, I don't know how heavy an implementation is acceptable to you. I'd be surprised if one of the zillion Scheme implementations doesn't to native threads, but off the top of my head, I can only remember the MzScheme used to but I seem to remember support was dropped. Certainly some of the Common LISP implementations do this well. If Embeddable Common Lisp (ECL) does, it might work for you. I don't use it though so I'm not sure what the state of it's threading support is, and this may of course depend on platform.
Update Also, if I recall correctly, GHC Haskell doesn't do quite what you are asking, but may do effectively what you want since, again, as I recall, it will spin of a native thread per core or so and then run its threads across those....
You can freely multi-thread with the Python language in implementations such as Jython (on the JVM, as #Reginaldo mention Groovy is) and IronPython (on .NET). For the classical CPython implementation of the Python language, as #Dan's comment mentions, multiprocessing (rather than threading) is the way to freely use as many cores as you have available
Thread syntax may be static, but implementation across operating systems and virtual machines may change
Your scripting language may use true threading on one OS and fake-threads on another.
If you have performance requirements, it might be worth looking to ensure that the scripted threads fall through to the most beneficial layer in the OS. Userspace threads will be faster, but for largely blocking thread activity kernel threads will be better.
As Groovy is based on the Java virtual machine, you get support for true threads.
F# on .NET 4 has excellent support for parallel programming and extremely good performance as well as support for .fsx files that are specifically designed for scripting. I do all my scripting using F#.
An answer for this question has already been accepted, but just to add that besides tcl, the only other interpreted scripting language that I know of that supports multithreading and thread-safe programming is Qore.
Qore was designed from the bottom up to support multithreading; every aspect of the language is thread-safe; the language was designed to support SMP scalability and multithreading natively. For example, you can use the background operator to start a new thread or the ThreadPool class to manage a pool of threads. Qore will also throw exceptions with common thread errors so that threading errors (like potential deadlocks or errors with threading APIs like trying to grab a lock that's already held by the current thread) are immediately visible to the programmer.
Qore additionally supports and thread resources; for example, a DatasourcePool allocation is treated as a thread-local resource; if you forget to commit or roll back a transaction before you end your thread, the thread resource handling for the DatasourcePool class will roll back the transaction automatically and throw an exception with user-friendly information about the problem and how it was solved.
Maybe it could be useful for you - an overview of Qore's features is here: Why use Qore?.
CSScript in combination with Parallel Extensions shouldn't be a bad option. You write your code in pure C# and then run it as a script.
It is not related to the threading mechanism. The problem is that (for example in python) you have to get interpreter instance to run the script. To acquire the interpreter you have to lock it as it is going to keep the reference count and etc and need to avoid concurrent access to this objects. Python uses pthread and they are real threads but when you are working with python objects just one thread is running an others waiting. They call this GIL (Global Interpreter Lock) and it is the main problem that makes real parallelism impossible inside a process.
https://wiki.python.org/moin/GlobalInterpreterLock
The other scripting languages may have kind of the same problem.
Guile supports POSIX threads which I believe are hardware threads.

What high level languages support multithreading? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm wondering which languages support (or don't support) native multithreading, and perhaps get some details about the implementation. Hopefully we can produce a complete overview of this specific functionality.
Erlang has built-in support for concurrent programming.
Strictly speaking, Erlang processe are greenlets. But the language and virtual machine are designed from the ground up to support concurrency. The language has specific control structures for asynchronous inter-process messaging.
In Python, greenlet is a third-party package that provides lightweight threads and channel-based messaging. But it does not bear the comparison with Erlang.
I suppose that the list of languages that are higher-level than Haskell is pretty short, and it has pretty good support for concurrency and parallelism.
With CPython, one has to remember about the GIL. To summarize: only one processor is used, even on multiprocessor machines. There are multiple ways around this, as the comment shows.
Older versions of C and C++ (namely, C89, C99, C++98, and C++03) have no support at all in the core language, although libraries such as POSIX threads are available for pretty much every platform in common user today.
The newest versions of C and C++, C11 and C++11, do have built-in threading support in the language, but it's an optional feature of C11, so implementations such as single-core embedded systems can choose not to support it while supporting the rest of C11 if they desire.
Delphi/FreePascal also has support for threads.
I'll assume, from other answers, that it's only native on the Windows platforms.
Some nice libraries that implement better features on top of the TThread Object:
OmniThreadLibrary
BMThread
Clojure is an up and coming Lisp-dialect for the JVM that is specifically designed to handle concurrency well.
It features a functional style API, some very efficient implementations of various immutable data structures, and agent system (bit like actors in Scala and processes in Erlang). It even has software transactional memory.
All in all, Clojure goes to great lenght to help you write correct multithreaded and concurrent code.
I believe that the official squeak VM does not support native (OS) threads, but that the Gemstone version does.
(Feel free to edit this if not correct).
You need to define "native" in this context.
Java claims some sort of built-in multithreading, but is just based on coarse grained locking and some library support. At this moment, it is not more 'native' than C with the POSIX threads. The next version of C++ (0x) will include a threading library as well.
I know Java and C# support multithreading and that the next version of C++ will support it directly... (The planned implementation is available as part of the boost.org libraries...)
Boost::thread is great, I'm not sure whether you can say its part of the language though. It depends if you consider the CRT/STL/Boost to be 'part' of C++, or an optional add-on library.
(otherwise practically no language has native threading as they're all a feature of the OS).
This question doesn't make sense: whether a particular implementation chooses to implement threads as native threads or green threads has nothing to do with the language, that is an internal implementation detail.
There are Java implementations that use native threads and Java implementations that use green threads.
There are Ruby implementations that use native threads and Ruby implementations that use green threads.
There are Python implementations that use native threads and Python implementations that use green threads.
There are even POSIX Thread implementations that use green threads, e.g. the old LinuxThreads library or the GNU pth library.
And just because an implementation uses native threads doesn't mean that these threads can actually run in parallel; many implementations use a Global Interpreter Lock to ensure only one thread can run at a time. On the other hand, using green threads doesn't mean that they can't run in parallel: the BEAM Erlang VM for example can schedule its green threads (more precisely green processes) across mulitple CPU cores, the same is planned for the Rubinius Ruby VM.
Perl doesn't usefully support native threads.
Yes, there is a Perl threads module, and yes it uses native platform threads in its implementation. The problem is, it isn't very useful in the general case.
When you create a new thread using Perl threads, it copies the entire state of the Perl interpreter. This is very slow and uses lots of RAM. In fact it's probably slower than using fork() on Unix, as the latter uses copy-on-write and Perl threads do not.
But in general each language has its own threading model, some are different from others. Python (mostly) uses native platform threads but has a big lock which ensures that only one runs (Python code) at once. This actually has some advantages.
Aren't threads out of fashion these days in favour of processes? (Think Google Chrome, IE8)
I made a multithreading extension for Lua recently, called Lua Lanes. It merges multithreading concepts so naturally to the language, that I would not see 'built in' multithreading being any better.
For the record, Lua's built in co-operative multithreading (coroutines) can often also be used. With or without Lanes.
Lanes has no GIL, and runs code in separate Lua universes per thread. Thus, unless your C libraries would crash, it is immune to the problems associated with thread usage. In fact, the concept is more like processes and message passing, although only one OS process is used.
Finally Go is here with multi-threading with its own pkg Goroutine.
People say it is on the structure of C-language.
It is also easy to use and understand .
Perl and Python do. Ruby is working on it, but the threads in Ruby 1.8 are not really threads.

Resources