Haskell on JVM? - haskell

I'm wondering if there is some way to make Haskell run on the JVM (compiled or interpreted)?
There exists JHaskell on Sourceforge but this one seems to be empty and dead.
GHC uses LLVM as compiler backend. Would it be a good idea or possible to compile LLVM to Java bytecode? Or maybe use a different compiler backend?

You may want to investigate Frege. Quoting from that page:
"Frege is a non-strict, pure functional programming language in the spirit of Haskell."
"Frege programs are compiled to Java and run in a JVM."
Based on a brief perusal of the language specification, Frege looks to be nearly a Haskell clone. Perhaps the phrase "in the spirit of Haskell" is simpy intended to set the proper expectation.

Haskell works beautifully on the JVM. See Eta, a project that brings full GHC 7.10.3 Haskell onto the JVM with type-safe Java interop.

The only language I know that is close to haskell in the JVM is CAL. CAL is heavily based on haskell but it doesn't have all haskell's features. The type system is similar to Haskell 98, and syntactic sugar like do notation is missing.
Here's a comparison of Haskell and CAL: CAL for Haskell Programmers
The eclipse plugin is very polished and useful.
Note that CAL is part of the Open Quark framework.
main site: http://openquark.org/Welcome.html
download page: http://openquark.org/Download.html
Source on Github: https://github.com/levans/Open-Quark

There are big but surmountable impediments to GHC building to the JVM:
http://www.haskell.org/haskellwiki/GHC:FAQ#Why_isn.27t_GHC_available_for_.NET_or_on_the_JVM.3F
(Got a spare year or two to make it happen?)

Related

Confusions arising from a programming language whose compiler is written in itself

By accident I knew that the compiler of Haskell is written in Haskell. It sounds strange to me. How is this possible, I mean, to compile itself? Who is to compile the compiler then? What is the ultimate code accepted by machine?
Consider the programming language who is the first to have a compiler. What is the language of its compiler? Going back even farther in time, how did people program before the era of compiler?
Broadly speaking, I am often confused about the border between software (e.g. programming written by people) and hardware (e.g. something executable on a physical machine).
P.S.: I have basic knowledge about compiler such as lexical analysis, parsing, and code optimization. However, I know little about hardware (the machine).
It seems that the answer to a related post Implementing a compiler in “itself” does not go deeply into the border between software and hardware.
And I would like to see some concrete examples.
EDIT: Some comments mentioned the term "bootstrapping". It seems that there is some minimum core part of a language (like axioms/basic theorems in mathematics) which must be compiled in a lower-level way (instead of by itself). What are they? Are they basically the same in different languages? Again, I would like to see some concrete examples.
As you can read in A history of Haskell page 28, the first haskell compiler was written in Lazy ML in June 1989. It implemented essentially all of Haskell 1.0.
Now that this compiler existed, it can then be used to compile the Haskell version of GHC. The first beta of GHC written in Haskell was release on 1 April 1991. The full release came in December 1992.
Because the Lazy ML-based compiler wasn't developed further, today you use a previous version of GHC to compile GHC. So if you want to build GHC 7.8, you use GHC 7.6 to build it (in practice, it's a bit more complicated, because there are multiple stages and only the first stage, which doesn't support GHCi or TemplateHaskell is built with GHC 7.6)
That means that if you don't have a working haskell compiler today, you have two options:
Try to install a LML compiler and compile the first version of GHC written in Lazy ML. Then use this compiler to compile the next version which is written in Haskell. Then again use that compiler to build the next version, and repeat until you have a reasonably recent compiler. It may be possible to skip a few versions, but I don't know how many. As you can imagine, this could take a lot of time.
(Much easier) Download pre-built GHC binaries.
Um... I have not tried this, but another route would be simply compiling to c and using a c compiler to compile latest ghc...ghc is itself built in stages, so you don't really even need to convert the whole code base to c, just the first stage, which then can compile the rest. Certainly no need to dig up Lazy ML.
Edit: Note the resulting compiler will not build binaries targeting the new platform, it would simply run on that platform and be a cross-platform compiler for targets that ghc already has backends for. Another note is that i actually intended this in response to bennofs answer, not as stand alone answer to the OP.

Compiled interpreted language

Is there a programming language, having usable interactive interpreter, even as it can be compiled to machine code?
Compilation vs. "interpretation" is essentially a matter of implementation, not the language itself. For example, MRI Ruby 1.8 is interpreted, while MacRuby is compiled to native machine code. Both include an interactive REPL. All the languages I know that have at least one machine-code compiler and at least one REPL:
Ruby
Python
Almost all Lisps (Lisp was the language that pioneered this technique, AFAIK)
OCaml
Haskell
Forth
If we're counting compilation to bytecode as well as machine code, it's true of the vast majority of popular bytecode-compiled languages:
Java
Scala
Groovy
Erlang
C#
F#
Smalltalk
Haskell, using the Glasgow Haskell Compiler which has an interactive "shell" called GHCi.
Many flavors of Lisp offer both options, including Clojure.
Two come to my mind : ocaml and scala (~= java), but I'm sure there must be a lot more out there.
And here's another one to burn your house down:
x86 Assembly
Yup, there are interpreters for this as well.
Javascript x86 Assembly Interpreter
Jasmin
At this point you're really in emulator land, but it does meet the requirements you state.
I'm wondering if it's easier to name compiled languages that someone hasn't cobbled up a working interpreter for. :-)
Lua has an interactive mode for one-liners and experimentation. It normally compiles to bytecode for its VM for execution. LuaJIT is an independent implementation of a Lua VM that also does just-in-time compilation to 32-bit x86. Support for 64-bit is underway, and support for ARM is frequently requested.
Compilation to a bytecode is often a reasonable compromise between a pure interpreter and a pure compiler. The VM can be tuned to the needs of the language, and JIT techniques can analyze the VM code as it executes and concentrate on frequently executed code paths and inner loops.
As others have mentioned, OCaml.
If managed code (.NET CLI) is close enough to machine code, F# would be a candidate as well. There are probably other .NET/Mono languages which meet the requirement as well.
You may regret you asked:
C and C++.
Why?
Ch
CINT
EIC
picocc
and there are probably others out there as well.
Plenty of languages offer an implementation that both interacts and compiles to machine code, but it's rare to do both at once. Standard ML of New Jersey is one that has an interactive loop but no bytecode: it simply compiles to machine code in memory and then branches to it.
Not exactly machine code, but Java can be compiled and also used via BeanShell.
I've used Ruby with an interpreter, and there seems to be a compiler here.
Icon used to have a compiler, but it falls in and out of maintenence. It may still work.
Python can be compiled to windows executables.
C# can be compiled by using SnippetCompiler, maybe this would act as an interactive interpreter for you?
Your question is a bit vague. Even Java would fit it:
by interactive interpreter, i mean
shell-like environment, where you can
work in the runtime interactively.
Java has this, e.g. in the Eclipse "scrapbook pages", where you can enter Java expressions and have them evaluated right away. Java is of course also a compiled language (and while it's usually compiled to bytecode, there are various compilers that output machine code).
So what are you looking for? Maybe you could explain your problem or interest.
I tried using mono/.net for a bit and found random GC pauses to be disagreeable (at least on my crusty old laptop). I looked at using gambit-c an implementation of scheme that can compile to C but it seemed difficult to work with because the docs were somewhat limited and the packages where not very easy to install and use.
I usually just stick to having an interpreted language such as python bound to C/C++ which is more painful but at least I know what I am in for.

How can a programming language be "implemented"?

maybe this is just a little misunderstanding but how can a programming language be implemented?
I'm not talking about how to implement my own programming language but about the word "implemented"?
I mean, you can implement a compiler or an interpreter, but a programming language?
What does it mean if I read "C++ is implemented in C" or "Python was implemented in C"?
I think a language is more sth. like a protocol of how someone thinks about things should be implemented. For example, if he wants do display a messagebox he can say the command for this is ShowMessageBox(string) and implement a compiler who will translate this into something that works on a computer (aside from the selected programming paradigms he imagines).
I think this question leads to the question "what is a programming language in reality"? A compiler, an interpreter or just a documented language standard about how things should be implemented in a language?
[EDIT]
Answer: Languages are never implemented, only compilers/interpreters etc. It's this simple.
Here's a very academic answer (from a longtime academic).
First I'll reframe the question:
What does it mean for a programming language to be implemented?
I'll start with "what is a programming language":
A programming language is a formal language (a set of utterances we can characterize precisely through algorithmic rules) such that a sentence in the language has a computational meaning. There are a variety of ways to give computation meaning; two of the most popular are that a computation stands for a function (from values to values, or from machine states to machine states) and that a computation stands for a machine that makes "state transitions" and interacts with the outside world.
A language is implemented when a means is provided to read in an utterance and perform the computation, that is, calculate the function or perform the behavior. The means is the implementation.
Typical implementations include
Direct interpretation of the language syntax. This model is rare but FORTH probably comes closest to it.
Translation of the syntax into virtual-machine code, also called bytecode, which is itself another language and which is interpreted. It is popular to write bytecode interpreters in C. Lua, Perl, Python, and Ruby are implemented more or less this way.
Translation of the syntax into hardware machine instructions, which is itself another language, and which is interpreted by your CPU. C and C++ are typically (but not always) implemented this way.
Direct interpretation of the language in hardware. IA-32 machine code and AMD64 machine code are implemented this way.
When a person says "Language X is implemented in Y", they are usually saying that a translator for X or an interpreter for X's bytecode is written in language Y.
One of the great secrets of compiler writers is the ability to write the compiler for language X in language X itself. If this interests you, get Andrew Appel's paper Axiomatic Bootstrapping: A Guide for Compiler Hackers.
Sometimes the answer to this question is not obvious. Squeak Smalltalk writes both a translator and a bytecode interpreter in Smalltalk, then translates the interpreter to C, which is translated to machine code. What is Squeak implemented in? Smalltalk.
Poke a professor; get a lecture.
You are right, those statements don't make any sense. It's pretty obvious that whoever made those statements doesn't understand the difference between a programming language and a compiler (or interpreter).
This is a surprisingly common problem. For example, sometimes people talk about interpreted languages or compiled languages. That's the same thing: languages aren't interpreted or compiled, they just are. Interpretation and compilation are traits of the implementation not the language.
Another goodie: Python has a GIL. No, it doesn't: one implementation of Python has a GIL, all the other implementations don't, and the Python Language itself certainly doesn't. Or: Ruby has green threads. Again, not true: Ruby has threads. Period. Whether any particular language implementation chooses to implement them as green threads, native threads, platform threads or whatever, is a trait of that particular implementation, not of Ruby. And of course my favorite: Ruby 1.9 is faster than Ruby 1.8. This doesn't even make sense: Ruby 1.9 and Ruby 1.8 are programming languages, i.e. a bunch of abstract mathematical rules. You cannot run a programming language, therefore a programming language can never be "faster" or "slower" than another one.
The most blatant confusion about the difference between programming languages and implementations is the Computer Language Benchmark Game, which claims to benchmark languages against each other but in fact benchmarks implementations.
All of these are just different expressions of the fact that apparently some people seem to be fundamentally incapable of grasping the concept of abstraction. Or at least the concept of having an abstract language and a concrete implementation of that language.
If we go back to the statement that "Python is implemented in C", it should now be obvious that that statement is not just wrong. If the statement were wrong that would imply that the statement even makes sense, i.e. that there is some possible world out there, in which it could at least theoretically be right. But that's not the case. The statement is neither wrong nor right, it simply doesn't make sense. If English were a typed language, it would be a type error.
Python is a programming language. Programming languages aren't implemented in anything. They are just implemented. Compilers and interpreters are implemented in languages. But even if you interpret the statement this way, it isn't true: Jython is implemented in Java, IronPython is implemented in C#, PyPy is implemented in RPython and Python, Pynie is implemented in PGE, NQP and PIR. (Oh, and all of those implementations have compilers, so there goes your "Python is an interpreted language".) Similar with Ruby: Rubinius is implemented in Ruby and C++, JRuby and XRuby are implemented in Java, IronRuby and Ruby.NET are implemented in C#, HotRuby is implemented in ECMAScript, Red Sun is implemented in ActionScript, RubyGoLightly is implemented in Go, Cardinal is implemented in PGE, NQP and PIR, SmallRuby is implemented in Smalltalk/X, MagLev is implemented in GemStone Smalltalk and Ruby, YARI is implemented in Io. And for C++: Clang (which is the C, C++ and Objective-C front-end for LLVM) is implemented in C++ (all three front-ends are implemented in C++).
"C++ is implemented in C". I understand this as "C++ compiler is written in C language". Quite simple, without too much philosophy.
Generally, C++ compiler can be written in any language, including C++ itself (except of the first compiler version).
"Python was implemented in C" means that at least one Python compiler (in this case the most commonly used one) is written using C. The developers of that implementation of Python made a deliberate decision not to use C++. As a statement it is incomplete as Python has also been implemented in Java, in C# and in Python.
The main relevance is that it gives you some idea of the systems you might be able to port the language onto: anything targeted by a C compiler should (at least in theory) be capable of running the C implementation of Python, but if they'd chosen to use C++ there would be a smaller set of systems that could run it.
C++ usually isn't implemented in C these days: I believe it is usually implemented in C++. It is quite common for languages to be implemented in the same language (or a subset of the language) as it means you are no longer dependent on some other unrelated language being available for the target. To bootstrap onto a new system you cross compile from some other system.
If you compile gcc for a new platform the build process involves compiling the source code once with whatever compiler is already available (perhaps an older gcc), then compiling it a second time with the newly compiled compiler, then compiling it a third time with the output from the second compilation. If the second and third versions aren't identical you get a build error. If they are identical then you've got a pretty good indication that it compiled correctly.
A programming language is a standard. Its interpreter or compiler is an implementation of this standard.
To build a new language, you don't necessarily needs to do in in low level machine code (assembly for instance). So, using another language to accomplish your goal (creating a new language here) is perfectly normal. So, when we say: Python was implemented in C, it just means that C was used to create that language. For instance, C can be complied on many different architecture, so the programmers doesn't have to take care of the different type of computers (portable).
A language is just a way to express yourself to the computer. Today, it can be done in various ways. But when you use the same syntax as the language and create your own framework, it's called a library or framework. A programming language is just a notation for writing program. If the notation change, you have a different language. Like French or Spanish comes from Latin. (French is implemented in Latin ;)
Why is there so many different languages? Because the goal of a language is to solve complex problems. So, depending on what you want to try yo accomplish, choosing the appropriate language can be an important decision.
The statement "Language X is implemented in Language Y" makes sense and is true if and only if there exists a canonical implementation of Language X and that implementation is written in Language Y. In common usage, either the first or the most popular implementation is often assumed to be canonical.
For example, Perl is one of the few languages with a definitive canon. "Python is implemented in C" makes sense if CPython is taken to be the canonical implementation of Python, and "C++ is implemented in C" is true for CFront, the original implementation of "C with classes" by Bjarne Stroustrup.
The direct answer:
Implementation in the context you are talking about just means written and language actually means compiler.
The original C++ compiler was as I understand it written in C. There is nothing (apart from knowledge and time) to stop you from writing a C++ compiler in another language.
Implementation is the code that makes software work. Often we talk about the implementation of a function as in: "the function has not been implemented yet."
eg
void foo()
{
//function has not been implemented yet
throw();
}
This often happens during the design phase of a program because the call needs to be there in order to write/debug/concept test the calling code but we haven't got round to implementing (writing the code to go insde the function)

When choosing a functional programming language for use with LLVM, what are the trade-offs?

Let's assume for the moment that C++ is not a functional programming language. If you want to write a compiler using LLVM for the back-end, and you want to use a functional programming language and its bindings to LLVM to do your work, you have two choices as far as I know: Objective Caml and Haskell. If there are others, then I'd like to know about those too.
I'm not asking for subjective opinions, so please don't give this the subjective tag. I want to make up my own mind about this, but I'm not sure I know what are all the trade-offs. So, StackOverflow to the rescue. What are the trade-offs?
Either OCaml or Haskell would be a good choice. Why not check out the LLVM tutorials for each language? The LLVM tutorial for OCaml is here: http://llvm.org/docs/tutorial/OCamlLangImpl1.html
Haskell has more momentum these days, but there are plenty of good parsing libraries for OCaml as well including the PEG parser generator Aurochs, Menhir, and the GLR parser generator Dypgen. Also check out this presentation on pcl a monadic parser combinator library for OCaml (like Parsec for Haskell) there's some good info in there comparing Haskell's and OCaml's approach: http://osp.janestreet.com/files/pcl.pdf
Some will say that laziness gives Haskell the edge in parsing, but you can get laziness in OCaml as well.
Haskell has higher level bindings to LLVM than OCaml (the Haskell ones provide some interesting type safety guarantees) and Haskell has by far more libraries to use (1700 packages on http://hackage.haskell.org) making it easier to glue together components.
Availability of native bindings need not constrain your choice of language. There is a third option, apart from using bindings or generating IR text directly:
You can use a language-neutral serialization format, such as Google's Protocol Buffers, to serve as the bridge from your front-end to your back-end. Protocol buffers are, after all, just ASTs in disguise.
Your front end, implemented in a functional language, then does what it is best at -- parsing, type checking, desugaring, core-to-core transformations, etc -- and the C++ backend takes the IR from your frontend and uses LLVM's feature-complete-by-definition native C++ API to do lowering from your-language-IR to LLVM IR. This makes it much easier to handle "advanced" features of LLVM such as debug metadata.
I'm using this strategy with hprotoc and associated Haskell bindings for protocol buffers, and am very happy with the results. There is much to be said for using the right tool for the job!
OCaml is the only functional language with bindings in the LLVM distro itself and documentation on llvm.org such as the Kaleidoscope tutorial. If you have OCaml installed when you build and install LLVM then it will automatically build and install the LLVM bindings for OCaml as well. Moreover, these OCaml bindings have been in use for years so they are mature and reliable.
I have been developing HLVM in OCaml using the standard LLVM bindings and found OCaml+LLVM to be an extremely powerful combination. HLVM provides tuples, arrays, unions, TCO of all tail calls, generic printing, FFI to C, JIT compilation and parallel garbage collection with a VM weighing in at under 2kLOC of OCaml code that took only a few man-weeks to develop from scratch. HLVM's numerical performance already far exceeds that of today's fastest open source FPLs including OCaml itself. I have published articles in the OCaml Journal describing how LLVM can be used from OCaml for everything from basic expression evaluation to advanced topics such as parallelism and garbage collection. You may also like this mini example.

Integrating Haskell in non-functional projects

I have looking to Haskell questions in SO, and I recovering my university notes on functional programming as a hobby. But I've always wondered how could something done in Haskell get outside Hugs interpreter and integrate with a C#, C++ or Java project. Has anybody done that? How?
Well, first of all, Haskell compiles to machine code, so you don't have to worry about the interpreter bit.
As far as integrating with other languages, your best bet is the Foreign Function Interface.
For integrating with .NET projects, there is also http://haskell.forkio.com/dotnet/
To integrate with other code, you need to use the FFI (as was already said). Usually, you would use GHC (the Glasgow Haskell Compiler) and compile to machine code, rather than use an interpreter like Hugs. (Most "real" projects use GHC instead of Hugs.)
Python has a subset which is pretty much a functional language.

Resources