How do I implement a language compiler using the Truffle Language Implementation Framework

How do I implement a language compiler using the Truffle Language Implementation Framework - truffle

So I'm investigating how to create a language compiler using Truffle. Let's just say for the purpose of this question that the language is called Emerald.
Emerald is a statically compiled language, and it runs on the JVM, just like Java.
The compiler for Emerald would be a program called emeraldc. The compiler emeraldc will compile source files like Hello.emerald to Hello.class.
I've not found any examples of using Truffle to create such a language. All language examples I've found are interpreted languages. None seem to compile to class files for example.

With GraalVM's Truffle framework languages are specifically implemented as interpreters but you can still get a compiler.
Languages are usually not intrinsically compiled or interpreted (you can interpret C and compile Javascript for example). There are even cases using a mix of both: for example your Emerald compiler compiles from emerald to Java bytecodes which can in turn be interpreted in a Java Virtual Machine and compiled Just-In-Time.
With the GraalVM's Truffle framework the typical setup is that you implement an interpreter for your language and GraalVM will give you a JIT compiler through partial evaluation of your interpreter. You might want to check this introduction.
If you want compilation Ahead-Of-Time, Truffle also has support for that.
However there is currently no configuration in which the output AOT or JIT compilation would be Java bytecodes.

Related

Is groovy native to JVM or ported to JVM?

I know Jython and JRuby is ported to JVM, and scala/Clojure is native to JVM, what about Groovy? Groovy looks like a dynamic language, I guess it is ported, but it seems it could also be compiled.
For those language native to JVM such as Scala, is that some tool to decompile the code to the source code?

"Ported" usually means "retargeted to run on." Groovy was designed to bring dynamic features from languages like Python and Smalltalk to Java. It was designed to be an extension of Java and in that sense it's native to the JVM and to the Java language. (The Groovy language, object model, and run-time libraries are extensions of Java's.)
But it sounds like you're asking about whether Groovy is interpreted or compiled. You can use groovyc to compile Groovy source code to Java .class files and run them in the JVM (linking in some Groovy run-time libraries). Or you can run Groovy source code interactively in GroovyShell, but what that does is compile, load, and run code for you incrementally.
A web search for [groovy decompiler] returns some possibilities for you.

I'm not sure whether it answers the entirety of your question, but the vast majority of Groovy and Groovy-Eclipse compiler is written in java, as seen on both projects' GitHub repositories.

How do functional language compilers work? [duplicate]

I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?

Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.

The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.

The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.

A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.

Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.

Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.

As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.

Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.

Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.

It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.

Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...

What is Neko anyway?

I have started to use Haxe to convert my ActionScript 3 projects into NME, but, I like to know please what is Neko in the world of Linux? I searched for it, I found its an animated cat!
Can any one please explain to me?

Neko for most people is nothing more than a Haxe target. That's not technically true (it does have its own language, and could potentially be a target for other languages), but for most people, Neko is one of the Haxe output targets.
In the same way the Java Virtual Machine (JVM) can be targeted from multiple languages (See the list on wikipedia), Neko is a bytecode format that can theoretically be written to from multiple languages. For Neko however, most people seem to use Haxe to create their *.n files.
For Haxe programmers, the Neko target lets you:
Write command line tools and utilities (for example, haxelib and haxedoc are written in Haxe targeting Neko)
Write web apps or dynamic web pages - using mod_neko (or mod_tora) you get a web processor with the same sort of capabilities as PHP, but a fair bit faster.
Create games with NME (which originally started with Neko, it stands for Neko Media Engine), and compile them quickly, having a target closer to what CPP has, but which compiles a lot faster and where the output is cross platform.
A runtime that is closely tied into Haxe and can be used from within macros etc - so you can use all of the neko.* classes inside Macros.
If you're only interested in targetting SWF or JS, you'll probably not have much need for Neko. But if you are writing server side code, you'll appreciate the performance, and if you are writing CPP, you may appreciate having a simple target that is dead easy and super quick to compile, and which behaves similarly to CPP.
Of course, outside of Haxe neko is it's own language... but to me at least it seems most people just use it with Haxe.
More Info:
If you want to write in the Neko language (See this tutorial) you might save your code as "myfile.neko" and compile with nekoc myfile.neko, which will compile a Neko bytecode file "myfile.n".
If you want to write in the Haxe language, you might save your file as "MyFile.hx" and compile with "haxe -neko myfile.n -main MyFile".
The "myfile.n" that is generated by both of these doesn't have human readable source code - this is the Neko bytecode. You can run it on any computer that has Neko installed by running neko myfile.n. You can turn it into an executable (that runs without Neko installed) for your platform/OS by running nekotools boot myfile.n.
Here is a tutorial on Getting Started With Neko, which covers both command line programs you write and (very very basic) web pages.

"Neko" is Japanese for "cat", which is probably why you found what you did.
Neko is also a virtual machine (a "VM") like the Java Virtual Machine ("JVM") or the .Net Common Language Runtime (".Net CLR").
Neko has a custom high-level language made as an easily targeted language backend (like C-- in a way, but not like LLVM, which is closer to an assembly language). In other words: It's something that a programming language can be translated into rather than a more involved "full" compilation (like to assembly, to bytecode, or to machine code). Neko's language can be translated into a bytecode, which is portable and is usually stored in a ".n" file.
Neko was made by Nicolas Cannasse (the same person that made the Haxe Programming language), which is probably why Haxe has a Neko target in its compiler, and the Haxe tools, such as "haxelib" use it. Because the tools are compiled into ".n" files, they only need to be built once, and then they work on any platform with the VM executable "neko" installed.
Perhaps a more interesting bit about neko, and why you should learn it for Haxe development is that it's the runtime used for compile-time macros. See this tutorial for how part of your program can be run at compile time with full access to the build machine, which means you could even do complex tasks, such as parse a data file, at compile time.

Neko provides a common runtime for several different languages, including javascript and haxe. the compiler converts a source file (.neko) into a bytecode file (.n) that can be executed with the virtual machine. you can use the compiler as standalone commandline executable separated from the virtual machine, or as a neko library to perform compile-and-run for interactive languages. neko was written by nicolas cannasse.
you can find Neko Tutorial here

Compiled interpreted language

Is there a programming language, having usable interactive interpreter, even as it can be compiled to machine code?

Compilation vs. "interpretation" is essentially a matter of implementation, not the language itself. For example, MRI Ruby 1.8 is interpreted, while MacRuby is compiled to native machine code. Both include an interactive REPL. All the languages I know that have at least one machine-code compiler and at least one REPL:
Ruby
Python
Almost all Lisps (Lisp was the language that pioneered this technique, AFAIK)
OCaml
Haskell
Forth
If we're counting compilation to bytecode as well as machine code, it's true of the vast majority of popular bytecode-compiled languages:
Java
Scala
Groovy
Erlang
C#
F#
Smalltalk

Haskell, using the Glasgow Haskell Compiler which has an interactive "shell" called GHCi.

Many flavors of Lisp offer both options, including Clojure.

Two come to my mind : ocaml and scala (~= java), but I'm sure there must be a lot more out there.

And here's another one to burn your house down:
x86 Assembly
Yup, there are interpreters for this as well.
Javascript x86 Assembly Interpreter
Jasmin
At this point you're really in emulator land, but it does meet the requirements you state.
I'm wondering if it's easier to name compiled languages that someone hasn't cobbled up a working interpreter for. :-)

Lua has an interactive mode for one-liners and experimentation. It normally compiles to bytecode for its VM for execution. LuaJIT is an independent implementation of a Lua VM that also does just-in-time compilation to 32-bit x86. Support for 64-bit is underway, and support for ARM is frequently requested.
Compilation to a bytecode is often a reasonable compromise between a pure interpreter and a pure compiler. The VM can be tuned to the needs of the language, and JIT techniques can analyze the VM code as it executes and concentrate on frequently executed code paths and inner loops.

As others have mentioned, OCaml.
If managed code (.NET CLI) is close enough to machine code, F# would be a candidate as well. There are probably other .NET/Mono languages which meet the requirement as well.

You may regret you asked:
C and C++.
Why?
Ch
CINT
EIC
picocc
and there are probably others out there as well.

Plenty of languages offer an implementation that both interacts and compiles to machine code, but it's rare to do both at once. Standard ML of New Jersey is one that has an interactive loop but no bytecode: it simply compiles to machine code in memory and then branches to it.

Not exactly machine code, but Java can be compiled and also used via BeanShell.

I've used Ruby with an interpreter, and there seems to be a compiler here.

Icon used to have a compiler, but it falls in and out of maintenence. It may still work.

Python can be compiled to windows executables.

C# can be compiled by using SnippetCompiler, maybe this would act as an interactive interpreter for you?

Your question is a bit vague. Even Java would fit it:
by interactive interpreter, i mean
shell-like environment, where you can
work in the runtime interactively.
Java has this, e.g. in the Eclipse "scrapbook pages", where you can enter Java expressions and have them evaluated right away. Java is of course also a compiled language (and while it's usually compiled to bytecode, there are various compilers that output machine code).
So what are you looking for? Maybe you could explain your problem or interest.

I tried using mono/.net for a bit and found random GC pauses to be disagreeable (at least on my crusty old laptop). I looked at using gambit-c an implementation of scheme that can compile to C but it seemed difficult to work with because the docs were somewhat limited and the packages where not very easy to install and use.
I usually just stick to having an interpreted language such as python bound to C/C++ which is more painful but at least I know what I am in for.

Programming languages with python-like syntax but native code generation

Can anyone point to programming language which has python-like syntax, but from the very beginning was designed to generate native code? I'm aware of Boo only, but it uses .net, not native code generation. Well, if nothing else than python-like languages which generate .net/java bytecode are fine too.

Cython might do -- the C code it generates is for Python extensions, but the whole thing can be packaged up and you'll be running native code throughout (after the 'import';-).

I must admit that I don't quite understand your question, for two reasons:
You are asking for a language with native code generation, but native code generation has nothing to do with the language, it is a trait of the implementation. Every language can have an implementation with native code generation. Several Python implementations have native code generation. There are C compilers that compile to JVM bytecode, CIL bytecode or even ECMAScript sourcecode. There are even C interpreters. There are also compilers that compile Java sourcecode or JVM bytecode to native code.
Why do you care about the syntax? It is probably the least important factor about choosing a programming language.
Anyway, Nim is a programming language which has an implementation which supports native code generation (or more precisely an implementation which supports C source code generation) and whose syntax is a hybrid between Wirthian style (by the looks of it the most important influences are Oberon and Delphi) and Python.
However, the fact that it has Pythonic syntax isn't going to help you at all if you don't like European style language design or Wirthian style OOP.

Also found today Delight applying Python syntax on a D back-end.
And Converge too.

Check out Cobra
It is strongly influenced by Python, C#, Eiffel, Objective-C and other programming languages. It supports both static and dynamic typing. It has first class support for unit tests and contracts. Cobra provides both rapid development and performance in the same language.

shedskin compiles Python to C++
From shedskin project page
Shed Skin is an experimental compiler,
that can translate pure, but
implicitly statically typed Python
programs into optimized C++. It can
generate stand-alone programs or
extension modules that can be imported
and used in larger Python programs.

Genie which is part of the gnome project: http://live.gnome.org/Genie
I think it's exactly what you're looking for.

If you are happy with something that compiles down to Java bytecode you could have a look at Jython. Quoting from their FAQ:
JPython is an implementation of the Python programming language which is designed to run on the Java(tm) Platform. It consists of a compiler to compile Python source code down to Java bytecodes which can run directly on a JVM, a set of support libraries which are used by the compiled Java bytecodes, and extra support to make it trivial to use Java packages from within JPython.
I've not actually used it yet but am considering it on some projects where I have to integrate with existing an Java codebase.
HTH

PyPy is a project to re-implement Python in Python. One of it's goals is to allow the use of multiple back-ends, including C. So you can take a pure Python program, convert it to C and compile it to native code. It is still a work in progress, so probably not suitable for production code.

You can find all of the previously mentioned languages, plus some more, here: http://wiki.python.org/moin/PythonImplementations

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula.
https://nim-lang.org/

You can also investigate IronPython - a python inplementation on the .NET framework

You can try Genie. It's the same like Vala, but with Python-like syntax. If you want to develop apps for Linux with GTK, and you want to compile it to native app, Vala or Genie is really good choice.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string