which ANTLR4 output language runs fastest? - antlr4

Has anyone compared the speed differences in output language for an ANTLR4 project? They support C#, Java, Python2 and Python3. If you don't care much about the output language, which one would you recommend and why?

I haven't compared them myself, but it's very likely that you see the same differences like for any other code in the different languages. Why should that be different for the generated parser code? So I expect C# being the fastest here, followed by Java and then Python. The C++ target is under work currently, which I expect to be on par or even faster than C#. Still, I'm speculating here, even though it's an educated guess.
Hence, if the language doesn't matter I would obviously choose C# then, or, if you can wait a bit more, the C++ target (which is much more portable, if that is relevant for you).

Related

Tool for automated porting and language that can compile into others

I'm just asking this out of curiosity :
Is there any tool that can automatically convert a source code of reasonable complexity from one language to another ?
Is there any "meta-language" that can compile into several other languages ? For example CoffeeScript compiles into Javascript.
If you know any open-source example, it'd be great !
Thank you for your time.
PS: No idea how to tag this. Feel free to edit.
GCC converts complex C++ code into machine code and thus technically is an answer to your question. In fact, there are lots of compiler like this, but I don't think these are what you intended to ask.
There are tools that are hardwired to translate just one language to another as source code (another poster suggested "f2C", which is a perfect example). These are just like compilers... but rarer.
There are virtually no tools that will map from one language to many others, out of the box. The problem is that languages have different execution models, data types, and execution schemes, which such a translator has to simulate properly in the target language.
The are "code generators" that claim to do this, but they are largely IMHO specifications of rather simple functions that translate trivially to simple code in the target langauge.
If you want to translate one language to another in a sort of general way, you need a program transformation system, e.g., a system that can parse arbitrary langauges, and for which you can provide translation rules that map to other languages in a sort of straightforward way.
Our DMS Software Reengineering Toolkit is one of these. This SO What kinds of patterns could I enforce on the code to make it easier to translate to another programming language? discusses the issues in more detail.
You can convert Fortran code to C using the f2c tool.
For python, you can convert a subset of the language to C++ using shedskin.
The vala language is converted to C before the real compilation.

Programming language design

I created a programming language and wrote it in my computer. It is an experimental non-professional programming language that I created for fun.
A language needs the most important thing, a compiler.
Is it a good idea to convert the source code to C/++ and call GCC?
My language looks like C++ and Java, it would not be difficult to convert without a parser.
It is not my goal to optimize anything neither to generate a binary for each platform. If I generate a C source, I can compile it for many platforms and use GCC optimizations.
I do not know about tools that may help me, some tools that I know the name are yacc and llvm, but I do not know how they can help me.
The first part of fun is the design of programming language, the second part of fun is the implementation of runtime details. I think that a parser implementation is not a great fun.
Thanks
To counter Mehrdad, converting your language to C is a fine idea. Many language compilers compile to C, using it as a "portable assembly". Now, creating a front end for GCC is a fine idea, but it raises the bar in terms of initial complexity. Creating C code is FAR simpler than a front end to GCC.
Or, for that matter, convert it to any language you like that you think would be a suitable target, whatever you're comfortable with. I've written compilers that created Java code, for example.
The Grammar part of your language may not be great fun, but it will likely be your first point of frustration. So it's wise to pay a little attention to what other languages do, and to look at their grammars. Consider some simple Pascal recursive decent parsers, if you think your language could work with that. Or look for Yacc and ANTLR grammars.
The one that most folks find initially frustrating is simply expressions.
a + b * ( c - sqrt(12 / 4) + sin(30))
Many people have problems working with expressions. After you get expressions work, the rest can easily fall in to place (assuming an Algol/C like language vs some other style of syntax that you're working on).
I'm not exactly an expert on this topic myself, but from what I know, "converting" your code to C or C++ is a pretty bad idea, especially when using GCC.
GCC is designed to have a "plug-in" architecture. What you should instead do is create a front-end for the GCC compiler that is able to process the code from your language, and let the back-end of GCC take care of the code generation and optimization.
(I haven't done this myself so I don't know the details of how it would work.)
If your real goal -- and remember, by goal I mean, what you think will be fun to work on ;-) -- is language design, then I'd say it's perfectly fine to avoid writing your own full-blow compiler. The only question is, is it that much easier to write a Your Language to C(++) translater? It probably is easier than writing a frontend to gcc or LLVM (though those are serious approaches if your goal is different).
My advice is to start with a C/C++ translator.

Will Haskell be a good choice for my task?

I'm starting a new project and don't know which language to use.
My 'must have' requirements are:
Be able to run on Windows/LinuxMacOs natively (native executable) – user should be able to just run the .exe (when on Windows, for example) and see the results.
No runtimes/interpreters (no JVM, CLR etc.) – one file download should be enough to run the application.
Full Unicode support.
Be able to manipulate OS threads (create them, run multiple tasks in parallel on multi-core CPUs, etc.)
Be reasonably fast (Python level performance and better).
To have some kind of standard library that does low-level, mundane tasks.
Not very niche and have some community behind it to be able to ask questions.
My 'nice to have' requirements are:
Language should be functional.
It should have good string manipulation capabilities (not necessarily regex).
Not extremely hard to learn.
I'm thinking about Haskell now, but keeping in mind OCaml as well.
Update:
This application is intended to be a simple language parsing and manipulation utility.
Please advice, if my choice is correct.
Haskell:
1: It runs on Linux, Windows and OS X, in many cases without changes to source code.
2: Native binaries generated. No VM.
3: Full Unicode support. All UTF variants supported.
4: Full threading support, plus if you only want parallelisation then you can use "par" with a 100% guarantee that it only affects the time taken rather than the semantics.
5: As fast as C, although some tweaking can be required, the skills required are currently rather obscure, and apparently minor tweaks can have multiple orders of magnitude impact.
6: Standard library included, and "Hackage" has lots more packages including a range of parser libraries.
7: Friendly community on IRC (#haskell) and here.
Edit: On the "nice to have" points:
1: Haskell is an uncompromisingly pure functional language.
2: It has generally good string manipulation, with regexes if you want them. As someone said in a later comment, beware the efficiency of the built-in "String" type (it represents a string as a linked list of characters), but the ByteString and Text libraries will solve that for you.
3: Is it hard to learn? Its nowhere near as complicated as C++, and probably a lot simpler than Java or even maybe Python. But its pure functional nature means that it is very different to imperative languages. The problem is not so much learning Haskell as unlearning imperative thought patterns.
Haskell sounds like it fits the bill perfectly. GHC produces native code on OS X, linux and windows just fine, and in general has performance that is much better than Python (for many things, not everything).
The only strange request is the need for OS threads. Programs produced by GHC use lightweight threads, which perform much better than OS threads, and much easier to work with than pthreads.
Haskell is also excellent for language parsing, using libraries like Parsec.
We're also quite well known for how string and helpful the community is around Haskell.
To your third nice to have: Have a look at Real World Haskell, it's free and a very good introduction, including an introduction to all the points you need. (Such as parallel computing, string parsing, etc).
Maybe 'nice to have':
yes pure functional and lazy evaluation.
yes (as said before).
depends on you, I think it's hard to learn,
but gives you some great benefits.

Looking for a new language that supports both interpreted and native compilation modes

I currently program in Perl, Python, C#, C, C++, Java, and a few other languages, and I'm looking for a new language to use as a primary when doing personal projects.
My current criteria are:
can be run as an interpreted language (i.e., run without having to wait to compile it);
can be compiled to native code;
are strongly typed (even if optionally);
support macros/templating/code morphing/wtf you want to call it;
has a decent number of libraries for it, or easily accessible to it;
Ideas? Suggestions?
I would suggest that Haskell suits your criteria.
Can be run as an interpreted language? Yes, via GHCI.
Can be compiled to native code? Yes.
Is strongly typed? Very much so. Perhaps even the most strongly typed language today, with the exception of some theorem provers like Agda.
Support macros/templating/morphing? If you use template haskell. This is an optional extension of the language however, so most libraries don't use macros. I haven't used template haskell myself so i can't comment on if it's any good.
Has decent library support? The standard library is not bad. There is also Hackage, an open repository of Haskell libraries a bit in the style of CPAN.
Additionally, it sounds like you already know a lot of imperative/object oriented languages. IMHO if you learn another one of those langs. it will probably be a slightly different permutation of features you've already seen somewhere else. Adding another programming paradigm like functional programming to your toolbox will probably be a better learning experience. Though I guess whether that's an advantage or not depends on if you want to learn new things or be productive quickly.
Common Lisp fits: there is an optional typing, efficient native compilation is available, powerful REPL makes it a perfect choice for scripting, and there is a powerful macro metaprogramming.
OCaml fits as well, with CamlP4 for metaprogramming.
Scala? It does run scripts, although they are compiled (transparently) first. I'm not sure what you mean by code morphing etc, but it's pretty good for DSLs. It meets all your other requirements - compiled as much as Java is, strongly typed, and has a reasonable number of its own libraries as well as all of Java's. I'm still a beginner with it, but I like it so far.

How to create a language these days?

I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com

Resources