Is ANTLR 4 faster than ANTLR 3? - antlr4

What should I expect from ANTLR 4?
Is it faster ANTLR 3? I mean the parsing speed.
Note code generation speed would be interesting too.
For design reasons?

First the easy part - the ANTLR 4 tool performs only minimal analysis of the grammar, and in particular does not need to statically compute the DFA tables like ANTLR 3 did. As such, it's much, much faster than ANTLR 3 for generating parsers.
The initial 4.0 release of ANTLR 4 varies from slightly faster than ANTLR 3 to much slower than it, depending on the grammar and input. However, ANTLR 4 is able to handle many grammars and inputs that ANTLR 3 simply cannot handle at all. In addition, an optimized version of the ANTLR 4 runtime which substantially outperforms ANTLR 3 is already in development.
Debugging aids and how-to documentation is coming which helps users find and correct (or avoid) performance problems related to grammar design. I believe some of this is available in the ANTLR 4 book as well.

Related

Verify that a grammar is LL(1)

I want to verify that my ANTLR 4 grammar is LL(1). There is an option to do just that in older versions of ANTLR. Is there something similar in ANTLR 4?
I looked through through the documentation, but didn't find anything. Though especially the page on options seems to be lacking, I didn't even find a list of all possible options.
One of the design goals of ANTLR 4 is allowing language designers to focus on writing accurate grammars rather than worrying about characteristics like "LL(1)" which have little to no impact on users of the language.
However, it is likely that you can identify an LL(1) grammar by examining the generated parser. If there are no calls to adaptivePredict in the generated code, then the grammar is LL(1). The intent is for the inverse to also be true, but considering a call to adaptivePredict produces the same result as the inline version of an LL(1) decision, we have not rigorously evaluated this.

How can i manipulate the ATN-Constant generated by ANTLR V4 for Java?

I'm very new at ANTLR and use V4 to generate a lexer to integrate with netbeans.
The generated java-file gives me an error: "constant too long" at the serialized ATN.
How can i configure ANTLR to generate an compliant String (ore more of them)?
kind regards
Jan
This is a limitation of the first release of ANTLR 4 that has since been fixed. Here is the issue report:
Serialized ATN strings should be split when longer than 2^16 bytes (class file limitation)
Until ANTLR 4.1 is released sometime this Summer, you have two options:
Build a current version of ANTLR from source code, and use that.
Modify your lexer/parser to be simpler, thus requiring fewer states.

is it possible to markup all programming languages under object oriented paradigm using a common markup schema?

i have planned to develop a tool that converts a program written in a programming language (eg: Java) to a common markup language (eg: XML) and that markup code is converted to another language (eg: C#).
in simple words, it is a programming language converter that converts program written in one language to another language.
i think it is possible but i don know where to start. i wanna know the possibilities to do so and information about some existing system.
What you are trying to do is extremely hard, but if you want to know what you are up for I've listed the steps you need to follow below:
First the hard bit:
First you obtain or derive an operational semantics for your source and target languages.
Then you enhance the semantics to capture your source and target memory models.
Then you need to unify the two enhanced-semantics within a common operational model.
Then you need to define a mapping from your source languages onto the common operational model.
Then you need to define a mapping from your operational model to your target language
Step 4, as you pointed out in your question, is trivial.
Step 1 is difficult, as most languages do not have sufficiently formal semantics specified; but I recommend checking out http://lucacardelli.name/TheoryOfObjects.html as this is the best starting point for building a traditional OO semantics.
Step 2 is almost certainly impossible in general, but may be merely obscenely difficult if you are willing to sacrifice some efficiency.
Step 3 will depend on how clean the result of step 1 turned out, but is going to be anything from delicate and tricky to impossible.
Step 5 is not going to be trivial, it is effectively writing a compiler.
Ultimately, what you propose to do is impossible in general, due to the difficulties inherited in steps 1 and 2. However it should be difficult, but doable, if you are willing to: severely restrict the source language constructs supported; pretty much forget handling threads correctly; and pick two languages with sufficiently similar semantics (ie. Java and C# are ok, but C++ and anything-else is not).
It depends on what languages you want to support, but in general this is a huge & difficult task unless you plan to only support a very small subset of each language.
The real problem is that each programming languages has different features (with some areas that overlap and others that don't) and different ways of solving the same problems -- and it's pretty tricky to detect the problem the programmer is trying to solve and convert that to a new idiom. :) And think about the differences between GUIs created in different languages....
See http://xmlvm.org/ as an example (a project aimed at converting between source code of many different languages, with an XML middle-point) -- the site covers in some depth the challenges they are tackling and the compromises they take, and (if you still have any interest in this kind of project...) ask more specific followup questions.
Notice specifically what the output source code looks like -- it's not at all readable, maintainable, efficient, etc..
It is "technically easy" to produce XML for any single langauge: build a parser, construct and abstract syntax tree, and dump out that tree as XML. (I build tools that do this off-the-shelf for many languages). By technically easy, I mean that the community knows how to do this (see any compiler textbook, e.g., Aho&Ullman Dragon book). I do not mean this is a trivial exercise in terms of effort, because real languages are complicated and messy; there have been many attempts to build C++ parsers and few successes. (I have one of the successes, and it was expensive to get right).
What is really hard (and I don't try to do) is produce XML according to a single schema in which the language semantics are exposed. And without that, it will be essentially impossible to write a translator from a generic XML to an arbitrary target language. This is known as the UNCOL problem and people have been looking since 1958 for the answer. I note that the Wikipedia article seems to indicate the problem is solved, but you can't find many references to UNCOL in the literature since 1961.
The closest attempt I've seen to this is the OMG's "ASTM" model (http://www.omg.org/spec/ASTM/1.0/Beta1/); it exports XMI which is XML. But the ASTM model has lots of escapes built into it to allow langauges that it doesn't model perfectly (AFAIK, that means every language) to extend the XMI in arbitrary ways so that the language-specific information can be encoded. Consequently each language parser produces a custom version of the XMI, and thus each reader has to pretty much know about the extensions and full generality vanishes.

Is Antlr a DSL generator and an alternative to Intentional Programming?

I am struck by the ambition and creativity of Charles Simonyi's efforts to establish the field of Intentional Programming, first at Microsoft and then with his own company.
What exactly is Intentional Programming
http://en.wikipedia.org/wiki/Intentional_programming
In this approach to software, a
programmer first builds a toolbox
specific to a given problem domain
(such as life insurance). Domain
experts, aided by the programmer, then
describe the program's intended
behavior in a What You See Is What You
Get (WYSIWYG)-like manner. An
automated system uses the program
description and the toolbox to
generate the final program. Successive
changes are only done at the WYSIWYG
level.
It seems to be such a useful and practical approach to programming, potentially circumventing many of the problems with current approaches to software development.
Essentially it seems to facilitate the creation of domain-specific languages by non-programmers (business/systems analysts) but at a stage much closer to real-life implementation than UML could provide. He says it will be completed eventually but that it is not there yet (almost 15 years later).
DSLs run the gamut from simple 5-line rule engines to complex applications like Ruby on Rails. So I imagine the delay in releasing his product has to do with the fact that he is dealing with simplifying a much higher level of abstraction because he has to essentially allow for the encapsulation of all domain languages at once.
So, my question is
(a) whether Antlr could be an alternative to Intentional Programming - although perhaps a less user-friendly alternative which requires the intervention of programmers rather than permitting business analysts to generate the DSL? Could you use Antlr to generate a DSL like Ruby on Rails (assuming it supported Ruby as an output - which I think it does not)? What can it not do? Also, I don't understand why it's called a "language parser" rather than a "language generator" - since the latter describes what it is used for while the former describes how it achieves its end result.
and
(b) if Antlr is different from Intentional Programming, is there anything similar to Intentional Programming?
In answer to part b), three systems that work in a similar space are:
JetBrains MPS
Eclipse xText
MetaCase MetaEdit+
Each of these products has different strengths and weaknesses, but all of them fall into the category of Language Workbenches. Intentional Software's Intentional Workbench is possibly the most ambitious product in this category to date, but is also not generally available.
MPS and xText are free, open-source products. MetaCase is the most mature, and is a commercial product. All of them have a steep learning curve.
I am not an expert on this, so treat with a large pinch of salt. However...
ANTLR itself is not a DSL generator, though it can be used to create code that interprets DSLs. It is a parser generator - but the DSL generator would have to create what ANTLR generates a parser from.
ANTLR is just a parser generator. In any non-trivial DSL, writing the parser is less than 50% of the effort expended in implementing the DSL. The evaluator/rule engine/code generator/schedule or whatever else your DSL does, probably requires more work and can't be generated like a parser.

which ANTLR4 output language runs fastest?

Has anyone compared the speed differences in output language for an ANTLR4 project? They support C#, Java, Python2 and Python3. If you don't care much about the output language, which one would you recommend and why?
I haven't compared them myself, but it's very likely that you see the same differences like for any other code in the different languages. Why should that be different for the generated parser code? So I expect C# being the fastest here, followed by Java and then Python. The C++ target is under work currently, which I expect to be on par or even faster than C#. Still, I'm speculating here, even though it's an educated guess.
Hence, if the language doesn't matter I would obviously choose C# then, or, if you can wait a bit more, the C++ target (which is much more portable, if that is relevant for you).

Resources