This question already has answers here:
How to create AST with ANTLR4?
(4 answers)
Closed 5 years ago.
I have an ANTLR3 grammar that builds an abstract syntax tree. I'm looking into upgrading to ANTLR4. However, it appears that ANTLR4 only builds parse trees and not abstract syntax trees. For example, the output=AST option is no longer recognized. Furthermore neither "AST" nor "abstract syntax" appears in the text of "The Definitive ANTLR4 reference".
I'm wondering if I'm missing something.
My application currently knows how to crawl over the AST produced by ANTLR3. Changing it to process a parse tree isn't impossible but it will be a bit of work. I want to be sure it's necessary before I start down that road.
ANTLR 4 produces parse trees based on the grammar instead of ASTs based on arbitrary AST operators and/or rewrite rules. This allows ANTLR 4 to automatically produce listener and visitor interfaces that you can implement in the code using your grammar.
The change can be dramatic for users upgrading existing applications from version 3, but as a whole the new system is much easier to use and (especially) maintain.
Related
I want to create a tool that can analyze C and C++ code and detect unwanted behaviors, based on a config file. I thought about using ANTLR for this task, as I already created a simple compiler with it from scratch a few years ago (variables, condition, loops, and functions).
I grabbed C.g4 and CPP14.g4 from ANTLR grammars repository. However, I came to notice that they don't support the pre-processing parsing, as that's a different step in the compilation.
I tried to find a grammar that does the pre-processing part (updated to ANTLR4) with no luck. Moreover, I also understood that if I'll go with two-steps parsing I won't be able to retain the original locations of each character, as I'd already modified the input stream.
I wonder if there's a good ANTLR grammar or program (preferably Python, but can deal with other languages as well) that can help me to pre-process the C code. I also thought about using gcc -E, but then I won't be able to inspect the macro definitions (for example, I want to warn if a user used a #pragma GCC (some students at my university, for which I write this program to, used this to bypass some of the course coding style restrictions). Moreover, gcc -E will include library header contents, which I don't want to process.
My question is, therefore, if you can recommend me a grammar/program that I can use to pre-process C and C++ code. Alternatively, if you can guide me on how to create a grammar myself that'd be perfect. I was able to write the basic #define, #pragma etc. processings, but I'm unable to deal with conditions and with macro functions, as I'm unsure how to deal with them.
Thanks in advance!
This question is almost off-topic as it asks for an external resource. However, it also bears a part that deserves some attention.
The term "preprocessor" already indicates what the handling of macros etc. is about. The parser never sees the disabled parts of the input, which also means it can be anything, which might not be part of the actual language to parse. Hence a good approach for parsing C-like languages is to send the input through a preprocessor (which can be a specialized input stream) to strip out all preprocessing constructs, to resolve macros and remove disabled text. The parse position is not a problem, because you can push the current token position before you open a new input stream and restore that when you are done with it. Store reported errors together with your input stream stack. This way you keep the correct token positions. I have used exactly this approach in my Windows resource file parser.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Kotlin and Groovy look as very similar languages with very similar features if we compile Groovy statically. Which features, apart from null safety, Kotlin has that are missing in Groovy?
Kotlin is a JVM language, which IMO is trying to improve on Java in features and conciseness, while remaining imperative and static. Groovy has a similar concept except decided to go dynamic. As a result, a number of language features will be similar.
Here are some differences I'm aware of
Static vs Dynamic: Since Groovy was designed as a dynamic language, and #CompileStatic, while a great annotation (I use it a lot), was added later. Its feature feels a bit bolted on, and it does not enforce people to code in a static manner. It's not usable everywhere (e.g. my Spock tests seem to fail to compile with them). Sometimes even with it on Groovy still seems to have some odd dynamic behaviour every now and then. Kotlin is 100% Static, and dynamic is not an option.
There are a number of other features that is has though. I'd recommend you look at the reference, and you may spot a few more e.g. https://kotlinlang.org/docs/reference/
Data classes - concise with a copy function (a bit like case classes in Scala)
The null safety check you mentioned (which is a big pro)
The ability to destruct items. val (name, age) = person
Higher-Order Functions, defined like "fun doStuff(body: Int -> T)): T". Which are much better than the groovy Closures IMO. (very similar to Scala's)
Type checks and smart casts are nice: https://kotlinlang.org/docs/reference/typecasts.html
Companion Objects, in the same way Scala also tries to remove static methods from classes, Kotlin tries the same thing.
Sealed Classes to restrict inheritance (again Scala has something similar)
The "Nothing" subtype, where everything is a supertype of it. (another crucial concept in Scala).
when expressions for basic pattern matching: https://kotlinlang.org/docs/reference/control-flow.html
As you can see it does borrow from other languages other than Groovy. They have attempted to cherry pick a number of great features in an attempt to make a good language. Naturally Groovy has its own goodness. I've only focused one what Kotlin has and not visa-versa
Another plus is, being made by an IDE maker, the compiler is very quick and has great IDE support. Not saying Groovy does not have good support, but my current project does take a long time to compile, and refactor method always assumes you are coding in a dynamic fashion.
I'd recommend you try out the Koans to get a feel for them to see which features of the language you like and how it compares to groovy (https://github.com/Kotlin/kotlin-koans).
Kotlin designed as statically typed language, with great type system and other benefits of statically typed language. Groovy - in first place is a dynamically typed language, and only then - statically.
When you enable compile static in groovy you get just java with syntax sugar. On other side - Kotlin, in their type-system, have two types of references: nullable and nonnullable, so you can write code with less NPEs. If you are asking about only one feature - that's it.
Second great feature of Kotlin - it doesn't do any implicit conversions, on other hand - groovy implicitly converts double to bigdecimal and so on.
But kotlin has a lot other features, like smart casts, ADT (doc), type-safe builders, zero-cost abstractions and finally great IDE support.
Also i'm not sure about quality of Groovy's type-inference(in closures for example we need additional annotations, meh), but in Kotlin type-inference work's like a charm, without any annotations in every peace of language.
So statically typed compilation in Kotlin - first class citizen, in Groovy - not.
I want to verify that my ANTLR 4 grammar is LL(1). There is an option to do just that in older versions of ANTLR. Is there something similar in ANTLR 4?
I looked through through the documentation, but didn't find anything. Though especially the page on options seems to be lacking, I didn't even find a list of all possible options.
One of the design goals of ANTLR 4 is allowing language designers to focus on writing accurate grammars rather than worrying about characteristics like "LL(1)" which have little to no impact on users of the language.
However, it is likely that you can identify an LL(1) grammar by examining the generated parser. If there are no calls to adaptivePredict in the generated code, then the grammar is LL(1). The intent is for the inverse to also be true, but considering a call to adaptivePredict produces the same result as the inline version of an LL(1) decision, we have not rigorously evaluated this.
I understand that the way ANTLR4 works it does some conversions on your grammar for you (getting rid of ambiguity, left-factoring, etc) so that you can just focus on writing more human-readable grammars rather than doing conversions by hand so the machine will accept it. Is there any way to export my grammar after ANTLR makes these changes? I'd like to see what changes were made to my grammar.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to develop a simple C style scripting language for educational purposes.
Thing I have done so far:
defined syntax of the language
written code for tokenizing the language.
The features that I want to include at the moment:
Arthematics
Conditions
while loop (only)
At the moment I don't want to add other features to the language, as it will make development procedure quite complex.
However, I don't know what are the next steps that are involved in developing a language. I have gone through many questions on SO but they weren't very specific in detail. Kindly guide me with this process.
I think these answers are helpful starting out:
How to go about making your own programming language
How to develop Programming language like Coffee script?
If you have defined a EBNF grammar, then you can use a tool like BISON to create a parser. Using that parser to generate an abstract syntax tree you can then proceed to create an interpreter for your language.
Few years back I have been developing my own language too (it was interpreted language) and in phase when language was ready for "others" to try, I found out that there were few things I should have done earlier, or better:
Solve tons of simple programming problems in that language
Solve just a few, but I would call it "hard core" programming problems with it (for example Project Euler)
Write complex language specification, few examples, wiki or FAQ, well anything that will spare you answering the same questions all the time
Hope that helps.
Yes, having done this several times I know it's hard to know where to start. And you really don't need Lex or Yacc or Bison.
Make sure you have the definitions for your lexical elements (tokens) and grammar (in EBNF) nailed down.
Write a lexer for your tokens. It should be able to read a sample program emitting tokens and ending gracefully.
Write a symbol table. This is where you will put symbols as you recognise them. You can put reserved words and literals in here too, or not. It's a design choice.
Write a recursive descent parser, with a function for recognising each production in your grammar. You may need to modify your grammar to let you do this.
Write a tree/node manager for your AST (Abstract Syntax Tree). The parser adds nodes to the tree with links into the symbol table as it recognises productions.
Assuming you get this far, the final two steps are:
Walk the AST performing type and reference resolution, some kinds of optimisation, etc.
Walk the AST to emit code.
The last two steps turn out to be where most of the hard work is.
You will needs some specific references and what you choose depends on your level and what you like to read and what language you like to write. The Dragon Book is an obvious choice, but there are many others. I suggest you look here: Learning to write a compiler.