How to do semantic analysis using Antlr? - antlr4

I am currently trying to implement a ruby compiler.To create the parser and lexer I used Antlr4. Now i am unable to figure out how to implement semantic analysis into the parser.Can someone explain how to do semantic analysis using the generated parser?It would be better if you can explain with a simple example, say how to check if a variable is initialized before use.

Well I can't describe everything you can and have to do but I will try to show you the principle behind it...
ANTLR generates a ParseTree for you which you can then process with a ParseTreeWalker. That walker will go through the parse tree node by node starting at the topmost, then processes through all children (Though that behaviour can be specified as far as I know). If you have registered a ParseTreeListener to the walker it will get notified about each step of it. There are two methods for each parser rule in your grammar: One that gets notfied whenever the parser enters this rule (before the children of that node are visited) and one when the parser exits the rule (after all children of the respective node have been visited).
This ParseTreeListener is where you can do your semantic analysis. You mentioned the check for undefined variables: For that you have to hook up your declaration rule, read out the variable name and store it in a List. Now you can hook up each rule that can contain a variable, read the name of it out and check whether it is in your list of declared variables. If not then the variable is undefined.
As an example on how something like that can be done you can have a look at a ParseTreeListener of mine here. The corresponding grammar can be found here.

Related

Given an antlr4 grammar, can I build up an expression tree?

So I have written my grammar in antlr4 syntax. Then I setup codegeneration, and now I can parse source files in my own defined language. This works great!
The next step I took is to create an object model from the expression tree. This is also working well.
However, now I want to generate an expression from my object model.
Can I generate code using the generated language parser objects API? Obviously, I can write methods that hand-generates strings. But I want to use a geenrated API based on the grammar to achieve some level of type safety and to detect errors when I make a grammar change.
I'm using the latest antlr4: antlr 4.7.1.
There's no generated solution. You have to wire this all up manually.

antlr4 handling incomplete rule match because of parse error in visitor

I'm new to antlr4 and I'm trying to make good use of antlr's ability to recover from parser errors and proceed. I find that it can proceed to visit the parse tree even when there has been a parse error and it will match a rule but sometimes not all of the rule elements are there. This causes a problem in my visitor code because my code is expecting all elements of the rule match to be there and it throws an exception.
Two options I'm thinking of:
1) after parsing, check parser.getNumberOfSyntaxErrors() > 1 and don't proceed to visit the parse tree if so. This will stop an exception being thrown, but does not give the user as good feedback as possible. antlr4 does recover nicely from errors and can get to the next independent section of what I'm trying to parse so this is stronger than what I'd like.
2) I can wrap each self.visit() in something that will catch
an exception and react accordingly. I think this would work.
But, I'm wondering if there is something in ctx or otherwise that would tell me that what's below it in the parse tree is an incomplete match?
In case it is relevant, I'm using python with antlr 4.
As you have seen ANTLR4 tries to re-synchronize the input stream to the rule structure once an error was encountered. This is usually done by trying to detect a single missing token or a single additional token. Everything else usually leads to error nodes all the way to the end of the input.
Of course, if the input cannot be parsed successfully the parse tree will be incomplete, at least from the point of the error, which might be much earlier than where the actual error is located. This happens because of variable lookahead, which may consume much of the input to find a prediction early in the parsing process (and hence this can fail early).
In fact I recommend to follow path 1). Once you got a syntax error there's not much in the parse tree you can use. It's totally up to the grammar structure what part will be parsed successfully (don't assume it would always be everything up to the error position, as I just explained).

What is difference between DaemonStage and ElementProblemAnalyzer?

I'm developing Resharper plugin and I don't realize between Deamon Stages and Element Problem Analyzers?
When I need use one or another? If they both provide code analysis.
An ElementProblemAnalyzer<T> will only be called for specific nodes in the abstract syntax tree, while a daemon stage gets to process the whole file. The nodes you're interested in are registered in the ElementProblemAnalyzerAttribute constructor, and the T parameter of the base class is the common node interface. If you're interested in just one node type, it's the interface for that node, if you're interested in several, it would be the most common base type, perhaps ITreeNode or ICSharpTreeNode.
[ElementProblemAnalyzer(typeof(ICSharpArgument),…)]
public class MyAnalyzer : ElementProblemAnalyzer<ICSharpArgument>
{
// ...
}
You'd use an element problem analyser if you only need to check a particular node, without looking at the rest of the file. You can still navigate from the node you're at (for example, given a method call, you could have an analyser for the argument, in which you navigate from the argument node up to the method call node, and look to see if the argument is the same as the default value, meaning it's redundant code).
You would use a daemon stage if you need more context within the file, for example, a list of all of the methods in a class, or more control of how the abstract syntax tree is walked - you can skip child nodes of a method declaration if you're not interested in the statements or expressions within.
If it helps, element problem analysers are actually implemented as daemon stages. They're only supported by C#, VB, JS and XML. Each language has a daemon stage that walks the AST for error checking, and as it does so, calls Run for each analyser that's interested in each node type of the tree.

How to find matching expressions while using a ParseTreeWalker

Say I'd like to find instances of the expression while using the Java7 grammar:
FoobarClass.getInstanceOfType("Bazz");
Using a ParseTreeWalker and listening to exitExpression() calls sounded like a good first place to start. What surprised me was the level of manual traversal of the Java7Parser.ExpressionContext required to find expressions of this type.
What's the appropriate method to find matches to the above expression? At this point using a Regex in place of ANTLR4 yields simpler code, but this won't scale.
ANTLR 4 does not currently include feature allowing you to write concrete or abstract syntax queries. We hope to add something in the future to help with this type of application.
I've needed to write a few pattern recognition features for ANTLR 4 parse trees. I implemented the predicate itself with relative success by extending BaseMyParserVisitor<Boolean> (the parser in this example is called MyParser).

Syntax tree with parent in Haskell

I want to implement an AST in Haskell. I need a parent reference so it seems impossible to use a functional data structure. I've seen the following in an article. We define a node as:
type Tree = Node -> Node
Node allows us to get attribute by key of type Key a.
Is there anything to read about such a pattern? Could you give me some further links?
If you want a pure data structure with cyclic self-references, then as delnan says in the comments the usual term for that is "tying the knot". Searching for that term should give you more information.
Do note that data structures built by tying the knot are difficult (or impossible) to "update" in the usual manner--with a non-cyclic structure you can keep pieces of the original when building a new structure based on it, but changing any piece of a cycle requires you to rebuild the entire cycle as well. Depending on what you're doing, this may or may not be a problem, of course.

Resources