Mutually left-recursive rules - antlr4

I have the following rules in a grammar:
CCExpression
: LiteralExpression
| CCParenthesizedExpression
| CCSimpleNameExpression
| CCCastExpression
| CCOperatorExpression
| CCConditionalExpression
;
CCOperatorExpression
: CCUnaryOperator CCExpression
| CCExpression CCBinaryOperator CCExpression
;
and I am getting the following error:
The following sets of rules are mutually left-recursive [CCExpression, CCOperatorExpression]
I tried to fold the CCOperatorExpression rule into the CCExpression rule:
CCExpression
: CCExpression CCBinaryOperator CCExpression
| CCUnaryOperator CCExpression
| '(' CCExpression ')'
| LiteralExpression
| CCSimpleNameExpression
| CCCastExpression
| CCConditionalExpression
;
but that didn't seem to help. I still get:
The following sets of rules are mutually left-recursive [CCExpression]
How can I fix this?

That is because lexer rules can’t be left recursive, only parser rules can.
See: Practical difference between parser rules and lexer rules in ANTLR?

Related

ANTLR parser to throw exception for "true and or false" statement

I'm using ANTLR 4 and have a fairly complex grammar. I'm trying to simplify here...
Given an expression like: true and or false I want a parsing error since the operands defined expect expressions on either side and this has an expr operand operand expr
My reduced grammar is:
grammar MappingExpression;
/* The start rule; begin parsing here.
operator precedence is implied by the ordering in this list */
// =======================
// = PARSER RULES
// =======================
expr:
| op=(TRUE|FALSE) # boolean
| expr op=AND expr # logand
| expr op=OR expr # logor
;
TRUE : 'true';
FALSE : 'false';
WS : [ \t\r\n]+ -> skip; // ignore whitespace
AND : 'and';
OR : 'or';
however, it seems that the parser stops after evaluating true even though it has all four tokens identified (e.g., alt state returned becomes 2 in the parser).
If I can't get a parsing exception (because it is seeing what I deem operands as expressions), if I got the entire parse tree I could throw a runtime exception for two operands in a row (e.g., 'and' and 'or').
Originally, I'd just had:
expr 'and' expr #logand
expr 'or' expr #logor
and this suffered the same parsing problem (stopping early).
You should get a parsing error if you force the parser to consume all tokens by "anchoring" a rule with the built-in EOF
parse
: expr EOF
;
This is what I get when parsing the input true and or false:
See the error in the lower left corner:
line 1:9 extraneous input 'or' expecting {'true', 'false'}
line 1:17 missing {'true', 'false'} at '<EOF>'
Bart Kiers answer above is correct. I just wanted to provide more details for people working with Java who have experienced incomplete parsing issues.
I'd had a fairly complex g4 file that defined an expr as a series of OR'ed rules associated with tags (e.g., following a # that become the method name in the ExpressionsVisitor). While this seemed to work there were situations where I'd expected parsing errors but received none. I also had situations where only part of an input to the parser was interpreted making it impossible to process the entire input statement.
I repaired the g4 file as follows (the full version is here):
// =======================
// = PARSER RULES
// =======================
expr_to_eof : expr EOF ;
expr:
ID # id
| '*' # field_values
| DESCEND # descendant
| DOLLAR # context_ref
| ROOT # root_path
| ARR_OPEN exprOrSeqList? ARR_CLOSE # array_constructor
| OBJ_OPEN fieldList? OBJ_CLOSE # object_constructor
| expr '.' expr # path
| expr ARR_OPEN ARR_CLOSE # to_array
| expr ARR_OPEN expr ARR_CLOSE # array
| expr OBJ_OPEN fieldList? OBJ_CLOSE # object
| VAR_ID (emptyValues | exprValues) # function_call
| FUNCTIONID varList '{' exprList? '}' # function_decl
| VAR_ID ASSIGN (expr | (FUNCTIONID varList '{' exprList? '}')) # var_assign
| (FUNCTIONID varList '{' exprList? '}') exprValues # function_exec
| op=(TRUE|FALSE) # boolean
| op='-' expr # unary_op
| expr op=('*'|'/'|'%') expr # muldiv_op
| expr op=('+'|'-') expr # addsub_op
| expr op='&' expr # concat_op
| expr op=('<'|'<='|'>'|'>='|'!='|'=') expr # comp_op
| expr 'in' expr # membership
| expr 'and' expr #logand
| expr 'or' expr # logor
| expr '?' expr (':' expr)? # conditional
| expr CHAIN expr # fct_chain
| '(' (expr (';' (expr)?)*)? ')' # parens
| VAR_ID # var_recall
| NUMBER # number
| STRING # string
| 'null' # null
;
Based on Bart's suggestion I added the top rule for expr_to_eof that resulted in that method being added to the MappingExpressionParser. So, in my Expressions class where before I'd called tree = parser.expr(); I now needed to call tree = parser.expr_to_eof(); which resulted in a ParseTree that included a last child for the Token.EOF.
Because my code needed to check some conditions for the first and last step performed it was easiest for me to add the following to strip out the <EOF> and get back the ParseTree (ExprContext rather than Expr_to_eofContext) I had been using by adding this statement:
newTree = ((Expr_to_eofContext)tree).expr();
So, overall, it was quite easy to fix a long standing bug (and others I'd postponed addressing) just by adding the new rule in the .g4 file and changing the parser so it would parse to end of file () and then extract the entire expression that was parsed.
I expect this will allow me to add considerably more functions to JSONata4Java to match the JavaScript version jsonata.js
Thanks again Bart!

ANTLR4: what design pattern to follow?

I have a ANTR4 rule "expression" that can be either "maths" or "comparison", but "comparison" can contain "maths". Here a concrete code:
expression
: ID
| maths
| comparison
;
maths
: maths_atom ((PLUS | MINUS) maths_atom) ? // "?" because in fact there is first multiplication then pow and I don't want to force a multiplication to make an addition
;
maths_atom
: NUMBER
| ID
| OPEN_PAR expression CLOSE_PAR
;
comparison
: comp_atom ((EQUALS | NOT_EQUALS) comp_atom) ?
;
comp_atom
: ID
| maths // here is the expression of interest
| OPEN_PAR expression CLOSE_PAR
;
If I give, for instance, 6 as input, this is fine for the parse tree, because it detects maths. But in the ANTLR4 plugin for Intellij Idea, it mark my expression rule as red - ambiguity. Should I say goodbye to a short parse tree and allow only maths trough comparison in expression so it is not so ambiguous anymore ?
The problem is that when the parser sees 6, which is a NUMBER, it has two paths of reaching it through your grammar:
expression - maths - maths_atom - NUMBER
or
expression - comparison - comp_atom - NUMBER
This ambiguity triggers the error that you see.
You can fix this by flattening your parser grammar as shown in this tutorial:
start
: expr | <EOF>
;
expr
: expr (PLUS | MINUS) expr # ADDGRP
| expr (EQUALS | NOT_EQUALS) expr # COMPGRP
| OPEN_PAR expression CLOSE_PAR # PARENGRP
| NUMBER # NUM
| ID # IDENT
;

ANTLR4.7 listener for a rule when sub rules are labeled

I have an antlr4.7 grammar like this, where all sub rules are labeled.
date_expr
: attr op=( '+' | '-' ) dt_interval=ISO8601_INTERVAL
#dateexpr_Op
| DATETIME_NAME
#dateexpr_Named
| d=( DATETIME_LITERAL | DATE_LITERAL | TIME_LITERAL )
#dateexpr_Literal
| attr
#dateexpr_Attr
| '(' date_expr ')'
#dateexpr_Paren
;
I would like to annotate the tree when a date_expr rule completes. However, looking at the generated listener class, I see no exitDate_expr. How can I add this? Or, do I have to use a visitor interface for it. I am not much familiar with grammar tools.
Thanks.
To achieve beforeAllLabledAlts and afterAllLabledAlts visit points, wrap the labeled alt rule in a singleton rule:
anyDate : dateExpr ;
dateExpr
: attr op=( '+' | '-' ) dt_interval=ISO8601_INTERVAL #dateexpr_Op
| DATETIME_NAME #dateexpr_Named
| d=( DATETIME_LITERAL | DATE_LITERAL | TIME_LITERAL ) #dateexpr_Literal
| attr #dateexpr_Attr
| '(' date_expr ')' #dateexpr_Paren
;
The ANTLR tool will then generate the listener interface (and/or visitor interface) with AnyDateContext onEntry and onExit methods.

Antlr4 perentheses and arithmetics

I am parsing an SQL like language of which I need to handle arithmetics with precedence.
Things could be like this:
(a + b) - c
(a + b) / 1000
a + (b - c)
a + (SELECT...)
(SELECT... ) + (SELECT ...)
etc..
I am using the antlr4 listeners pattern and so I can't find a way to build a representation tree for these arithmetic clauses.
grammer parts:
arithmetic_select_clause:
result_column arithmeticExpression result_column # ArithmeticSelect
| result_column arithmeticExpression arithmetic_select_clause # ArithmeticSelect
| arithmetic_select_clause arithmeticExpression result_column # ArithmeticSelect
| '(' arithmetic_select_clause ')' # ArithmeticSelectParentheses
;
arithmeticExpression : '+' # arithmeticsAdd
| '-' # arithmeticsSubtract
| '*' # arithmeticsMultiply
| '/' # arithmeticsDivide
| '%' # arithmeticsModulus
;
I can create a tree using the antlr listenres but I cant handle precedence.
Help please
ANTLR can help you there but you need to follow a few rules for it to do so. The arithmeticExpression rule needs to contain both operands and be directly recursive so that ANTLR can figure out how to rewrite it.
Here's an example of what you could do:
expression : '(' expression ')'
| expression op=('*'|'/'|'%') expression
| expression op=('+'|'-') expression
| result_column
| arithmetic_select_clause
;
This rule is left-recursive but ANTLR will rewrite it to eliminate the left-recursion. Relevant docs.
Notice how the levels of precedence are ordered. Each level gets its alternative. Same-precedence operators are on one level.
Also, for processing math expressions it's much easier to use a visitor than a listener. ANTLR can generate the base classes for you. It'll be much easier to traverse/process the parse tree in the precedence order this way.

Struggling to parse array notation

I have a very simple grammar to parse statements.
Here are examples of the type of statements that need be parsed:
a.b.c
a.b.c == "88"
The issue I am having is that array notation is not matching. For example, things that are not working:
a.b[0].c
a[3][4]
I hope someone can point out what I am doing wrong here. (I am testing in ANTLRWorks)
Here is the grammar (generationUnit is my entry point):
grammar RatBinding;
generationUnit: testStatement | statement;
arrayAccesor : identifier arrayNotation+;
arrayNotation: '[' Number ']';
testStatement:
(statement | string | Number | Bool )
(greaterThanAndEqual
| lessThanOrEqual
| greaterThan
| lessThan | notEquals | equals)
(statement | string | Number | Bool )
;
part: identifier | arrayAccesor;
statement: part ('.' part )*;
string: ('"' identifier '"') | ('\'' identifier '\'');
greaterThanAndEqual: '>=';
lessThanOrEqual: '<=';
greaterThan: '>';
lessThan: '<';
notEquals : '!=';
equals: '==';
identifier: Letter (Letter|Digit)*;
Bool : 'true' | 'false';
ArrayLeft: '\u005B';
ArrayRight: '\u005D';
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f '|
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
Digit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;
You referenced the non-existent rule Number in the arrayNotation parser rule.
A Digit rule does exist in the lexer, but it will only match a single-digit number. For example, 1 is a Digit, but 10 is two separate Digit tokens so a[10] won't match the arrayAccesor rule. You probably want to resolve this in two parts:
Create a Number token consisting of one or more digits.
Number
: Digit+
;
Mark Digit as a fragment rule to indicate that it doesn't form tokens on its own, but is merely intended to be referenced from other lexer rules.
fragment // prevents a Digit token from being created on its own
Digit
: ...
You will not need to change arrayNotation because it already references the Number rule you created here.
Bah, waste of space. I Used Number instead of Digit in my array declaration.

Resources