In a rule expr : expr '<' expr | ...;
the ANTLR parser will accept expressions like 1 < 2 < 3 (and construct left-associative trees corrsponding to brackets (1 < 2) < 3.
You can tell ANTLR to treat operators as right associative, e.g.
expr : expr '<'<assoc=right> expr | ...;
to yield parse trees 1 < (2 < 3).
However, in many languages, relational operators are non-associative, i.e., an expression 1 < 2 < 3 is forbidden.
This can be specified in YACC and its derivates.
Can it also be specified in ANTLR?
E.g., as expr : expr '<'<assoc=no> expr | ...;
I was unable to find something in the ANTLR4-book so far.
How about the following approach. Basically the "result" of a < b has a type not compatible for another application of operator < or >:
expression
: boolExpression
| nonBoolExpression
;
boolExpression
: nonBoolExpression '<' nonBoolExpression
| nonBoolExpression '>' nonBoolExpression
| ...
;
nonBoolExpression
: expression '*' expression
| expression '+' expression
| ...
;
Although personally I'd go with Darien and rather detect the error after parsing.
Related
I'm using ANTLR 4 and have a fairly complex grammar. I'm trying to simplify here...
Given an expression like: true and or false I want a parsing error since the operands defined expect expressions on either side and this has an expr operand operand expr
My reduced grammar is:
grammar MappingExpression;
/* The start rule; begin parsing here.
operator precedence is implied by the ordering in this list */
// =======================
// = PARSER RULES
// =======================
expr:
| op=(TRUE|FALSE) # boolean
| expr op=AND expr # logand
| expr op=OR expr # logor
;
TRUE : 'true';
FALSE : 'false';
WS : [ \t\r\n]+ -> skip; // ignore whitespace
AND : 'and';
OR : 'or';
however, it seems that the parser stops after evaluating true even though it has all four tokens identified (e.g., alt state returned becomes 2 in the parser).
If I can't get a parsing exception (because it is seeing what I deem operands as expressions), if I got the entire parse tree I could throw a runtime exception for two operands in a row (e.g., 'and' and 'or').
Originally, I'd just had:
expr 'and' expr #logand
expr 'or' expr #logor
and this suffered the same parsing problem (stopping early).
You should get a parsing error if you force the parser to consume all tokens by "anchoring" a rule with the built-in EOF
parse
: expr EOF
;
This is what I get when parsing the input true and or false:
See the error in the lower left corner:
line 1:9 extraneous input 'or' expecting {'true', 'false'}
line 1:17 missing {'true', 'false'} at '<EOF>'
Bart Kiers answer above is correct. I just wanted to provide more details for people working with Java who have experienced incomplete parsing issues.
I'd had a fairly complex g4 file that defined an expr as a series of OR'ed rules associated with tags (e.g., following a # that become the method name in the ExpressionsVisitor). While this seemed to work there were situations where I'd expected parsing errors but received none. I also had situations where only part of an input to the parser was interpreted making it impossible to process the entire input statement.
I repaired the g4 file as follows (the full version is here):
// =======================
// = PARSER RULES
// =======================
expr_to_eof : expr EOF ;
expr:
ID # id
| '*' # field_values
| DESCEND # descendant
| DOLLAR # context_ref
| ROOT # root_path
| ARR_OPEN exprOrSeqList? ARR_CLOSE # array_constructor
| OBJ_OPEN fieldList? OBJ_CLOSE # object_constructor
| expr '.' expr # path
| expr ARR_OPEN ARR_CLOSE # to_array
| expr ARR_OPEN expr ARR_CLOSE # array
| expr OBJ_OPEN fieldList? OBJ_CLOSE # object
| VAR_ID (emptyValues | exprValues) # function_call
| FUNCTIONID varList '{' exprList? '}' # function_decl
| VAR_ID ASSIGN (expr | (FUNCTIONID varList '{' exprList? '}')) # var_assign
| (FUNCTIONID varList '{' exprList? '}') exprValues # function_exec
| op=(TRUE|FALSE) # boolean
| op='-' expr # unary_op
| expr op=('*'|'/'|'%') expr # muldiv_op
| expr op=('+'|'-') expr # addsub_op
| expr op='&' expr # concat_op
| expr op=('<'|'<='|'>'|'>='|'!='|'=') expr # comp_op
| expr 'in' expr # membership
| expr 'and' expr #logand
| expr 'or' expr # logor
| expr '?' expr (':' expr)? # conditional
| expr CHAIN expr # fct_chain
| '(' (expr (';' (expr)?)*)? ')' # parens
| VAR_ID # var_recall
| NUMBER # number
| STRING # string
| 'null' # null
;
Based on Bart's suggestion I added the top rule for expr_to_eof that resulted in that method being added to the MappingExpressionParser. So, in my Expressions class where before I'd called tree = parser.expr(); I now needed to call tree = parser.expr_to_eof(); which resulted in a ParseTree that included a last child for the Token.EOF.
Because my code needed to check some conditions for the first and last step performed it was easiest for me to add the following to strip out the <EOF> and get back the ParseTree (ExprContext rather than Expr_to_eofContext) I had been using by adding this statement:
newTree = ((Expr_to_eofContext)tree).expr();
So, overall, it was quite easy to fix a long standing bug (and others I'd postponed addressing) just by adding the new rule in the .g4 file and changing the parser so it would parse to end of file () and then extract the entire expression that was parsed.
I expect this will allow me to add considerably more functions to JSONata4Java to match the JavaScript version jsonata.js
Thanks again Bart!
I have a ANTR4 rule "expression" that can be either "maths" or "comparison", but "comparison" can contain "maths". Here a concrete code:
expression
: ID
| maths
| comparison
;
maths
: maths_atom ((PLUS | MINUS) maths_atom) ? // "?" because in fact there is first multiplication then pow and I don't want to force a multiplication to make an addition
;
maths_atom
: NUMBER
| ID
| OPEN_PAR expression CLOSE_PAR
;
comparison
: comp_atom ((EQUALS | NOT_EQUALS) comp_atom) ?
;
comp_atom
: ID
| maths // here is the expression of interest
| OPEN_PAR expression CLOSE_PAR
;
If I give, for instance, 6 as input, this is fine for the parse tree, because it detects maths. But in the ANTLR4 plugin for Intellij Idea, it mark my expression rule as red - ambiguity. Should I say goodbye to a short parse tree and allow only maths trough comparison in expression so it is not so ambiguous anymore ?
The problem is that when the parser sees 6, which is a NUMBER, it has two paths of reaching it through your grammar:
expression - maths - maths_atom - NUMBER
or
expression - comparison - comp_atom - NUMBER
This ambiguity triggers the error that you see.
You can fix this by flattening your parser grammar as shown in this tutorial:
start
: expr | <EOF>
;
expr
: expr (PLUS | MINUS) expr # ADDGRP
| expr (EQUALS | NOT_EQUALS) expr # COMPGRP
| OPEN_PAR expression CLOSE_PAR # PARENGRP
| NUMBER # NUM
| ID # IDENT
;
I define my own grammars using antlr 4 and I want to build tree true According to Priority of Operations (+ * - /) ....
I find sample on do Priority of Operations (* +) it work fine ...
I try to edit it to add the Priority of Operations (- /) but I failed :(
the grammars for Priority of Operations (+ *) is :
println:PRINTLN expression SEMICOLON {System.out.println($expression.value);};
expression returns [Object value]:
t1=factor {$value=(int)$t1.value;}
(PLUS t2=factor{$value=(int)$value+(int)$t2.value;})*;
factor returns [Object value]: t1=term {$value=(int)$t1.value;}
(MULT t2=term{$value=(int)$value*(int)$t2.value;})*;
term returns [Object value]:
NUMBER {$value=Integer.parseInt($NUMBER.text);}
| ID {$value=symbolTable.get($value=$ID.text);}
| PAR_OPEN expression {$value=$expression.value;} PAR_CLOSE
;
MULT :'*';
PLUS :'+';
MINUS:'-';
DIV:'/' ;
How I can add to them the Priority of Operations (- /) ?
In ANTLR3 (and ANTLR4) * and / can be given a higher precedence than + and - like this:
println
: PRINTLN expression SEMICOLON
;
expression
: factor ( PLUS factor
| MINUS factor
)*
;
factor
: term ( MULT term
| DIV term
)*
;
term
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
;
But in ANTLR4, this will also work:
println
: PRINTLN expression SEMICOLON
;
expression
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
| expression ( MULT | DIV ) expression // higher precedence
| expression ( PLUS | MINUS ) expression // lower precedence
;
You normally solve this by defining expression, term, and factor production rules. Here's a grammar (specified in EBNF) that implements unary + and unary -, along with the 4 binary arithmetic operators, plus parentheses:
start ::= expression
expression ::= term (('+' term) | ('-' term))*
term ::= factor (('*' factor) | ('/' factor))*
factor :: = (number | group | '-' factor | '+' factor)
group ::= '(' expression ')'
where number is a numeric literal.
I read "The Definite ANTLR4 Reference" and it says
While ANTLR v4 can handle direct left recursion, it can’t handle indirect left
recursion.
on page 71.
But in json grammar on page 90 i see next
grammar JSON;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair: STRING ':' value ;
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // indirect recursion
| array // indirec recursion
| 'true'
| 'false'
| 'null'
;
Is it correct?
The JSON grammar you mentioned is not a problem because it actually doesn't contain any indirect left recursion.
The rule value can produce array and array can again produce something which contains value, but not as it's leftmost part. (there is a [ preceding value)
The value rule would only be a problem if there would be some way to produce value folowed by any terminals and non-terminals.
From the book
A left-recursive rule is one that
either directly or indirectly invokes itself on the left edge of an alternative.
Example:
expr : expr '*' expr // match subexpressions joined with '*'
| expr '+' expr // match subexpressions joined with '+' operator
| INT // matches simple integer atom
;
It is left recursion because there is at least one alternative immediatly started with expr. Also it is direct left recursion.
Example of indirect left recursion:
expr : addition // indirectly invokes expr left recursively via addition
| ...
;
addition : expr '+' expr
;
I'm trying to match an operator of variable arity (e.g. "1 < 3 < x < 10" yields true, given that 3 < x < 10) within a mathematical expression. Note that this is unlike most languages would parse the expression)
The (simplified) production rule is:
expression: '(' expression ')' # parenthesisExpression
| expression ('*' | '/' | '%') expression # multiplicationExpression
| expression ('+' | '-') expression # additionExpression
| expression (SMALLER_THAN expression)+ # smallerThanExpression
| IDENTIFIER # variableExpression
;
How do we keep the precedence, but still parse the smallerThanExpression as greedy as possible?
For example; "1 < 1+1 < 3" should be parsed as a single parse node "smallerThanExpression" with three child nodes, each of which is an expression. At this moment, the smallerThanExpression is broken up in two smallerThanExpressions (1 < (1+1 < 3)).
To give an answer for "future generations": we fixed it by separating arithmetic expressions from the other expressions. We know that only arithmetic expressions can be used as operands for our variable-arity operators ('true < false' is not a valid expression).
expression:
'!' expression
| arithmetic (SMALLER_THAN arithmetic)+
| arithmetic (GREATER_THAN arithmetic)+
| ....
;
arithmetic:
'(' expression ')'
| expression ('*' | '/' | '%') expression
| expression ('+' | '-') expression
| IDENTIFIER
| ...
;
This enforces an expression such as "x < y < z" to be parsed as a single 'expression' node with three 'arithmetic' nodes as children.
(Note that an identifier might refer to a non-integer object; this is checked in the context checker)