How to do Priority of Operations (+ * - /) in my grammars? - antlr4

I define my own grammars using antlr 4 and I want to build tree true According to Priority of Operations (+ * - /) ....
I find sample on do Priority of Operations (* +) it work fine ...
I try to edit it to add the Priority of Operations (- /) but I failed :(
the grammars for Priority of Operations (+ *) is :
println:PRINTLN expression SEMICOLON {System.out.println($expression.value);};
expression returns [Object value]:
t1=factor {$value=(int)$t1.value;}
(PLUS t2=factor{$value=(int)$value+(int)$t2.value;})*;
factor returns [Object value]: t1=term {$value=(int)$t1.value;}
(MULT t2=term{$value=(int)$value*(int)$t2.value;})*;
term returns [Object value]:
NUMBER {$value=Integer.parseInt($NUMBER.text);}
| ID {$value=symbolTable.get($value=$ID.text);}
| PAR_OPEN expression {$value=$expression.value;} PAR_CLOSE
;
MULT :'*';
PLUS :'+';
MINUS:'-';
DIV:'/' ;
How I can add to them the Priority of Operations (- /) ?

In ANTLR3 (and ANTLR4) * and / can be given a higher precedence than + and - like this:
println
: PRINTLN expression SEMICOLON
;
expression
: factor ( PLUS factor
| MINUS factor
)*
;
factor
: term ( MULT term
| DIV term
)*
;
term
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
;
But in ANTLR4, this will also work:
println
: PRINTLN expression SEMICOLON
;
expression
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
| expression ( MULT | DIV ) expression // higher precedence
| expression ( PLUS | MINUS ) expression // lower precedence
;

You normally solve this by defining expression, term, and factor production rules. Here's a grammar (specified in EBNF) that implements unary + and unary -, along with the 4 binary arithmetic operators, plus parentheses:
start ::= expression
expression ::= term (('+' term) | ('-' term))*
term ::= factor (('*' factor) | ('/' factor))*
factor :: = (number | group | '-' factor | '+' factor)
group ::= '(' expression ')'
where number is a numeric literal.

Related

ANTLR4: what design pattern to follow?

I have a ANTR4 rule "expression" that can be either "maths" or "comparison", but "comparison" can contain "maths". Here a concrete code:
expression
: ID
| maths
| comparison
;
maths
: maths_atom ((PLUS | MINUS) maths_atom) ? // "?" because in fact there is first multiplication then pow and I don't want to force a multiplication to make an addition
;
maths_atom
: NUMBER
| ID
| OPEN_PAR expression CLOSE_PAR
;
comparison
: comp_atom ((EQUALS | NOT_EQUALS) comp_atom) ?
;
comp_atom
: ID
| maths // here is the expression of interest
| OPEN_PAR expression CLOSE_PAR
;
If I give, for instance, 6 as input, this is fine for the parse tree, because it detects maths. But in the ANTLR4 plugin for Intellij Idea, it mark my expression rule as red - ambiguity. Should I say goodbye to a short parse tree and allow only maths trough comparison in expression so it is not so ambiguous anymore ?
The problem is that when the parser sees 6, which is a NUMBER, it has two paths of reaching it through your grammar:
expression - maths - maths_atom - NUMBER
or
expression - comparison - comp_atom - NUMBER
This ambiguity triggers the error that you see.
You can fix this by flattening your parser grammar as shown in this tutorial:
start
: expr | <EOF>
;
expr
: expr (PLUS | MINUS) expr # ADDGRP
| expr (EQUALS | NOT_EQUALS) expr # COMPGRP
| OPEN_PAR expression CLOSE_PAR # PARENGRP
| NUMBER # NUM
| ID # IDENT
;

antlr4 json grammar and indirect left recursion

I read "The Definite ANTLR4 Reference" and it says
While ANTLR v4 can handle direct left recursion, it can’t handle indirect left
recursion.
on page 71.
But in json grammar on page 90 i see next
grammar JSON;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair: STRING ':' value ;
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // indirect recursion
| array // indirec recursion
| 'true'
| 'false'
| 'null'
;
Is it correct?
The JSON grammar you mentioned is not a problem because it actually doesn't contain any indirect left recursion.
The rule value can produce array and array can again produce something which contains value, but not as it's leftmost part. (there is a [ preceding value)
The value rule would only be a problem if there would be some way to produce value folowed by any terminals and non-terminals.
From the book
A left-recursive rule is one that
either directly or indirectly invokes itself on the left edge of an alternative.
Example:
expr : expr '*' expr // match subexpressions joined with '*'
| expr '+' expr // match subexpressions joined with '+' operator
| INT // matches simple integer atom
;
It is left recursion because there is at least one alternative immediatly started with expr. Also it is direct left recursion.
Example of indirect left recursion:
expr : addition // indirectly invokes expr left recursively via addition
| ...
;
addition : expr '+' expr
;

Antlr4 perentheses and arithmetics

I am parsing an SQL like language of which I need to handle arithmetics with precedence.
Things could be like this:
(a + b) - c
(a + b) / 1000
a + (b - c)
a + (SELECT...)
(SELECT... ) + (SELECT ...)
etc..
I am using the antlr4 listeners pattern and so I can't find a way to build a representation tree for these arithmetic clauses.
grammer parts:
arithmetic_select_clause:
result_column arithmeticExpression result_column # ArithmeticSelect
| result_column arithmeticExpression arithmetic_select_clause # ArithmeticSelect
| arithmetic_select_clause arithmeticExpression result_column # ArithmeticSelect
| '(' arithmetic_select_clause ')' # ArithmeticSelectParentheses
;
arithmeticExpression : '+' # arithmeticsAdd
| '-' # arithmeticsSubtract
| '*' # arithmeticsMultiply
| '/' # arithmeticsDivide
| '%' # arithmeticsModulus
;
I can create a tree using the antlr listenres but I cant handle precedence.
Help please
ANTLR can help you there but you need to follow a few rules for it to do so. The arithmeticExpression rule needs to contain both operands and be directly recursive so that ANTLR can figure out how to rewrite it.
Here's an example of what you could do:
expression : '(' expression ')'
| expression op=('*'|'/'|'%') expression
| expression op=('+'|'-') expression
| result_column
| arithmetic_select_clause
;
This rule is left-recursive but ANTLR will rewrite it to eliminate the left-recursion. Relevant docs.
Notice how the levels of precedence are ordered. Each level gets its alternative. Same-precedence operators are on one level.
Also, for processing math expressions it's much easier to use a visitor than a listener. ANTLR can generate the base classes for you. It'll be much easier to traverse/process the parse tree in the precedence order this way.

N-ary operator parsing

I'm trying to match an operator of variable arity (e.g. "1 < 3 < x < 10" yields true, given that 3 < x < 10) within a mathematical expression. Note that this is unlike most languages would parse the expression)
The (simplified) production rule is:
expression: '(' expression ')' # parenthesisExpression
| expression ('*' | '/' | '%') expression # multiplicationExpression
| expression ('+' | '-') expression # additionExpression
| expression (SMALLER_THAN expression)+ # smallerThanExpression
| IDENTIFIER # variableExpression
;
How do we keep the precedence, but still parse the smallerThanExpression as greedy as possible?
For example; "1 < 1+1 < 3" should be parsed as a single parse node "smallerThanExpression" with three child nodes, each of which is an expression. At this moment, the smallerThanExpression is broken up in two smallerThanExpressions (1 < (1+1 < 3)).
To give an answer for "future generations": we fixed it by separating arithmetic expressions from the other expressions. We know that only arithmetic expressions can be used as operands for our variable-arity operators ('true < false' is not a valid expression).
expression:
'!' expression
| arithmetic (SMALLER_THAN arithmetic)+
| arithmetic (GREATER_THAN arithmetic)+
| ....
;
arithmetic:
'(' expression ')'
| expression ('*' | '/' | '%') expression
| expression ('+' | '-') expression
| IDENTIFIER
| ...
;
This enforces an expression such as "x < y < z" to be parsed as a single 'expression' node with three 'arithmetic' nodes as children.
(Note that an identifier might refer to a non-integer object; this is checked in the context checker)

How can non-associative operators like "<" be specified in ANTLR4 grammars?

In a rule expr : expr '<' expr | ...;
the ANTLR parser will accept expressions like 1 < 2 < 3 (and construct left-associative trees corrsponding to brackets (1 < 2) < 3.
You can tell ANTLR to treat operators as right associative, e.g.
expr : expr '<'<assoc=right> expr | ...;
to yield parse trees 1 < (2 < 3).
However, in many languages, relational operators are non-associative, i.e., an expression 1 < 2 < 3 is forbidden.
This can be specified in YACC and its derivates.
Can it also be specified in ANTLR?
E.g., as expr : expr '<'<assoc=no> expr | ...;
I was unable to find something in the ANTLR4-book so far.
How about the following approach. Basically the "result" of a < b has a type not compatible for another application of operator < or >:
expression
: boolExpression
| nonBoolExpression
;
boolExpression
: nonBoolExpression '<' nonBoolExpression
| nonBoolExpression '>' nonBoolExpression
| ...
;
nonBoolExpression
: expression '*' expression
| expression '+' expression
| ...
;
Although personally I'd go with Darien and rather detect the error after parsing.

Resources