How to write context free grammar aspect the priority of operations in antlr4

How to write context free grammar aspect the priority of operations in antlr4 - antlr4

we knew the priority of logical operation from strong to low:
Not
And
Or
I want to add logical operation to my grammar in way respect the priority of logical operation. ...
My grammar is:
expression : factor ( PLUS factor | MINUS factor )* ;
factor : term ( MULT term | DIV term )* ;
term : NUMBER | ID | PAR_OPEN expression PAR_CLOSE ;

With ANTLR3 and ANTLR 4, you can doe something like this:
expression
: or_expression
;
// lowest precedence
or_expression
: and_expression ( '||' and_expression )*
;
and_expression
: rel_expression ( '&&' rel_expression )*
;
rel_expression
: add_expression ( ( '<' | '<=' | '>' | '>=' ) add_expression )*
;
add_expression
: mult_expression ( ( '+' | '-' ) mult_expression )*
;
mult_expression
: unary_expression ( ( '*' | '/' ) unary_expression )*
;
unary_expression
: '-' atom
| atom
;
// highest precedence
atom
: NUMBER
| ID
| '(' expression ')'
;
And with ANTLR4, you can also write it like this (which is equivalent to the grammar above!):
expression
: '!' expression
| expression ( '*' | '/' ) expression // higher than '+' | '-'
| expression ( '+' | '-' ) expression // higher than '<' | '<=' | '>' | '>='
| expression ( '<' | '<=' | '>' | '>=' ) expression // higher than '&&'
| expression '&&' expression // higher than '||'
| expression '||' expression
| NUMBER
| ID
| '(' expression ')'
;

Related

Antlr4 Grammar fails to parse

I am new to ANTLR, and here is a grammar that I am working on and its failing for a given input string - A.B() && ((C.D() || E.F())).
I have tried a number of combinations but its failing on the same place.
grammar Expressions;
expression
: logicBlock (logicalConnector logicBlock)*
| NOT? '('? logicBlock ')'? (logicalConnector NOT? '('? logicBlock ')'?)*
;
logicBlock
: logicUnit comparator THINGS
| logicUnit comparator logicUnit
| logicUnit
;
logicUnit
: NOT? '(' method ')'
| NOT? method
;
method
: object '.' function ('.' function)*
;
object
: THINGS
|'(' THINGS ')'
;
function
: THINGS '(' arguments? ')'
;
arguments
: (object | function | method | logicUnit | logicBlock)
(
','
(object | function | method | logicUnit | logicBlock)
)*
;
logicalConnector
: AND | OR | PLUS | MINUS
;
comparator
: GT | LT | GTE | LTE | EQUALS | NOTEQUALS
;
AND : '&&' ;
OR : '||' ;
EQUALS : '==' ;
ASSIGN : '=' ;
GT : '>' ;
LT : '<' ;
GTE : '>=' ;
LTE : '<=' ;
NOTEQUALS : '!=' ;
NOT : '!' ;
PLUS : '+' ;
MINUS : '-' ;
IF : 'if' ;
THINGS
: [a-zA-Z] [a-zA-Z0-9]*
| '"' .*? '"'
| [0-9]+
| ([0-9]+)? '.' ([0-9])+
;
WS : [ \t\r\n]+ -> skip
;
The error that I getting for this input - A.B() && ((C.D() || E.F())) is below. any help and/or suggestion to improve would be highly appreciated.

This rule looks odd to me:
expression
: logicBlock (logicalConnector logicBlock)*
| NOT? '('? logicBlock ')'? (logicalConnector NOT? '('? logicBlock ')'?)*
// ^ ^ ^ ^
// | | | |
;
All the optional parenthesis can't be right: this means the the first can be present, and the other 3 would be omitted (and any other combination), leaving your expression with unbalanced parenthesis.
The get your grammar working in the input A.B() && ((C.D() || E.F())) , you'll want to do something like this:
expression
: logicBlock (logicalConnector logicBlock)*
| NOT? logicBlock (logicalConnector NOT? logicBlock)*
;
logicBlock
: '(' expression ')'
| logicUnit comparator THINGS
| logicUnit comparator logicUnit
| logicUnit
;
logicUnit
: '(' expression ')'
| NOT? '(' method ')'
| NOT? method
;
But ANTLR4 supports left recursive rules, allowing you to define the expression rule like this:
expression
: '(' expression ')'
| NOT expression
| expression comparator expression
| expression logicalConnector expression
| method
| object
| THINGS
;
method
: object '.' function ( '.' function )*
;
object
: THINGS
|'(' THINGS ')'
;
function
: THINGS '(' arguments? ')'
;
arguments
: expression ( ',' expression )*
;
logicalConnector
: AND | OR | PLUS | MINUS
;
comparator
: GT | LT | GTE | LTE | EQUALS | NOTEQUALS
;
making it much more readable. Sure, it doesn't produce the exact parse tree your original grammar was producing, but you might be flexible in that regard. This proposed grammar also matches more than yours: like NOT A, which your grammar does not allow, where my proposed grammar does accept it.

How to parse statement in order of desired precedence using antlr?

I have an RSQL grammar defined:
grammar Rsql;
statement
: L_PAREN wrapped=statement R_PAREN
| left=statement op=( AND_OPERATOR | OR_OPERATOR ) right=statement
| node=comparison
;
comparison
: single_comparison
| multi_comparison
| bool_comparison
;
single_comparison
: key=IDENTIFIER op=( EQ | NE | GT | GTE | LT | LTE ) value=single_value
;
multi_comparison
: key=IDENTIFIER op=( IN | NIN ) value=multi_value
;
bool_comparison
: key=IDENTIFIER op=EX value=boolean_value
;
boolean_value
: BOOLEAN
;
single_value
: boolean_value
| ( STRING_LITERAL | IDENTIFIER )
| NUMERIC_LITERAL
;
multi_value
: L_PAREN single_value ( COMMA single_value )* R_PAREN
| single_value
;
TRUE: 'true';
FALSE: 'false';
AND_OPERATOR: ';';
OR_OPERATOR: ',';
L_PAREN: '(';
R_PAREN: ')';
COMMA: ',';
EQ: '==';
NE: '!=';
IN: '=in=';
NIN: '=out=';
GT: '=gt=';
LT: '=lt=';
GTE: '=ge=';
LTE: '=le=';
EX: '=ex=';
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
BOOLEAN
: TRUE
| FALSE
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( [-+]? DIGIT+ )?
| '.' DIGIT+ ( [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( STRING_ESCAPE_SEQ | ~[\\\r\n'] )* '\''
| '"' ( STRING_ESCAPE_SEQ | ~[\\\r\n"] )* '"'
;
STRING_ESCAPE_SEQ
: '\\' .
;
fragment DIGIT : [0-9];
No matter how I attempt to parse this (listener/visitor), the statements with parenthesis always get evaluated in order. It is my understanding that the order in the rule would be the precedence. However, the parse tree for a statement like "name==foo,(name==bar;age=gt=35)" is always
no matter where the parenthesis appear. Please help me discover what I'm missing. Thanks!

how can I refactor this ANTLR4 grammar so that it isn't mutually left recursive?

I can't seem to figure out why this grammar won't compile. It compiled fine until I modified line 145 from
(Identifier '.')* functionCall
to
(primary '.')? functionCall
I've been trying to figure out how to solve this issue for a while but I can't seem to be able to. Here's the error:
The following sets of rules are mutually left-recursive [primary]
grammar Tadpole;
#header
{package net.tadpole.compiler.parser;}
file
: fileContents*
;
fileContents
: structDec
| functionDec
| statement
| importDec
;
importDec
: 'import' Identifier ';'
;
literal
: IntegerLiteral
| FloatingPointLiteral
| BooleanLiteral
| CharacterLiteral
| StringLiteral
| NoneLiteral
| arrayLiteral
;
arrayLiteral
: '[' expressionList? ']'
;
expressionList
: expression (',' expression)*
;
expression
: primary
| unaryExpression
| <assoc=right> expression binaryOpPrec0 expression
| <assoc=left> expression binaryOpPrec1 expression
| <assoc=left> expression binaryOpPrec2 expression
| <assoc=left> expression binaryOpPrec3 expression
| <assoc=left> expression binaryOpPrec4 expression
| <assoc=left> expression binaryOpPrec5 expression
| <assoc=left> expression binaryOpPrec6 expression
| <assoc=left> expression binaryOpPrec7 expression
| <assoc=left> expression binaryOpPrec8 expression
| <assoc=left> expression binaryOpPrec9 expression
| <assoc=left> expression binaryOpPrec10 expression
| <assoc=right> expression binaryOpPrec11 expression
;
unaryExpression
: unaryOp expression
| prefixPostfixOp primary
| primary prefixPostfixOp
;
unaryOp
: '+'
| '-'
| '!'
| '~'
;
prefixPostfixOp
: '++'
| '--'
;
binaryOpPrec0
: '**'
;
binaryOpPrec1
: '*'
| '/'
| '%'
;
binaryOpPrec2
: '+'
| '-'
;
binaryOpPrec3
: '>>'
| '>>>'
| '<<'
;
binaryOpPrec4
: '<'
| '>'
| '<='
| '>='
| 'is'
;
binaryOpPrec5
: '=='
| '!='
;
binaryOpPrec6
: '&'
;
binaryOpPrec7
: '^'
;
binaryOpPrec8
: '|'
;
binaryOpPrec9
: '&&'
;
binaryOpPrec10
: '||'
;
binaryOpPrec11
: '='
| '**='
| '*='
| '/='
| '%='
| '+='
| '-='
| '&='
| '|='
| '^='
| '>>='
| '>>>='
| '<<='
| '<-'
;
primary
: literal
| fieldName
| '(' expression ')'
| '(' type ')' (primary | unaryExpression)
| 'new' objType '(' expressionList? ')'
| primary '.' fieldName
| primary dimension
| (primary '.')? functionCall
;
functionCall
: functionName '(' expressionList? ')'
;
functionName
: Identifier
;
dimension
: '[' expression ']'
;
statement
: '{' statement* '}'
| expression ';'
| 'recall' ';'
| 'return' expression? ';'
| variableDec
| 'if' '(' expression ')' statement ('else' statement)?
| 'while' '(' expression ')' statement
| 'do' expression 'while' '(' expression ')' ';'
| 'do' '{' statement* '}' 'while' '(' expression ')' ';'
;
structDec
: 'struct' structName ('(' parameterList ')')? '{' variableDec* functionDec* '}'
;
structName
: Identifier
;
fieldName
: Identifier
;
variableDec
: type fieldName ('=' expression)? ';'
;
type
: primitiveType ('[' ']')*
| objType ('[' ']')*
;
primitiveType
: 'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'boolean'
| 'float'
| 'double'
;
objType
: (Identifier '.')? structName
;
functionDec
: 'def' functionName '(' parameterList? ')' ':' type '->' functionBody
;
functionBody
: statement
;
parameterList
: parameter (',' parameter)*
;
parameter
: type fieldName
;
IntegerLiteral
: DecimalIntegerLiteral
| HexIntegerLiteral
| OctalIntegerLiteral
| BinaryIntegerLiteral
;
fragment
DecimalIntegerLiteral
: DecimalNumeral IntegerSuffix?
;
fragment
HexIntegerLiteral
: HexNumeral IntegerSuffix?
;
fragment
OctalIntegerLiteral
: OctalNumeral IntegerSuffix?
;
fragment
BinaryIntegerLiteral
: BinaryNumeral IntegerSuffix?
;
fragment
IntegerSuffix
: [lL]
;
fragment
DecimalNumeral
: Digit (Digits? | Underscores Digits)
;
fragment
Digits
: Digit (DigitsAndUnderscores? Digit)?
;
fragment
Digit
: [0-9]
;
fragment
DigitsAndUnderscores
: DigitOrUnderscore+
;
fragment
DigitOrUnderscore
: Digit
| '_'
;
fragment
Underscores
: '_'+
;
fragment
HexNumeral
: '0' [xX] HexDigits
;
fragment
HexDigits
: HexDigit (HexDigitsAndUnderscores? HexDigit)?
;
fragment
HexDigit
: [0-9a-fA-F]
;
fragment
HexDigitsAndUnderscores
: HexDigitOrUnderscore+
;
fragment
HexDigitOrUnderscore
: HexDigit
| '_'
;
fragment
OctalNumeral
: '0' [oO] Underscores? OctalDigits
;
fragment
OctalDigits
: OctalDigit (OctalDigitsAndUnderscores? OctalDigit)?
;
fragment
OctalDigit
: [0-7]
;
fragment
OctalDigitsAndUnderscores
: OctalDigitOrUnderscore+
;
fragment
OctalDigitOrUnderscore
: OctalDigit
| '_'
;
fragment
BinaryNumeral
: '0' [bB] BinaryDigits
;
fragment
BinaryDigits
: BinaryDigit (BinaryDigitsAndUnderscores? BinaryDigit)?
;
fragment
BinaryDigit
: [01]
;
fragment
BinaryDigitsAndUnderscores
: BinaryDigitOrUnderscore+
;
fragment
BinaryDigitOrUnderscore
: BinaryDigit
| '_'
;
// §3.10.2 Floating-Point Literals
FloatingPointLiteral
: DecimalFloatingPointLiteral FloatingPointSuffix?
| HexadecimalFloatingPointLiteral FloatingPointSuffix?
;
fragment
FloatingPointSuffix
: [fFdD]
;
fragment
DecimalFloatingPointLiteral
: Digits '.' Digits? ExponentPart?
| '.' Digits ExponentPart?
| Digits ExponentPart
| Digits
;
fragment
ExponentPart
: ExponentIndicator SignedInteger
;
fragment
ExponentIndicator
: [eE]
;
fragment
SignedInteger
: Sign? Digits
;
fragment
Sign
: [+-]
;
fragment
HexadecimalFloatingPointLiteral
: HexSignificand BinaryExponent
;
fragment
HexSignificand
: HexNumeral '.'?
| '0' [xX] HexDigits? '.' HexDigits
;
fragment
BinaryExponent
: BinaryExponentIndicator SignedInteger
;
fragment
BinaryExponentIndicator
: [pP]
;
BooleanLiteral
: 'true'
| 'false'
;
CharacterLiteral
: '\'' SingleCharacter '\''
| '\'' EscapeSequence '\''
;
fragment
SingleCharacter
: ~['\\]
;
StringLiteral
: '"' StringCharacters? '"'
;
fragment
StringCharacters
: StringCharacter+
;
fragment
StringCharacter
: ~["\\]
| EscapeSequence
;
fragment
EscapeSequence
: '\\' [btnfr"'\\]
| OctalEscape
| UnicodeEscape
;
fragment
OctalEscape
: '\\' OctalDigit
| '\\' OctalDigit OctalDigit
| '\\' ZeroToThree OctalDigit OctalDigit
;
fragment
ZeroToThree
: [0-3]
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
NoneLiteral
: 'nil'
;
Identifier
: IdentifierStartChar IdentifierChar*
;
fragment
IdentifierStartChar
: [a-zA-Z$_] // these are the "java letters" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
IdentifierChar
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
WS : [ \t\r\n\u000C]+ -> skip
;
LINE_COMMENT
: '#' ~[\r\n]* -> skip
;

The left recursive invocation needs to be the first, so no parenthesis can be placed before it.
You can rewrite it like this:
primary
: literal
| fieldName
| '(' expression ')'
| '(' type ')' (primary | unaryExpression)
| 'new' objType '(' expressionList? ')'
| primary '.' fieldName
| primary dimension
| primary '.' functionCall
| functionCall
;
which is equivalent.

Advice on handling an ambiguous operator in an ANTLR 4 grammar

I am writing an antlr grammar file for a dialect of basic. Most of it is either working or I have a good idea of what I need to do next. However, I am not at all sure what I should do with the '=' character which is used for both equality tests as well as assignment.
For example, this is a valid statement
t = (x = 5) And (y = 3)
This evaluates if x is EQUAL to 5, if y is EQUAL to 3 then performs a logical AND on those results and ASSIGNS the result to t.
My grammar will parse this; albeit incorrectly, but I think that will resolve itself once the ambiguity is resolved .
How do I differentiate between the two uses of the '=' character?
1) Should I remove the assignment rule from expression and handle these cases (assignment vs equality test) in my visitor and\or listener implementation during code generation
2) Is there a better way to define the grammar such that it is already sorted out
Would someone be able to simply point me in the right direction as to how best implement this language "feature"?
Also, I have been reading through the Definitive guide to ANTLR4 as well as Language Implementation Patterns looking for a solution to this. It may be there but I have not yet found it.
Below is the full parser grammar. The ASSIGN token is currently set to '='. EQUAL is set to '=='.
parser grammar wlParser;
options { tokenVocab=wlLexer; }
program
: multistatement (NEWLINE multistatement)* NEWLINE?
;
multistatement
: statement (COLON statement)*
;
statement
: declarationStat
| defTypeStat
| assignment
| expression
;
assignment
: lvalue op=ASSIGN expression
;
expression
: <assoc=right> left=expression op=CARAT right=expression #exponentiationExprStat
| (PLUS|MINUS) expression #signExprStat
| IDENTIFIER DATATYPESUFFIX? LPAREN expression RPAREN #arrayIndexExprStat
| left=expression op=(ASTERISK|FSLASH) right=expression #multDivExprStat
| left=expression op=BSLASH right=expression #integerDivExprStat
| left=expression op=KW_MOD right=expression #modulusDivExprStat
| left=expression op=(PLUS|MINUS) right=expression #addSubExprStat
| left=string op=AMPERSAND right=string #stringConcatenation
| left=expression op=(RELATIONALOPERATORS | KW_IS | KW_ISA) right=expression #relationalComparisonExprStat
| left=expression (op=LOGICALOPERATORS right=expression)+ #logicalOrAndExprStat
| op=KW_LIKE patternString #likeExprStat
| LPAREN expression RPAREN #groupingExprStat
| NUMBER #atom
| string #atom
| IDENTIFIER DATATYPESUFFIX? #atom
;
lvalue
: (IDENTIFIER DATATYPESUFFIX?) | (IDENTIFIER DATATYPESUFFIX? LPAREN expression RPAREN)
;
string
: STRING
;
patternString
: DQUOT (QUESTIONMARK | POUND | ASTERISK | LBRACKET BANG? .*? RBRACKET)+ DQUOT
;
referenceType
: DATATYPE
;
declarationStat
: constDecl
| varDecl
;
constDecl
: CONSTDECL? KW_CONST IDENTIFIER EQUAL expression
;
varDecl
: VARDECL (varDeclPart (COMMA varDeclPart)*)? | listDeclPart
;
varDeclPart
: IDENTIFIER DATATYPESUFFIX? ((arrayBounds)? KW_AS DATATYPE (COMMA DATATYPE)*)?
;
listDeclPart
: IDENTIFIER DATATYPESUFFIX? KW_LIST KW_AS DATATYPE
;
arrayBounds
: LPAREN (arrayDimension (COMMA arrayDimension)*)? RPAREN
;
arrayDimension
: INTEGER (KW_TO INTEGER)?
;
defTypeStat
: DEFTYPES DEFTYPERANGE (COMMA DEFTYPERANGE)*
;
This is the lexer grammar.
lexer grammar wlLexer;
NUMBER
: INTEGER
| REAL
| BINARY
| OCTAL
| HEXIDECIMAL
;
RELATIONALOPERATORS
: EQUAL
| NEQUAL
| LT
| LTE
| GT
| GTE
;
LOGICALOPERATORS
: KW_OR
| KW_XOR
| KW_AND
| KW_NOT
| KW_IMP
| KW_EQV
;
INSTANCEOF
: KW_IS
| KW_ISA
;
CONSTDECL
: KW_PUBLIC
| KW_PRIVATE
;
DATATYPE
: KW_BOOLEAN
| KW_BYTE
| KW_INTEGER
| KW_LONG
| KW_SINGLE
| KW_DOUBLE
| KW_CURRENCY
| KW_STRING
;
VARDECL
: KW_DIM
| KW_STATIC
| KW_PUBLIC
| KW_PRIVATE
;
LABEL
: IDENTIFIER COLON
;
DEFTYPERANGE
: [a-zA-Z] MINUS [a-zA-Z]
;
DEFTYPES
: KW_DEFBOOL
| KW_DEFBYTE
| KW_DEFCUR
| KW_DEFDBL
| KW_DEFINT
| KW_DEFLNG
| KW_DEFSNG
| KW_DEFSTR
| KW_DEFVAR
;
DATATYPESUFFIX
: PERCENT
| AMPERSAND
| BANG
| POUND
| AT
| DOLLARSIGN
;
STRING
: (DQUOT (DQUOTESC|.)*? DQUOT)
| (LBRACE (RBRACEESC|.)*? RBRACE)
| (PIPE (PIPESC|.|NEWLINE)*? PIPE)
;
fragment DQUOTESC: '\"\"' ;
fragment RBRACEESC: '}}' ;
fragment PIPESC: '||' ;
INTEGER
: DIGIT+ (E (PLUS|MINUS)? DIGIT+)?
;
REAL
: DIGIT+ PERIOD DIGIT+ (E (PLUS|MINUS)? DIGIT+)?
;
BINARY
: AMPERSAND B BINARYDIGIT+
;
OCTAL
: AMPERSAND O OCTALDIGIT+
;
HEXIDECIMAL
: AMPERSAND H HEXDIGIT+
;
QUESTIONMARK: '?' ;
COLON: ':' ;
ASSIGN: '=';
SEMICOLON: ';' ;
AT: '#' ;
LPAREN: '(' ;
RPAREN: ')' ;
DQUOT: '"' ;
LBRACE: '{' ;
RBRACE: '}' ;
LBRACKET: '[' ;
RBRACKET: ']' ;
CARAT: '^' ;
PLUS: '+' ;
MINUS: '-' ;
ASTERISK: '*' ;
FSLASH: '/' ;
BSLASH: '\\' ;
AMPERSAND: '&' ;
BANG: '!' ;
POUND: '#' ;
DOLLARSIGN: '$' ;
PERCENT: '%' ;
COMMA: ',' ;
APOSTROPHE: '\'' ;
TWOPERIODS: '..' ;
PERIOD: '.' ;
UNDERSCORE: '_' ;
PIPE: '|' ;
NEWLINE: '\r\n' | '\r' | '\n';
EQUAL: '==' ;
NEQUAL: '<>' | '><' ;
LT: '<' ;
LTE: '<=' | '=<';
GT: '>' ;
GTE: '=<'|'<=' ;
KW_AND: A N D ;
KW_BINARY: B I N A R Y ;
KW_BOOLEAN: B O O L E A N ;
KW_BYTE: B Y T E ;
KW_DATATYPE: D A T A T Y P E ;
KW_DATE: D A T E ;
KW_INTEGER: I N T E G E R ;
KW_IS: I S ;
KW_ISA: I S A ;
KW_LIKE: L I K E ;
KW_LONG: L O N G ;
KW_MOD: M O D ;
KW_NOT: N O T ;
KW_TO: T O ;
KW_FALSE: F A L S E ;
KW_TRUE: T R U E ;
KW_SINGLE: S I N G L E ;
KW_DOUBLE: D O U B L E ;
KW_CURRENCY: C U R R E N C Y ;
KW_STRING: S T R I N G ;
fragment BINARYDIGIT: ('0'|'1') ;
fragment OCTALDIGIT: ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7') ;
fragment DIGIT: '0'..'9' ;
fragment HEXDIGIT: ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9' | A | B | C | D | E | F) ;
fragment A: ('a'|'A');
fragment B: ('b'|'B');
fragment C: ('c'|'C');
fragment D: ('d'|'D');
fragment E: ('e'|'E');
fragment F: ('f'|'F');
fragment G: ('g'|'G');
fragment H: ('h'|'H');
fragment I: ('i'|'I');
fragment J: ('j'|'J');
fragment K: ('k'|'K');
fragment L: ('l'|'L');
fragment M: ('m'|'M');
fragment N: ('n'|'N');
fragment O: ('o'|'O');
fragment P: ('p'|'P');
fragment Q: ('q'|'Q');
fragment R: ('r'|'R');
fragment S: ('s'|'S');
fragment T: ('t'|'T');
fragment U: ('u'|'U');
fragment V: ('v'|'V');
fragment W: ('w'|'W');
fragment X: ('x'|'X');
fragment Y: ('y'|'Y');
fragment Z: ('z'|'Z');
IDENTIFIER
: [a-zA-Z_][a-zA-Z0-9_~]*
;
LINE_ESCAPE
: (' ' | '\t') UNDERSCORE ('\r'? | '\n')
;
WS
: [ \t] -> skip
;

Take a look at this grammar (Note that this grammar is not supposed to be a grammar for BASIC, it's just an example to show how to disambiguate using "=" for both assignment and equality):
grammar Foo;
program:
(statement | exprOtherThanEquality)*
;
statement:
assignment
;
expr:
equality | exprOtherThanEquality
;
exprOtherThanEquality:
boolAndOr
;
boolAndOr:
atom (BOOL_OP expr)*
;
equality:
atom EQUAL expr
;
assignment:
VAR EQUAL expr ENDL
;
atom:
BOOL |
VAR |
INT |
group
;
group:
LEFT_PARENTH expr RGHT_PARENTH
;
ENDL : ';' ;
LEFT_PARENTH : '(' ;
RGHT_PARENTH : ')' ;
EQUAL : '=' ;
BOOL:
'true' | 'false'
;
BOOL_OP:
'and' | 'or'
;
VAR:
[A-Za-z_]+ [A-Za-z_0-9]*
;
INT:
'-'? [0-9]+
;
WS:
[ \t\r\n] -> skip
;
Here is the parse tree for the input: t = (x = 5) and (y = 2);
In one of the comments above, I asked you if we can assume that the first equal sign on a line always corresponds to an assignment. I retract that assumption slightly... The first equal sign on a line always corresponds to an assignment unless it is contained within parentheses. With the above grammar, this is a valid line: (x = 2). Here is the parse tree:

ANTLR 4 precedence not as expected

I've defined a flavor of SQL that we use in my company as the following:
/** Grammars always start with a grammar header. This grammar is called
* GigyaSQL and must match the filename: GigyaSQL.g4
*/
grammar GigyaSQL;
parse
: selectClause
fromClause
( whereClause )?
( filterClause )?
( groupByClause )?
( limitClause )?
;
selectClause
: K_SELECT result_column ( ',' result_column )*
;
result_column
: '*' # selectAll
| table_name '.' '*' # selectAllFromTable
| select_expr ( K_AS? column_alias )? # selectExpr
| with_table # selectWithTable
;
fromClause
: K_FROM table_name
;
table_name
: any_name # simpleTable
| any_name K_WITH with_table # tableWithTable
;
any_name
: IDENTIFIER
| STRING_LITERAL
| '(' any_name ')'
;
with_table
: COUNTERS
;
select_expr
: literal_value
| range_function_in_select
| interval_function_in_select
| ( table_name '.' )? column_name
| function_name '(' argument_list ')'
;
whereClause
: K_WHERE condition_expr
;
condition_expr
: literal_value # literal
| ( table_name '.' )? column_name # column_name_expr
| unary_operator condition_expr # unary_expr
| condition_expr binary_operator condition_expr # binary_expr
| K_IFELEMENT '(' with_table ',' condition_expr ')' # if_element
| function_name '(' argument_list ')' # function_expr
| '(' condition_expr ')' # brackets_expr
| condition_expr K_NOT? K_LIKE condition_expr # like_expr
| condition_expr K_NOT? K_CONTAINS condition_expr # contains_expr
| condition_expr K_IS K_NOT? condition_expr # is_expr
//| condition_expr K_NOT? K_BETWEEN condition_expr K_AND condition_expr
| condition_expr K_NOT? K_IN '(' ( literal_value ( ',' literal_value )*) ')' # in_expr
;
filterClause
: K_FILTER with_table K_BY condition_expr
;
groupByClause
: K_GROUP K_BY group_expr ( ',' group_expr )*
;
group_expr
: literal_value
| ( table_name '.' )? column_name
| function_name '(' argument_list ')'
| range_function_in_group
| interval_function_in_group
;
limitClause
: K_LIMIT NUMERIC_LITERAL
;
argument_list
: ( select_expr ( ',' select_expr )* | '*' )
;
unary_operator
: MINUS
| PLUS
| '~'
| K_NOT
;
binary_operator
: ( '*' | DIVIDE | MODULAR )
| ( PLUS | MINUS )
//| ( '<<' | '>>' | '&' | '|' )
| ( LTH | LEQ | GTH | GEQ )
| ( EQUAL | NOT_EQUAL | K_IN | K_LIKE )
//| ( '=' | '==' | '!=' | '<>' | K_IS | K_IS K_NOT | K_IN | K_LIKE | K_GLOB | K_MATCH | K_REGEXP )
| K_AND
| K_OR
;
range_function_in_select
: K_RANGE '(' select_expr ')'
;
range_function_in_group
: K_RANGE '(' select_expr ',' range_pair (',' range_pair)* ')'
;
range_pair // Tried to use INT instead (for decimal numbers) but that didn't work fine (didn't parse a = 1 correctly)
: '"' NUMERIC_LITERAL ',' NUMERIC_LITERAL '"'
| '"' ',' NUMERIC_LITERAL '"'
| '"' NUMERIC_LITERAL ',' '"'
;
interval_function_in_select
: K_INTERVAL '(' select_expr ')'
;
interval_function_in_group
: K_INTERVAL '(' select_expr ',' NUMERIC_LITERAL ')'
;
function_name
: any_name
;
literal_value
: NUMERIC_LITERAL
| STRING_LITERAL
// | BLOB_LITERAL
| K_NULL
// | K_CURRENT_TIME
// | K_CURRENT_DATE
// | K_CURRENT_TIMESTAMP
;
column_name
: any_name
;
column_alias
: IDENTIFIER
| STRING_LITERAL
;
SPACES
: [ \u000B\t\r\n] -> skip
;
COUNTERS : 'counters' | 'COUNTERS';
//INT : '0' | DIGIT+ ;
EQUAL : '=';
NOT_EQUAL : '<>' | '!=';
LTH : '<' ;
LEQ : '<=';
GTH : '>';
GEQ : '>=';
//MULTIPLY: '*';
DIVIDE : '/';
MODULAR : '%';
PLUS : '+';
MINUS : '-';
K_AND : A N D;
K_AS : A S;
K_BY : B Y;
K_CONTAINS: C O N T A I N S;
K_DISTINCT : D I S T I N C T;
K_FILTER : F I L T E R;
K_FROM : F R O M;
K_GROUP : G R O U P;
K_IFELEMENT : I F E L E M E N T;
K_IN : I N;
K_INTERVAL : I N T E R V A L;
K_IS : I S;
K_LIKE : L I K E;
K_LIMIT : L I M I T;
K_NOT : N O T;
K_NULL : N U L L;
K_OR : O R;
K_RANGE : R A N G E;
K_REGEXP : R E G E X P;
K_SELECT : S E L E C T;
K_WHERE : W H E R E;
K_WITH : W I T H;
IDENTIFIER
: '"' (~'"' | '""')* '"'
| '`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [.a-zA-Z_0-9]* // TODO - need to check if the period is correcly handled
| [a-zA-Z_] [a-zA-Z_0-9]* // TODO check: needs more chars in set
;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
;
NUMERIC_LITERAL
:// INT
DIGIT+ ('.' DIGIT*)? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
and I try to parse the following query:
SELECT * from accounts where not data.zzz > 124
I get the following tree:
But I wanted to get the tree similar to when I use parenthesis:
SELECT * from accounts where not (data.zzz > 124)
I don't understand why it's working that way sinze the unary rule is before others.
Any suggestion?

That is the correct result for the given grammar. As you've already mentioned, the unary_operator is before the binary_operator meaning any operand for the NOT keyword is binded to it first before other operators. And since it is unary, it takes the data.zzz as its operand and after that the whole NOT expression becomes an operand of the binary_operator.
To get what you want, just shift down the unary_operator according to the precedence level of it (as I recall, in SQL, NOT's precedence is lower than that of binary operators, and the NOT operator should not have the same precedence as the MINUS PLUS and ~ just like what your grammar does) e.g.
condition_expr
: literal_value # literal
| ( table_name '.' )? column_name # column_name_expr
| condition_expr binary_operator condition_expr # binary_expr
| unary_operator condition_expr # unary_expr
| K_IFELEMENT '(' with_table ',' condition_expr ')' # if_element
| function_name '(' argument_list ')' # function_expr
| '(' condition_expr ')' # brackets_expr
| condition_expr K_NOT? K_LIKE condition_expr # like_expr
| condition_expr K_NOT? K_CONTAINS condition_expr # contains_expr
| condition_expr K_IS K_NOT? condition_expr # is_expr
//| condition_expr K_NOT? K_BETWEEN condition_expr K_AND condition_expr
| condition_expr K_NOT? K_IN '(' ( literal_value ( ',' literal_value )*) ')' # in_expr
;
And this gives what you want:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to write context free grammar aspect the priority of operations in antlr4 - antlr4

Related

Antlr4 Grammar fails to parse

How to parse statement in order of desired precedence using antlr?

how can I refactor this ANTLR4 grammar so that it isn't mutually left recursive?

Advice on handling an ambiguous operator in an ANTLR 4 grammar

ANTLR 4 precedence not as expected

Categories

Resources