Parser rules:
expression : L_BRACKET expression R_BRACKET #Parenthesis
| left=expression op=AND right=expression #And
| left=expression op=OR right=expression #Or
| left=expression op=XOR right=expression #Xor
| left=expression op=IMPL right=expression #Impl
| left=expression op=EQUIV right=expression #Equiv
| left=expression op=VAR right=expression #Var
| op=NEG expression #Neg
| VALUE #Value
| VAR #Var
;
Input:
a or a and a and a and a
Parse tree:
I'd like the string to be simply evaluated from left to right. In other words, only the left node of the tree should have children in this case.
However, currently, AND seems to take priority over OR, and the aforementioned string is treated instead like a or (a and a and a and a)
How do I do so?
| left=expression op=(AND|OR|XOR|IMPL|EQUIV) right=expression #Operation
Related
I have a ANTR4 rule "expression" that can be either "maths" or "comparison", but "comparison" can contain "maths". Here a concrete code:
expression
: ID
| maths
| comparison
;
maths
: maths_atom ((PLUS | MINUS) maths_atom) ? // "?" because in fact there is first multiplication then pow and I don't want to force a multiplication to make an addition
;
maths_atom
: NUMBER
| ID
| OPEN_PAR expression CLOSE_PAR
;
comparison
: comp_atom ((EQUALS | NOT_EQUALS) comp_atom) ?
;
comp_atom
: ID
| maths // here is the expression of interest
| OPEN_PAR expression CLOSE_PAR
;
If I give, for instance, 6 as input, this is fine for the parse tree, because it detects maths. But in the ANTLR4 plugin for Intellij Idea, it mark my expression rule as red - ambiguity. Should I say goodbye to a short parse tree and allow only maths trough comparison in expression so it is not so ambiguous anymore ?
The problem is that when the parser sees 6, which is a NUMBER, it has two paths of reaching it through your grammar:
expression - maths - maths_atom - NUMBER
or
expression - comparison - comp_atom - NUMBER
This ambiguity triggers the error that you see.
You can fix this by flattening your parser grammar as shown in this tutorial:
start
: expr | <EOF>
;
expr
: expr (PLUS | MINUS) expr # ADDGRP
| expr (EQUALS | NOT_EQUALS) expr # COMPGRP
| OPEN_PAR expression CLOSE_PAR # PARENGRP
| NUMBER # NUM
| ID # IDENT
;
The error I am getting:
invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected
Spark SQL:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
val df = spark.sql(sql)
df.show(10, false)
You added s prefix which means you want the string be interpolated. It means all tokens prefixed with $ will be replaced with the local variable with the same name. From you code it looks like you do not use this feature, so you could just remove s prefix from the string:
val sql =
"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
Otherwise if you really need the interpolation you have to quote $ sign like this:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
I have successfully split my expressions into arithmetic and boolean expressions like this:
/* entry point */
parse: formula EOF;
formula : (expr|boolExpr);
/* boolean expressions : take math expr and use boolean operators on them */
boolExpr
: bool
| l=expr operator=(GT|LT|GEQ|LEQ) r=expr
| l=boolExpr operator=(OR|AND) r=boolExpr
| l=expr (not='!')? EQUALS r=expr
| l=expr BETWEEN low=expr AND high=expr
| l=expr IS (NOT)? NULL
| l=atom LIKE regexp=string
| l=atom ('IN'|'in') '(' string (',' string)* ')'
| '(' boolExpr ')'
;
/* arithmetic expressions */
expr
: atom
| (PLUS|MINUS) expr
| l=expr operator=(MULT|DIV) r=expr
| l=expr operator=(PLUS|MINUS) r=expr
| function=IDENTIFIER '(' (expr ( ',' expr )* ) ? ')'
| '(' expr ')'
;
atom
: number
| variable
| string
;
But now I have HUGE performance problems. Some formulas I try to parse are utterly slow, to the point that it has become unbearable: more than an hour (I stopped at that point) to parse this:
-4.77+[V1]*-0.0071+[V1]*[V1]*0+[V2]*-0.0194+[V2]*[V2]*0+[V3]*-0.00447932+[V3]*[V3]*-0.0017+[V4]*-0.00003298+[V4]*[V4]*0.0017+[V5]*-0.0035+[V5]*[V5]*0+[V6]*-4.19793004+[V6]*[V6]*1.5962+[V7]*12.51966636+[V7]*[V7]*-5.7058+[V8]*-19.06596752+[V8]*[V8]*28.6281+[V9]*9.47136506+[V9]*[V9]*-33.0993+[V10]*0.001+[V10]*[V10]*0+[V11]*-0.15397774+[V11]*[V11]*-0.0021+[V12]*-0.027+[V12]*[V12]*0+[V13]*-2.02963068+[V13]*[V13]*0.1683+[V14]*24.6268688+[V14]*[V14]*-5.1685+[V15]*-6.17590512+[V15]*[V15]*1.2936+[V16]*2.03846688+[V16]*[V16]*-0.1427+[V17]*9.02302288+[V17]*[V17]*-1.8223+[V18]*1.7471106+[V18]*[V18]*-0.1255+[V19]*-30.00770912+[V19]*[V19]*6.7738
Do you have any idea on what the problem is?
The parsing stops when the parser enters the formula grammar rule.
edit Original problem here:
My grammar allows this:
// ( 1 LESS_EQUALS 2 )
1 <= 2
But the way I expressed it in my G4 file makes it also accept this:
// ( ( 1 LESS_EQUALS 2 ) LESS_EQUALS 3 )
1 <= 2 <= 3
Which I don't want.
My grammar contains this:
expr
: atom # atomArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=('*'|'/') r=expr # multdivArithmeticExpr
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| l=expr operator=('>'|'<'|'>='|'<=') r=expr # comparisonExpr
[...]
How can I tell Antlr that this is not acceptable?
Just split root into two. Either rename root 'expr' into 'rootexpr', or vice versa.
rootExpr
: atom # atomArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=('*'|'/') r=expr # multdivArithmeticExpr
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| l=expr operator=('>'|'<'|'>='|'<=') r=expr # comparisonExpr
EDIT: You cannot have cyclic reference => expr node in expr rule.
I've been trying to convert a grammar that I have found online into an antlr4 format. The original grammar is here: https://github.com/cv/asp-parser/blob/master/vbscript.bnf.
Short Version:
The current problem that I am running across I think is due to ambiguity at the lexing stage.
For example, I copied the rule for a floating point literal as below:
float_literal : DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
Further up in the file I have a definition for letters:
LETTER: 'a'..'z';
It seems that because I am using 'e' in the float literal, that character can't be recognised as a letter? In my research I have come across the idea of having a token for each letter, so letter would become:
letter: A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z;
And I would replace any instances of 'e' with E. However there are much longer strings in this file, such as '.and'. So this approach would require replacing things like that with DOT A N D? Which doesn't seem right at all.
Am I doing something fundamentally wrong or is there something I can do to avoid this ambiguity?
Thanks,
Craig
The full grammar is below.
grammar vbscript;
/*===== Character Sets =====*/
SPACES: ' ' -> skip;
DIGIT: '0'..'9';
SEMI_COLON: ':';
NEW_LINE_CHARACTER: [\r\n]+;
WHITESPACE_CHARACTER: [ \t];
LETTER: 'a'..'z';
QUOTE: '"';
HASH: '#';
SQUARE_BRACE: '[' | ']';
PLUS_OR_MINUS: [+-];
ANYTHING_ELSE: ~('"' | '#');
ws: WHITESPACE_CHARACTER;
id_tail: (DIGIT | LETTER | '_');
string_character: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS | SQUARE_BRACE;
id_name_char: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS;
/*===== terminals =====*/
whitespace: ws+ | '_' ws* new_line?;
comment_line : '' | 'rem';
string_literal : '"' ( string_character | '""' )* '"';
float_literal : DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
id : LETTER id_tail*
| '[' id_name_char* ']';
iddot : LETTER id_tail* '.'
| '[' id_name_char* ']' '.'
| 'and.'
| 'byref.'
| 'byval.'
| 'call.'
| 'case.'
| 'class.'
| 'const.'
| 'default.'
| 'dim.'
| 'do.'
| 'each.'
| 'else.'
| 'elseif.'
| 'empty.'
| 'end.'
| 'eqv.'
| 'erase.'
| 'error.'
| 'exit.'
| 'explicit.'
| 'false.'
| 'for.'
| 'function.'
| 'get.'
| 'goto.'
| 'if.'
| 'imp.'
| 'in.'
| 'is.'
| 'let.'
| 'loop.'
| 'mod.'
| 'new.'
| 'next.'
| 'not.'
| 'nothing.'
| 'null.'
| 'on.'
| 'option.'
| 'or.'
| 'preserve.'
| 'private.'
| 'property.'
| 'public.'
| 'redim.'
| 'rem.'
| 'resume.'
| 'select.'
| 'set.'
| 'step.'
| 'sub.'
| 'then.'
| 'to.'
| 'true.'
| 'until.'
| 'wend.'
| 'while.'
| 'with.'
| 'xor.';
dot_id : '.' LETTER id_tail*
| '.' '[' id_name_char* ']'
| '.and'
| '.byref'
| '.byval'
| '.call'
| '.case'
| '.class'
| '.const'
| '.default'
| '.dim'
| '.do'
| '.each'
| '.else'
| '.elseif'
| '.empty'
| '.end'
| '.eqv'
| '.erase'
| '.error'
| '.exit'
| '.explicit'
| '.false'
| '.for'
| '.function'
| '.get'
| '.goto'
| '.if'
| '.imp'
| '.in'
| '.is'
| '.let'
| '.loop'
| '.mod'
| '.new'
| '.next'
| '.not'
| '.nothing'
| '.null'
| '.on'
| '.option'
| '.or'
| '.preserve'
| '.private'
| '.property'
| '.public'
| '.redim'
| '.rem'
| '.resume'
| '.select'
| '.set'
| '.step'
| '.sub'
| '.then'
| '.to'
| '.true'
| '.until'
| '.wend'
| '.while'
| '.with'
| '.xor';
dot_iddot : '.' LETTER id_tail* '.'
| '.' '[' id_name_char* ']' '.'
| '.and.'
| '.byref.'
| '.byval.'
| '.call.'
| '.case.'
| '.class.'
| '.const.'
| '.default.'
| '.dim.'
| '.do.'
| '.each.'
| '.else.'
| '.elseif.'
| '.empty.'
| '.end.'
| '.eqv.'
| '.erase.'
| '.error.'
| '.exit.'
| '.explicit.'
| '.false.'
| '.for.'
| '.function.'
| '.get.'
| '.goto.'
| '.if.'
| '.imp.'
| '.in.'
| '.is.'
| '.let.'
| '.loop.'
| '.mod.'
| '.new.'
| '.next.'
| '.not.'
| '.nothing.'
| '.null.'
| '.on.'
| '.option.'
| '.or.'
| '.preserve.'
| '.private.'
| '.property.'
| '.public.'
| '.redim.'
| '.rem.'
| '.resume.'
| '.select.'
| '.set.'
| '.step.'
| '.sub.'
| '.then.'
| '.to.'
| '.true.'
| '.until.'
| '.wend.'
| '.while.'
| '.with.'
| '.xor.';
/*===== rules =====*/
new_line: (SEMI_COLON | NEW_LINE_CHARACTER)+;
program: new_line? global_stmt_list;
/*===== rules: declarations =====*/
class_decl: 'class' extended_id new_line member_decl_list 'end' 'class' new_line;
member_decl_list: member_decl*;
member_decl: field_decl | var_decl | const_decl | sub_decl | function_decl | property_decl;
field_decl:
'private' field_name other_vars_opt new_line
| 'public' field_name other_vars_opt new_line;
field_name: field_id '(' array_rank_list ')' | field_id;
field_id: id | 'default' | 'erase' | 'error' | 'explicit' | 'step';
var_decl: 'dim' var_name other_vars_opt new_line;
var_name: extended_id '(' array_rank_list ')' | extended_id;
other_vars_opt: (',' var_name other_vars_opt)?;
array_rank_list: (int_literal ',' array_rank_list | int_literal)?;
const_decl: access_modifier_opt 'const' const_list new_line;
const_list: extended_id '=' const_expr_def ',' const_list | extended_id '=' const_expr_def;
const_expr_def: '(' const_expr_def ')'
| '-' const_expr_def
| '+' const_expr_def
| const_expr;
sub_decl:
method_access_opt 'sub' extended_id method_arg_list new_line method_stmt_list 'end' 'sub' new_line
| method_access_opt 'sub' extended_id method_arg_list inline_stmt 'end' 'sub' new_line;
function_decl:
method_access_opt 'function' extended_id method_arg_list new_line method_stmt_list 'end' 'function' new_line
| method_access_opt 'function' extended_id method_arg_list inline_stmt 'end' 'function' new_line;
method_access_opt: 'public' 'default' | access_modifier_opt;
access_modifier_opt: ('public' | 'private')?;
method_arg_list: ('(' arg_list? ')')?;
arg_list: arg (',' arg_list)?;
arg: arg_modifier_opt extended_id ('(' ')')?;
arg_modifier_opt: ('byval' | 'byref')?;
property_decl: method_access_opt 'property' property_access_type extended_id method_arg_list new_line method_stmt_list 'end' 'property' new_line;
property_access_type: 'get' | 'let' | 'set';
/*===== rules: statements =====*/
global_stmt: option_explicit | class_decl | field_decl | const_decl | sub_decl | function_decl | block_stmt;
method_stmt: const_decl | block_stmt;
block_stmt:
var_decl
| redim_stmt
| if_stmt
| with_stmt
| select_stmt
| loop_stmt
| for_stmt
| inline_stmt new_line;
inline_stmt:
assign_stmt
| call_stmt
| sub_call_stmt
| error_stmt
| exit_stmt
| 'erase' extended_id;
global_stmt_list: global_stmt_list global_stmt | global_stmt;
method_stmt_list: method_stmt*;
block_stmt_list: block_stmt*;
option_explicit: 'option' 'explicit' new_line;
error_stmt: 'on' 'error' 'resume' 'next' | 'on' 'error' 'goto' int_literal;
exit_stmt: 'exit' 'do' | 'exit' 'for' | 'exit' 'function' | 'exit' 'property' | 'exit' 'sub';
assign_stmt:
left_expr '=' expr
| 'set' left_expr '=' expr
| 'set' left_expr '=' 'new' left_expr;
sub_call_stmt: qualified_id sub_safe_expr? comma_expr_list
| qualified_id sub_safe_expr?
| qualified_id '(' expr ')' comma_expr_list
| qualified_id '(' expr ')'
| qualified_id '(' ')'
| qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr? comma_expr_list
| qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr? comma_expr_list
| qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr?
| qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr?;
call_stmt: 'call' left_expr;
left_expr: qualified_id index_or_params_list '.' left_expr_tail
| qualified_id index_or_params_list_dot left_expr_tail
| qualified_id index_or_params_list
| qualified_id
| safe_keyword_id;
left_expr_tail: qualified_id_tail index_or_params_list '.' left_expr_tail
| qualified_id_tail index_or_params_list_dot left_expr_tail
| qualified_id_tail index_or_params_list
| qualified_id_tail;
qualified_id: iddot qualified_id_tail
| dot_iddot qualified_id_tail
| id
| dot_id;
qualified_id_tail: iddot qualified_id_tail
| id
| keyword_id;
keyword_id: safe_keyword_id
| 'and'
| 'byref'
| 'byval'
| 'call'
| 'case'
| 'class'
| 'const'
| 'dim'
| 'do'
| 'each'
| 'else'
| 'elseif'
| 'empty'
| 'end'
| 'eqv'
| 'exit'
| 'false'
| 'for'
| 'function'
| 'get'
| 'goto'
| 'if'
| 'imp'
| 'in'
| 'is'
| 'let'
| 'loop'
| 'mod'
| 'new'
| 'next'
| 'not'
| 'nothing'
| 'null'
| 'on'
| 'option'
| 'or'
| 'preserve'
| 'private'
| 'public'
| 'redim'
| 'resume'
| 'select'
| 'set'
| 'sub'
| 'then'
| 'to'
| 'true'
| 'until'
| 'wend'
| 'while'
| 'with'
| 'xor';
safe_keyword_id: 'default'
| 'erase'
| 'error'
| 'explicit'
| 'property'
| 'step';
extended_id: safe_keyword_id
| id;
index_or_params_list: index_or_params index_or_params_list
| index_or_params;
index_or_params: '(' expr comma_expr_list ')'
| '(' comma_expr_list ')'
| '(' expr ')'
| '(' ')';
index_or_params_list_dot: index_or_params index_or_params_list_dot
| index_or_params_dot;
index_or_params_dot: '(' expr comma_expr_list ').'
| '(' comma_expr_list ').'
| '(' expr ').'
| '(' ').';
comma_expr_list: ',' expr comma_expr_list
| ',' comma_expr_list
| ',' expr
| ',';
/* redim statement */
redim_stmt: 'redim' redim_decl_list new_line
| 'redim' 'preserve' redim_decl_list new_line;
redim_decl_list: redim_decl ',' redim_decl_list
| redim_decl;
redim_decl: extended_id '(' expr_list ')';
/* if statement */
if_stmt: 'if' expr 'then' new_line block_stmt_list else_stmt_list 'end' 'if' new_line
| 'if' expr 'then' inline_stmt else_opt end_if_opt new_line;
else_stmt_list: ('elseif' expr 'then' new_line block_stmt_list else_stmt_list
| 'elseif' expr 'then' inline_stmt new_line else_stmt_list
| 'else' inline_stmt new_line
| 'else' new_line block_stmt_list)?;
else_opt: ('else' inline_stmt)?;
end_if_opt : ('end' 'if')?;
/* with statement */
with_stmt: 'with' expr new_line block_stmt_list 'end' 'with' new_line;
/* loop statement */
loop_stmt: 'do' loop_type expr new_line block_stmt_list 'loop' new_line
| 'do' new_line block_stmt_list 'loop' loop_type expr new_line
| 'do' new_line block_stmt_list 'loop' new_line
| 'while' expr new_line block_stmt_list 'wend' new_line;
loop_type: 'while' | 'until';
/* for statement */
for_stmt: 'for' extended_id '=' expr 'to' expr step_opt new_line block_stmt_list 'next' new_line
| 'for' 'each' extended_id 'in' expr new_line block_stmt_list 'next' new_line;
step_opt: ('step' expr)?;
/* select statement */
select_stmt: 'select' 'case' expr new_line cast_stmt_list 'end' 'select' new_line;
cast_stmt_list: ('case' expr_list nl_opt block_stmt_list cast_stmt_list
| 'case' 'else' nl_opt block_stmt_list)?;
nl_opt: new_line?;
expr_list: expr ',' expr_list | expr;
/*===== rules: expressions =====*/
sub_safe_expr: sub_safe_imp_expr;
sub_safe_imp_expr: sub_safe_imp_expr 'imp' eqv_expr | sub_safe_eqv_expr;
sub_safe_eqv_expr: sub_safe_eqv_expr 'eqv' xor_expr
| sub_safe_xor_expr;
sub_safe_xor_expr: sub_safe_xor_expr 'xor' or_expr
| sub_safe_or_expr;
sub_safe_or_expr: sub_safe_or_expr 'or' and_expr
| sub_safe_and_expr;
sub_safe_and_expr : sub_safe_and_expr 'and' not_expr
| sub_safe_not_expr;
sub_safe_not_expr : 'not' not_expr
| sub_safe_compare_expr;
sub_safe_compare_expr : sub_safe_compare_expr 'is' concat_expr
| sub_safe_compare_expr 'is' 'not' concat_expr
| sub_safe_compare_expr '>=' concat_expr
| sub_safe_compare_expr '=>' concat_expr
| sub_safe_compare_expr '<=' concat_expr
| sub_safe_compare_expr '=<' concat_expr
| sub_safe_compare_expr '>' concat_expr
| sub_safe_compare_expr '<' concat_expr
| sub_safe_compare_expr '<>' concat_expr
| sub_safe_compare_expr '=' concat_expr
| sub_safe_concat_expr;
sub_safe_concat_expr : sub_safe_concat_expr '&' add_expr
| sub_safe_add_expr;
sub_safe_add_expr : sub_safe_add_expr '+' mod_expr
| sub_safe_add_expr '-' mod_expr
| sub_safe_mod_expr;
sub_safe_mod_expr : sub_safe_mod_expr 'mod' int_div_expr
| sub_safe_int_div_expr;
sub_safe_int_div_expr : sub_safe_int_div_expr '\\' mult_expr
| sub_safe_mult_expr;
sub_safe_mult_expr : sub_safe_mult_expr '*' unary_expr
| sub_safe_mult_expr '/' unary_expr
| sub_safe_unary_expr;
sub_safe_unary_expr : '-' unary_expr
| '+' unary_expr
| sub_safe_exp_expr;
sub_safe_exp_expr : sub_safe_value '^' exp_expr
| sub_safe_value;
sub_safe_value : const_expr
| left_expr
| '(' expr ')';
expr : imp_expr;
imp_expr : imp_expr 'imp' eqv_expr
| eqv_expr;
eqv_expr : eqv_expr 'eqv' xor_expr
| xor_expr;
xor_expr : xor_expr 'xor' or_expr
| or_expr;
or_expr : or_expr 'or' and_expr
| and_expr;
and_expr : and_expr 'and' not_expr
| not_expr;
not_expr : 'not' not_expr
| compare_expr;
compare_expr : compare_expr 'is' concat_expr
| compare_expr 'is' 'not' concat_expr
| compare_expr '>=' concat_expr
| compare_expr '=>' concat_expr
| compare_expr '<=' concat_expr
| compare_expr '=<' concat_expr
| compare_expr '>' concat_expr
| compare_expr '<' concat_expr
| compare_expr '<>' concat_expr
| compare_expr '=' concat_expr
| concat_expr;
concat_expr : concat_expr '&' add_expr
| add_expr;
add_expr : add_expr '+' mod_expr
| add_expr '-' mod_expr
| mod_expr;
mod_expr : mod_expr 'mod' int_div_expr
| int_div_expr;
int_div_expr : int_div_expr '\\' mult_expr
| mult_expr;
mult_expr : mult_expr '*' unary_expr
| mult_expr '/' unary_expr
| unary_expr;
unary_expr : '-' unary_expr
| '+' unary_expr
| exp_expr;
exp_expr : value '^' exp_expr
| value;
value : const_expr
| left_expr
| '(' expr ')';
const_expr : bool_literal
| int_literal
| float_literal
| string_literal
| nothing;
bool_literal : 'true'
| 'false';
int_literal : DIGIT+;
nothing : 'nothing'
| 'null'
| 'empty';
Your grammar defines "Literals" in the parser section. Note that ANTLR treats every lower case rule as parser rule (upper case rules are lexer rules).
Your small problem part may be solved like this:
FLOAT_LITERAL
: DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
LETTER
: [a-z];
The ANTLR lexer prefers the longest matching rule (if two rules are in conflict, it prefers the first defined one). Both rules are completely disjunct so the sequence of definition is not relevant (its just more readable to define the more complex rules above the basic ones).
You can extend the second definition by upper case Characters:
LETTER
: [a-zA-Z];
To solve the overall problem of your grammar, you will need a complete rewriting of your grammar. Most rules of the terminals section should be lexer rules. However the terminals sections seems overly populated so that it could also be that some rules are workarounds for non existent parser rules.
I have a very simple grammar to parse statements.
Here are examples of the type of statements that need be parsed:
a.b.c
a.b.c == "88"
The issue I am having is that array notation is not matching. For example, things that are not working:
a.b[0].c
a[3][4]
I hope someone can point out what I am doing wrong here. (I am testing in ANTLRWorks)
Here is the grammar (generationUnit is my entry point):
grammar RatBinding;
generationUnit: testStatement | statement;
arrayAccesor : identifier arrayNotation+;
arrayNotation: '[' Number ']';
testStatement:
(statement | string | Number | Bool )
(greaterThanAndEqual
| lessThanOrEqual
| greaterThan
| lessThan | notEquals | equals)
(statement | string | Number | Bool )
;
part: identifier | arrayAccesor;
statement: part ('.' part )*;
string: ('"' identifier '"') | ('\'' identifier '\'');
greaterThanAndEqual: '>=';
lessThanOrEqual: '<=';
greaterThan: '>';
lessThan: '<';
notEquals : '!=';
equals: '==';
identifier: Letter (Letter|Digit)*;
Bool : 'true' | 'false';
ArrayLeft: '\u005B';
ArrayRight: '\u005D';
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f '|
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
Digit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;
You referenced the non-existent rule Number in the arrayNotation parser rule.
A Digit rule does exist in the lexer, but it will only match a single-digit number. For example, 1 is a Digit, but 10 is two separate Digit tokens so a[10] won't match the arrayAccesor rule. You probably want to resolve this in two parts:
Create a Number token consisting of one or more digits.
Number
: Digit+
;
Mark Digit as a fragment rule to indicate that it doesn't form tokens on its own, but is merely intended to be referenced from other lexer rules.
fragment // prevents a Digit token from being created on its own
Digit
: ...
You will not need to change arrayNotation because it already references the Number rule you created here.
Bah, waste of space. I Used Number instead of Digit in my array declaration.