in the LALR(1) parser, i know how to match the paired bracket, for example:
expr: /* empty */
| '(' expr ')'
;
can match the following input:
( ( ( ) ) )
but i am not sure how to write rules to match the input like this:
( ( ( ) ( ) ) ( ) ( ( ) ( ) ( ) ) )
this means there are many paired brackets inside a big paired bracket which are
in the same level. one simple example like this:
( ( ) ( ) )
suppose there are many nested brackets, how to write the rules?
Assuming you are looking at something similar to a lisp list then you would probably want to have a sub-expression for that purpose:
expr: /* empty */
| expr '+' expr
| ...
| list-expr
list-expr: '(' expr ')'
| list-expr '(' expr ')'
If you do not have other expressions (like the + shown in my example here) then you don't need to have a separate rule.
Note that will probably prevent you from having a function call defined with an expression:
expr: expr '(' expr ')' // C-like function call not compatible
You could still have identifier calls such as:
expr: IDENTIFIER '(' expr ')'
Related
I'm writing a dsl in a text in which people can declare some variables. the grammar is as follows:
Cosem:
cosem+=ID '=' 'COSEM' '(' class=INT ',' version=INT ',' obis=STRING ')' ;
Attributes :
attribute+=ID '=' 'ATTRIBUTE' '(' object=ID ',' attribute_name=STRING ')' ;
Action:
action+=ID '=' 'ACTION' '(' object=ID ',' action_name=STRING ')';
the Dsl has some methods like the print method:
Print:
'PRINT' '(' var0=(STRING|ID) (','var1+=(STRING|ID) )* ')' |
'PRINT' '(' ')'
;
I put all my variables in map so I can use them later in my code. the key is identifying them is their ID which is a string.
However, in my interpreter I can't make the différence between a string and an ID
def dispatch void exec(Print p) {
if (LocalMapAttribute.containsKey(p.var0) )
{print(LocalMapAttribute.get(p.var0))}
else if (LocalMapAction.containsKey(p.var0)){print(LocalMapAction.get(p.var0))}
else if (LocalMapCosem.containsKey(p.var0)){print(LocalMapCosem.get(p.var0))}
else
{print("erreeeur Print")}
p.var1.forEach[v
| if (LocalMapAttribute.containsKey(v)){print(LocalMapAttribute.get(v))}
else if (LocalMapAction.containsKey(v)){print(LocalMapAction.get(v))}
else if (LocalMapCosem.containsKey(v)){print(LocalMapCosem.get(v))}
else{print("erreur entre print")} ]
}
For example when I write PRINT ("attribut2",attribut2) the result shoud be
attribut2 "the value of attribut2"
but I get
"the value of attribut2" "the value of attribut2"
your current grammar structure makes it hard to do this since you throw away the information at the point where you fill the map.
you can use org.eclipse.xtext.nodemodel.util.NodeModelUtils.findNodesForFeature(EObject, EStructuralFeature) to obtain the actual text (which still may contain the original value including the ""
or you change your grammar to
var0=Value ...
Value: IDValue | StringValue;
IDValue: value=ID;
StringValue: value=STRING;
then you can have a look at the type (IDValue or StringValue) to decide wheather you need to put the text into "" (org.eclipse.xtext.util.Strings.convertToJavaString(String, boolean)) might be helpful
Or you can try to use a special replacement for STRINGValueaConcerter that does not strip the quotation marks
we knew the priority of logical operation from strong to low:
Not
And
Or
I want to add logical operation to my grammar in way respect the priority of logical operation. ...
My grammar is:
expression : factor ( PLUS factor | MINUS factor )* ;
factor : term ( MULT term | DIV term )* ;
term : NUMBER | ID | PAR_OPEN expression PAR_CLOSE ;
With ANTLR3 and ANTLR 4, you can doe something like this:
expression
: or_expression
;
// lowest precedence
or_expression
: and_expression ( '||' and_expression )*
;
and_expression
: rel_expression ( '&&' rel_expression )*
;
rel_expression
: add_expression ( ( '<' | '<=' | '>' | '>=' ) add_expression )*
;
add_expression
: mult_expression ( ( '+' | '-' ) mult_expression )*
;
mult_expression
: unary_expression ( ( '*' | '/' ) unary_expression )*
;
unary_expression
: '-' atom
| atom
;
// highest precedence
atom
: NUMBER
| ID
| '(' expression ')'
;
And with ANTLR4, you can also write it like this (which is equivalent to the grammar above!):
expression
: '!' expression
| expression ( '*' | '/' ) expression // higher than '+' | '-'
| expression ( '+' | '-' ) expression // higher than '<' | '<=' | '>' | '>='
| expression ( '<' | '<=' | '>' | '>=' ) expression // higher than '&&'
| expression '&&' expression // higher than '||'
| expression '||' expression
| NUMBER
| ID
| '(' expression ')'
;
So I defined a grammar to parse an C style syntax language:
grammar mygrammar;
program
: (declaration)*
(statement)*
EOF
;
declaration
: INT ID '=' expression ';'
;
assignment
: ID '=' expression ';'
;
expression
: expression (op=('*'|'/') expression)*
| expression (op=('+'|'-') expression)*
| relation
| INT
| ID
| '(' expression ')'
;
relation
: expression (op=('<'|'>') expression)*
;
statement
: expression ';'
| ifstatement
| loopstatement
| printstatement
| assignment
;
ifstatement
: IF '(' expression ')' (statement)* FI ';'
;
loopstatement
: LOOP '(' expression ')' (statement)* POOL ';'
;
printstatement
: PRINT '(' expression ')' ';'
;
IF : 'if';
FI : 'fi';
LOOP : 'loop';
POOL : 'pool';
INT : 'int';
PRINT : 'print';
ID : [a-zA-Z][a-zA-Z0-9]*;
INTEGER : [0-9]+;
WS : [ \r\n\t] -> skip;
And I can parse a simple test as this:
int i = (2+3)*3/2*(3+36);
int j = i;
int k = 2*1+i*3;
if (k > 2)
k = k + 1;
i = i / 3;
j = j / 3;
fi;
loop (i < 10)
i = i + 1 * (i+k);
j = (j + 1) * (j-k);
k = i + j;
print(k);
pool;
However, when I want to generate ANTLR Recogonizers in intelliJ, I got this error:
sCalc.g4:19:0: left recursive rule expression contains a left recursive alternative which can be followed by the empty string
I wonder if this is caused by my ID could be an empty string?
There are a couple of issues with your grammar:
you have INT as an alternative inside expression while you probably want INTEGER instead
there is no need to do expression (op=('+'|'-') expression)*: this will do: expression op=('+'|'-') expression
ANTLR4 does not support indirect left recursive rules: you must include relation inside expression
Something like this ought to do it:
grammar mygrammar;
program
: (declaration)*
(statement)*
EOF
;
declaration
: INT ID '=' expression ';'
;
assignment
: ID '=' expression ';'
;
expression
: expression op=('*'|'/') expression
| expression op=('+'|'-') expression
| expression op=('<'|'>') expression
| INTEGER
| ID
| '(' expression ')'
;
statement
: expression ';'
| ifstatement
| loopstatement
| printstatement
| assignment
;
ifstatement
: IF '(' expression ')' (statement)* FI ';'
;
loopstatement
: LOOP '(' expression ')' (statement)* POOL ';'
;
printstatement
: PRINT '(' expression ')' ';'
;
IF : 'if';
FI : 'fi';
LOOP : 'loop';
POOL : 'pool';
INT : 'int';
PRINT : 'print';
ID : [a-zA-Z][a-zA-Z0-9]*;
INTEGER : [0-9]+;
WS : [ \r\n\t] -> skip;
Also not that this (statement)* can simply be written as statement*
It's about your expression and relation rules. The expression rule can match relation in one alt, which in turn recurses back to expression. Rule relation additionally can potentially match nothing because of (op=('<'|'>') expression)*
A better approach is probably to have relation call expression and remove the relation alt from expression. Then use relation everywhere you used expression now. That's a typical scenario in expressions, starting out with low precedence operations as top level rules and drilling down to higher precedence rules, ultimately ending at a simple expression rule (or similar).
I am parsing a SQL like language.
I want to parse full SQL sentence like : SELECT .. FROM .. WHERE and also a simple expr line which can be a function, where clause, expr and arithmethics.
this is the important parts of the grammar:
// main rule
parse : (statments)* EOF;
// All optional statements
statments : select_statement
| virtual_column_statement
;
select_statement :
SELECT select_item ( ',' select_item )*
FROM from_cluase ( ',' from_cluase )*
(WHERE where_clause )?
( GROUP BY group_by_item (',' group_by_item)* )?
( HAVING having_condition (AND having_condition)* )?
( ORDER BY order_clause (',' order_clause)* )?
( LIMIT limit_clause)?
| '(' select_statement ')'
virtual_column_statement:
virtual_column_expression
;
virtual_column_expression :
expr
| where_clause
| function
| virtual_column_expression arithmetichOp=('*'|'/'|'+'|'-'|'%') virtual_column_expression
| '(' virtual_column_expression ')'
;
virtual_columns works great.
Select queries works too but after it finishes, it goes to virtual_column_statement too.
I want it to choose one.
How can I fix this?
EDIT:
After some research I found out antlr takes my query and seperate it to two different parts.
How can I fix this?
Thanks,
id
Your 'virtual_column_statement' appears to be part of the 'select_statement'. I expect that you are missing a ';' between the two rules.
Most of your 'select_statement' clauses are optional, so after matching the select and from clauses, if Antlr thinks that the balance of the input is better matched as a 'virtual_column_statement', then it will take that path.
Your choices are:
1) make your select_statement comprehensive and at least as general as your 'virtual_column_statement';
2) require a keyword at the beginning of the 'virtual_column_statement' to prevent Antlr from considering it as a partial alternate;
3) put the 'virtual_column_statement' in a separate parser grammar and don't send it any select input text.
The grammar below parses ( left part = right part # comment ), # comment is optional.
Two questions:
Sometimes warning (ANTLRWorks 1.4.2):
Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2)
But only sometimes!
The next extension should be that the comment (id2) can contain chars '(' and ')'.
The grammar:
grammar NestedBrackets1a1;
//==========================================================
// Lexer Rules
//==========================================================
Int
: Digit+
;
fragment Digit
: '0'..'9'
;
Special
: ( TCUnderscore | TCQuote )
;
TCListStart : '(' ;
TCListEnd : ')' ;
fragment TCUnderscore : '_' ;
fragment TCQuote : '"' ;
// A word must start with a letter
Word
: ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
;
Space
: ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
;
//==========================================================
// Parser Rules
//==========================================================
assignment
: TCListStart id1 '=' id1 ( comment )? TCListEnd
;
id1
: Word+
;
comment
: '#' ( id2 )*
;
id2
: ( Word | Int )+
;
ANTLRStarter wrote:
Sometimes warning (ANTLRWorks 1.4.2): Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2)
But only sometimes!
No, the grammar you posted will always produce this warning. Perhaps you don't always notice it (your IDE-plugin or ANTLRWorks might show it in a tab you don't have opened), but the warning is there. Convince yourself by creating a lexer/parser from the command line:
java -cp antlr-3.4-complete.jar org.antlr.Tool NestedBrackets1a1.g
will produce:
warning(200): NestedBrackets1a1.g:49:19:
Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
This is because you have a * after ( id2 ) inside your comment rule, and id2 also is a repetition of tokens: ( Word | Int )+. Let's say your input is "# foo bar" (a # followed by two Word tokens). ANTLR can now parse the input in more than 1 way: the 2 tokens "foo" and "bar" could be matched by ( id2 )*, where id2 matches a single Word token at a time, but "foo" and "bar" could also be matches in one go of the id2 rule.
Look at the merged rules:
comment
: '#' ( ( Word | Int )+ )*
;
See how you're repeating a repetition: ( ( ... )+ )*? This is usually a problem, as it is in your case.
Resolve this problem by either replacing the * with a ?:
comment
: '#' ( id2 )?
;
id2
: ( Word | Int )+
;
or by removing the +:
comment
: '#' ( id2 )*
;
id2
: ( Word | Int )
;
ANTLRStarter wrote:
The next extension should be that the comment (id2) can contain chars '(' and ')'.
That is asking for trouble since a comment is followed by a TCListEnd, which is a ). I don't recommend letting a comment match ).
EDIT
Note that comments are usually stripped from the source file while tokenizing the input source. That way you don't need to account for them in your parser rules. You can do that by "skipping" these tokens in a lexer rule:
Comment
: '#' ~('\r' | '\n')* {skip();}
;