Precedence ambiguity in ANTLR 4 - antlr4

I have a huge ANTLR grammar, and I am facing problem with a small piece of it. Grammar has two rules expr and sets as defined below:
expr:
id
|(PLUS|MINUS|MULTIPLY|AND|NEGATION)expr
| expr (MULTIPLY |DIVIDE| MODULO)
| expr (PLUS | MINUS) expr
;
set:
EMPTY
| MULTIPLY set
| set PLUS set
| UNION '(' set (COMMA set)* ')'
| INTER '(' set (COMMA set)* ')'
| expr
;
The problem here is that for a set of form *s1 + *s2 should be reduced as following:
set -> set PLUS set
and then each set in RHS should reduce to:
set -> MULTIPLY set
set -> expr
term -> id
But instead they are reducing as:
set -> MULTIPLY set
set -> expr
expr -> expr PLUS expr
Because of whichset of forn *s1 +*s2 is parsed as *(s1 + *s2) instead of (*s1) + (*s2).
One of the rules of set, reduces it to expr. There are many other similar rules in grammar which reduces to expr. The problem is occurring here becuase some of the rules in set and expr are similar. But because some rules are different, I cannot merge them together.
In set even though the precedence of rule MULTIPLY set is higher than set PLUS set, set is reduced by MUTIPLY set rule.
Is there a way to fix this issue?
EDIT:
Adding a working example :
Grammar:
grammar T;
expr
: ID
| ( PLUS | MINUS | MULTIPLY | AND | NEGATION ) expr
| expr ( MULTIPLY | DIVIDE | MODULO )
| expr ( PLUS | MINUS ) expr
;
set:
EMPTY
| MULTIPLY set
| set PLUS set
| UNION '(' set (COMMA set)* ')'
| INTER '(' set (COMMA set)* ')'
| expr
;
ID : [a-zA-Z] [a-zA-Z0-9]*;
PLUS : '+';
MINUS : '-';
MULTIPLY : '*';
AND : '&&';
NEGATION : '!';
DIVIDE : '/';
MODULO : '%';
COMMA : ',';
EMPTY: '\\empty';
UNION: '\\union';
INTER: '\\inter';
SPACES : [ \t\r\n] -> skip;
Code to execute it:
TLexer lexer = new TLexer(new ANTLRInputStream("*s1 + *s2"));
TParser parser = new TParser(new CommonTokenStream(lexer));
RuleContext tree = parser.set();
tree.inspect(parser);
Output it generated:
set
/ \
* set
|
expr
/ | \
/ | \
expr + expr
| / \
s1 * expr
|
s2

I can't reproduce this.
Given the grammar:
grammar T;
expr
: ID
| ( PLUS | MINUS | MULTIPLY | AND | NEGATION ) expr
| expr ( MULTIPLY | DIVIDE | MODULO )
| expr ( PLUS | MINUS ) expr
;
ID : [a-zA-Z] [a-zA-Z0-9]*;
PLUS : '+';
MINUS : '-';
MULTIPLY : '*';
AND : '&&';
NEGATION : '!';
DIVIDE : '/';
MODULO : '%';
SPACES : [ \t\r\n] -> skip;
your input *s1 + *s2 will be parsed as:
expr
/ | \
/ | \
expr + expr
/ \ / \
* expr * expr
| |
s1 s2
Or, in plain code:
TLexer lexer = new TLexer(new ANTLRInputStream("*s1 + *s2"));
TParser parser = new TParser(new CommonTokenStream(lexer));
System.out.println(parser.expr().toStringTree(parser));
will print:
(expr (expr * (expr s1)) + (expr * (expr s2)))

Related

Can't output my own data type (* No instance for (Show Ops) arising from a use of `print')

I have the following data type :
data Ops = Add | Sub | Mul | Div | Mod
And this function :
charToOp :: Char -> Ops
charToOp x
| x == '+' = Add
| x == '-' = Sub
| x == '*' = Mul
| x == '/' = Div
| x == '%' = Mod
In GHCI, if I try charToOp '+' I get the following message :
*** No instance for (Show Ops) arising from a use of `print'
* In a stmt of an interactive GHCi command: print it**.
How do I fix this?
There is no Show instance for your data type, try adding the following:
data Ops = Add | Sub | Mul | Div | Mod deriving (Show)
now it works:
*Main> charToOp '-'
Sub
Besides, you can also create your own Show instance
Sidenote: You can use pattern matching instead of the guards
charToOp :: Char -> Ops
charToOp x = case x of
'+' -> Add
'-' -> Sub
'*' -> Mul
'/' -> Div
'%' -> Mod

Haskell source generated by happy has error "parse error on input 'data'"

I'm trying out the happy parser generator of Haskell. After generating the module in happy.hs (no problem while generating!), I run the command ghc happy.hs, and I get the error: Line 297: parse error on input 'data'. Does anyone have solutions? Or tell me where can I get the solution?
I tried loading the module in GHCi, rather than compiling it using ghc. But it seems not to be working too - I get the same error.
Code in happy.y (happy source):
-- TODO: add more of my things!!!
{
module Main where
}
%name calc
%tokentype { Token }
%error { parseError }
-- tokens
%token
let { TokenLet }
in { TokenIn }
int { TokenInt $$ }
var { TokenVar $$ }
'=' { TokenEq }
'+' { TokenPlus }
'-' { TokenSub }
'*' { TokenMul }
'/' { TokenDvd }
'(' { TokenOB }
')' { TokenCB }
%%
Exp : let var '=' Exp in Exp { Let $2 $4 $6 }
| Exp1 { Exp1 $1 }
Exp1 : Exp1 '+' Term { Plus $1 $3 }
| Exp1 '-' Term { Minus $1 $3 }
| Term { Term $1 }
Term : Term '*' Factor { Times $1 $3 }
| Term '/' Factor { Div $1 $3 }
| Factor { Factor $1 }
Factor : int { Int $1 }
| var { Var $1 }
| '(' Exp ')' { Brack $2 }
{
parseError :: [Token] -> a
parseError _ = error "Parse error! please try again..."
data Exp = Let String Exp Exp
| Exp1 Exp1
deriving Show
data Exp1 = Plus Exp1 Term
| Minus Exp1 Term
| Term Term
deriving Show
data Term = Times Term Factor
| Div Term Factor
| Factor Factor
deriving Show
data Factor = Int Int
| Var String
| Brack Exp
deriving Show
-- the tokens
data Token = TokenLet
| TokenIn
| TokenInt Int
| TokenVar String
| TokenEq
| TokenPlus
| TokenMinus
| TokenTimes
| TokenDiv
| TokenOB
| TokenCB
deriving Show
-- the lexer
lexer :: String -> [Token]
lexer [] = []
lexer (c:cs)
| isSpace c = lexer cs
| isAlpha c = lexVar (c:cs)
| isDigit c = lexNum (c:cs)
lexer ('=':cs) = TokenEq : lexer cs
lexer ('+':cs) = TokenPlus : lexer cs
lexer ('-':cs) = TokenMinus : lexer cs
lexer ('*':cs) = TokenTimes : lexer cs
lexer ('/':cs) = TokenDiv : lexer cs
lexer ('(':cs) = TokenOB : lexer cs
lexer (')':cs) = TokenCB : lexer cs
lexNum cs = TokenInt (read num) : lexer rest
where (num,rest) = span isDigit cs
lexVar cs =
case span isAlpha cs of
("let",rest) -> TokenLet : lexer rest
("in",rest) -> TokenIn : lexer rest
(var,rest) -> TokenVar var : lexer rest
-- the main function
main = getContents >>= print . calc . lexer
}
Line that makes error in happy.hs and ±10 Lines(Lines 287~307, including):
287: calc tks = happyRunIdentity happySomeParser where
288: happySomeParser = happyThen (happyParse action_0 tks) (\x -> case x of 289: {HappyAbsSyn4 z -> happyReturn z; _other -> notHappyAtAll })
290:
291: happySeq = happyDontSeq
292:
293:
294: parseError :: [Token] -> a
295: parseError _ = error "Parse error! please try again..."
296:
297: data Exp = Let String Exp Exp -- <= I get error on this line!
298: | Exp1 Exp1
299: deriving Show
300:
301: data Exp1 = Plus Exp1 Term
302: | Minus Exp1 Term
303: | Term Term
304: deriving Show
305:
306: data Term = Times Term Factor
307: | Div Term Factor
I expect the program to run smoothly without any errors, but it doesn't.
Happy produces its generated code on column 1 of the file. Therefore, to have your code be considered part of the same module-where block as the generated code (which you cannot control), it must also live on column 1 of the file.
Delete four spaces from the beginning of the plain-Haskell lines of code in your happy file and you'll be on your way to the next error. Make sure that your deriving clauses and where clauses are indented one level from their surroundings when you are deleting spaces.

Running TestRig with wrong input string does not emit error message

I made a grammar which constructs comparison between expressions as follows.
grammar EtlExpression;
/** The start rule; begin parsing here. */
prog : comp ;
comp : expr ('='|'<='|'>='|'<'|'>') expr ;
expr : expr ('*'|'/') expr
| expr ('+'|'-') expr
| '(' expr ')'
| func
| ID
| STR
| NUM
;
func : ID '(' expr (',' expr)* ')' ; // function
STR : '"' ('\\"'|.)*? '"' ; // match identifiers
ID : LETTER (LETTER|DIGIT)* ; // match identifiers
NUM : '-'? ('.' DIGIT+ | DIGIT+ ('.' DIGIT*)? ) ; // match number
WS : [ \t\r\n]+ -> skip ; // toss out whitespace
fragment
LETTER : [a-zA-Z] ;
fragment
DIGIT : [0-9] ;
then I ran testRig after compile.
The result is as follwoging.
java -cp .;C:\App\Antlr4\antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig test.antlr.EtlExpression prog -tree
a < b = c
^Z
(prog (comp (expr a) < (expr b)))
The comp rule specifies only one comparison operand and I think this test input string should emit some kind error like "line 1:6 token recognition error at: '=' ", but it just ignores "= c" part.
Could you help me What's wrong with the grammar or how i can get the right message?
Thankyou in advance.
The parser simply stops when it cannot match = and beyond it. If you force the parser to consume the entire input, an error will appear on your stdout. You can do that by adding EOF to your prog rule:
prog : comp EOF;

How rewrite grammar to eliminate shift-reduce conflict (in Haskell Happy parser)

I'm trying to define grammar for methods (java like) using Happy LALR parser generator
1. MD ::= some_prefix { list(VD) list(S) }
2. VD ::= T I
3. S ::= I = E | I [ E ] = E | etc...
4. T ::= I | byte | int | etc...
5. E ::= INT | E + E | etc...
Here,
MD: Method Declaration
VD: Variable Declaration
S: Statement
T: Type
I: Identifier
E: Expression
All the other tokens are terminals.
Within the method, variable declarations are done in the top and after that the statements.
As you can see VD can starts from an I since there can be variable declarations of type class where the type is an identifier (I). Statement can also be started from an I because of assignments to variables and variable name is an I
The problem is VD and S both can starts from an I. Therefore, in the first production it cause an shift/reduce conflict.
Is there a way to re-write grammar or any other parser generator tricks to solve this problem?
I have specified the associativity and precedence for the operators. I have only mentioned the minimum set of information to explain the problem. Let me know if you need more information.
UPDATE:
Following is the grammar file
{
module Issue where
import Lexer
}
%name parser
%tokentype { Token }
%error { parseError }
%token
--===========================================================================
';' { TokenSemi }
id { TokenId $$ }
'{' { TokenLBrace }
'}' { TokenRBrace }
public { TokenPublickw }
'(' { TokenLParen }
')' { TokenRParen }
int { TokenInt $$ }
plus { TokenPlus }
inttype { TokenIntkw }
'=' { TokenAssign }
--===========================================================================
--===========================================================================
-- Precedence and associativity. Reference:
-- http://introcs.cs.princeton.edu/java/11precedence/
%right '='
%left plus
--===========================================================================
%%
MethodDecl :
public id '(' ')' '{' list(VarDecl) list(Statement) '}'
{ MethodDecl $2 (VarList $6) (BlockStatement $7) }
DataType :
inttype
{ DataType TypeInt }
| id
{ DataType (TypeClass $1) }
VarDecl :
DataType id ';'
{ VarDecl $1 $2 }
Statement :
id '=' Expression ';'
{ Assignment $1 $3 }
Expression :
int
{ IntLiteral $1 }
| Expression plus Expression
{ PlusExp $1 $3 }
--============================================================================
list1(p) :
p
{ [$1] }
| p list1(p)
{ $1 : $2 }
list(p) :
list1(p)
{ $1 }
| -- epsilon
{ [] }
--============================================================================
{
data AST = Goal AST [AST]
| BlockStatement [AST]
| IntLiteral Int
| PlusExp AST AST
| MethodDecl String AST AST
| DataType MJType
| Identifier String
| VarList [AST]
| VarDecl AST String
| Assignment String AST
deriving Show
data MJType = TypeInt
| TypeUnknown
| TypeClass String
deriving (Show,Eq)
parseError :: [Token] -> a
parseError (t:ts) = error ("Parser Error: " ++ (show t))
}
.info file generated by Happy parser with the details of shift-reduce conflicts and states
-----------------------------------------------------------------------------
Info file generated by Happy Version 1.19.4 from issue.y
-----------------------------------------------------------------------------
state 7 contains 1 shift/reduce conflicts.
state 9 contains 1 shift/reduce conflicts.
-----------------------------------------------------------------------------
Grammar
-----------------------------------------------------------------------------
%start_parser -> MethodDecl (0)
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) '}' (1)
DataType -> inttype (2)
DataType -> id (3)
VarDecl -> DataType id ';' (4)
Statement -> id '=' Expression ';' (5)
Expression -> int (6)
Expression -> Expression plus Expression (7)
list(Statement) -> list1(Statement) (8)
list(Statement) -> (9)
list(VarDecl) -> list1(VarDecl) (10)
list(VarDecl) -> (11)
list1(Statement) -> Statement (12)
list1(Statement) -> Statement list1(Statement) (13)
list1(VarDecl) -> VarDecl (14)
list1(VarDecl) -> VarDecl list1(VarDecl) (15)
-----------------------------------------------------------------------------
Terminals
-----------------------------------------------------------------------------
';' { TokenSemi }
id { TokenId $$ }
'{' { TokenLBrace }
'}' { TokenRBrace }
public { TokenPublickw }
'(' { TokenLParen }
')' { TokenRParen }
int { TokenInt $$ }
plus { TokenPlus }
inttype { TokenIntkw }
'=' { TokenAssign }
-----------------------------------------------------------------------------
Non-terminals
-----------------------------------------------------------------------------
%start_parser rule 0
MethodDecl rule 1
DataType rules 2, 3
VarDecl rule 4
Statement rule 5
Expression rules 6, 7
list(Statement) rules 8, 9
list(VarDecl) rules 10, 11
list1(Statement) rules 12, 13
list1(VarDecl) rules 14, 15
-----------------------------------------------------------------------------
States
-----------------------------------------------------------------------------
State 0
public shift, and enter state 2
MethodDecl goto state 3
State 1
public shift, and enter state 2
State 2
MethodDecl -> public . id '(' ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
id shift, and enter state 4
State 3
%start_parser -> MethodDecl . (rule 0)
%eof accept
State 4
MethodDecl -> public id . '(' ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
'(' shift, and enter state 5
State 5
MethodDecl -> public id '(' . ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
')' shift, and enter state 6
State 6
MethodDecl -> public id '(' ')' . '{' list(VarDecl) list(Statement) '}' (rule 1)
'{' shift, and enter state 7
State 7
MethodDecl -> public id '(' ')' '{' . list(VarDecl) list(Statement) '}' (rule 1)
id shift, and enter state 12
(reduce using rule 11)
'}' reduce using rule 11
inttype shift, and enter state 13
DataType goto state 8
VarDecl goto state 9
list(VarDecl) goto state 10
list1(VarDecl) goto state 11
State 8
VarDecl -> DataType . id ';' (rule 4)
id shift, and enter state 19
State 9
list1(VarDecl) -> VarDecl . (rule 14)
list1(VarDecl) -> VarDecl . list1(VarDecl) (rule 15)
id shift, and enter state 12
(reduce using rule 14)
'}' reduce using rule 14
inttype shift, and enter state 13
DataType goto state 8
VarDecl goto state 9
list1(VarDecl) goto state 18
State 10
MethodDecl -> public id '(' ')' '{' list(VarDecl) . list(Statement) '}' (rule 1)
id shift, and enter state 17
'}' reduce using rule 9
Statement goto state 14
list(Statement)goto state 15
list1(Statement)goto state 16
State 11
list(VarDecl) -> list1(VarDecl) . (rule 10)
id reduce using rule 10
'}' reduce using rule 10
State 12
DataType -> id . (rule 3)
id reduce using rule 3
State 13
DataType -> inttype . (rule 2)
id reduce using rule 2
State 14
list1(Statement) -> Statement . (rule 12)
list1(Statement) -> Statement . list1(Statement) (rule 13)
id shift, and enter state 17
'}' reduce using rule 12
Statement goto state 14
list1(Statement)goto state 23
State 15
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) . '}' (rule 1)
'}' shift, and enter state 22
State 16
list(Statement) -> list1(Statement) . (rule 8)
'}' reduce using rule 8
State 17
Statement -> id . '=' Expression ';' (rule 5)
'=' shift, and enter state 21
State 18
list1(VarDecl) -> VarDecl list1(VarDecl) . (rule 15)
id reduce using rule 15
'}' reduce using rule 15
State 19
VarDecl -> DataType id . ';' (rule 4)
';' shift, and enter state 20
State 20
VarDecl -> DataType id ';' . (rule 4)
id reduce using rule 4
'}' reduce using rule 4
inttype reduce using rule 4
State 21
Statement -> id '=' . Expression ';' (rule 5)
int shift, and enter state 25
Expression goto state 24
State 22
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) '}' . (rule 1)
%eof reduce using rule 1
State 23
list1(Statement) -> Statement list1(Statement) . (rule 13)
'}' reduce using rule 13
State 24
Statement -> id '=' Expression . ';' (rule 5)
Expression -> Expression . plus Expression (rule 7)
';' shift, and enter state 26
plus shift, and enter state 27
State 25
Expression -> int . (rule 6)
';' reduce using rule 6
plus reduce using rule 6
State 26
Statement -> id '=' Expression ';' . (rule 5)
id reduce using rule 5
'}' reduce using rule 5
State 27
Expression -> Expression plus . Expression (rule 7)
int shift, and enter state 25
Expression goto state 28
State 28
Expression -> Expression . plus Expression (rule 7)
Expression -> Expression plus Expression . (rule 7)
';' reduce using rule 7
plus reduce using rule 7
-----------------------------------------------------------------------------
Grammar Totals
-----------------------------------------------------------------------------
Number of rules: 16
Number of terminals: 11
Number of non-terminals: 10
Number of states: 29
At a glance, it seems the shift-reduce conflict might be the expression grammar in 5.
Check out the grammar 2.3 Using Precedences in the Happy manual. You can also use the classic approach:
E => E + T | T
T => T * F | F
F => INT | ( E )
I found the solution. Instead of using 2 different lists
list(VarDecl) list(Statement)
use one list
ordered_lists(VarDecl,Statements)
Following is the definition of the ordered_lists.
--A list of p followed by a list of q, where each list can be an empty list
ordered_lists(p,q) :
ordered_lists1(p,q)
{ $1 }
| -- epsilon
{ ([], []) }
--This list should contain atleast one of p or q. A list of p followed by a
--list of q, where at most one list can be an empty list
ordered_lists1(p,q) :
p
{ ([$1], []) }
| q
{ ([], [$1]) }
| p ordered_lists1(p,q)
{ ($1:fst($2), snd($2)) }
| q list1(q)
{ ([], $1 : $2) }
Definition of the list1 is available in the question description. Please don't hesitate to post a better answer.

Is there a way to skip spaces for certain rules but not others?

My grammar contains the following:
assignment
: ID ASSIGN expr
;
expr
: MINUS expr #unaryMinusExpr
| NOT expr #notExpr
| expr MULT expr #multExpr
| expr DIV expr #divExpr
| expr PLUS expr #plusExpr
| expr MINUS expr #minusExpr
| expr LTEQ expr #lteqExpr
| expr GTEQ expr #gteqExpr
| expr LT expr #ltExpr
| expr GT expr #gtExpr
| expr NEQ expr #neqExpr
| expr EQ expr #eqExpr
| expr AND expr #andExpr
| expr OR expr #orExpr
| function #functionExpr
| atom #atomExpr
;
function
: ID OPAR (parameter (',' parameter)*)? CPAR
;
parameter
: STRING #stringParameter
| expr #exprParameter
;
atom
: OPAR expr CPAR #parExpr
| (INT | FLOAT) #numberAtom
| (TRUE | FALSE) #booleanAtom
| ID #idAtom
;
OR : '||';
AND : '&&';
EQ : '==';
NEQ : '!=';
GT : '>';
LT : '<';
GTEQ : '>=';
LTEQ : '<=';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
NOT : '!';
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
ASSIGN : '=';
TRUE : 'true';
FALSE : 'false';
IF : 'if';
ELSE : 'else';
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
SPACE
: [ \t\r\n] -> skip
;
The issue is that ID needs to be able to also contain MINUS, PLUS, etc. This means that I won't be able to tell when I have just an ID (this-isandid) or ID (this) MINUS ID (isandid).
What we'd like to do is to not skip the spaces around the operators in the expr, but otherwise skip spaces for all other rules. Is there a way to do that? I.e. we force the user to put spaces around operators when they really mean an expression as opposed to an ID containing e.g. MINUS.
I.e.
a-b is an ID
a - b is a minusExpr
a- b, a -b is an error
Or is there another way to allow e.g. MINUS in an ID and be able to tell the difference between an ID and a minusExpr?
You might be able to avoid conditional consideration of whitespace in the parser rules by modifying the ID rule within the lexer. Allow a - to appear in an ID, provided it is not at the beginning or end of the identifier.
ID
: [a-zA-Z_]
( '-'? [a-zA-Z_0-9]
)*
;
This can be done by using lexer modes. There would be one mode (or default mode) where the spaces are skipped, and the other one where they would not be skipped.

Resources