Running TestRig with wrong input string does not emit error message - antlr4

I made a grammar which constructs comparison between expressions as follows.
grammar EtlExpression;
/** The start rule; begin parsing here. */
prog : comp ;
comp : expr ('='|'<='|'>='|'<'|'>') expr ;
expr : expr ('*'|'/') expr
| expr ('+'|'-') expr
| '(' expr ')'
| func
| ID
| STR
| NUM
;
func : ID '(' expr (',' expr)* ')' ; // function
STR : '"' ('\\"'|.)*? '"' ; // match identifiers
ID : LETTER (LETTER|DIGIT)* ; // match identifiers
NUM : '-'? ('.' DIGIT+ | DIGIT+ ('.' DIGIT*)? ) ; // match number
WS : [ \t\r\n]+ -> skip ; // toss out whitespace
fragment
LETTER : [a-zA-Z] ;
fragment
DIGIT : [0-9] ;
then I ran testRig after compile.
The result is as follwoging.
java -cp .;C:\App\Antlr4\antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig test.antlr.EtlExpression prog -tree
a < b = c
^Z
(prog (comp (expr a) < (expr b)))
The comp rule specifies only one comparison operand and I think this test input string should emit some kind error like "line 1:6 token recognition error at: '=' ", but it just ignores "= c" part.
Could you help me What's wrong with the grammar or how i can get the right message?
Thankyou in advance.

The parser simply stops when it cannot match = and beyond it. If you force the parser to consume the entire input, an error will appear on your stdout. You can do that by adding EOF to your prog rule:
prog : comp EOF;

Related

How to correctly parse field access after function call via parser-combinators (makeExprParser) library?

I want to parse expressions like this: a().x. It should look like EAttrRef (EFuncCall (EVarRef "a") []) "x". Unfortunately my expression parser is stopping too soon, it only parses a() and then stops.
1:4:
|
1 | a().x
| ^
unexpected '.'
expecting end of input
Code:
pExpr :: Parser Expr
pExpr = lexeme p & dbg "pExpr" <?> "expression"
where
pTerm = try pVarRef <|> pELit
p = makeExprParser pTerm exprTable
exprTable = [[Postfix opIndexRef], [InfixL opAttrRef], [Postfix opFuncCall]]
opAttrRef :: Parser (Expr -> Expr -> Expr)
opAttrRef = do
symbol "." & dbg "opAttrRef symbol \".\""
return r
where
r x (EVarRef y) = EAttrRef x y
r x y = error [qq|opAttrRef got unexpected right operand $y (left operand was $x)|]
opFuncCall :: Parser (Expr -> Expr)
opFuncCall = do
symbol "("
args <- sepBy pExpr (symbol ",")
symbol ")" & dbg "opFuncCall symbol \")\""
return $ \funcExpr -> EFuncCall funcExpr args
opIndexRef = do
symbol "["
e <- pExpr
symbol "]" & dbg "opIndexRef symbol \"]\""
return $ \obj -> EIndexRef obj e
Debug output:
opAttrRef symbol "."> IN: "().x"
opAttrRef symbol "."> MATCH (EERR): <EMPTY>
opAttrRef symbol "."> ERROR:
opAttrRef symbol "."> offset=1:
opAttrRef symbol "."> unexpected '('
opAttrRef symbol "."> expecting '.'
pExpr> IN: ").x"
pExpr> MATCH (EERR): <EMPTY>
pExpr> ERROR:
pExpr> offset=2:
pExpr> unexpected ").x"
pExpr> expecting "false", "null", "true", '"', '+', '-', '[', digit, identifier, or integer
opFuncCall symbol ")"> IN: ").x"
opFuncCall symbol ")"> MATCH (COK): ')'
opFuncCall symbol ")"> VALUE: ")"
pExpr> IN: "a().x"
pExpr> MATCH (COK): "a()"
pExpr> VALUE: EFuncCall (EVarRef "a") []
It seems to me that makeExprParser is not calling opFuncCall second time (compared to how index access debug output looks), but I have no idea why not.
It parses when I decrease opAttrRef priority, but then it produces wrong trees (e.g. right operand of x.a() would be a() which is incorrect, it should be a and then the whole think should be in function call), so I can't use that (I am quite sure current priority is correct, since it's based on the reference of that language).
Your current expression parser looks like the following BNF:
expr = funcOp ;
funcOp = attrOp , { "(" , expr, ")" } ;
attrOp = attrOp , "." , indexOp | indexOp ;
indexOp = term , { "[", expr, "]" } ;
Once it finishes parsing funcCall, it will not go back up in the operator table and parse any attrRef or indexRef.
The problem with decreasing the priority of opAttrRef is that parses the left and right hand side of the dot separately, when it seems like you want the parser to read from left to right, and be able to mix any of funcCall, attrRef, or indexRef. So if you want to be able to parse something like a[b](c).d(e)[f], I'd suggest changing opAttrRef from infix to postfix, and flatten the operator table into:
exprTable = [[Postfix opIndexRef, PostFix opAttrRef, Postfix opFuncCall]]
At this point, the parser becomes:
expr = term , { indexRef | attrRef | funcCall } ;
If you need multiple postfix operators being allowed, you can rewrite your expression parser like this:
p = liftM2 (foldl (flip ($))) pTerm (many (opAttrRef <|> opIndexRef <|> opFuncCall))
The p parser can be used as a term parser for makeExprParser, if you want to add arithmetic, logic and other common operators.

Haskell source generated by happy has error "parse error on input 'data'"

I'm trying out the happy parser generator of Haskell. After generating the module in happy.hs (no problem while generating!), I run the command ghc happy.hs, and I get the error: Line 297: parse error on input 'data'. Does anyone have solutions? Or tell me where can I get the solution?
I tried loading the module in GHCi, rather than compiling it using ghc. But it seems not to be working too - I get the same error.
Code in happy.y (happy source):
-- TODO: add more of my things!!!
{
module Main where
}
%name calc
%tokentype { Token }
%error { parseError }
-- tokens
%token
let { TokenLet }
in { TokenIn }
int { TokenInt $$ }
var { TokenVar $$ }
'=' { TokenEq }
'+' { TokenPlus }
'-' { TokenSub }
'*' { TokenMul }
'/' { TokenDvd }
'(' { TokenOB }
')' { TokenCB }
%%
Exp : let var '=' Exp in Exp { Let $2 $4 $6 }
| Exp1 { Exp1 $1 }
Exp1 : Exp1 '+' Term { Plus $1 $3 }
| Exp1 '-' Term { Minus $1 $3 }
| Term { Term $1 }
Term : Term '*' Factor { Times $1 $3 }
| Term '/' Factor { Div $1 $3 }
| Factor { Factor $1 }
Factor : int { Int $1 }
| var { Var $1 }
| '(' Exp ')' { Brack $2 }
{
parseError :: [Token] -> a
parseError _ = error "Parse error! please try again..."
data Exp = Let String Exp Exp
| Exp1 Exp1
deriving Show
data Exp1 = Plus Exp1 Term
| Minus Exp1 Term
| Term Term
deriving Show
data Term = Times Term Factor
| Div Term Factor
| Factor Factor
deriving Show
data Factor = Int Int
| Var String
| Brack Exp
deriving Show
-- the tokens
data Token = TokenLet
| TokenIn
| TokenInt Int
| TokenVar String
| TokenEq
| TokenPlus
| TokenMinus
| TokenTimes
| TokenDiv
| TokenOB
| TokenCB
deriving Show
-- the lexer
lexer :: String -> [Token]
lexer [] = []
lexer (c:cs)
| isSpace c = lexer cs
| isAlpha c = lexVar (c:cs)
| isDigit c = lexNum (c:cs)
lexer ('=':cs) = TokenEq : lexer cs
lexer ('+':cs) = TokenPlus : lexer cs
lexer ('-':cs) = TokenMinus : lexer cs
lexer ('*':cs) = TokenTimes : lexer cs
lexer ('/':cs) = TokenDiv : lexer cs
lexer ('(':cs) = TokenOB : lexer cs
lexer (')':cs) = TokenCB : lexer cs
lexNum cs = TokenInt (read num) : lexer rest
where (num,rest) = span isDigit cs
lexVar cs =
case span isAlpha cs of
("let",rest) -> TokenLet : lexer rest
("in",rest) -> TokenIn : lexer rest
(var,rest) -> TokenVar var : lexer rest
-- the main function
main = getContents >>= print . calc . lexer
}
Line that makes error in happy.hs and ±10 Lines(Lines 287~307, including):
287: calc tks = happyRunIdentity happySomeParser where
288: happySomeParser = happyThen (happyParse action_0 tks) (\x -> case x of 289: {HappyAbsSyn4 z -> happyReturn z; _other -> notHappyAtAll })
290:
291: happySeq = happyDontSeq
292:
293:
294: parseError :: [Token] -> a
295: parseError _ = error "Parse error! please try again..."
296:
297: data Exp = Let String Exp Exp -- <= I get error on this line!
298: | Exp1 Exp1
299: deriving Show
300:
301: data Exp1 = Plus Exp1 Term
302: | Minus Exp1 Term
303: | Term Term
304: deriving Show
305:
306: data Term = Times Term Factor
307: | Div Term Factor
I expect the program to run smoothly without any errors, but it doesn't.
Happy produces its generated code on column 1 of the file. Therefore, to have your code be considered part of the same module-where block as the generated code (which you cannot control), it must also live on column 1 of the file.
Delete four spaces from the beginning of the plain-Haskell lines of code in your happy file and you'll be on your way to the next error. Make sure that your deriving clauses and where clauses are indented one level from their surroundings when you are deleting spaces.

How rewrite grammar to eliminate shift-reduce conflict (in Haskell Happy parser)

I'm trying to define grammar for methods (java like) using Happy LALR parser generator
1. MD ::= some_prefix { list(VD) list(S) }
2. VD ::= T I
3. S ::= I = E | I [ E ] = E | etc...
4. T ::= I | byte | int | etc...
5. E ::= INT | E + E | etc...
Here,
MD: Method Declaration
VD: Variable Declaration
S: Statement
T: Type
I: Identifier
E: Expression
All the other tokens are terminals.
Within the method, variable declarations are done in the top and after that the statements.
As you can see VD can starts from an I since there can be variable declarations of type class where the type is an identifier (I). Statement can also be started from an I because of assignments to variables and variable name is an I
The problem is VD and S both can starts from an I. Therefore, in the first production it cause an shift/reduce conflict.
Is there a way to re-write grammar or any other parser generator tricks to solve this problem?
I have specified the associativity and precedence for the operators. I have only mentioned the minimum set of information to explain the problem. Let me know if you need more information.
UPDATE:
Following is the grammar file
{
module Issue where
import Lexer
}
%name parser
%tokentype { Token }
%error { parseError }
%token
--===========================================================================
';' { TokenSemi }
id { TokenId $$ }
'{' { TokenLBrace }
'}' { TokenRBrace }
public { TokenPublickw }
'(' { TokenLParen }
')' { TokenRParen }
int { TokenInt $$ }
plus { TokenPlus }
inttype { TokenIntkw }
'=' { TokenAssign }
--===========================================================================
--===========================================================================
-- Precedence and associativity. Reference:
-- http://introcs.cs.princeton.edu/java/11precedence/
%right '='
%left plus
--===========================================================================
%%
MethodDecl :
public id '(' ')' '{' list(VarDecl) list(Statement) '}'
{ MethodDecl $2 (VarList $6) (BlockStatement $7) }
DataType :
inttype
{ DataType TypeInt }
| id
{ DataType (TypeClass $1) }
VarDecl :
DataType id ';'
{ VarDecl $1 $2 }
Statement :
id '=' Expression ';'
{ Assignment $1 $3 }
Expression :
int
{ IntLiteral $1 }
| Expression plus Expression
{ PlusExp $1 $3 }
--============================================================================
list1(p) :
p
{ [$1] }
| p list1(p)
{ $1 : $2 }
list(p) :
list1(p)
{ $1 }
| -- epsilon
{ [] }
--============================================================================
{
data AST = Goal AST [AST]
| BlockStatement [AST]
| IntLiteral Int
| PlusExp AST AST
| MethodDecl String AST AST
| DataType MJType
| Identifier String
| VarList [AST]
| VarDecl AST String
| Assignment String AST
deriving Show
data MJType = TypeInt
| TypeUnknown
| TypeClass String
deriving (Show,Eq)
parseError :: [Token] -> a
parseError (t:ts) = error ("Parser Error: " ++ (show t))
}
.info file generated by Happy parser with the details of shift-reduce conflicts and states
-----------------------------------------------------------------------------
Info file generated by Happy Version 1.19.4 from issue.y
-----------------------------------------------------------------------------
state 7 contains 1 shift/reduce conflicts.
state 9 contains 1 shift/reduce conflicts.
-----------------------------------------------------------------------------
Grammar
-----------------------------------------------------------------------------
%start_parser -> MethodDecl (0)
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) '}' (1)
DataType -> inttype (2)
DataType -> id (3)
VarDecl -> DataType id ';' (4)
Statement -> id '=' Expression ';' (5)
Expression -> int (6)
Expression -> Expression plus Expression (7)
list(Statement) -> list1(Statement) (8)
list(Statement) -> (9)
list(VarDecl) -> list1(VarDecl) (10)
list(VarDecl) -> (11)
list1(Statement) -> Statement (12)
list1(Statement) -> Statement list1(Statement) (13)
list1(VarDecl) -> VarDecl (14)
list1(VarDecl) -> VarDecl list1(VarDecl) (15)
-----------------------------------------------------------------------------
Terminals
-----------------------------------------------------------------------------
';' { TokenSemi }
id { TokenId $$ }
'{' { TokenLBrace }
'}' { TokenRBrace }
public { TokenPublickw }
'(' { TokenLParen }
')' { TokenRParen }
int { TokenInt $$ }
plus { TokenPlus }
inttype { TokenIntkw }
'=' { TokenAssign }
-----------------------------------------------------------------------------
Non-terminals
-----------------------------------------------------------------------------
%start_parser rule 0
MethodDecl rule 1
DataType rules 2, 3
VarDecl rule 4
Statement rule 5
Expression rules 6, 7
list(Statement) rules 8, 9
list(VarDecl) rules 10, 11
list1(Statement) rules 12, 13
list1(VarDecl) rules 14, 15
-----------------------------------------------------------------------------
States
-----------------------------------------------------------------------------
State 0
public shift, and enter state 2
MethodDecl goto state 3
State 1
public shift, and enter state 2
State 2
MethodDecl -> public . id '(' ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
id shift, and enter state 4
State 3
%start_parser -> MethodDecl . (rule 0)
%eof accept
State 4
MethodDecl -> public id . '(' ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
'(' shift, and enter state 5
State 5
MethodDecl -> public id '(' . ')' '{' list(VarDecl) list(Statement) '}' (rule 1)
')' shift, and enter state 6
State 6
MethodDecl -> public id '(' ')' . '{' list(VarDecl) list(Statement) '}' (rule 1)
'{' shift, and enter state 7
State 7
MethodDecl -> public id '(' ')' '{' . list(VarDecl) list(Statement) '}' (rule 1)
id shift, and enter state 12
(reduce using rule 11)
'}' reduce using rule 11
inttype shift, and enter state 13
DataType goto state 8
VarDecl goto state 9
list(VarDecl) goto state 10
list1(VarDecl) goto state 11
State 8
VarDecl -> DataType . id ';' (rule 4)
id shift, and enter state 19
State 9
list1(VarDecl) -> VarDecl . (rule 14)
list1(VarDecl) -> VarDecl . list1(VarDecl) (rule 15)
id shift, and enter state 12
(reduce using rule 14)
'}' reduce using rule 14
inttype shift, and enter state 13
DataType goto state 8
VarDecl goto state 9
list1(VarDecl) goto state 18
State 10
MethodDecl -> public id '(' ')' '{' list(VarDecl) . list(Statement) '}' (rule 1)
id shift, and enter state 17
'}' reduce using rule 9
Statement goto state 14
list(Statement)goto state 15
list1(Statement)goto state 16
State 11
list(VarDecl) -> list1(VarDecl) . (rule 10)
id reduce using rule 10
'}' reduce using rule 10
State 12
DataType -> id . (rule 3)
id reduce using rule 3
State 13
DataType -> inttype . (rule 2)
id reduce using rule 2
State 14
list1(Statement) -> Statement . (rule 12)
list1(Statement) -> Statement . list1(Statement) (rule 13)
id shift, and enter state 17
'}' reduce using rule 12
Statement goto state 14
list1(Statement)goto state 23
State 15
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) . '}' (rule 1)
'}' shift, and enter state 22
State 16
list(Statement) -> list1(Statement) . (rule 8)
'}' reduce using rule 8
State 17
Statement -> id . '=' Expression ';' (rule 5)
'=' shift, and enter state 21
State 18
list1(VarDecl) -> VarDecl list1(VarDecl) . (rule 15)
id reduce using rule 15
'}' reduce using rule 15
State 19
VarDecl -> DataType id . ';' (rule 4)
';' shift, and enter state 20
State 20
VarDecl -> DataType id ';' . (rule 4)
id reduce using rule 4
'}' reduce using rule 4
inttype reduce using rule 4
State 21
Statement -> id '=' . Expression ';' (rule 5)
int shift, and enter state 25
Expression goto state 24
State 22
MethodDecl -> public id '(' ')' '{' list(VarDecl) list(Statement) '}' . (rule 1)
%eof reduce using rule 1
State 23
list1(Statement) -> Statement list1(Statement) . (rule 13)
'}' reduce using rule 13
State 24
Statement -> id '=' Expression . ';' (rule 5)
Expression -> Expression . plus Expression (rule 7)
';' shift, and enter state 26
plus shift, and enter state 27
State 25
Expression -> int . (rule 6)
';' reduce using rule 6
plus reduce using rule 6
State 26
Statement -> id '=' Expression ';' . (rule 5)
id reduce using rule 5
'}' reduce using rule 5
State 27
Expression -> Expression plus . Expression (rule 7)
int shift, and enter state 25
Expression goto state 28
State 28
Expression -> Expression . plus Expression (rule 7)
Expression -> Expression plus Expression . (rule 7)
';' reduce using rule 7
plus reduce using rule 7
-----------------------------------------------------------------------------
Grammar Totals
-----------------------------------------------------------------------------
Number of rules: 16
Number of terminals: 11
Number of non-terminals: 10
Number of states: 29
At a glance, it seems the shift-reduce conflict might be the expression grammar in 5.
Check out the grammar 2.3 Using Precedences in the Happy manual. You can also use the classic approach:
E => E + T | T
T => T * F | F
F => INT | ( E )
I found the solution. Instead of using 2 different lists
list(VarDecl) list(Statement)
use one list
ordered_lists(VarDecl,Statements)
Following is the definition of the ordered_lists.
--A list of p followed by a list of q, where each list can be an empty list
ordered_lists(p,q) :
ordered_lists1(p,q)
{ $1 }
| -- epsilon
{ ([], []) }
--This list should contain atleast one of p or q. A list of p followed by a
--list of q, where at most one list can be an empty list
ordered_lists1(p,q) :
p
{ ([$1], []) }
| q
{ ([], [$1]) }
| p ordered_lists1(p,q)
{ ($1:fst($2), snd($2)) }
| q list1(q)
{ ([], $1 : $2) }
Definition of the list1 is available in the question description. Please don't hesitate to post a better answer.

Precedence ambiguity in ANTLR 4

I have a huge ANTLR grammar, and I am facing problem with a small piece of it. Grammar has two rules expr and sets as defined below:
expr:
id
|(PLUS|MINUS|MULTIPLY|AND|NEGATION)expr
| expr (MULTIPLY |DIVIDE| MODULO)
| expr (PLUS | MINUS) expr
;
set:
EMPTY
| MULTIPLY set
| set PLUS set
| UNION '(' set (COMMA set)* ')'
| INTER '(' set (COMMA set)* ')'
| expr
;
The problem here is that for a set of form *s1 + *s2 should be reduced as following:
set -> set PLUS set
and then each set in RHS should reduce to:
set -> MULTIPLY set
set -> expr
term -> id
But instead they are reducing as:
set -> MULTIPLY set
set -> expr
expr -> expr PLUS expr
Because of whichset of forn *s1 +*s2 is parsed as *(s1 + *s2) instead of (*s1) + (*s2).
One of the rules of set, reduces it to expr. There are many other similar rules in grammar which reduces to expr. The problem is occurring here becuase some of the rules in set and expr are similar. But because some rules are different, I cannot merge them together.
In set even though the precedence of rule MULTIPLY set is higher than set PLUS set, set is reduced by MUTIPLY set rule.
Is there a way to fix this issue?
EDIT:
Adding a working example :
Grammar:
grammar T;
expr
: ID
| ( PLUS | MINUS | MULTIPLY | AND | NEGATION ) expr
| expr ( MULTIPLY | DIVIDE | MODULO )
| expr ( PLUS | MINUS ) expr
;
set:
EMPTY
| MULTIPLY set
| set PLUS set
| UNION '(' set (COMMA set)* ')'
| INTER '(' set (COMMA set)* ')'
| expr
;
ID : [a-zA-Z] [a-zA-Z0-9]*;
PLUS : '+';
MINUS : '-';
MULTIPLY : '*';
AND : '&&';
NEGATION : '!';
DIVIDE : '/';
MODULO : '%';
COMMA : ',';
EMPTY: '\\empty';
UNION: '\\union';
INTER: '\\inter';
SPACES : [ \t\r\n] -> skip;
Code to execute it:
TLexer lexer = new TLexer(new ANTLRInputStream("*s1 + *s2"));
TParser parser = new TParser(new CommonTokenStream(lexer));
RuleContext tree = parser.set();
tree.inspect(parser);
Output it generated:
set
/ \
* set
|
expr
/ | \
/ | \
expr + expr
| / \
s1 * expr
|
s2
I can't reproduce this.
Given the grammar:
grammar T;
expr
: ID
| ( PLUS | MINUS | MULTIPLY | AND | NEGATION ) expr
| expr ( MULTIPLY | DIVIDE | MODULO )
| expr ( PLUS | MINUS ) expr
;
ID : [a-zA-Z] [a-zA-Z0-9]*;
PLUS : '+';
MINUS : '-';
MULTIPLY : '*';
AND : '&&';
NEGATION : '!';
DIVIDE : '/';
MODULO : '%';
SPACES : [ \t\r\n] -> skip;
your input *s1 + *s2 will be parsed as:
expr
/ | \
/ | \
expr + expr
/ \ / \
* expr * expr
| |
s1 s2
Or, in plain code:
TLexer lexer = new TLexer(new ANTLRInputStream("*s1 + *s2"));
TParser parser = new TParser(new CommonTokenStream(lexer));
System.out.println(parser.expr().toStringTree(parser));
will print:
(expr (expr * (expr s1)) + (expr * (expr s2)))

Is there a way to skip spaces for certain rules but not others?

My grammar contains the following:
assignment
: ID ASSIGN expr
;
expr
: MINUS expr #unaryMinusExpr
| NOT expr #notExpr
| expr MULT expr #multExpr
| expr DIV expr #divExpr
| expr PLUS expr #plusExpr
| expr MINUS expr #minusExpr
| expr LTEQ expr #lteqExpr
| expr GTEQ expr #gteqExpr
| expr LT expr #ltExpr
| expr GT expr #gtExpr
| expr NEQ expr #neqExpr
| expr EQ expr #eqExpr
| expr AND expr #andExpr
| expr OR expr #orExpr
| function #functionExpr
| atom #atomExpr
;
function
: ID OPAR (parameter (',' parameter)*)? CPAR
;
parameter
: STRING #stringParameter
| expr #exprParameter
;
atom
: OPAR expr CPAR #parExpr
| (INT | FLOAT) #numberAtom
| (TRUE | FALSE) #booleanAtom
| ID #idAtom
;
OR : '||';
AND : '&&';
EQ : '==';
NEQ : '!=';
GT : '>';
LT : '<';
GTEQ : '>=';
LTEQ : '<=';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
NOT : '!';
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
ASSIGN : '=';
TRUE : 'true';
FALSE : 'false';
IF : 'if';
ELSE : 'else';
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
SPACE
: [ \t\r\n] -> skip
;
The issue is that ID needs to be able to also contain MINUS, PLUS, etc. This means that I won't be able to tell when I have just an ID (this-isandid) or ID (this) MINUS ID (isandid).
What we'd like to do is to not skip the spaces around the operators in the expr, but otherwise skip spaces for all other rules. Is there a way to do that? I.e. we force the user to put spaces around operators when they really mean an expression as opposed to an ID containing e.g. MINUS.
I.e.
a-b is an ID
a - b is a minusExpr
a- b, a -b is an error
Or is there another way to allow e.g. MINUS in an ID and be able to tell the difference between an ID and a minusExpr?
You might be able to avoid conditional consideration of whitespace in the parser rules by modifying the ID rule within the lexer. Allow a - to appear in an ID, provided it is not at the beginning or end of the identifier.
ID
: [a-zA-Z_]
( '-'? [a-zA-Z_0-9]
)*
;
This can be done by using lexer modes. There would be one mode (or default mode) where the spaces are skipped, and the other one where they would not be skipped.

Resources