How can I modify my EBNF to handle cases like `- not 12`, `not + -1` - programming-languages

I created EBNF for expressions below
<expression> ::= <or_operand> [ "or" <or_operand> ]
<or_operand> ::= <and_operand> [ "and" <and_operand> ]
<and_operand> ::= <equality_operand> [ ( "=" | "!=" ) <equality_operand> ]
<equality_operand> ::= <simple_expression> [ <relational_operator> <simple_expression> ]
<relational_op> ::= "<" | ">" | "<=" | ">="
<simple_expression> ::= <term> [ ( "+" | "-" ) <term> ]
<term> ::= <factor> [ ( "*" | "/" ) <factor> ]
<factor> ::= <literal>
| "(" <expression> ")"
| "not" <factor>
| ( "+" | "-" ) <factor>
<literal> ::= <boolean_literal> | <number>
<boolean_literal> ::= "true" | "false"
<number> ::= <digit> [ <digit> ]
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
My problem lies within the factor section
<factor> ::= <literal>
| "(" <expression> ")"
| "not" <factor>
| ( "+" | "-" ) <factor>
You can see that I included three unary operators not, -, and +. They work for specific types like not only applies to boolean values only and +/- applies to numbers only.
I don't know how to handle cases when not is mixed with +/- like not +7, - not true, etc. for example. Is there any way I can modify the grammar so not can never be mixed with +/-?
Will it suffice?
<factor> ::= <literal>
| "(" <expression> ")"
| ( "not" | ( "+" | "-" ) ) <factor>
Or maybe it's parser's job to solve this issue?

This is quite easy to solve. You have two different styles of expressions each with their own syntax, so you just do not mix them, and keep their syntax rules separated.
A boolean expression can only occur in certain places, such as an assignment or some kind of choice statement. A numerical expression can only occur in a comparison or an assignment. This is not something that is handled at the semantic level, and if one looks at the grammar for many languages, this is how it is solved.
So you have for numeric expressions:
<simple_expression> ::= <term> [ ( "+" | "-" ) <term> ]
<term> ::= <factor> [ ( "*" | "/" ) <factor> ]
<factor> ::= <number>
| "(" <expression> ")"
| ( "+" | "-" ) <factor>
<number> ::= <digit> [ <digit> ]
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
This is now self contained. We can now build this into boolean expressions:
<boolean_expression> ::= "not" <boolean_expression>
| <logical_expression>
<logical_expression> ::= <or_operand> [ "or" <or_operand> ]
<or_operand> ::= <and_operand> [ "and" <and_operand> ]
<and_operand> ::= <equality_operand> [ ( "=" | "!=" ) <equality_operand> ]
<equality_operand> ::= <simple_expression> [ <relational_operator> <simple_expression> ]
| <boolean_literal>
<relational_op> ::= "<" | ">" | "<=" | ">="
<boolean_literal> ::= "true" | "false"
Notice that I permitted the equality comparison of boolean literals, however if you did not want to permit this you could change the rules to only permit them for an and operand.
Now we can use these in another rule, such as assignment:
<assignment> ::= <variable> ":=" ( <simple_expression> | <boolean_expression> )
All is done.

Related

int('1.5') VS float('1.5') in python

Why does float('1.5') gives 1.5 as output as expected but int('1.5') gives a value error?
Shouldn't python automatically convert the string into float and then into integer.
Because 1.5 isn't a valid integer literal which is required by the int() function.
From the docs:
If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in radix
base.
Whereas integer literals are defined as follows:
integer ::= decinteger | bininteger | octinteger | hexinteger
decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger ::= "0" ("b" | "B") (["_"] bindigit)+
octinteger ::= "0" ("o" | "O") (["_"] octdigit)+
hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
bindigit ::= "0" | "1"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"
Source: https://docs.python.org/3/reference/lexical_analysis.html#integers

How to parse statement in order of desired precedence using antlr?

I have an RSQL grammar defined:
grammar Rsql;
statement
: L_PAREN wrapped=statement R_PAREN
| left=statement op=( AND_OPERATOR | OR_OPERATOR ) right=statement
| node=comparison
;
comparison
: single_comparison
| multi_comparison
| bool_comparison
;
single_comparison
: key=IDENTIFIER op=( EQ | NE | GT | GTE | LT | LTE ) value=single_value
;
multi_comparison
: key=IDENTIFIER op=( IN | NIN ) value=multi_value
;
bool_comparison
: key=IDENTIFIER op=EX value=boolean_value
;
boolean_value
: BOOLEAN
;
single_value
: boolean_value
| ( STRING_LITERAL | IDENTIFIER )
| NUMERIC_LITERAL
;
multi_value
: L_PAREN single_value ( COMMA single_value )* R_PAREN
| single_value
;
TRUE: 'true';
FALSE: 'false';
AND_OPERATOR: ';';
OR_OPERATOR: ',';
L_PAREN: '(';
R_PAREN: ')';
COMMA: ',';
EQ: '==';
NE: '!=';
IN: '=in=';
NIN: '=out=';
GT: '=gt=';
LT: '=lt=';
GTE: '=ge=';
LTE: '=le=';
EX: '=ex=';
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
BOOLEAN
: TRUE
| FALSE
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( [-+]? DIGIT+ )?
| '.' DIGIT+ ( [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( STRING_ESCAPE_SEQ | ~[\\\r\n'] )* '\''
| '"' ( STRING_ESCAPE_SEQ | ~[\\\r\n"] )* '"'
;
STRING_ESCAPE_SEQ
: '\\' .
;
fragment DIGIT : [0-9];
No matter how I attempt to parse this (listener/visitor), the statements with parenthesis always get evaluated in order. It is my understanding that the order in the rule would be the precedence. However, the parse tree for a statement like "name==foo,(name==bar;age=gt=35)" is always
no matter where the parenthesis appear. Please help me discover what I'm missing. Thanks!

How to write context free grammar aspect the priority of operations in antlr4

we knew the priority of logical operation from strong to low:
Not
And
Or
I want to add logical operation to my grammar in way respect the priority of logical operation. ...
My grammar is:
expression : factor ( PLUS factor | MINUS factor )* ;
factor : term ( MULT term | DIV term )* ;
term : NUMBER | ID | PAR_OPEN expression PAR_CLOSE ;
With ANTLR3 and ANTLR 4, you can doe something like this:
expression
: or_expression
;
// lowest precedence
or_expression
: and_expression ( '||' and_expression )*
;
and_expression
: rel_expression ( '&&' rel_expression )*
;
rel_expression
: add_expression ( ( '<' | '<=' | '>' | '>=' ) add_expression )*
;
add_expression
: mult_expression ( ( '+' | '-' ) mult_expression )*
;
mult_expression
: unary_expression ( ( '*' | '/' ) unary_expression )*
;
unary_expression
: '-' atom
| atom
;
// highest precedence
atom
: NUMBER
| ID
| '(' expression ')'
;
And with ANTLR4, you can also write it like this (which is equivalent to the grammar above!):
expression
: '!' expression
| expression ( '*' | '/' ) expression // higher than '+' | '-'
| expression ( '+' | '-' ) expression // higher than '<' | '<=' | '>' | '>='
| expression ( '<' | '<=' | '>' | '>=' ) expression // higher than '&&'
| expression '&&' expression // higher than '||'
| expression '||' expression
| NUMBER
| ID
| '(' expression ')'
;

Xtext : identify when a optional must be call

I am using xtext to define a grammar.
I have a problem with the runtime syntax evaluation.
the rule is signatureDeclaration. Here the full grammar:
// automatically generated by Xtext
grammar org.xtext.alloy.Alloy with org.eclipse.xtext.common.Terminals
import "http://fr.cuauh.als/1.0"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
//specification ::= [module] open* paragraph*
//ok
Specification returns Specification:
{Specification}
(module=Module)?
(opens+=Library (opens+=Library)*)?
(paragraphs+=Paragraph (paragraphs+=Paragraph)*)?;
//module ::= "module" name [ "[" ["exactly"] name ("," ["exactly"] num)* "]" ]
//module ::= "module" name? [ "[" ["exactly"] name ("," ExactlyNum )* "]" ]
//ok
Module returns Module:
{Module}
'module' (name=IDName)? ('['(exactly?='exactly')? extensionName=[IDref] (nums+=ExactlyNums ( "," nums+=ExactlyNums)*)?']')?
;
IDName returns IDName:
name=ID
;
terminal ID : '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'/')*;
//open ::= ["private"] "open" name [ "[" ref,+ "]" ] [ "as" name ]
//open ::= ["private"] "open" path [ "[" ref,+ "]" ] [ "as" name ]
//ok
Library returns Library:
{Library}
(private?='private')? 'open' path=EString ('['references+=Reference (',' references+=Reference)*']')? ('as' alias=Alias)?
;
//a path
//ok
//terminal PATH returns ecore::EString :
// ('a'..'z'|'A'..'Z'|'_'|'.')+('/'('a'..'z'|'A'..'Z'|'_'|'.')+)*
//;
//paragraph ::= factDecl | assertDecl | funDecl | cmdDecl | enumDecl | sigDecl
//paragraph ::= factDecl | assertDecl | funDecl | predDecl | cmdDecl | enumDecl | sigDecl
//ok
Paragraph returns Paragraph:
FactDeclaration | AssertDeclaration | FunctionDeclaration | PredicatDeclaration | CmdDeclaration | EnumerationDeclaration | SignatureDeclaration;
//cmdDecl ::= [name ":"] ("run"|"check") (name|block) scope
//cmdDecl ::= [name ":"] command (ref|block) scope ["expect (0|1)"]
//ok
CmdDeclaration returns CmdDeclaration:
(name=IDName ':')? operation=cmdOp referenceOrBlock=ReferenceOrBlock (scope=Scope)? (expect?='expect' expectValue=EInt)?;
//ok
ReferenceOrBlock returns ReferenceOrBlock:
BlockExpr | ReferenceName;
//sigDecl ::= sigQual* "sig" name,+ [sigExt] "{" decl,* "}" [block]
//sigDecl ::= ["private"] ["abstract"] [quant] "sig" name [sigExt] "{" relDecl,* "}" [block]
//ok
SignatureDeclaration returns SignatureDeclaration:
{SignatureDeclaration}
(isPrivate?='private')? (isAbstract?='abstract')? (quantifier=SignatureQuantifier)? 'sig' name=IDName (extension=SignatureExtension)? '{'
(relations+=RelationDeclaration ( ',' =>relations+=RelationDeclaration)* )?
'}'
(block=Block)?;
//ok
SignatureExtension returns SignatureExtension:
SignatureinInheritance | SignatureInclusion;
TypeScopeTarget returns TypeScopeTarget:
Int0 | Seq | ReferenceName;
//name ::= ("this" | ID) ["/" ID]*
//ok do not need to be part of the concrete syntax
//Name returns Name:
// thisOrId=ThisOrID ('/'ids+=ID0( "/" ids+=ID0)*)?;
//ok do not need to be part of the concrete syntax
//ThisOrID returns ThisOrID:
// ID0 | This;
EBoolean returns ecore::EBoolean:
'true' | 'false';
//["exactly"] num
//ok
ExactlyNums returns ExactlyNums:
exactly?='exactly' num=Number;
//ok do not need to be part of the concrete syntax
// IDName | IDref;
IDref returns IDref:
namedElement=[IDName]
;
This returns This:
{This}
'this'
;
EString returns ecore::EString:
STRING | ID;
//ok
EInt returns ecore::EInt:
'-'? INT;
Alias returns Alias:
{Alias}
name=IDName;
//factDecl ::= "fact" [name] block
//ok
FactDeclaration returns FactDeclaration:
'fact' (name=IDName)? block=Block;
//assertDecl ::= "assert" [name] block
//ok
AssertDeclaration returns AssertDeclaration:
'assert' (name=IDName)? block=Block ;
//funDecl ::= ["private"] "fun" [ref "."] name "(" decl,* ")" ":" expr block
//funDecl ::= ["private"] "fun" [ref "."] name "[" decl,* "]" ":" expr block
//funDecl ::= ["private"] "fun" [ref "."] name ":" expr block
//
//funDecl ::= ["private"] "fun" [ref "."] name ["(" paramDecl,* ")" | "[" paramDecl,* "]" ] ":" expr block
//ok
FunctionDeclaration returns FunctionDeclaration:
(private?='private')? 'fun' (reference=[Reference] ".")? name=IDName
('(' (parameters+=ParameterDeclaration ( "," parameters+=ParameterDeclaration)*)? ')'|
'[' (parameters+=ParameterDeclaration ( "," parameters+=ParameterDeclaration)*)? ']')?
':' ^returns=Expression
block=Block;
//funDecl ::= ["private"] "pred" [ref "."] name "(" decl,* ")" block
//funDecl ::= ["private"] "pred" [ref "."] name "[" decl,* "]" block
//funDecl ::= ["private"] "pred" [ref "."] name block
//
//predDecl ::= ["private"] "pred" [ref "."] name ["(" paramDecl,* ")" | "[" paramDecl,* "]" ] ":" block
//ok
PredicatDeclaration returns PredicatDeclaration:
(private?='private')? 'pred' (reference=[Reference|EString] ".")? name=IDName
('(' (parameters+=ParameterDeclaration ( "," parameters+=ParameterDeclaration)*)? ')'|
'[' (parameters+=ParameterDeclaration ( "," parameters+=ParameterDeclaration)*)? ']')?
block=Block;
//enumDecl ::= "enum" name "{" name ("," name)* "}"
//enumDecl ::= "enum" name "{" enumEl ("," enumEl)* "}"
//ok
EnumerationDeclaration returns EnumerationDeclaration:
'enum' name=IDName '{' enumeration+=EnumerationElement ( "," enumeration+=EnumerationElement)* '}';
//ok
EnumerationElement returns EnumerationElement:
{EnumerationElement}
name=IDName;
//"lone" | "one" | "some"
//ok
enum SignatureQuantifier returns SignatureQuantifier:
lone = 'lone' | one = 'one' | some = 'some';
//decl ::= ["private"] ["disj"] name,+ ":" ["disj"] expr
//decl ::= ["private"] ["disj"] name,+ ":" ["disj"] expr
//ok
RelationDeclaration returns RelationDeclaration:
(isPrivate?='private')? (varsAreDisjoint?='disj')? names+=VarDecl (',' names+=VarDecl)* ':' (expressionIsDisjoint?='disj')? expression=Expression;
//sigExt ::= "extends" ref
//ok
SignatureinInheritance returns SignatureinInheritance:
'extends' extends=Reference;
//sigExt ::= "in" ref ["+" ref]*
//ok
SignatureInclusion returns SignatureInclusion:
'in' includes+=Reference ( "+" includes+=Reference)* ;
//ok
VarDecl returns VarDecl:
{VarDecl}
name=IDName;
//decl ::= ["private"] ["disj"] name,+ ":" ["disj"] expr
//ok
ParameterDeclaration returns ParameterDeclaration:
(isPrivate?='private')? (varsAreDisjoint?='disj')? names+=VarDecl ( "," names+=VarDecl)* ':' (expressionIsDisjoint?='disj')? expression=Expression;
//("run"|"check")
//ok
enum cmdOp returns cmdOp:
run = 'run' | check = 'check';
//expr ::=
//1) "let" letDecl,+ blockOrBar
//2) | quant decl,+ blockOrBar
//3) | unOp expr
//4) | expr binOp expr
//5) | expr arrowOp expr
//6) | expr ["!"|"not"] compareOp expr
//7) | expr ("=>"|"implies") expr "else" expr
//8) | expr "[" expr,* "]"
//9) | number
//10) | "-" number
//11) | "none"
//12) | "iden"
//13) | "univ"
//14) | "Int"
//15) | "seq/Int"
//16) | "(" expr ")"
//17) | ["#"] Name
//18) | block
//19) | "{" decl,+ blockOrBar "}"
// expr ::= leftPart [rightPart]
Expression returns Expression:
lhs=NonLeftRecursiveExpression (=>parts=NaryPart)?;
//4) | expr binOp expr
//5) | expr arrowOp expr
//6) | expr ["!"|"not"] compareOp expr
//7) | expr ("=>"|"implies") expr "else" expr
//8) | expr "[" expr,* "]"
//ok
NaryPart returns NaryPart:
BinaryOrElsePart | CallPart;
//4) | expr binOp expr
//5) | expr arrowOp expr
//6) | expr ["!"|"not"] compareOp expr
//7) | expr ("=>"|"implies") expr "else" expr
//
//7) | expr ("=>"|"implies") expr "else" expr
//4)5)6) | expr binaryOperator expr*
//ok
BinaryOrElsePart returns BinaryOrElsePart:
=>('=>'|'implies') rhs=Expression (=>'else' else=Expression)? |
operator=BinaryOperator rhs=Expression ;
//8) | expr "[" expr,* "]"
//it is just the right part
//ok
CallPart returns CallPart:
{CallPart}
'['(params+=Expression ( "," params+=Expression)*)?']';
//1) "let" letDecl,+ blockOrBar
//2) | quant decl,+ blockOrBar
//19) | "{" decl,+ blockOrBar "}"
//18) | block
// | terminalExpression
NonLeftRecursiveExpression returns NonLeftRecursiveExpression:
LetExpression | QuantifiedExpression | CurlyBracketsExpression | BlockExpr | TerminalExpression;
//1) "let" letDecl,+ blockOrBar
//ok
LetExpression returns LetExpression:
'let' letDeclarations+=LetDeclaration ( "," letDeclarations+=LetDeclaration)* blockOrBar=BlockOrBar;
//2) | quant decl,+ blockOrBar
//ok
QuantifiedExpression returns QuantifiedExpression:
quantifier=QuantifiedExpressionQuantifier varDeclaration+=VarDeclaration ( "," varDeclaration+=VarDeclaration)* blockOrBar=BlockOrBar;
//
//binOp ::= "||" | "or" | "&&" | "and" | "&" | "<=>" | "iff" | "=>" | "implies" | "+" | "-" | "++" | "<:" | ":>" | "." | "<<" | ">>" | ">>>"
//compareOp ::= "=" | "in" | "<" | ">" | "=<" | ">="
//arrowOp ::= ["some"|"one"|"lone"|"set"] "->" ["some"|"one"|"lone"|"set"]
//ok
BinaryOperator returns BinaryOperator:
RelationalOperator | CompareOperator | ArrowOperator;
//ok
RelationalOperator returns RelationalOperator:
operator=RelationalOp;
//binOp ::= "||" | "or" | "&&" | "and" | "&" | "<=>" | "iff" | "=>" | "implies" | "+" | "-" | "++" | "<:" | ":>" | "." | "<<" | ">>" | ">>>"
//ok
enum RelationalOp returns RelationalOp:
or = '||' | and = '&&' | union = '+' | intersection = '&' | difference = '-' | equivalence = '<=>' | override = '++'
| domain = '<:' | range = ':>' | join = '.' ; // | lshift = '<<' | rshift = '>>' | rrshift = '>>>';
//["!"|"not"] compareOp
//ok
CompareOperator returns CompareOperator:
(negated?='!' | negated?='not')? operator=CompareOp;
//compareOp ::= "=" | "in" | "<" | ">" | "=<" | ">="
enum CompareOp returns CompareOp:
equal = '=' | inclusion = 'in' | lesser = '<' | greater = '>' | lesserOrEq = '<=' | greaterOrEq = '>=';
//arrowOp ::= ["some"|"one"|"lone"|"set"] "->" ["some"|"one"|"lone"|"set"]
//ok
ArrowOperator returns ArrowOperator:
{ArrowOperator}
(leftQuantifier=ArrowQuantifier)? '->' (=>rightQuantifier=ArrowQuantifier)?;
//"some"|"one"|"lone"|"set"
//ok
enum ArrowQuantifier returns ArrowQuantifier:
lone = 'lone' | one = 'one' | some = 'some' | set = 'set' ;
//19) | "{" decl,+ blockOrBar "}"
//ok
CurlyBracketsExpression returns CurlyBracketsExpression:
'{' varDeclarations+=VarDeclaration ( "," varDeclarations+=VarDeclaration)* blockOrBar=BlockOrBar '}';
//blockOrBar ::= block
//blockOrBar ::= "|" expr
//ok
BlockOrBar returns BlockOrBar:
BlockExpr | Bar;
//blockOrBar ::= "|" expr
//ok
Bar returns Bar:
'|' expression=Expression;
//block ::= "{" expr* "}"
//ok
BlockExpr returns BlockExpr:
{BlockExpr}
'{' (expressions+=Expression ( "," expressions+=Expression)*)?'}';
//3) unOp expr
// | finalExpression
TerminalExpression returns TerminalExpression:
UnaryExpr | finalExpression ;
//3) unOp expr
//ok
UnaryExpr returns UnaryExpr:
unOp=UnaryOperator expression=TerminalExpression;
//unOp ::= "!" | "not" | "no" | "some" | "lone" | "one" | "set" | "seq" | "#" | "~" | "*" | "^"
//unOp ::= "!" | "not" |"#" | "~" | "*" | "^"
//ok
enum UnaryOperator returns UnaryOperator:
not = 'not' | card = '#' | transpose = '~' | reflexiveClosure = '*' | closure = '^' | not2 = '!';
//16) | "(" expr ")"
//9) | number
//10) | "-" number
//17) | ["#"] Name
//11) | "none"
//12) | "iden"
//13) | "univ"
//14) | "Int"
//15) | "seq/Int"
//16) | "(" expr ")"
//10) | ["-"] number
//17) | "#" Name
//17) | reference
//12)13) | constante
finalExpression returns TerminalExpression:
BracketExpression | NumberExpression | NotExpandedExpression | ReferenceExpression | ConstantExpression;
//16) | "(" expr ")"
//ok
BracketExpression returns BracketExpression:
'('expression=Expression')';
//9) | number
//10) | "-" number
//ok
Number returns Number:
NumberExpression;
//ok
NumberExpression returns NumberExpression:
value=EInt;
//17) | ["#"] Name
//17) | "#" Name
//ok
NotExpandedExpression returns NotExpandedExpression:
'#' name=[IDref];
//ok
ReferenceExpression returns ReferenceExpression:
reference=Reference;
//ref ::= name | "univ" | "Int" | "seq/Int"
//ok
Reference returns Reference:
ReferenceName | ConstanteReference;
//ok
ConstanteReference returns ConstanteReference:
cst=constanteRef;
//13) | "univ"
//14) | "Int"
//15) | "seq/Int"
//ok
enum constanteRef returns constanteRef:
int = 'Int' | seqint = 'seq/Int' | univ = 'univ';
//ok
ReferenceName returns ReferenceName:
name=IDref;
//ok
ConstantExpression returns ConstantExpression:
constante=Constant;
//11) | "none"
//12) | "iden"
//ok
enum Constant returns Constant:
none = 'none' | iden = 'iden';
//ok
Block returns Block:
BlockExpr;
//letDecl ::= name "=" expr
//ok
LetDeclaration returns LetDeclaration:
varName=VarDecl '=' expression=Expression;
//quant ::= "all" | "no" | "some" | "lone" | "one" | "sum"
//quant ::= "all" | "no" | "some" | "lone" | "one" | "null"
//ok
enum QuantifiedExpressionQuantifier returns QuantifiedExpressionQuantifier:
no = 'no' | one = 'one' | lone = 'lone' | some = 'some' | all = 'all' | null = 'null';
//decl ::= ["private"] ["disj"] name,+ ":" ["disj"] expr
//ok
VarDeclaration returns VarDeclaration:
(isPrivate?='private')? (varsAreDisjoint?='disj')? names+=VarDecl ( "," names+=VarDecl)* ':' (expressionIsDisjoint?='disj')? expression=Expression;
//scope ::= "for" number ["expect" (0|1)]
//scope ::= "for" number "but" typescope,+ ["expect" (0|1)]
//scope ::= "for" typescope,+ ["expect" (0|1)]
//scope ::= ["expect" (0|1)]
//
//scope ::= "for" [number] ["but"] typescope,*
//ok
Scope returns Scope:
{Scope}
'for' (number=Number)? (but?='but')? (typeScope+=TypeScope ( "," typeScope+=TypeScope)*)?;
//typescope ::= ["exactly"] number [name|"int"|"seq"]
//typescope ::= ExactlyNumber target
TypeScope returns TypeScope:
num=ExactlyNums target=[TypeScopeTarget];
//[name|"int"|"seq"]
//[Validname|"int"|"seq"]
//ok
TypeScopeTarget_Impl returns TypeScopeTarget:
{TypeScopeTarget}
;
Int0 returns Int:
{Int}
'Int'
;
Seq returns Seq:
{Seq}
'Seq'
;
But at runtime of the editor, I have the following error on the little exemple :
sig A {}
sig B {
a : A
}
Multiple markers at this line (line a : A)
- no viable alternative at input 'A'
- missing EOF at '->'
- missing '}' at 'a'
the rule works for the first one, but not the second one. It is at is expect there is no relationDeclartion between the brackets.
I think it is due to the declaration of the rule relationDeclaration call with the form :
(relations+=RelationDeclaration ( ',' relations+=RelationDeclaration)*)?
I cannot get what is wrong.
What did I miss?
What can I do to make it work?
Thanks in advance.
in your grammar for RelationDeclaration it seems you miss some optional marking questionmarks (isPrivate?='private')? (varsAreDisjoint?='disj')? (the same problem is all over the grammar)
?= declares the attribute as optional, but only a ? around the rule call makes it actually optional

ANTLR 4.1 Variable ANTLR 4 token multiplicity yields error: "closure with at least one alternative that can match empty string"

Basically what I'm trying to do is create a grammar for Internationalized Resource Identifiers in ANTLR 4.1. The hardest time I've had thus far is trying to get the production rule for ipv6address working correctly. The way ipv6address is defined in RFC 3987 is that there are basically 9 different alternatives in ABNF format for that production rule alone:
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
/ [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
/ [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
/ [ *4( h16 ":" ) h16 ] "::" ls32
/ [ *5( h16 ":" ) h16 ] "::" h16
/ [ *6( h16 ":" ) h16 ] "::"
Here, ls32 and h16 are both subrules defined as:
ls32 = ( h16 ":" h16 ) / IPv4address
And as such for h16:
h16 = 1*4HEXDIG
Where HEXDIG is a lexer rule for valid hexadecimal digits. I've tried to write this ABNF grammar with ANTLR syntax like such:
grammar IRI;
iri : scheme ':' ihier_part ('?' iquery)? ('#' ifragment)? ;
ihier_part : ('//' iauthority ipath_abempty
| ipath_absolute
| ipath_rootless)?
;
iri_reference : iri
| irelative_ref
;
absolute_IRI : scheme ':' ihier_part ('?' iquery)? ;
irelative_ref : irelative_part ('?' iquery)? ('#' ifragment)? ;
irelative_part : ('//' iauthority ipath_abempty
| ipath_absolute
| ipath_noscheme)?
;
iauthority : (iuserinfo '#')? ihost (':' port)? ;
iuserinfo : (iunreserved | pct_encoded | sub_delims | ':')* ;
ihost : ip_literal
| ipv4address
| ireg_name
;
ireg_name : (iunreserved | pct_encoded | sub_delims)* ;
ipath : (ipath_abempty
| ipath_absolute
| ipath_noscheme
| ipath_rootless)?
;
ipath_abempty : ('/' isegment)* ;
ipath_absolute : '/' (isegment_nz ('/' isegment)*)? ;
ipath_noscheme : isegment_nz_nc ('/' isegment)* ;
ipath_rootless : isegment_nz ('/' isegment)* ;
isegment : (ipchar)* ;
isegment_nz : (ipchar)+ ;
isegment_nz_nc : (iunreserved | pct_encoded | sub_delims | '#')+ ;
ipchar : iunreserved
| pct_encoded
| sub_delims
| ':'
| '#'
;
iquery : (ipchar | IPRIVATE | '/' | '?')* ;
ifragment : (ipchar | '/' | '?')* ;
iunreserved : ALPHA
| DIGIT
| '-'
| '.'
| '_'
| '~'
| UCSCHAR
;
fragment
UCSCHAR : '\u00A0'..'\uD7FF' | '\uF900'..'\uFDCF' | '\uFDF0'..'\uFFEF'
| '\u40000'..'\u4FFFD' | '\u50000'..'\u5FFFD' | '\u60000'..'\u6FFFD'
| '\u70000'..'\u7FFFD' | '\u80000'..'\u8FFFD' | '\u90000'..'\u9FFFD'
| '\uA0000'..'\uAFFFD' | '\uB0000'..'\uBFFFD' | '\uC0000'..'\uCFFFD'
| '\uD0000'..'\uDFFFD' | '\uE1000'..'\uEFFFD'
;
fragment
IPRIVATE : '\uE000'..'\uF8FF' | '\uF0000'..'\uFFFFD' | '\u100000'..'\u10FFFD' ;
scheme : ALPHA (ALPHA | DIGIT | '+' | '-' | '.')* ;
port : (DIGIT)* ;
ip_literal : '[' (ipv6address | ipvFuture) ']' ;
ipvFuture : 'v' (HEXDIG)+ '.' (unreserved | sub_delims | ':')+ ;
ipv6address
locals [int i1, i2, i3, i4, i5, i6, i7, i8, i9, i10 = 0;]
: ( {$i1<=6}? h16 ':' {$i1++;} )* ls32
| '::' ( {$i2<=5}? h16 ':' {$i2++;} )* ls32
| (h16)? '::' ( {$i3<=4}? h16 ':' {$i3++;} )* ls32
| ((h16 ':')? h16)? '::' ( {$i4<=3}? h16 ':'{$i4++;} )* ls32
| (( {$i5>=0 && $i5<=2}? h16 ':' {$i5++;} )* h16)? '::' ( {$i6<=2}? h16 ':' {$i6++;} )* ls32
| (( {$i7>=0 && $i7<=3}? h16 ':' {$i7++;} )* h16)? '::' h16 ':' ls32
| (( {$i8>=0 && $i8<=4}? h16 ':' {$i8++;} )* h16)? '::' ls32
| (( {$i9>=0 && $i9<=5}? h16 ':' {$i9++;} )* h16)? '::' h16
| (( {$i10>=0 && $i10<=6}? h16 ':' {$i10++;} )* h16)* '::'
;
h16
locals [int i = 1;]
: ( {$i>=1 && $i<=4}? HEXDIG {$i++;} )* ;
ls32 : h16 ':' h16 ;
ipv4address : DEC_OCTET '.' DEC_OCTET '.' DEC_OCTET '.' DEC_OCTET ;
DEC_OCTET : '0'..'9'
| '10'..'99'
| '100'..'199'
| '200'..'249'
| '250'..'255'
;
pct_encoded : '%' HEXDIG HEXDIG ;
unreserved : ALPHA | DIGIT | '-' | '.' | '_' | '~' ;
reserved : gen_delims
| sub_delims
;
gen_delims : ':' | '/' | '?' | '#' | '[' | ']' | '#' ;
sub_delims : '!' | '$' | '&' | '\'' | '(' | ')' ;
DIGIT : [0-9] ;
HEXDIG : [0-9A-F] ;
ALPHA : [a-zA-Z] ;
WS : [' ' | '\t' | '\r' | '\n']+ -> skip ;
In my ANTLR grammar, I'm trying to use semantic predicates in order to specify the multiplicity rules defined in the ABNF grammer, both for ipv6address and h16. When I execute the org.antlr.v4.Tool class, I get the following output:
warning(125): IRI.g4:68:20: implicit definition of token 'IPRIVATE' in parser
warning(125): IRI.g4:78:4: implicit definition of token 'UCSCHAR' in parser
error(153): IRI.g4:100:0: rule 'ipv6address' contains a closure with at least one alternative that can match an empty string
warning(154): IRI.g4:40:0: rule 'ipath' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
warning(154): IRI.g4:100:0: rule 'ipv6address' contains an optional block with at least one alternative that can match an empty string
Obviously I'd like to get rid of the warnings as well, but I need to get rid of the error stating 'ipv6address' contains a closure with at least one alternative that can match an empty string. I've seen similar posts on StackOverflow about multiple alternatives errors. However, none of them dealt with closures that could match the empty string. I also am pretty sure I'm going to have to define the Unicode characters in UCSCHAR past \uFFFF as surrogate pairs, but that I'll take care of later. Just need to know how to get rid of the closure problem for now.
There are quite some things going wrong:
0
What 280Z28 said.
1
'250'..'255' does not match the strings "250" ... "255": you need to match the numeric ranges as described in the original ABNF specs:
ABNF
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
ANTLR
dec_octet
: digit
| non_zero_digit digit
| D1 digit digit
| ...
;
2
You have a lot of conflicting lexer rules. Take these for example:
HEXDIG : [0-9A-F] ;
ALPHA : [a-zA-Z] ;
because HEXDIG is defined before ALPHA, the lexer will always create a HEXDIG when it sees 'A', for example. You must realize that the lexer does not produce tokens based on what the parser would like to receive. The lexer will go its own way and will never produce an ALPHA for the uppercase letters A-F.
3
fragment rules can only be used inside other lexer rules (or other fragment rules). You cannot use them inside parser rules.
4
Not really an issue, but the predicates make your grammar hard to read: if possible try to minimize predicates is my rule of thumb.
Your rule:
h16
locals [int i = 1;]
: ( {$i>=1 && $i<=4}? HEXDIG {$i++;} )* ;
could be written as:
h16
: HEXDIG HEXDIG HEXDIG HEXDIG
| HEXDIG HEXDIG HEXDIG
| HEXDIG HEXDIG
| HEXDIG
;
or even:
h16
: HEXDIG (HEXDIG (HEXDIG HEXDIG?)?)?
;
Most of these issues are easily fixed, but #2 is a more tricky one. What you could (should?) do is let the lexer create single-char tokens and let the parser match these single-char tokens into a whole. An example how you could let the parser match the dec-octet production from the official ABNF:
dec_octet
: digit // 0-9
| non_zero_digit digit // 10-99
| D1 digit digit // 100-199
| D2 (D0 | D1 | D2 | D3 | D4) digit // 200-249
| D2 D5 (D0 | D1 | D2 | D3 | D4 | D5) // 250-255
;
digit
: D0
| non_zero_digit
;
non_zero_digit
: D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9
;
// lexer rules
D0 : '0';
D1 : '1';
D2 : '2';
D3 : '3';
D4 : '4';
D5 : '5';
D6 : '6';
D7 : '7';
D8 : '8';
D9 : '9';
I've once written an IRI grammar for ANTLR 3. If you want, I could put it in Github somewhere.
Your h16 rule uses (...)* instead of (...)+, which allows it to match 0 digits. When you place h16* in your grammar, it means you allow any number of nothings in your parse tree, which would always result in an infinite loop running your system out of memory (creating parse tree nodes with no tokens).

Resources