Antlr4 Grammar/Rules - issue with solving BASIC print variable - antlr4

The scenario is that I want to create a BASIC (high level) language using ANTRL4.
The test input below is the creation of a variable called C$ and assigning an integer value. The value assignment works. The print statement works except where concatenating the variable to it:-
************ TEST CASE ****************
$C=15;
print "dangerdanger!"; # print works
print "Number of GB left=" + $C;
Using a Parse Tree Inspector I can see assignments are working fine but when it gets to the identification of the variable in the string it seems there is a mismatched input '+' expecting STMTEND.
I wondered if anyone could help me out here and see what adjustment I need to make to my rules and grammar to solve this issue.
Many thanks in advance.
Kevin
PS. As a side issue I would rather have C$ than $C but early days...
********RULES************
VARNAME : '$'('A'..'Z')*
;
CONCAT : '+'
;
STMTEND : SEMICOLON NEWLINE* | NEWLINE+
;
STRING : SQUOTED_STRING (CONCAT SQUOTED_STRING | CONCAT VARNAME)*
| DQUOTED_STRING (CONCAT DQUOTED_STRING | CONCAT VARNAME)*
;
fragment SQUOTED_STRING : '\'' (~['])* '\''
;
fragment DQUOTED_STRING
: '"' ( ESC_SEQ| ~('\\'|'"') )* '"'
;
fragment ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment HEX_DIGIT : '0x' ('0'..'9' | 'a'..'f' | 'A'..'F')+
;
fragment UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
SEMICOLON : ';'
;
NEWLINE : '\r'?'\n'
************GRAMMAR************
print_command
: PRINT STRING STMTEND #printCommandLabel
;
assignment
: VARNAME EQUALS INTEGER STMTEND #assignInteger
| VARNAME EQUALS STRING STMTEND #assignString
;

You shouldn't try to create concat-expressions inside your lexer: that is the responsibility of the parser. Something like this should do it:
print_command
: PRINT STRING STMTEND #printCommandLabel
;
assignment
: VARNAME EQUALS expression STMTEND
;
expression
: expression CONCAT expression
| INTEGER
| STRING
| VARNAME
;
CONCAT
: '+'
;
VARNAME
: '$'('A'..'Z')*
;
STMTEND
: SEMICOLON NEWLINE*
| NEWLINE+
;
STRING
: SQUOTED_STRING
| DQUOTED_STRING
;
fragment SQUOTED_STRING
: '\'' (~['])* '\''
;
fragment DQUOTED_STRING
: '"' ( ESC_SEQ| ~('\\'|'"') )* '"'
;
fragment ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment HEX_DIGIT : '0x' ('0'..'9' | 'a'..'f' | 'A'..'F')+;
fragment UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;
fragment SEMICOLON : ';';
fragment NEWLINE : '\r'?'\n';

Related

Why is ending parenthesis not recognizes as valid in my antlr4 grammar?

I am writing a DSL with ANTLR4, now I have a problem with ending right parenthesis. Why is this command not valid
This is the command:
set(buffer,variableX|"foo");
the parse tree with the error
Here is my grammar
grammar Expr;
prog: expr+ EOF;
expr:
statement #StatementExpr
|NOT expr #NotExpr
| expr AND expr #AndExpr
| expr (OR | XOR) expr #OrExpr
| function #FunctionExpr
| LPAREN expr RPAREN #ParenExpr
| writeCommand #WriteExpr
;
writeCommand: setCommand | setIfCommand;
statement: ID '=' getCommand NEWLINE #Assign;
setCommand: 'set' LPAREN variableType '|' parameter RPAREN SEMI;
setIfCommand: 'setIf' LPAREN variableType '|' expr '?' parameter ':' parameter RPAREN SEMI;
getCommand: getFieldValue | getInstanceAttribValue|getInstanceAttribValue|getFormAttribValue|getMandatorAttribValue;
getFieldValue: 'getFieldValue' LPAREN instanceID=ID COMMA fieldname=ID RPAREN;
getInstanceAttribValue: 'getInstanceAttrib' LPAREN instanceId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
getFormAttribValue: 'getFormAttrib' LPAREN formId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
getMandatorAttribValue: 'getMandatorAttrib' LPAREN mandator=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
twoParameterList: parameter '|' parameter;
parameter:variableType | constType;
pdixFuncton:ID;
constType:
ID
| '"' ID '"';
variableType:
valueType
|instanceType
|formType
|bufferType
|instanceAttribType
|formAttribType
|mandatorAttribType
;
valueType:'value' COMMA parameter (COMMA functionParameter)?;
instanceType: 'instance' COMMA instanceParameter;
formType: 'form' COMMA formParameter;
bufferType: 'buffer' COMMA ID '|' parameter;
instanceParameter: 'instanceId'
| 'instanceKey'
| 'firstpenId'
| 'lastpenId'
| 'lastUpdate'
| 'started'
;
formParameter: 'formId'
|'formKey'
|'lastUpdate'
;
functionParameter: 'lastPen'
| 'fieldGroup'
| ' fieldType'
| 'fieldSource'
| 'updateId'
| 'sessionId'
| 'icrConfidence'
| 'icrRecognition'
| 'lastUpdate';
instanceAttribType:('instattrib' | 'instanceattrib') COMMA attributeType;
formAttribType:'formattrib' COMMA attributeType;
mandatorAttribType: 'mandatorattrib' COMMA attributeType;
attributeType:ID '#' ID;
function:
commandIsSet #IsSet
| commandIsEmpty #IsEmpty
| commandIsEqual #IsEqual
;
commandIsSet: IS_SET LPAREN parameter RPAREN;
commandIsEmpty: IS_EMPTY LPAREN parameter RPAREN;
commandIsEqual: IS_EQUAL LPAREN twoParameterList RPAREN;
LPAREN : '(';
RPAREN : ')';
LBRACE : '{';
RBRACE : '}';
LBRACK : '[';
RBRACK : ']';
SEMI : ';';
COMMA : ',';
DOT : '.';
ASSIGN : '=';
GT : '>';
LT : '<';
BANG : '!';
TILDE : '~';
QUESTION : '?';
COLON : ':';
EQUAL : '==';
LE : '<=';
GE : '>=';
NOTEQUAL : '!=';
AND : 'and';
OR : 'or';
XOR :'xor';
NOT :'not' ;
INC : '++';
DEC : '--';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
INT: [0-9]+;
NEWLINE: '\r'? '\n';
IS_SET:'isSet';
IS_EMPTY:'isEmpty';
IS_EQUAL:'isEqual';
WS: (' '|'\t' | '\n' | '\r' )+ -> skip;
ID
: JavaLetter JavaLetterOrDigit*
;
fragment
JavaLetter
: [a-zA-Z$_] // these are the "java letters" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
JavaLetterOrDigit
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment DoubleQuote: '"' ; // Hard to read otherwise.
The input set(buffer,variableX|"foo"); cannot be parsed by:
setCommand: 'set' LPAREN variableType '|' parameter RPAREN SEMI;
because buffer,variableX|"foo" is being matched by:
bufferType: 'buffer' COMMA ID '|' parameter;
which causes '|' parameter from the setCommand to not be able to match anything. If the input was something like set(buffer,variableX|"foo"|"bar");, it would parse successfully. Or remove '|' parameter from either the setCommand rule or from the bufferType rule.

How to parse statement in order of desired precedence using antlr?

I have an RSQL grammar defined:
grammar Rsql;
statement
: L_PAREN wrapped=statement R_PAREN
| left=statement op=( AND_OPERATOR | OR_OPERATOR ) right=statement
| node=comparison
;
comparison
: single_comparison
| multi_comparison
| bool_comparison
;
single_comparison
: key=IDENTIFIER op=( EQ | NE | GT | GTE | LT | LTE ) value=single_value
;
multi_comparison
: key=IDENTIFIER op=( IN | NIN ) value=multi_value
;
bool_comparison
: key=IDENTIFIER op=EX value=boolean_value
;
boolean_value
: BOOLEAN
;
single_value
: boolean_value
| ( STRING_LITERAL | IDENTIFIER )
| NUMERIC_LITERAL
;
multi_value
: L_PAREN single_value ( COMMA single_value )* R_PAREN
| single_value
;
TRUE: 'true';
FALSE: 'false';
AND_OPERATOR: ';';
OR_OPERATOR: ',';
L_PAREN: '(';
R_PAREN: ')';
COMMA: ',';
EQ: '==';
NE: '!=';
IN: '=in=';
NIN: '=out=';
GT: '=gt=';
LT: '=lt=';
GTE: '=ge=';
LTE: '=le=';
EX: '=ex=';
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
BOOLEAN
: TRUE
| FALSE
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( [-+]? DIGIT+ )?
| '.' DIGIT+ ( [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( STRING_ESCAPE_SEQ | ~[\\\r\n'] )* '\''
| '"' ( STRING_ESCAPE_SEQ | ~[\\\r\n"] )* '"'
;
STRING_ESCAPE_SEQ
: '\\' .
;
fragment DIGIT : [0-9];
No matter how I attempt to parse this (listener/visitor), the statements with parenthesis always get evaluated in order. It is my understanding that the order in the rule would be the precedence. However, the parse tree for a statement like "name==foo,(name==bar;age=gt=35)" is always
no matter where the parenthesis appear. Please help me discover what I'm missing. Thanks!

how can I refactor this ANTLR4 grammar so that it isn't mutually left recursive?

I can't seem to figure out why this grammar won't compile. It compiled fine until I modified line 145 from
(Identifier '.')* functionCall
to
(primary '.')? functionCall
I've been trying to figure out how to solve this issue for a while but I can't seem to be able to. Here's the error:
The following sets of rules are mutually left-recursive [primary]
grammar Tadpole;
#header
{package net.tadpole.compiler.parser;}
file
: fileContents*
;
fileContents
: structDec
| functionDec
| statement
| importDec
;
importDec
: 'import' Identifier ';'
;
literal
: IntegerLiteral
| FloatingPointLiteral
| BooleanLiteral
| CharacterLiteral
| StringLiteral
| NoneLiteral
| arrayLiteral
;
arrayLiteral
: '[' expressionList? ']'
;
expressionList
: expression (',' expression)*
;
expression
: primary
| unaryExpression
| <assoc=right> expression binaryOpPrec0 expression
| <assoc=left> expression binaryOpPrec1 expression
| <assoc=left> expression binaryOpPrec2 expression
| <assoc=left> expression binaryOpPrec3 expression
| <assoc=left> expression binaryOpPrec4 expression
| <assoc=left> expression binaryOpPrec5 expression
| <assoc=left> expression binaryOpPrec6 expression
| <assoc=left> expression binaryOpPrec7 expression
| <assoc=left> expression binaryOpPrec8 expression
| <assoc=left> expression binaryOpPrec9 expression
| <assoc=left> expression binaryOpPrec10 expression
| <assoc=right> expression binaryOpPrec11 expression
;
unaryExpression
: unaryOp expression
| prefixPostfixOp primary
| primary prefixPostfixOp
;
unaryOp
: '+'
| '-'
| '!'
| '~'
;
prefixPostfixOp
: '++'
| '--'
;
binaryOpPrec0
: '**'
;
binaryOpPrec1
: '*'
| '/'
| '%'
;
binaryOpPrec2
: '+'
| '-'
;
binaryOpPrec3
: '>>'
| '>>>'
| '<<'
;
binaryOpPrec4
: '<'
| '>'
| '<='
| '>='
| 'is'
;
binaryOpPrec5
: '=='
| '!='
;
binaryOpPrec6
: '&'
;
binaryOpPrec7
: '^'
;
binaryOpPrec8
: '|'
;
binaryOpPrec9
: '&&'
;
binaryOpPrec10
: '||'
;
binaryOpPrec11
: '='
| '**='
| '*='
| '/='
| '%='
| '+='
| '-='
| '&='
| '|='
| '^='
| '>>='
| '>>>='
| '<<='
| '<-'
;
primary
: literal
| fieldName
| '(' expression ')'
| '(' type ')' (primary | unaryExpression)
| 'new' objType '(' expressionList? ')'
| primary '.' fieldName
| primary dimension
| (primary '.')? functionCall
;
functionCall
: functionName '(' expressionList? ')'
;
functionName
: Identifier
;
dimension
: '[' expression ']'
;
statement
: '{' statement* '}'
| expression ';'
| 'recall' ';'
| 'return' expression? ';'
| variableDec
| 'if' '(' expression ')' statement ('else' statement)?
| 'while' '(' expression ')' statement
| 'do' expression 'while' '(' expression ')' ';'
| 'do' '{' statement* '}' 'while' '(' expression ')' ';'
;
structDec
: 'struct' structName ('(' parameterList ')')? '{' variableDec* functionDec* '}'
;
structName
: Identifier
;
fieldName
: Identifier
;
variableDec
: type fieldName ('=' expression)? ';'
;
type
: primitiveType ('[' ']')*
| objType ('[' ']')*
;
primitiveType
: 'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'boolean'
| 'float'
| 'double'
;
objType
: (Identifier '.')? structName
;
functionDec
: 'def' functionName '(' parameterList? ')' ':' type '->' functionBody
;
functionBody
: statement
;
parameterList
: parameter (',' parameter)*
;
parameter
: type fieldName
;
IntegerLiteral
: DecimalIntegerLiteral
| HexIntegerLiteral
| OctalIntegerLiteral
| BinaryIntegerLiteral
;
fragment
DecimalIntegerLiteral
: DecimalNumeral IntegerSuffix?
;
fragment
HexIntegerLiteral
: HexNumeral IntegerSuffix?
;
fragment
OctalIntegerLiteral
: OctalNumeral IntegerSuffix?
;
fragment
BinaryIntegerLiteral
: BinaryNumeral IntegerSuffix?
;
fragment
IntegerSuffix
: [lL]
;
fragment
DecimalNumeral
: Digit (Digits? | Underscores Digits)
;
fragment
Digits
: Digit (DigitsAndUnderscores? Digit)?
;
fragment
Digit
: [0-9]
;
fragment
DigitsAndUnderscores
: DigitOrUnderscore+
;
fragment
DigitOrUnderscore
: Digit
| '_'
;
fragment
Underscores
: '_'+
;
fragment
HexNumeral
: '0' [xX] HexDigits
;
fragment
HexDigits
: HexDigit (HexDigitsAndUnderscores? HexDigit)?
;
fragment
HexDigit
: [0-9a-fA-F]
;
fragment
HexDigitsAndUnderscores
: HexDigitOrUnderscore+
;
fragment
HexDigitOrUnderscore
: HexDigit
| '_'
;
fragment
OctalNumeral
: '0' [oO] Underscores? OctalDigits
;
fragment
OctalDigits
: OctalDigit (OctalDigitsAndUnderscores? OctalDigit)?
;
fragment
OctalDigit
: [0-7]
;
fragment
OctalDigitsAndUnderscores
: OctalDigitOrUnderscore+
;
fragment
OctalDigitOrUnderscore
: OctalDigit
| '_'
;
fragment
BinaryNumeral
: '0' [bB] BinaryDigits
;
fragment
BinaryDigits
: BinaryDigit (BinaryDigitsAndUnderscores? BinaryDigit)?
;
fragment
BinaryDigit
: [01]
;
fragment
BinaryDigitsAndUnderscores
: BinaryDigitOrUnderscore+
;
fragment
BinaryDigitOrUnderscore
: BinaryDigit
| '_'
;
// §3.10.2 Floating-Point Literals
FloatingPointLiteral
: DecimalFloatingPointLiteral FloatingPointSuffix?
| HexadecimalFloatingPointLiteral FloatingPointSuffix?
;
fragment
FloatingPointSuffix
: [fFdD]
;
fragment
DecimalFloatingPointLiteral
: Digits '.' Digits? ExponentPart?
| '.' Digits ExponentPart?
| Digits ExponentPart
| Digits
;
fragment
ExponentPart
: ExponentIndicator SignedInteger
;
fragment
ExponentIndicator
: [eE]
;
fragment
SignedInteger
: Sign? Digits
;
fragment
Sign
: [+-]
;
fragment
HexadecimalFloatingPointLiteral
: HexSignificand BinaryExponent
;
fragment
HexSignificand
: HexNumeral '.'?
| '0' [xX] HexDigits? '.' HexDigits
;
fragment
BinaryExponent
: BinaryExponentIndicator SignedInteger
;
fragment
BinaryExponentIndicator
: [pP]
;
BooleanLiteral
: 'true'
| 'false'
;
CharacterLiteral
: '\'' SingleCharacter '\''
| '\'' EscapeSequence '\''
;
fragment
SingleCharacter
: ~['\\]
;
StringLiteral
: '"' StringCharacters? '"'
;
fragment
StringCharacters
: StringCharacter+
;
fragment
StringCharacter
: ~["\\]
| EscapeSequence
;
fragment
EscapeSequence
: '\\' [btnfr"'\\]
| OctalEscape
| UnicodeEscape
;
fragment
OctalEscape
: '\\' OctalDigit
| '\\' OctalDigit OctalDigit
| '\\' ZeroToThree OctalDigit OctalDigit
;
fragment
ZeroToThree
: [0-3]
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
NoneLiteral
: 'nil'
;
Identifier
: IdentifierStartChar IdentifierChar*
;
fragment
IdentifierStartChar
: [a-zA-Z$_] // these are the "java letters" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
IdentifierChar
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
| // covers all characters above 0xFF which are not a surrogate
~[\u0000-\u00FF\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
WS : [ \t\r\n\u000C]+ -> skip
;
LINE_COMMENT
: '#' ~[\r\n]* -> skip
;
The left recursive invocation needs to be the first, so no parenthesis can be placed before it.
You can rewrite it like this:
primary
: literal
| fieldName
| '(' expression ')'
| '(' type ')' (primary | unaryExpression)
| 'new' objType '(' expressionList? ')'
| primary '.' fieldName
| primary dimension
| primary '.' functionCall
| functionCall
;
which is equivalent.

Antlr - not able to use Lexer token if it is assigned to another token in grammar

"In this example if i use 'MID('int = VALUE')' then it works fine. I want MID to be validated for INT value but when i use INT it gives error "mismatched input '9' expecting INT.
I am using antlr-4.2-complete version of antlr.
I am not able to understand the exact issue?
grammar DIExpression;
r: 'MID('int_val = INT')'
{
System.out.println("value equals: "+ $int_val.text);
};
VALUE : INT | STRING;
STRING : [0-9a-zA-Z_]+;
INT : [0-9]+;
WS : [ \t\r\n]+ -> skip ;
UPDATE:
I am giving input like MID(9)
The issue is that your rules are ambiguous. What should '9' be? It could be a STRING or and INT. I would highly recommend to use the predefined literals for STRING, WS, COMMENT and NEWLINE provided by the antlr community.
Be aware, this is antlr3 code! As you can see a String is sth in quotes (I guess that´s also what you want)
INT : '0'..'9'+
;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;

Antlr String can not match

I'm working on a little antlr problem. In my small custom DSL I want to be able to do a compare action between fields. I've got three fieldtypes (String, Int, Identifier) the Identifier is a variable name. I made a big specification but i've reduced my problem to a smaller grammer.
The problem is that when I try to use the String grammar notation, which you can add to your grammer using antlrworks, my Strings are seen as an identifier. This is my grammar:
grammar test;
x
: 'FROM' field_value EOF
;
field_value
: STRING
| INT
| identifier
;
identifier
: ID (('.' '`' ID '`')|('.' ID))?
| '`' ID '`' (('.' '`' ID '`')|('.' ID))?
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INT : '0'..'9'+
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
When I try to parse the following string FROM "Hello!" it returns a parsetree like this
<grammar test>
|
x
|
----------------------------
| | |
FROM field_value !
|
identifier
|
"Hello
It parses what I think should be a string to an identifier thought my identifier doesn't say anything about double quoets so it shouldn't match.
Besides I think my definition for a string is wrong, even though antlrworks generated it for me. Does anybody know why this happens?
Cheers!
There's nothing wrong with your grammar. The things that is messing it up for you is most probably the fact that you're using ANTLRWorks' interpreter. Don't. The interpreter doesn't work well.
Use ANTLRWorks' debugger instead (in your grammar, press CTRL + D), which works like a charm. This is what the debugger shows after parsing FROM "Hello!":

Resources