How to evaluate a DSL with PARSE in Rebol? - dsl

As I learnt some DSL, I realized that the Parse dialect in Rebol can be a great lexer and parser. There is a good example from the Parse tutorial:
expr: [term ["+" | "-"] expr | term]
term: [factor ["*" | "/"] term | factor]
factor: [primary "**" factor | primary]
primary: [some digit | "(" expr ")"]
digit: charset "0123456789"
probe parse "4/5+3**2-(5*6+1)" expr
;will output true
The code above verifies if an expression conforms to the "grammar" defined above. My question is:
how to compute or evaluate it?
how to denote the prior of operators such as "*" and "+"?

A modified version of the code by Boyko Bantchev:
arith-eval: funct [
a-exp [string!]
] [
op-stack: copy []
num-stack: copy []
pop: func [stk /local t] [t: last stk remove back tail stk t]
do.op: func [/local op x y] [
op: to-word pop op-stack
y: pop num-stack
x: pop num-stack
append num-stack do reduce [x op y]
]
num: op: none
expr: [term any [copy op [{+} | {-}] (append op-stack op) term (do.op)]]
term: [prim any [copy op [{*} | {/}] (append op-stack op) prim (do.op)]]
prim: [copy num some digit (append num-stack to-decimal num) | {(} expr {)}]
digit: charset {0123456789}
; whitespace is not allowed between tokens
either parse a-exp expr [num-stack/1] [{wrong expression}]
]
>> arith-eval "12+34*56-89/2"
== 1871.5
>>

1, You generate either a string or better a block e.g. with collect, which you can evaluate with do.
2, There is an old example dialect from Gabriele Santilli on rebol.org with operator precedence.

See below for good ordering parse rules.
REBOL []
math-expression?: func
[ {Returns a block of rebol code for a given mathematical expression string,
or none if mathematical expression is not correct.
>> e: math-expression? "1-2**3*6"
== [subtract 1.0 multiply power 2.0 3.0 6.0]
>> do e
== -47.0
>> x: 0
== 0
>> e: math-expression? "sqrt(1+1)/(1-x)**3"
== [divide square-root add 1.0 1.0 power subtract 1.0 x 3.0]
>> do e
== 1.4142135623731
}
Amath-expression {The mathematical expression to convert to REBOL code.}
/trace 'Rtrace {Set Rtrace from the syntax error to the mathematical expression end.}
/local
exprs fx expr text
digit number parameter primary factor term expression
]
[ exprs: copy []
fx: copy []
append/only exprs expr: copy []
digit: charset "0123456789"
number: [some digit]
parameter: charset "abcdefghijklmnopqrstuvwxyz"
primary:
[ opt
[ trace: "abs" (fx: copy [abs])
| trace: "acos" (fx: copy [arccosine/radians])
| trace: "arccos" (fx: copy [arccosine/radians])
| trace: "arcsin" (fx: copy [arcsine/radians])
| trace: "arctan" (fx: copy [arctangent/radians])
| trace: "asin" (fx: copy [arcsine/radians])
| trace: "atan" (fx: copy [arctangent/radians])
| trace: "cos" (fx: copy [cosine/radians])
| trace: "exp" (fx: copy [exp])
| trace: "ln" (fx: copy [log-e])
| trace: "log2" (fx: copy [log-2])
| trace: "log10" (fx: copy [log-10])
| trace: "sin" (fx: copy [sine/radians])
| trace: "sqrt" (fx: copy [square-root])
| trace: "tan" (fx: copy [tangent/radians])
]
trace: "("
( append/only expr fx
fx: copy []
append/only exprs expr: copy []
)
expression
trace: ")"
( insert head expr last second-to-last exprs
remove back tail second-to-last exprs
temp: tail second-to-last exprs
expr: append second-to-last exprs head expr
remove back tail exprs
expr: temp
)
| trace: "+" expression
| trace: "-" (append expr 'negate) expression
| trace: copy text number (append expr to decimal! trim/all text) ; (probe head exprs)
| trace: copy text parameter (append expr to word! trim/all text)
]
factor:
[ primary any
[ [ trace: "**" (insert expr 'power)
| trace: "^^" (insert expr 'power)
| trace: "//" (insert expr 'remainder)
]
primary
]
]
term:
[ factor any
[ [ trace: "*" (insert expr 'multiply)
| trace: "/" (insert expr 'divide)
]
factor
]
]
expression:
[ term any
[ [ trace: "+" (expr: tail insert head expr 'add)
| trace: "-" (expr: tail insert head expr 'subtract)
]
term
]
]
either parse Amath-expression expression
[ first exprs
]
[ if Rtrace [set Rtrace copy back trace]
none
]
]
a: b: i: x: y: 0
foreach e
[ "1 * (2 * (3 - sqrt(4 * sqrt(16)) + 5))"
"-1"
"+1"
"1 + 2 + 3"
"1 + 2 * 3"
"1 * 2 + 4 * 5"
"(1 + 2) * 3"
"-((1 + 2) * 3)"
"3 * (1 + 2)"
"SQRT(1-x**2)/(1-x)**3"
"3*(3+x)**sin(2-i)+cos((3*y)**3*(a-b))+1"
"+(1+-x)**-sqrt(-1--x)"
"3-2+1"
"4/2*2"
"2**3*6"
"1-2**3*6"
"4/5+3**2-(5*6+1)"
"sqrt(3*3)"
"1-x**3"
"(1-x)**3"
"(1+1)/(1-x)**3"
"sqrt(1+1)/(1-x)**3"
"-(1+1)/(1-x)**3"
"+(1+2)/-(1-2)"
"1 - 2"
"-(1 + 2)"
"-1/2"
"(-1)/2"
"-(1)/2"
]
[ print ""
print e
probe e: math-expression? e
prin "= "probe either error? err: try [e: first reduce e] [disarm err] [e]
; break
]
print ""
probe m: "2*(1*1+1))"
probe math-expression? m
probe math-expression?/trace m t: copy ""
probe t
probe m: "1+abs(-1"
probe math-expression? m
probe math-expression?/trace m t: copy ""
probe t
probe m: "1+2+"
probe math-expression? m
probe math-expression?/trace m t: copy ""
probe t

Related

JQ hex string conversion to ASCII

I end up in jq with a hex string that I want to convert to ASCII, within jq. Doing it outside would involve passing multiple conditions too, which over complicates matters and would really slow down the solution.
To be clear: it concerns a string translation like:
"0x4162634b6c6d" -> "AbcKlm"
stripping of the "0x" is easy ( .[2:] ) and I got the equivalent in a bash function:
function h2a ()
{
while read s; do
n=0;
while [[ "$n" -lt ${#s} ]]; do
h="${s:$n:2}";
printf "\x$h";
n="$(($n+2))";
done;
done
}
but I would really want to do this in native jq. I found Rosetta JQ, but am unable to convert.
Thanks for the help!
Edit: making progress, found how to access substrings
Now, how do I convert and iterate?
You could use a function like this:
def decode_hex:
("0123456789abcdef"|split("")|with_entries({key:.value, value:.key})) as $hex_map |
def decode_nybble: $hex_map[ascii_downcase];
def decode_byte: (.[0:1]|decode_nybble * 16) + (.[1:2]|decode_nybble);
def pairs: range(0;length;2) as $i | .[$i:$i+2];
[pairs|decode_byte] | implode;
Then to use it, strip out any non-hex characters and pass the string in.
.[2:] | decode_hex
Interesting side notes, strings are handled very differently compared to arrays to my surprise. You cannot index into them directly or perform other similar array-like operations on them. You can see how awkward it can be by looking at how $hex_map and decode_byte was defined above.
If an efficient solution is needed, then piggy-backing off #JeffMercado's answer:
def decode_hex:
def decode: if . < 58 then .-48 elif . < 91 then .-55 else .-87 end;
def decode_byte: map(decode) | (.[0] * 16) + .[1];
def pairs: explode | range(0;length;2) as $i | [.[$i], .[$i+1]];
[pairs|decode_byte] | implode;
Further extending #peak's answer and adding #stackprotector suggestion to filter out non-printable characters:
def decode_hex:
def decode: if . < 58 then .-48 elif . < 91 then .-55 else .-87 end;
def decode_byte: map(decode) | (.[0] * 16) + .[1];
def pairs: explode | range(0;length;2) as $i | [.[$i], .[$i+1]];
def filter: if (. < 32 or (. >= 128 and . <= 159)) then 46 else . end;
[pairs|decode_byte|filter] | implode;
$ cat q6.jq
{ "b":
[(
{ "a": (split("")) }
| .a[]
| gsub ("a"; "10"; "i") | gsub ("b"; "11"; "i") | gsub ("c"; "12"; "i") | gsub ("d"; "13"; "i") | gsub ("e"; "14"; "i") | gsub ("f"; "15"; "i")
)
]
}
| { "a": [ .b as $m | range(0; $m | length; 2) | { "q" : [ ($m[.]|tonumber), ($m[(. + 1)]|tonumber) ] } ] } |
[ (.a[].q) as $b | (($b[0]) * 16) as $d | ($b[1]) as $e | ($d+$e) ] | implode
$ echo '"4162634b6c6d"' | jq -f q6.jq
"AbcKlm"
$
Yes, I know, it's ugly, twice the size of the bash function and it works. If you can improve: please show us.

Antlr4 expressions are chaining

from what i have read the way i defined my 'expression' this should provide me with the following:
This is my input:
xyz = a + b + c + d
This should be my output:
xyz = ( ( a + b ) + ( c + d) )
But instead i get:
xyz = ( a + (b + (c + d) ) )
I bet this has been solved before and i just wasnt able to find the solution.
statementList : s=statement sl=statementList #multipleStatementList
| s=statement #singleStatementList
;
statement : statementAssign
| statementIf
;
statementAssign : var=VAR ASSIGN expr=expression #overwriteStatementAssign
| var=VAR PLUS ASSIGN expr=expression #addStatementAssign
| var=VAR MINUS ASSIGN expr=expression #subStatementAssign
;
;
expression : BRACKET_OPEN expr=expression BRACKET_CLOSE #priorityExp
| left=expression operand=('*'|'/') right=expression #mulDivExp
| left=expression operand=('+'|'-') right=expression #addSubExp
| <assoc=right> left=expression POWER right=expression #powExp
| variable=VAR #varExp
| number=NUMBER #numExp
;
BRACKET_OPEN : '(' ;
BRACKET_CLOSE : ')' ;
ASTERISK : '*' ;
SLASH : '/' ;
PLUS : '+' ;
MINUS : '-' ;
POWER : '^' ;
MODULO : '%' ;
ASSIGN : '=' ;
NUMBER : [0-9]+ ;
VAR : [a-z][a-zA-Z0-9\-]* ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
If you want the expression
xyz = ( ( a + b ) + ( c + d) )
Then you'll need to control the binding of the + operator with parentheses just as you've done in your expected output. Otherwise, with
xyz = ( a + (b + (c + d) ) )
is the way the parser is going to parse it because all the + operators have the same precedence, and the parser continues parsing until it reaches the end of the expression.
It recursively applies
left=expression operand=('+'|'-') right=expression
until the expression is completed.
and you get the grouping you got. So use those parentheses, that's what they're for if you want to force the order of expression evaluation. ;)
If you change your input to
xyz = a * b + c + d
you'll see what I mean about the precedence, because the multiplication rule appears before the addition rule -- and hence binds earlier -- which is a mathematical convention (lacking parentheses to group terms.)
You're doing it right and the parser is too. Just group what you want if you want a specific binding order.

How to do Priority of Operations (+ * - /) in my grammars?

I define my own grammars using antlr 4 and I want to build tree true According to Priority of Operations (+ * - /) ....
I find sample on do Priority of Operations (* +) it work fine ...
I try to edit it to add the Priority of Operations (- /) but I failed :(
the grammars for Priority of Operations (+ *) is :
println:PRINTLN expression SEMICOLON {System.out.println($expression.value);};
expression returns [Object value]:
t1=factor {$value=(int)$t1.value;}
(PLUS t2=factor{$value=(int)$value+(int)$t2.value;})*;
factor returns [Object value]: t1=term {$value=(int)$t1.value;}
(MULT t2=term{$value=(int)$value*(int)$t2.value;})*;
term returns [Object value]:
NUMBER {$value=Integer.parseInt($NUMBER.text);}
| ID {$value=symbolTable.get($value=$ID.text);}
| PAR_OPEN expression {$value=$expression.value;} PAR_CLOSE
;
MULT :'*';
PLUS :'+';
MINUS:'-';
DIV:'/' ;
How I can add to them the Priority of Operations (- /) ?
In ANTLR3 (and ANTLR4) * and / can be given a higher precedence than + and - like this:
println
: PRINTLN expression SEMICOLON
;
expression
: factor ( PLUS factor
| MINUS factor
)*
;
factor
: term ( MULT term
| DIV term
)*
;
term
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
;
But in ANTLR4, this will also work:
println
: PRINTLN expression SEMICOLON
;
expression
: NUMBER
| ID
| PAR_OPEN expression PAR_CLOSE
| expression ( MULT | DIV ) expression // higher precedence
| expression ( PLUS | MINUS ) expression // lower precedence
;
You normally solve this by defining expression, term, and factor production rules. Here's a grammar (specified in EBNF) that implements unary + and unary -, along with the 4 binary arithmetic operators, plus parentheses:
start ::= expression
expression ::= term (('+' term) | ('-' term))*
term ::= factor (('*' factor) | ('/' factor))*
factor :: = (number | group | '-' factor | '+' factor)
group ::= '(' expression ')'
where number is a numeric literal.

ANTLR: VERY slow parsing

I have successfully split my expressions into arithmetic and boolean expressions like this:
/* entry point */
parse: formula EOF;
formula : (expr|boolExpr);
/* boolean expressions : take math expr and use boolean operators on them */
boolExpr
: bool
| l=expr operator=(GT|LT|GEQ|LEQ) r=expr
| l=boolExpr operator=(OR|AND) r=boolExpr
| l=expr (not='!')? EQUALS r=expr
| l=expr BETWEEN low=expr AND high=expr
| l=expr IS (NOT)? NULL
| l=atom LIKE regexp=string
| l=atom ('IN'|'in') '(' string (',' string)* ')'
| '(' boolExpr ')'
;
/* arithmetic expressions */
expr
: atom
| (PLUS|MINUS) expr
| l=expr operator=(MULT|DIV) r=expr
| l=expr operator=(PLUS|MINUS) r=expr
| function=IDENTIFIER '(' (expr ( ',' expr )* ) ? ')'
| '(' expr ')'
;
atom
: number
| variable
| string
;
But now I have HUGE performance problems. Some formulas I try to parse are utterly slow, to the point that it has become unbearable: more than an hour (I stopped at that point) to parse this:
-4.77+[V1]*-0.0071+[V1]*[V1]*0+[V2]*-0.0194+[V2]*[V2]*0+[V3]*-0.00447932+[V3]*[V3]*-0.0017+[V4]*-0.00003298+[V4]*[V4]*0.0017+[V5]*-0.0035+[V5]*[V5]*0+[V6]*-4.19793004+[V6]*[V6]*1.5962+[V7]*12.51966636+[V7]*[V7]*-5.7058+[V8]*-19.06596752+[V8]*[V8]*28.6281+[V9]*9.47136506+[V9]*[V9]*-33.0993+[V10]*0.001+[V10]*[V10]*0+[V11]*-0.15397774+[V11]*[V11]*-0.0021+[V12]*-0.027+[V12]*[V12]*0+[V13]*-2.02963068+[V13]*[V13]*0.1683+[V14]*24.6268688+[V14]*[V14]*-5.1685+[V15]*-6.17590512+[V15]*[V15]*1.2936+[V16]*2.03846688+[V16]*[V16]*-0.1427+[V17]*9.02302288+[V17]*[V17]*-1.8223+[V18]*1.7471106+[V18]*[V18]*-0.1255+[V19]*-30.00770912+[V19]*[V19]*6.7738
Do you have any idea on what the problem is?
The parsing stops when the parser enters the formula grammar rule.
edit Original problem here:
My grammar allows this:
// ( 1 LESS_EQUALS 2 )
1 <= 2
But the way I expressed it in my G4 file makes it also accept this:
// ( ( 1 LESS_EQUALS 2 ) LESS_EQUALS 3 )
1 <= 2 <= 3
Which I don't want.
My grammar contains this:
expr
: atom # atomArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=('*'|'/') r=expr # multdivArithmeticExpr
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| l=expr operator=('>'|'<'|'>='|'<=') r=expr # comparisonExpr
[...]
How can I tell Antlr that this is not acceptable?
Just split root into two. Either rename root 'expr' into 'rootexpr', or vice versa.
rootExpr
: atom # atomArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=('*'|'/') r=expr # multdivArithmeticExpr
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| l=expr operator=('>'|'<'|'>='|'<=') r=expr # comparisonExpr
EDIT: You cannot have cyclic reference => expr node in expr rule.

Advice on handling an ambiguous operator in an ANTLR 4 grammar

I am writing an antlr grammar file for a dialect of basic. Most of it is either working or I have a good idea of what I need to do next. However, I am not at all sure what I should do with the '=' character which is used for both equality tests as well as assignment.
For example, this is a valid statement
t = (x = 5) And (y = 3)
This evaluates if x is EQUAL to 5, if y is EQUAL to 3 then performs a logical AND on those results and ASSIGNS the result to t.
My grammar will parse this; albeit incorrectly, but I think that will resolve itself once the ambiguity is resolved .
How do I differentiate between the two uses of the '=' character?
1) Should I remove the assignment rule from expression and handle these cases (assignment vs equality test) in my visitor and\or listener implementation during code generation
2) Is there a better way to define the grammar such that it is already sorted out
Would someone be able to simply point me in the right direction as to how best implement this language "feature"?
Also, I have been reading through the Definitive guide to ANTLR4 as well as Language Implementation Patterns looking for a solution to this. It may be there but I have not yet found it.
Below is the full parser grammar. The ASSIGN token is currently set to '='. EQUAL is set to '=='.
parser grammar wlParser;
options { tokenVocab=wlLexer; }
program
: multistatement (NEWLINE multistatement)* NEWLINE?
;
multistatement
: statement (COLON statement)*
;
statement
: declarationStat
| defTypeStat
| assignment
| expression
;
assignment
: lvalue op=ASSIGN expression
;
expression
: <assoc=right> left=expression op=CARAT right=expression #exponentiationExprStat
| (PLUS|MINUS) expression #signExprStat
| IDENTIFIER DATATYPESUFFIX? LPAREN expression RPAREN #arrayIndexExprStat
| left=expression op=(ASTERISK|FSLASH) right=expression #multDivExprStat
| left=expression op=BSLASH right=expression #integerDivExprStat
| left=expression op=KW_MOD right=expression #modulusDivExprStat
| left=expression op=(PLUS|MINUS) right=expression #addSubExprStat
| left=string op=AMPERSAND right=string #stringConcatenation
| left=expression op=(RELATIONALOPERATORS | KW_IS | KW_ISA) right=expression #relationalComparisonExprStat
| left=expression (op=LOGICALOPERATORS right=expression)+ #logicalOrAndExprStat
| op=KW_LIKE patternString #likeExprStat
| LPAREN expression RPAREN #groupingExprStat
| NUMBER #atom
| string #atom
| IDENTIFIER DATATYPESUFFIX? #atom
;
lvalue
: (IDENTIFIER DATATYPESUFFIX?) | (IDENTIFIER DATATYPESUFFIX? LPAREN expression RPAREN)
;
string
: STRING
;
patternString
: DQUOT (QUESTIONMARK | POUND | ASTERISK | LBRACKET BANG? .*? RBRACKET)+ DQUOT
;
referenceType
: DATATYPE
;
declarationStat
: constDecl
| varDecl
;
constDecl
: CONSTDECL? KW_CONST IDENTIFIER EQUAL expression
;
varDecl
: VARDECL (varDeclPart (COMMA varDeclPart)*)? | listDeclPart
;
varDeclPart
: IDENTIFIER DATATYPESUFFIX? ((arrayBounds)? KW_AS DATATYPE (COMMA DATATYPE)*)?
;
listDeclPart
: IDENTIFIER DATATYPESUFFIX? KW_LIST KW_AS DATATYPE
;
arrayBounds
: LPAREN (arrayDimension (COMMA arrayDimension)*)? RPAREN
;
arrayDimension
: INTEGER (KW_TO INTEGER)?
;
defTypeStat
: DEFTYPES DEFTYPERANGE (COMMA DEFTYPERANGE)*
;
This is the lexer grammar.
lexer grammar wlLexer;
NUMBER
: INTEGER
| REAL
| BINARY
| OCTAL
| HEXIDECIMAL
;
RELATIONALOPERATORS
: EQUAL
| NEQUAL
| LT
| LTE
| GT
| GTE
;
LOGICALOPERATORS
: KW_OR
| KW_XOR
| KW_AND
| KW_NOT
| KW_IMP
| KW_EQV
;
INSTANCEOF
: KW_IS
| KW_ISA
;
CONSTDECL
: KW_PUBLIC
| KW_PRIVATE
;
DATATYPE
: KW_BOOLEAN
| KW_BYTE
| KW_INTEGER
| KW_LONG
| KW_SINGLE
| KW_DOUBLE
| KW_CURRENCY
| KW_STRING
;
VARDECL
: KW_DIM
| KW_STATIC
| KW_PUBLIC
| KW_PRIVATE
;
LABEL
: IDENTIFIER COLON
;
DEFTYPERANGE
: [a-zA-Z] MINUS [a-zA-Z]
;
DEFTYPES
: KW_DEFBOOL
| KW_DEFBYTE
| KW_DEFCUR
| KW_DEFDBL
| KW_DEFINT
| KW_DEFLNG
| KW_DEFSNG
| KW_DEFSTR
| KW_DEFVAR
;
DATATYPESUFFIX
: PERCENT
| AMPERSAND
| BANG
| POUND
| AT
| DOLLARSIGN
;
STRING
: (DQUOT (DQUOTESC|.)*? DQUOT)
| (LBRACE (RBRACEESC|.)*? RBRACE)
| (PIPE (PIPESC|.|NEWLINE)*? PIPE)
;
fragment DQUOTESC: '\"\"' ;
fragment RBRACEESC: '}}' ;
fragment PIPESC: '||' ;
INTEGER
: DIGIT+ (E (PLUS|MINUS)? DIGIT+)?
;
REAL
: DIGIT+ PERIOD DIGIT+ (E (PLUS|MINUS)? DIGIT+)?
;
BINARY
: AMPERSAND B BINARYDIGIT+
;
OCTAL
: AMPERSAND O OCTALDIGIT+
;
HEXIDECIMAL
: AMPERSAND H HEXDIGIT+
;
QUESTIONMARK: '?' ;
COLON: ':' ;
ASSIGN: '=';
SEMICOLON: ';' ;
AT: '#' ;
LPAREN: '(' ;
RPAREN: ')' ;
DQUOT: '"' ;
LBRACE: '{' ;
RBRACE: '}' ;
LBRACKET: '[' ;
RBRACKET: ']' ;
CARAT: '^' ;
PLUS: '+' ;
MINUS: '-' ;
ASTERISK: '*' ;
FSLASH: '/' ;
BSLASH: '\\' ;
AMPERSAND: '&' ;
BANG: '!' ;
POUND: '#' ;
DOLLARSIGN: '$' ;
PERCENT: '%' ;
COMMA: ',' ;
APOSTROPHE: '\'' ;
TWOPERIODS: '..' ;
PERIOD: '.' ;
UNDERSCORE: '_' ;
PIPE: '|' ;
NEWLINE: '\r\n' | '\r' | '\n';
EQUAL: '==' ;
NEQUAL: '<>' | '><' ;
LT: '<' ;
LTE: '<=' | '=<';
GT: '>' ;
GTE: '=<'|'<=' ;
KW_AND: A N D ;
KW_BINARY: B I N A R Y ;
KW_BOOLEAN: B O O L E A N ;
KW_BYTE: B Y T E ;
KW_DATATYPE: D A T A T Y P E ;
KW_DATE: D A T E ;
KW_INTEGER: I N T E G E R ;
KW_IS: I S ;
KW_ISA: I S A ;
KW_LIKE: L I K E ;
KW_LONG: L O N G ;
KW_MOD: M O D ;
KW_NOT: N O T ;
KW_TO: T O ;
KW_FALSE: F A L S E ;
KW_TRUE: T R U E ;
KW_SINGLE: S I N G L E ;
KW_DOUBLE: D O U B L E ;
KW_CURRENCY: C U R R E N C Y ;
KW_STRING: S T R I N G ;
fragment BINARYDIGIT: ('0'|'1') ;
fragment OCTALDIGIT: ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7') ;
fragment DIGIT: '0'..'9' ;
fragment HEXDIGIT: ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9' | A | B | C | D | E | F) ;
fragment A: ('a'|'A');
fragment B: ('b'|'B');
fragment C: ('c'|'C');
fragment D: ('d'|'D');
fragment E: ('e'|'E');
fragment F: ('f'|'F');
fragment G: ('g'|'G');
fragment H: ('h'|'H');
fragment I: ('i'|'I');
fragment J: ('j'|'J');
fragment K: ('k'|'K');
fragment L: ('l'|'L');
fragment M: ('m'|'M');
fragment N: ('n'|'N');
fragment O: ('o'|'O');
fragment P: ('p'|'P');
fragment Q: ('q'|'Q');
fragment R: ('r'|'R');
fragment S: ('s'|'S');
fragment T: ('t'|'T');
fragment U: ('u'|'U');
fragment V: ('v'|'V');
fragment W: ('w'|'W');
fragment X: ('x'|'X');
fragment Y: ('y'|'Y');
fragment Z: ('z'|'Z');
IDENTIFIER
: [a-zA-Z_][a-zA-Z0-9_~]*
;
LINE_ESCAPE
: (' ' | '\t') UNDERSCORE ('\r'? | '\n')
;
WS
: [ \t] -> skip
;
Take a look at this grammar (Note that this grammar is not supposed to be a grammar for BASIC, it's just an example to show how to disambiguate using "=" for both assignment and equality):
grammar Foo;
program:
(statement | exprOtherThanEquality)*
;
statement:
assignment
;
expr:
equality | exprOtherThanEquality
;
exprOtherThanEquality:
boolAndOr
;
boolAndOr:
atom (BOOL_OP expr)*
;
equality:
atom EQUAL expr
;
assignment:
VAR EQUAL expr ENDL
;
atom:
BOOL |
VAR |
INT |
group
;
group:
LEFT_PARENTH expr RGHT_PARENTH
;
ENDL : ';' ;
LEFT_PARENTH : '(' ;
RGHT_PARENTH : ')' ;
EQUAL : '=' ;
BOOL:
'true' | 'false'
;
BOOL_OP:
'and' | 'or'
;
VAR:
[A-Za-z_]+ [A-Za-z_0-9]*
;
INT:
'-'? [0-9]+
;
WS:
[ \t\r\n] -> skip
;
Here is the parse tree for the input: t = (x = 5) and (y = 2);
In one of the comments above, I asked you if we can assume that the first equal sign on a line always corresponds to an assignment. I retract that assumption slightly... The first equal sign on a line always corresponds to an assignment unless it is contained within parentheses. With the above grammar, this is a valid line: (x = 2). Here is the parse tree:

Resources