The python code generated by antlr-4.9 has some syntax problems. Eg, for the following antlr grammar:
e returns [ObjExpr v]
: a=e op=('*'|'/') b=e {
$v = ObjExpr($op.type)
$v.e1 = $a.v
$v.e2 = $b.v
}
| INT {
$v = ObjExpr(21)
$v.i = $INT.int
}
;
MUL : '*' ;
DIV : '/' ;
INT : [0-9]+ ;
NEWLINE:'\r'? '\n' ;
WS : [ \t]+ -> skip ;
The code generated is:
localctx.v = ObjExpr((0 if localctx.op is None else localctx.op.type())
localctx.v.e1 = localctx.a.v
localctx.v.e2 = localctx.b.v
Whereas, the correct code should be:
localctx.v = ObjExpr((0 if localctx.op is None else localctx.op.type))
localctx.v.e1 = localctx.a.v
localctx.v.e2 = localctx.b.v
i.e., the code indentation is wrong, and the number of braces dont match. Manually editing the generated parser file to fix these errors makes the code run properly. How do I report this bug and get it fixed?
Related
I am creating my own language with ANTLR 4 and I would like to create a rule to define variables with their types for example.
string = "string"
boolean = true
integer = 123
double = 12.3
string = string // reference to variable
Here is my grammar.
// lexer grammar
fragment LETTER : [A-Za-z];
fragment DIGIT : [0-9];
ID : LETTER+;
STRING : '"' ( ~ '"' )* '"' ;
BOOLEAN: ( 'true' | 'fase');
INTEGER: DIGIT+ ;
DOUBLE: DIGIT+ ('.' DIGIT+)*;
// parser grammar
program: main EOF;
main: study ;
study : studyBlock (assignVariableBlock)? ;
simpleAssign: name = ID '=' value = (STRING | BOOLEAN | INTEGER | BOOLEAN | ID);
listAssign: name = ID '=' value = listString #listStringAssign;
assign: simpleAssign #simpleVariableAssign
| listAssign #listOfVariableAssign
;
assignVariableBlock: assign+;
key: name = ID '[' value = STRING ']';
listString: '{' STRING (',' STRING)* '}';
studyParameters: (| ( simpleAssign (',' simpleAssign)*) );
studyBlock: 'study' '(' studyParameters ')' ;
When I test with this example ANTLR displays the following error
study(timestamp = "10:30", region = "region", businessDate="2020-03-05", processType="ID")
bool = true
region = "region"
region = region
line 4:7 no viable alternative at input 'bool=true'
line 6:9 no viable alternative at input 'region=region'
How can I fix that?.
When I test your grammar and start at the program rule for the given input, I get the following parse tree (without any errors or warnings):
You either don't start with the correct parser rule, or are testing an old parser and need to generate new classes from your grammar.
I need to replace two seperate strings in a text file and subsequently save the altered version as a new text file.
So far I have the following code:
fid = fopen('original_file.txt','rt') ;
X = fread(fid) ;
fclose(fid) ;
X = char(X.') ;
Y = strrep(X, 'results1.csv', 'results2.csv') ;
Z = strrep(X, 'plot1', 'plot2') ;
fid2 = fopen('new_file.txt','wt') ;
fwrite(fid2,Y) ;
fwrite(fid2,Z) ;
fclose (fid2) ;
The problem with this code is that it simply doubles the length of the text file - In other words the new_file.txt has twice as many lines as original_file.txt.
First the content is written with results1.csv changed to results2.csv then the same content is appended with plot1 changed to plot2.
Can someone point out what I'm missing here?
The problem is that you are creating two variables Y and Z and writing both variables to new_file.txt. To replace two separate strings, use the strrep function twice:
fid = fopen('original_file.txt','rt') ;
X = fread(fid) ;
fclose(fid) ;
X = char(X.') ;
Y = strrep(X, 'results1.csv', 'results2.csv') ;
Z = strrep(Y, 'plot1', 'plot2') ; % replace the second string, after the first replacement
fid2 = fopen('new_file.txt','wt') ;
fwrite(fid2,Z) ; % write just Z, with both replacements
fclose (fid2) ;
I'm trying to use a semantic predicate in the lexer to look ahead one token but somehow I can't get it right. Here's what I have:
lexer grammar
lexer grammar TLLexer;
DirStart
: { getCharPositionInLine() == 0 }? '#dir'
;
DirEnd
: { getCharPositionInLine() == 0 }? '#end'
;
Cont
: 'contents' [ \t]* -> mode(CNT)
;
WS
: [ \t]+ -> channel(HIDDEN)
;
NL
: '\r'? '\n'
;
mode CNT;
CNT_DirEnd
: '#end' [ \t]* '\n'?
{ System.out.println("--matched end--"); }
;
CNT_LastLine
: ~ '\n'* '\n'
{ _input.LA(1) == CNT_DirEnd }? -> mode(DEFAULT_MODE)
;
CNT_Line
: ~ '\n'* '\n'
;
parser grammar
parser grammar TLParser;
options { tokenVocab = TLLexer; }
dirs
: ( dir
| NL
)*
;
dir
: DirStart Cont
contents
DirEnd
;
contents
: CNT_Line* CNT_LastLine
;
Essentially each line in the stuff in the CNT mode is free-form, but it never begins with #end followed by optional whitespace. Basically I want to keep matching the #end tag in the default lexer mode.
My test input is as follows:
#dir contents
..line..
#end
If I run this in grun I get the following
$ grun TL dirs test.txt
--matched end--
line 3:0 extraneous input '#end\n' expecting {CNT_LastLine, CNT_Line}
So clearly CNT_DirEnd gets matched, but somehow the predicate doesn't detect it.
I know that this this particular task doesn't require a semantic predicate, but that's just the part that doesn't work. The actual parser, while it may be written without the predicate, will be a lot less clean if I simply move the matching of the the #end tag into the mode CNT.
Thanks,
Kesha.
I think I figured it out. The member _input represents the characters of the original input, thus _input.LA returns characters, not lexer token IDs (is that the correct term?). Either way, the numbers returned by the lexer to the parser have nothing to do with the values returned by _input.LA, hence the predicate fails unless by some weird luck the character value returned by _input.LA(1) is equal to the lexer ID of CNT_DirEnd.
I modified the lexer as shown below and now it works, even though it is not as elegant as I hoped it would be (maybe someone knows a better way?)
lexer grammar TLLexer;
#lexer::members {
private static final String END_DIR = "#end";
private boolean isAtEndDir() {
StringBuilder sb = new StringBuilder();
int n = 1;
int ic;
// read characters until EOF
while ((ic = _input.LA(n++)) != -1) {
char c = (char) ic;
// we're interested in the next line only
if (c == '\n') break;
if (c == '\r') continue;
sb.append(c);
}
// Does the line begin with #end ?
if (sb.indexOf(END_DIR) != 0) return false;
// Is the #end followed by whitespace only?
for (int i = END_DIR.length(); i < sb.length(); i++) {
switch (sb.charAt(i)) {
case ' ':
case '\t':
continue;
default: return false;
}
}
return true;
}
}
[skipped .. nothing changed in the default mode]
mode CNT;
/* removed CNT_DirEnd */
CNT_LastLine
: ~ '\n'* '\n'
{ isAtEndDir() }? -> mode(DEFAULT_MODE)
;
CNT_Line
: ~ '\n'* '\n'
;
I have this groovy script that defines a closure that works properly.
escape = { str ->
str.collect{ ch ->
def escaped = ch
switch (ch) {
case "\"" : escaped = "\\\"" ; break
// other cases omitted for simplicity
}
escaped
}.join()
}
assert escape("\"") == "\\\"" //Sucess
But when I add another closure that uses some GString interpolation to the script.
escape = { str ->
//Same as above
}
dummy = {
aStr = "abc"
"123${aStr}456"
}
//Compilation fails
I get the error
javax.script.ScriptException: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script650.groovy: 7: expecting anything but ''\n''; got it anyway # line 7, column 39.
case "\"" : escaped = "\\"" ; break
^
1 error
Even if the added closure was commented.
escape = { str ->
//Same as above
}
/*dummy = {
aStr = "abc"
"123${aStr}456"
}*/
//Compilation fails
Still fails! What gives?
I am trying to parse a boolean expression of the following type
B1=p & A4=p | A6=p &(~A5=c)
I want a tree that I can use to evaluate the above expression. So I tried this in Antlr3 with the example in Antlr parser for and/or logic - how to get expressions between logic operators?
It worked in Antlr3. Now I want to do the same thing for Antlr 4. I came up the grammar below and it compiles. But I am having trouble writing the Java code.
Start of Antlr4 grammar
grammar TestAntlr4;
options {
output = AST;
}
tokens { AND, OR, NOT }
AND : '&';
OR : '|';
NOT : '~';
// parser/production rules start with a lower case letter
parse
: expression EOF! // omit the EOF token
;
expression
: or
;
or
: and (OR^ and)* // make `||` the root
;
and
: not (AND^ not)* // make `&&` the root
;
not
: NOT^ atom // make `~` the root
| atom
;
atom
: ID
| '('! expression ')'! // omit both `(` and `)`
;
// lexer/terminal rules start with an upper case letter
ID
:
(
'a'..'z'
| 'A'..'Z'
| '0'..'9' | ' '
| ('+'|'-'|'*'|'/'|'_')
| '='
)+
;
I have written the Java Code (snippet below) for getting a tree for the expression "B1=p & A4=p | A6=p &(~A5=c)". I am expecting & with children B1=p and |. The child | operator will have children A4=p and A6=p &(~A5=c). And so on.
Here is that Java code but I am stuck trying to figure out how I will get the tree. I was able to do this in Antlr 3.
Java Code
String src = "B1=p & A4=p | A6=p &(~A5=c)";
CharStream stream = (CharStream)(new ANTLRInputStream(src));
TestAntlr4Lexer lexer = new TestAntlr4Lexer(stream);
parser.setBuildParseTree(true);
ParserRuleContext tree = parser.parse();
tree.inspect(parser);
if ( tree.children.size() > 0) {
System.out.println(" **************");
test.getChildren(tree, parser);
}
The get Children method is below. But this does not seem to extract any tokens.
public void getChildren(ParseTree tree, TestAntlr4Parser parser ) {
for (int i=0; i<tree.getChildCount(); i++){
System.out.println(" Child i= " + i);
System.out.println(" expression = <" + tree.toStringTree(parser) + ">");
if ( tree.getChild(i).getChildCount() != 0 ) {
this.getChildren(tree.getChild(i), parser);
}
}
}
Could someone help me figure out how to write the parser in Java?
The output=AST option was removed in ANTLR 4, as well as the ^ and ! operators you used in the grammar. ANTLR 4 produces parse trees instead of ASTs, so the root of the tree produced by a rule is the rule itself. For example, given the following rule:
and : not (AND not)*;
You will end up with an AndContext tree containing NotContext and TerminalNode children for the not and AND references, respectively. To make it easier to work with the trees, AndContext will contain a generated method not() which returns a list of context objects returned by the invocations of the not rule (return type List<? extends NotContext>). It also contains a generated method AND which returns a list of the TerminalNode instances created for each AND token that was matched.