I have problem with string token in antlr. I must accept string like "abc\n" but an error is thrown when i push a string like
"abc
"
How do i catch newline and ignore \n ?
The rule
STRING : '"' ( ~[\\"\r\n] | '\\' ~[\r\n] )* '"';
will match:
"foo bar"
"foo\\bar"
"foo\"bar"
"foo\nbar"
and reject:
"foo
bar"
If you want to match the rejected input as well, do this:
STRING : '"' ( ~[\\"] | '\\' [\\"] )* '"';
Related
I have the following rules for string and comment:
Double_quoted_string : '"' ( ~[\n\r] )* '"' ;
SL_Comment : '//' .*? '\r'? '\n' -> channel(HIDDEN) ;
But I see that for the following input:
printf("Hello \"something "); //printf("Bye ");
the string token getting generated is:
"Hello \"something "); //printf("Bye "
i.e. greedily the longest match is taken, without applying the rule for the comment.
I would like the string only to be "Hello \"something ". How should the rules be modified for this?
Like this
Double_quoted_string
: '"' ( ~[\\"\n\r] | '\\' [\\"] )* '"'
;
Short explanation of the inner ( ... )*:
~[\\"\n\r] matches any char except \, ", \n and \r
'\\' [\\"] matches \\ or \" *
* if you want to escape more, simply add them to the character class: '\\' [\\"'tbnrf] would match \\, \", \', \t, \b, \n, \r and \f
The requirement for the assignment is:
"Illegal escape in string: " + wrong string: When the lexer detects an illegal
escape in string. The wrong string is from the beginning of the string to the
illegal escape.
All the supported escape sequences are as follows:
\b backspace
\f formfeed
\r carriage return
\n newline
\t horizontal tab
\’ single quote
\" double quote
\ backslash
I use the code for "String" as same as this post recommended:
ANTLR4 - Need an explanation on this String Literals
STRINGLIT: '"' ( '\\' [btnfr"'\\] | ~[\b\t\f\r\n\\"] )* '"';
And also fix a little bit for "Unterminated (or Unclosed) String" as follow:
UNCLOSE_STRING: '"' ( '\\' [btnfr"'\\] | ~[\b\t\f\r\n\\"] )* ;
So I tried to write down the prototype for that requirement like this:
ILLEGAL_ESCAPE: '"' .*? ESCAPE ;
fragment ESCAPE: [\b\f\r\n\t'"\\]
Can someone help me to figure out if had done something wrong to it, I think there is something not clear between STRING and ILLEGAL_ESCAPE so the result is not right.
I appreciate if you can fix it again to meet the requirement as I mentioned earlier. Thanks in advance!!
Try to use the following lexer rule:
ILLEGAL_ESCAPE: '"' ('\\' ~[btnfr"'\\] | ~'\\')*;
I have this grammar :
grammar Hello;
STRING : '"' ( ESC | ~[\r\n"])* '"' ;
fragment ESC : '\\"' ;
r : STRING;
I want when i type a string :
"my name is : \" StackOverflow \" "
the result will be :
"my name is : "StackOverflow" "
But this is the result when i test it :
So what should i do to fix it ? Your help will be appreciated .
There is no way to handle it in your grammar without targeting a specific language. You either strip the slashes when walking your parse tree in a listener or visitor, or embed target specific code in your grammar.
If Java is your target, you could do this:
STRING
: '"' ( ESC | ~[\r\n"] )* '"'
{
String text = getText();
text = text.substring(1, text.length() - 1);
text = text.replaceAll("\\\\(.)", "$1");
setText(text);
}
;
How do I write a lexer rule to match a String literal which does not end in an escaped quote?
Here's my grammar:
lexer grammar StringLexer;
// from The Definitive ANTLR 4 Reference
STRING: '"' (ESC|.)*? '"';
fragment ESC : '\\"' | '\\\\' ;
Here's my java block:
String s = "\"\\\""; // looks like "\"
StringLexer lexer = new StringLexer(new ANTLRInputStream(s));
Token t = lexer.nextToken();
if (t.getType() == StringLexer.STRING) {
System.out.println("Saw a String");
}
else {
System.out.println("Nope");
}
This outputs Saw a String. Should "\" really match STRING?
Edit: Both 280Z28 and Bart's solutions are great solutions, unfortunately I can only accept one.
For properly formed input, the lexer will match the text you expect. However, the use of the non-greedy operator will not prevent it from matching something with the following form:
'"' .*? '"'
To ensure strings are tokens in the most "sane" way possible, I recommended using the following rules.
StringLiteral
: UnterminatedStringLiteral '"'
;
UnterminatedStringLiteral
: '"' (~["\\\r\n] | '\\' (. | EOF))*
;
If your language allows string literals to span across multiple lines, you would likely need to modify UnterminatedStringLiteral to allow matching end-of-line characters.
If you do not include the UnterminatedStringLiteral rule, the lexer will handle unterminated strings by simply ignoring the opening " character of the string and proceeding to tokenize the content of the string.
Yes, "\" is matched by the STRING rule:
STRING: '"' (ESC|.)*? '"';
^ ^ ^
| | |
// matches: " \ "
If you don't want the . to match the backslash (and quote), do something like this:
STRING: '"' ( ESC | ~[\\"] )* '"';
And if your string can't be spread over multiple lines, do:
STRING: '"' ( ESC | ~[\\"\r\n] )* '"';
I have the following deffinition of fragment:
fragment CHAR :'a'..'z'|'A'..'Z'|'\n'|'\t'|'\\'|EOF;
Now I have to define a lexer rule for string. I did the following :
STRING : '"'(CHAR)*'"'
However in string I want to match all of my characters except the new line '\n'. Any ideas how I can achieve that?
You'll also need to exclude " besides line breaks. Try this:
STRING : '"' ~('\r' | '\n' | '"')* '"' ;
The ~ negates char-sets.
ut I want to negate only the new line from my CHAR set
No other way than this AFAIK:
STRING : '"' CHAR_NO_NL* '"' ;
fragment CHAR_NO_NL : 'a'..'z'|'A'..'Z'|'\t'|'\\'|EOF;