Where is the InputMismatchException thrown? - antlr4

When I execute my program with a certain token in the wrong spot, it throws the InputMismatchException, saying something along the lines of
line 21:0 mismatched input '#' expecting {'in', '||', '&&', '==', '!=', '>=', '<=', '^', '>', '<', '+', '-', '*', '/', '%', '[', ';', '?'}
Which is a terrible error message for the language I'm developing, so I'm looking to change it, but I can't find the source of it, I know why the error is being thrown, but I can't find the actual line of java code that throws the InputMismatchException, I don't think its anywhere in my project, so I assume it's somewhere in the antlr4 runtime, is there a way to disable these error messages, or at least change them?
Edit:
My grammar (the relevant parts) are as follows:
grammar Q;
parse
: header? ( allImport ';' )*? block EOF
;
block
: ( statement | functionDecl )* ( Return expression ';' )?
;
statement
: functionCall ';'
| ifStatement
| forStatement | forInStatement
| whileStatement
| tryCatchStatement
| mainFunctionStatement
| addWebServerTextStatement ';'
| reAssignment ';'
| classStatement
| constructorStatement ';'
| windowAddCompStatement ';'
| windowRenderStatement ';'
| fileWriteStatement ';'
| verifyFileStatement ';'
| objFunctionCall (';')?
| objCreateStatement ';'
| osExecStatement ';'
| anonymousFunction
| hereStatement ';'
;
And an example of the importStatement visit method is:
#Override
public QValue visitImportStatement(ImportStatementContext ctx) {
StringBuilder path = new StringBuilder();
StringBuilder text = new StringBuilder();
for (TerminalNode o : ctx.Identifier()) {
path.append("/").append(o.getText());
}
for (TerminalNode o : ctx.Identifier()) {
text.append(".").append(o.getText());
}
if (lang.allLibs.contains(text.toString().replace(".q.", "").toLowerCase(Locale.ROOT))) {
lang.parse(text.toString());
return QValue.VOID;
}
for (File f : lang.parsed) {
Path currentRelativePath = Paths.get("");
String currentPath = currentRelativePath.toAbsolutePath().toString();
File file = new File(currentPath + "/" + path + ".l");
if (f.getPath().equals(file.getPath())) {
return null;
}
}
QLexer lexer = null;
Path currentRelativePath = Paths.get("");
String currentPath = currentRelativePath.toAbsolutePath().toString();
File file = new File(currentPath + "/" + path + ".l");
lang.parsed.add(file);
try {
lexer = new QLexer(CharStreams.fromFileName(currentPath + "/" + path + ".l"));
} catch (IOException e) {
throw new Problem("Library or File not found: " + path, ctx);
}
QParser parser = new QParser(new CommonTokenStream(lexer));
parser.setBuildParseTree(true);
ParseTree tree = parser.parse();
Scope s = new Scope(lang.scope, false);
Visitor v = new Visitor(s, new HashMap<>());
v.visit(tree);
return QValue.VOID;
}
Because of the parse rule in my g4 file, the import statement MUST come before any other thing (aside from a header statement), so doing this would throw an error
class Main
#import src.main.QFiles.aLib;
fn main()
try
std::ln("orih");
onflaw
end
new Object as o();
o::set("val");
std::ln(o::get());
std::ln("itj");
end
end
And, as expected, it throws an InputMismatchException, but that's not in any of my code

You can remove the default error strategy and implement your own:
...
QParser parser = new QParser(new CommonTokenStream(lexer));
parser.removeErrorListeners();
parser.addErrorListener(new BaseErrorListener() {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
throw new RuntimeException("Your own message here", e);
}
});
ParseTree tree = parser.parse();
...

Related

ANTLR4 - named function arguments

My goal is to generate parser that could handle following code with named function parameters and nested function calls
fnCallY(namedArgStr = "xxx", namedArgZ=fnCallZ(namedArg="www"))
G4 lang file:
val : type_string
| function_call
;
function_call : function_name=ID arguments='('argument? (',' argument)* ')';
argument : name=ID '=' value=val ;
ID : [a-zA-Z_][a-zA-Z0-9_]*;
type_string : LITERAL;
fragment ESCAPED_QUOTE : '\\"';
LITERAL : '"' ( ESCAPED_QUOTE | ~('\n'|'\r') )*? '"'
| '\'' ( ESCAPED_QUOTE | ~('\n'|'\r') )*? '\'';
#Override
public void exitFunction_call(Test.Function_callContext ctx) {
List<Test.ArgumentContext> argument = ctx.argument();
for (Test.ArgumentContext arg : argument) {
Token name = arg.name;
Test.ValContext value = arg.value;
if (value.type_literal() == null || value.function_call() == null) {
throw new RuntimeException("Could not parse argument value");
}
}
}
arg.name holds correct data, but i cannot make the parser to parse the part after =.
The parser is recognizing the argument values.
(It's really valuable to learn the grun command line utility as it can test the grammar and tree structure without involving any of your own code)
This condition would appear to be your problem:
if (value.type_literal() == null || value.function_call() == null)
One or the other will always be null, so this will fail.
if (value.type_literal() == null && value.function_call() == null)
is probably what you want.

Xtext refering to element from different file does not work

Hello I am having two files in my xtext editor, the first one containing all definitions and the second one containing the executed recipe. The Grammar looks like this:
ServiceAutomationProgram:
('package' name=QualifiedName ';')?
imports+=ServiceAutomationImport*
definitions+=Definition*;
ServiceAutomationImport:
'import' importedNamespace=QualifiedNameWithWildcard ';';
Definition:
'define' ( TypeDefinition | ServiceDefinition |
SubRecipeDefinition | RecipeDefinition) ';';
TypeDefinition:
'quantity' name=ID ;
SubRecipeDefinition:
'subrecipe' name=ID '('( subRecipeParameters+=ServiceParameterDefinition (','
subRecipeParameters+=ServiceParameterDefinition)*)? ')' '{'
recipeSteps+=RecipeStep*
'}';
RecipeDefinition:
'recipe' name=ID '{' recipeSteps+=RecipeStep* '}';
RecipeStep:
(ServiceInvocation | SubRecipeInvocation) ';';
SubRecipeInvocation:
name=ID 'subrecipe' calledSubrecipe=[SubRecipeDefinition] '('( parameters+=ServiceInvocationParameter (',' parameters+=ServiceInvocationParameter)* )?')'
;
ServiceInvocation:
name=ID 'service' service=[ServiceDefinition]
'(' (parameters+=ServiceInvocationParameter (',' parameters+=ServiceInvocationParameter)*)? ')'
;
ServiceInvocationParameter:
ServiceEngineeringQuantityParameter | SubRecipeParameter
;
ServiceEngineeringQuantityParameter:
parameterName=[ServiceParameterDefinition] value=Amount;
ServiceDefinition:
'service' name=ID ('inputs' serviceInputs+=ServiceParameterDefinition (','
serviceInputs+=ServiceParameterDefinition)*)?;
ServiceParameterDefinition:
name=ID ':' (parameterType=[TypeDefinition]);
;
SubRecipeParameter:
parameterName=[ServiceParameterDefinition]
;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
QualifiedName:
ID ('.' ID)*;
Amount:
INT ;
....
definitionfile file.mydsl:
define quantity Temperature;
define service Heater inputs SetTemperature:Temperature;
define subrecipe sub_recursive() {
Heating1 service Heater(SetTemperature 10);
};
....
recipefile secondsfile.mydsl:
define recipe Main {
sub1 subrecipe sub_recursive();
};
.....
In my generator file which looks like this:
override void doGenerate(Resource resource, IFileSystemAccess2 fsa, IGeneratorContext context) {
for (e : resource.allContents. toIterable.filter (RecipeDefinition)){
e.class;//just for demonstration add breakpoint here and //traverse down the tree
}
}
I need as an example the information RecipeDefinition.recipesteps.subrecipeinvocation.calledsubrecipe.recipesteps.serviceinvocation.service.name which is not accessible (null) So some of the very deep buried information gets lost (maybe due to lazylinking?).
To make the project executable also add to the scopeprovider:
public IScope getScope(EObject context, EReference reference) {
if (context instanceof ServiceInvocationParameter
&& reference == MyDslPackage.Literals.SERVICE_INVOCATION_PARAMETER__PARAMETER_NAME) {
ServiceInvocationParameter invocationParameter = (ServiceInvocationParameter) context;
List<ServiceParameterDefinition> candidates = new ArrayList<>();
if(invocationParameter.eContainer() instanceof ServiceInvocation) {
ServiceInvocation serviceCall = (ServiceInvocation) invocationParameter.eContainer();
ServiceDefinition calledService = serviceCall.getService();
candidates.addAll(calledService.getServiceInputs());
if(serviceCall.eContainer() instanceof SubRecipeDefinition) {
SubRecipeDefinition subRecipeCall=(SubRecipeDefinition) serviceCall.eContainer();
candidates.addAll(subRecipeCall.getSubRecipeParameters());
}
return Scopes.scopeFor(candidates);
}
else if(invocationParameter.eContainer() instanceof SubRecipeInvocation) {
SubRecipeInvocation serviceCall = (SubRecipeInvocation) invocationParameter.eContainer();
SubRecipeDefinition calledSub = serviceCall.getCalledSubrecipe();
candidates.addAll(calledSub.getSubRecipeParameters());
return Scopes.scopeFor(candidates);
}
}return super.getScope(context, reference);
}
When I put all in the same file it works as it does the first time executed after launching runtime but afterwards(when dogenerate is triggered via editor saving) some information is missing. Any idea how to get to the missing informations? thanks a lot!

Semantically disambiguating an ambiguous syntax

Using Antlr 4 I have a situation I am not sure how to resolve. I originally asked the question at https://groups.google.com/forum/#!topic/antlr-discussion/1yxxxAvU678 on the Antlr discussion forum. But that forum does not seem to get a lot of traffic, so I am asking again here.
I have the following grammar:
expression
: ...
| path
;
path
: ...
| dotIdentifierSequence
;
dotIdentifierSequence
: identifier (DOT identifier)*
;
The concern here is that dotIdentifierSequence can mean a number of things semantically, and not all of them are "paths". But at the moment they are all recognized as paths in the parse tree and then I need to handle them specially in my visitor.
But what I'd really like is a way to express the dotIdentifierSequence usages that are not paths into the expression rule rather than in the path rule, and still have dotIdentifierSequence in path to handle path usages.
To be clear, a dotIdentifierSequence might be any of the following:
A path - this is a SQL-like grammar and a path expression would be like a table or column reference in SQL, e.g. a.b.c
A Java class name - e.g. com.acme.SomeJavaType
A static Java field reference - e.g. com.acme.SomeJavaType.SOME_FIELD
A Java enum value reference - e.g. com.acme.Gender.MALE
The idea is that during visitation "dotIdentifierSequence as a path" resolves as a very different type from the other usages.
Any idea how I can do this?
The issue here is that you're trying to make a distinction between "paths" while being created in the parser. Constructing paths inside the lexer would be easier (pseudo code follows):
grammar T;
tokens {
JAVA_TYPE_PATH,
JAVA_FIELD_PATH
}
// parser rules
PATH
: IDENTIFIER ('.' IDENTIFIER)*
{
String s = getText();
if (s is a Java class) {
setType(JAVA_TYPE_PATH);
} else if (s is a Java field) {
setType(JAVA_FIELD_PATH);
}
}
;
fragment IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;
and then in the parser you would do:
expression
: JAVA_TYPE_PATH #javaTypeExpression
| JAVA_FIELD_PATH #javaFieldExpression
| PATH #pathExpression
;
But then, of course, input like this java./*comment*/lang.String would be tokenized wrongly.
Handling it all in the parser would mean manually looking ahead in the token stream and checking if either a Java type, or field exists.
A quick demo:
grammar T;
#parser::members {
String getPathAhead() {
Token token = _input.LT(1);
if (token.getType() != IDENTIFIER) {
return null;
}
StringBuilder builder = new StringBuilder(token.getText());
// Try to collect ('.' IDENTIFIER)*
for (int stepsAhead = 2; ; stepsAhead += 2) {
Token expectedDot = _input.LT(stepsAhead);
Token expectedIdentifier = _input.LT(stepsAhead + 1);
if (expectedDot.getType() != DOT || expectedIdentifier.getType() != IDENTIFIER) {
break;
}
builder.append('.').append(expectedIdentifier.getText());
}
return builder.toString();
}
boolean javaTypeAhead() {
String path = getPathAhead();
if (path == null) {
return false;
}
try {
return Class.forName(path) != null;
} catch (Exception e) {
return false;
}
}
boolean javaFieldAhead() {
String path = getPathAhead();
if (path == null || !path.contains(".")) {
return false;
}
int lastDot = path.lastIndexOf('.');
String typeName = path.substring(0, lastDot);
String fieldName = path.substring(lastDot + 1);
try {
Class<?> clazz = Class.forName(typeName);
return clazz.getField(fieldName) != null;
} catch (Exception e) {
return false;
}
}
}
expression
: {javaTypeAhead()}? path #javaTypeExpression
| {javaFieldAhead()}? path #javaFieldExpression
| path #pathExpression
;
path
: dotIdentifierSequence
;
dotIdentifierSequence
: IDENTIFIER (DOT IDENTIFIER)*
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
DOT
: '.'
;
which can be tested with the following class:
package tl.antlr4;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class Main {
public static void main(String[] args) {
String[] tests = {
"mu",
"tl.antlr4.The",
"java.lang.String",
"foo.bar.Baz",
"tl.antlr4.The.answer",
"tl.antlr4.The.ANSWER"
};
for (String test : tests) {
TLexer lexer = new TLexer(new ANTLRInputStream(test));
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTreeWalker.DEFAULT.walk(new TestListener(), parser.expression());
}
}
}
class TestListener extends TBaseListener {
#Override
public void enterJavaTypeExpression(#NotNull TParser.JavaTypeExpressionContext ctx) {
System.out.println("JavaTypeExpression -> " + ctx.getText());
}
#Override
public void enterJavaFieldExpression(#NotNull TParser.JavaFieldExpressionContext ctx) {
System.out.println("JavaFieldExpression -> " + ctx.getText());
}
#Override
public void enterPathExpression(#NotNull TParser.PathExpressionContext ctx) {
System.out.println("PathExpression -> " + ctx.getText());
}
}
class The {
public static final int ANSWER = 42;
}
which would print the following to the console:
PathExpression -> mu
JavaTypeExpression -> tl.antlr4.The
JavaTypeExpression -> java.lang.String
PathExpression -> foo.bar.Baz
PathExpression -> tl.antlr4.The.answer
JavaFieldExpression -> tl.antlr4.The.ANSWER

ANTLR4: Error message with complete offending source code line

In case of an error I want to generate in my compiler an error message like e.g. clang, which contains the complete offending source code line.
Example:
1.c:3:7: error: use of undeclared identifier 'x'
if ( x== y) {
^
I have extended the ANTLR BaseErrorListener, but I have no idea
how to get the offending source code line (besides of the line number) as a string.
That's a simple thing. You get line number and char position in your error info. Use that to locate the position in your input. Then scan back and forward for line breaks. The text between those linebreaks is your source code line.
I created 2 methods in my extension of BaseErrorListener. One to get the source code line and another to get the '^' sign:
private String getSourceLine(String src, int line) {
String currentLine = "";
try (Scanner fileScanner = new Scanner(new File(src))) {
int currentLineNumber = 1;
while (fileScanner.hasNextLine()) {
currentLine = fileScanner.nextLine();
if (currentLineNumber == line) {
return currentLine + "\n";
}
currentLineNumber++;
}
} catch (Exception e) {
currentLine = "\n";
}
return currentLine;
}
private String getPointer(int charPosition) {
return StringUtils.repeat(' ', charPosition) + '^';
}
Calling them with:
String source = recognizer.getInputStream().getSourceName();
errorMessage += this.getSourceLine(source, line);
errorMessage += this.getPointer(charPositionInLine);

Syntax error on my Groovy script?

I am using GroovyShell (2.1.7) to dynamically evaluate some Groovy code that I have stored off as a string.
GroovyShell shell = magicallyInstantiateAndBindGroovyShell();
The above method takes care of instantiating the shell, and binding all the required variables to it. Since I believe this is a syntax error, I won't clutter this question with all the variables the shell is being bound with, and what the code I'm trying to evaluate is actually doing. If it turns out that I need to add any more info to the question to help solve my problem, I'll happily oblige!
I then have a string of Groovy code that I am trying to evaluate:
com.me.myorg.myapp.ExpressionUtils.metaClass.filterMetadata = {
com.me.myorg.myapp.model.WidgetVO widget, List<String> properties ->
WidgetVO toReturn = new WidgetVO();
toReturn.setFizz(widget.getFizz());
if(widget.getBuzz().equalsIgnoreCase("BIMDER")) {
toReturn.setMode(widget.getMode());
}
for(String property : properties) {
if("some.prop".equals(property)) {
Preconditions.checkNotNull(widget.getDescriptions());
toReturn.setDescriptions(new ArrayList<DescriptionVO>());
DescriptionVO description = widget.getDescriptions().get(0);
toReturn.getDescriptions().add(description);
} else if("another.prop".equals(property)) {
Preconditions.checkNotNull(widget.getTitles().get(0));
toReturn.setTitles(new ArrayList<TitleVO>());
TitleVO title = widget.getTitles().get(0);
toReturn.getTitles().add(title);
}
}
return toReturn;
};
Which I actually have stored off as a string variable:
String code = "com.me.myorg.myapp.ExpressionUtils.metaClass.filterMetadata = { com.me.myorg.myapp.model.WidgetVO widget, List<String> properties -> WidgetVO toReturn = new WidgetVO(); toReturn.setFizz(widget.getFizz()); if(widget.getBuzz().equalsIgnoreCase(\"BIMDER\")) { toReturn.setMode(widget.getMode()); } for(String property : properties) { if(\"some.prop\".equals(property)) { Preconditions.checkNotNull(widget.getDescriptions()); toReturn.setDescriptions(new ArrayList<DescriptionVO>()); DescriptionVO description = widget.getDescriptions().get(0); toReturn.getDescriptions().add(description); } else if(\"another.prop\".equals(property)) { Preconditions.checkNotNull(widget.getTitles().get(0)); toReturn.setTitles(new ArrayList<TitleVO>()); TitleVO title = widget.getTitles().get(0); toReturn.getTitles().add(title); } } return toReturn; };
When I run:
shell.evaluate(code);
I get the following exception:
startup failed, Script1.groovy: 1: unexpected token: for # line 1, column 294.
1 error
No signature of method: com.me.myorg.myapp.ExpressionUtils.metaClass.filterMetadata() is applicable for argument types: (com.me.myorg.myapp.model.WidgetVO, java.util.ArrayList) values: {com.me.myorg.myapp.model.WidgetVO#9427908c, ["some.prop", "another.prop"]}
Column 294 is the beginning of the for-loop... but to me, this seems like perfectly fine code. Am I forgetting a closing bracket anywhere? Some other syntax error? Where am I going awry? Thanks in advance!
You have:
if(widget.getBuzz().equalsIgnoreCase(\"BIMDER\")) { toReturn.setMode(widget.getMode()); } for(String property : properties)
You need a semicolon before the for...
Why not use a multi-line string?
String code = """com.me.myorg.myapp.ExpressionUtils.metaClass.filterMetadata = { com.me.myorg.myapp.model.WidgetVO widget, List<String> properties ->
| WidgetVO toReturn = new WidgetVO()
| toReturn.setFizz(widget.getFizz())
| if( widget.getBuzz().equalsIgnoreCase( "BIMDER" ) ) {
| toReturn.setMode(widget.getMode())
| }
| for( String property : properties ) {
| if( "some.prop" == property ) {
| Preconditions.checkNotNull( widget.descriptions )
| toReturn.descriptions = [ widget.descriptions[ 0 ] ]
| }
| else if( "another.prop" == property ) {
| Preconditions.checkNotNull( widget.titles[ 0 ] )
| toReturn.titles = [ widget.titles[ 0 ] ]
| }
| }
| toReturn
|}""".stripMargin()

Resources