customized error reporting in v4 - antlr4

Another question on migrating code from v3 to v4:
For v3, I had a customized error reporting, using code like this (in the grammar file):
#members {
public void displayRecognitionError(String[] tokenNames,
RecognitionException e) {
String hdr = getErrorHeader(e);
String msg = getErrorMessage(e, tokenNames);
System.out.println("ERR:"+hdr+":"+msg);
errCount += 1;
}
}
In v4, when compiling the generated java files, I am getting the error:
MyParser.java:163: cannot find symbol
symbol : method getErrorMessage(org.antlr.v4.runtime.RecognitionException,java.lang.String[])
location: class MyParser
String msg = getErrorMessage(e, tokenNames);
^
Is this function replaced by some other function in v4? (I saw some questions and answers on ANTLRErrorListener, but I could not get how to use it for my situation.)

The displayRecognitionError method was removed in ANTLR 4, so even if you correct the body of that method it will not do anything. You need to remove the method from your grammar entirely, and implement ANTLRErrorListener instead. The documentation includes a list of classes that implement the interface, so you can reference those and/or extend one of them to produce the desired functionality.
Once you have an instance of an ANTLRErrorListener, you can use the following code to attach it to a Parser instance.
// remove the default error listener
parser.removeErrorListeners();
// add your custom error listener
parser.addErrorListener(listener);

Related

Setting types of parsed values in Antlr

I have a rule that looks like this:
INTEGER : [0-9]+;
myFields : uno=INTEGER COMMA dos=INTEGER
Right now to access uno I need to code:
Integer i = Integer.parseInt(myFields.uno.getText())
It would be much cleaner if I could tell antler to do that conversion for me; then I would just need to code:
Integer i = myFields.uno
What are my options?
You could write the code as action, but it would still be explicit conversion (eventually). The parser (like every parser) parses the text and then it's up to "parsing events" (achieved by listener or visitor or actions in ANTLR4) to create meaningful structures/objects.
Of course you could extend some of the generated or built-in classes and then get the type directly, but as mentioned before, at some point you'll always need to convert text to some type needed.
A standard way of handling custom operations on tokens is to embed them in a custom token class:
public class MyToken extends CommonToken {
....
public Integer getInt() {
return Integer.parseInt(getText()); // TODO: error handling
}
}
Also create
public class MyTokenFactory extends TokenFactory { .... }
to source the custom tokens. Add the factory to the lexer using Lexer#setTokenFactory().
Within the custom TokenFactory, override the method
Symbol create(int type, String text); // (typically override both factory methods)
to construct and return a new MyToken.
Given that the signature includes the target token type type, custom type-specific token subclasses could be returned, each with their own custom methods.
Couple of issues with this, though. First, in practice, it is not typically needed: the assignment var is statically typed, so as in the the OP example,
options { TokenLabelType = "MyToken"; }
Integer i = myFields.uno.getInt(); // no cast required
If Integer is desired & expected, use getInt(). If Boolean ....
Second, ANTLR options allows setting a TokenLabelType to preclude the requirement to manually cast custom tokens. Use of only one token label type is supported. So, to use multiple token types, manual casting is required.

Get rid of token recognition error

How can I get rid of the default ANTLR recognition error?
I want to write another message using my own error class instead of ANTLR's error.
I mean is there any possibility that some ANTLR error classes can be extended in order to display my own message?
More clearly, I do not want to see the following error message in my console :
token recognition error at:
If you simply want to suppress the messages, you can call lexer.removeErrorListeners(). However, a better approach is writing your lexer rules such that all possible input is tokenized, with the following rule at the end of the lexer. This will cause all error reporting to go through the parser instead of both the parser and lexer.
// handle characters which failed to match any other token
ErrorCharacter : . ;
In order to create a custom error handler you can extend the BaseErrorListener class and override the syntaxError method, e.g.:
public class MyErrorListener extends BaseErrorListener {
#Override
public void syntaxError( Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine,
String msg, RecognitionException e ) {
// method arguments should be used for more detailed report
throw new RuntimeException("syntax error occurred");
}
}
Now when you create a lexer and a parser you should remove the default error listeners and attach your custom one:
MyErrorListener errorListener = new MyErrorListener();
Lexer lexer = new MyLexer( ... );
lexer.removeErrorListeners();
lexer.addErrorListener( errorListener );
CommonTokenStream tokens = new CommonTokenStream( lexer );
Parser parser = new MyParser( tokens );
parser.removeErrorListeners();
parser.addErrorListener( errorListener );
The default message "line x:x token recognition error at: 'xxx'" comes from the default ConsoleErrorListener class. If you don't remove it using lexer/parser.removeErrorListeners() and only add your custom one it will still be triggered.
The error handling strategies are thoroughly described in a dedicated chapter of The Definitive ANTLR4 Reference book (mentioned on the ANTLR4 Documentation page). I currently have no access to the book itself, so would be grateful if someone edits this answer with a concrete page number of the book. Also, I couldn't find a related guide on the ANTLR4 doc page, so if it exists - a link would be helpful, too.

How to collect errors during run time given by a parser in Antlr4

I have upgraded from Antlr 3 to Antlr 4. I was using this code to catch exceptions using this code. But this is not working for Antlr 4.
partial class XParser
{
public override void ReportError(RecognitionException e)
{
base.ReportError(e);
Console.WriteLine("Error in Parser at line " + ":" + e.OffendingToken.Column + e.OffendingToken.Line + e.Message);
}
}
This is the error that appears
'Parser.ReportError(Antlr4.Runtime.RecognitionException)': no suitable method found to override
In Antlr 4 what is the expected way of accumulating errors that occurs in the input stream. I was unable to find a way to achieve this on the net. Please provide me some guidelines.
EDIT:
I have implemented the XParser as below
partial class XParser : IAntlrErrorListener<IToken>
{
public void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
Console.WriteLine("Error in parser at line " + ":" + e.OffendingToken.Column + e.OffendingToken.Line + e.Message);
}
}
As you said I can extend this parser class using any of the mentioned classes. But I was unable to register this listener, in the main program I am confused with passing argument as the listener. Please help me with the registering.
As I can see these classes has the capability of producing more meaningful error messages don't they?
You need to implement IAntlrErrorListener<IToken>. If all you want to is report errors like you have above, then you should focus on the SyntaxError method. Several base classes are available if you want to extend one.
ConsoleErrorListener
BaseErrorListener
DiagnosticErrorListener
The error listener is attached to the parser instance by calling parser.AddErrorListener(listener).
Edit: You need to create a new class which implements the error listener interface. You then attach the listener to the parser. The parser itself will not implement the error listener interface.

C#<-->JScript: invisible array?

I have a bit of a complex program which is giving me this apparently phantom error...
I'll start explaining with the help of this little example program I rigged that can throw my beautiful exception for anyone who runs it.
<!-- language: c# -->
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace so_redux {
static class Program {
[STAThread]
static void Main(){
CS2JS _class=new CS2JS();
}
}//class Program
class CS2JS{
public CS2JS(){
Func<String,Object> js_eval=initJS();
Object cs_ic=initCS();
string xc;
object res;
cs_ic.GetType().GetMethod("init").Invoke(cs_ic,null);
AppDomain.CurrentDomain.SetData("trespasser",cs_ic);
xc=#"function edata(fieldname:String,ival:String):Object{
var sob=System.AppDomain.CurrentDomain.GetData('trespasser');
var v1=sob.GetType().GetField(fieldname).GetValue(sob);
function HASH(s1:String,s2:String):Object{
var q1=sob.GetType().GetField(s1).GetValue(sob);
return q1.ITEM(s2);
}
var v2=v1.ITEM(ival);
return eval(v2);
}
edata('HT','foo');";
res=js_eval(xc);
// var xx;xx=new Hashtable();xx['sda']='1';eval(xx['sda']); OK
}
Func<String,Object> initJS(){
System.CodeDom.Compiler.CodeDomProvider jcc;
System.CodeDom.Compiler.CompilerParameters jcp;
System.CodeDom.Compiler.CompilerResults jcr;
System.Reflection.Assembly jas;
object jis;
string code;
Type t_ii;
code=#"#set #debug(off)
import System;
import System.Collections;
import System.Collections.Generic;
package internal_namespace{class internal_class{
public function internal_method(_code:String):Object{
return eval(_code);
}
}
}";
jcc=Microsoft.JScript.JScriptCodeProvider.CreateProvider("JScript");
jcp=new System.CodeDom.Compiler.CompilerParameters();
jcp.CompilerOptions="/fast-";
jcp.GenerateExecutable=false;
jcp.GenerateInMemory=true;
jcp.IncludeDebugInformation=false;
jcp.ReferencedAssemblies.Add("System.dll");
jcp.ReferencedAssemblies.Add("System.Core.dll");
jcp.WarningLevel=4;
jcr=jcc.CompileAssemblyFromSource(jcp,code);
jas=jcr.CompiledAssembly;
jis=jas.CreateInstance("internal_namespace.internal_class");
t_ii=jas.GetType("internal_namespace.internal_class");
return (s1)=>t_ii.InvokeMember("internal_method",System.Reflection.BindingFlags.InvokeMethod,null,jis,new object[]{s1});
}//initJS
Object initCS(){
var v1= Microsoft.CSharp.CSharpCodeProvider.CreateProvider("CSharp");
System.CodeDom.Compiler.CompilerParameters v2=new System.CodeDom.Compiler.CompilerParameters();
v2.GenerateExecutable=false;
v2.GenerateInMemory=true;
v2.IncludeDebugInformation=false;
v2.WarningLevel=4;
v2.ReferencedAssemblies.Add("System.dll");
v2.ReferencedAssemblies.Add("System.Core.dll");
string _code="public Hashtable2 HT;"+
"public void init(){"+
"HT=new Hashtable2();"+
"HT[\"foo\"]=\"1\";"+
"HT[\"bar\"]=\"HASH('HT','foo')\";"+
"}";
var v3="using System;using System.Collections;using System.Collections.Generic;"+
"namespace internal_namespace{public class Hashtable2:Hashtable{"+
"public Hashtable2():base(){} public Hashtable2(int N):base(N){}"+
"public Object ITEM(Object K){return this[K];} }"+
"[Serializable]public class internal_class{"+
"public internal_class(){}"+
_code+
"\n}}";
var v4=v1.CompileAssemblyFromSource(v2,v3);
var v5=v4.CompiledAssembly;
var v6=v5.GetType("internal_namespace.internal_class");
var v7=v5.CreateInstance("internal_namespace.internal_class");
return v7;
}//initCS
}//class CS2JS
}//namespace so_redux
The exception that is thrown is "index out of bounds", and it's thrown from JScript. The problem? It's that there is no array!
What this code is doing: first a JScript interpreter is initialized by compiling at runtime a class that "exports" an eval (one could do a dll, but in this case I didn't).
Then a C# assembly is compiled, an assembly that "exports" some user code (the variable _code in initCS is originally loaded by reading a text file).
After the initialization of the newly compiled class (the invoking of init()), I need the two assemblies (JScript and C#) to interact, so I need to pass data between them, and I thought of using AppDomain.
Note: in the C# assembly an Hashtable2 is defined because I put in there an ITEM method that one can use in alternative to the common property this[]: in this way debugging is easier (for examply by showing a holy MessageBox when accessing the values).
So, I pass the class instantiated in initCS to JScript, and JScript reads the internal Hashtable (HT). What I need to do is evaluate the data in the Hashtable, because it is supposed to be able to alter itself dynamically.
Everything works fine if I eval a string not taken from the Hashtable -- in the moment I take whatever is in the Hashtable and pass it to eval, then exceptions happen. Attention: the strings are absolutely the same (even comparing them with Equals) and they work, the difference is only from where they come from.
When the JScript function edata evals a string taken from the Hashtable, JScript says "index out of bounds", but as I was saying: I don't see any array there... (and maybe the problem is exactly that, dunno).
I admit I have my limitations in JScript, so if anybody could lend a hand to help understand WTF is going on, I would be really happy.
Temporary solution/workaround splitting the JScript function:
xc=#"function odata(fieldname:String,ival:String):Object{
var sob=System.AppDomain.CurrentDomain.GetData('trespasser');
var v1=sob.GetType().GetField(fieldname).GetValue(sob);
return v1.ITEM(ival);
}
odata('HT','bar');";
res=js_eval(xc);
xc=#"function HASH(s1:String,s2:String):Object{
var sob=System.AppDomain.CurrentDomain.GetData('trespasser');
var q1=sob.GetType().GetField(s1).GetValue(sob);
return q1.ITEM(s2);
}
eval("+res.ToString()+");";
res=js_eval(xc);
but if anybody really got any idea of why is wrong in the first example, please explain me!

System.Linq.Dynamic .Select("new ...") does not appear to be thread safe

I grabbed System.Linq.Dynamic.DynamicQueryable from here:
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
The issue that I am running into is in code that looks like this:
var results = dataContext.GetTable<MyClass>.Select("new (MyClassID, Name, Description)").Take(5);
It appears that if that line of code is executed by multiple threads near simultaneously, Microsoft's dynamic Linq code crashes in their ClassFactory.GetDynamicClass() method, which looks like this:
public Type GetDynamicClass(IEnumerable<DynamicProperty> properties)
{
rwLock.AcquireReaderLock(Timeout.Infinite);
try
{
Signature signature = new Signature(properties);
Type type;
if (!classes.TryGetValue(signature, out type))
{
type = CreateDynamicClass(signature.properties);
classes.Add(signature, type); // <-- crashes over here!
}
return type;
}
finally
{
rwLock.ReleaseReaderLock();
}
}
The crash is a simple dictionary error: "An item with the same key has already been added."
In Ms code, The rwLock variable is a ReadWriterLock class, but it does nothing to block multiple threads from getting inside classes.TryGetValue() if statement, so clearly, the Add will fail.
I can replicate this error pretty easily in any code that creates a two or more threads that try to execute the Select("new") statement.
Anyways, I'm wondering if anyone else has run into this issue, and if there are fixes or workarounds I can implement.
Thanks.
I did the following (requires .NET 4 or later to use System.Collections.Concurrent):
changed the classes field to a ConcurrentDictionary<Signature, Type> ,
removed all the ReaderWriterLock rwLock field and all the code referring to it,
updated GetDynamicClass to:
public Type GetDynamicClass(IEnumerable<DynamicProperty> properties) {
var signature = new Signature(properties);
return classes.GetOrAdd(signature, sig => CreateDynamicClass(sig.properties));
}
removed the classCount field and updated CreateDynamicClass to use classes.Count instead:
Type CreateDynamicClass(DynamicProperty[] properties) {
string typeName = "DynamicClass" + Guid.NewGuid().ToString("N");
...

Resources