ANTLR 4: Dynamic Token

ANTLR 4: Dynamic Token - antlr4

Given:
grammar Hbs;
var: START_DELIM ID END_DELIM;
START_DELIM: '{{';
END_DELIM: '}}';
I'd like to know how to change START_DELIM and END_DELIM at runtime to be for example <% and %>.
Does anyone know how to do this in ANTLR 4?
Thanks.

There's a way, but you'll need to tie your grammar to a target language (as of now, the only target is Java).
Here's a quick demo (I included some comments to clarify things):
grammar T;
#lexer::members {
// Some default values
private String start = "<<";
private String end = ">>";
public TLexer(CharStream input, String start, String end) {
this(input);
this.start = start;
this.end = end;
}
boolean tryToken(String text) {
// See if `text` is ahead in the CharStream.
for(int i = 0; i < text.length(); i++) {
if(_input.LA(i + 1) != text.charAt(i)) {
// Nope, we didn't find `text`.
return false;
}
}
// Since we found the text, increase the CharStream's index.
_input.seek(_input.index() + text.length() - 1);
return true;
}
}
parse
: START ID END
;
START
: {tryToken(start)}? .
// The `.` is needed because a lexer rule must match at least 1 char.
;
END
: {tryToken(end)}? .
;
ID
: [a-zA-Z]+
;
SPACE
: [ \t\r\n] -> skip
;
The { ... }? is a semantic predicate. See: https://github.com/antlr/antlr4/blob/master/doc/predicates.md
Here's a small test class:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Main {
private static void test(TLexer lexer) throws Exception {
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTree tree = parser.parse();
System.out.println(tree.toStringTree(parser));
}
public static void main(String[] args) throws Exception {
// Test with the default START and END.
test(new TLexer(new ANTLRInputStream("<< foo >>")));
// Test with a custom START and END.
test(new TLexer(new ANTLRInputStream("<? foo ?>"), "<?", "?>"));
}
}
Run the demo as follows:
*nix
java -jar antlr-4.0-complete.jar T.g4
javac -cp .:antlr-4.0-complete.jar *.java
java -cp .:antlr-4.0-complete.jar Main
Windows
java -jar antlr-4.0-complete.jar T.g4
javac -cp .;antlr-4.0-complete.jar *.java
java -cp .;antlr-4.0-complete.jar Main
And you'll see the following being printed to the console:
(parse << foo >>)
(parse <? foo ?>)

Related

<expression> expected, got '=' for variable assignment in Groovy

I've created following Groovy class in IntelliJ IDEA
1. package org.seleniumrun
2.
3. class Testing {
4. static void getJsonAsString(String endPointUrl) {
5. BufferedReader in = null;
6. }
7. .......
8. }
But it gives compilation error <expression> expected, got '=' at line number 6. I'm not getting how it is incorrect. Could you please help correct me if something is wrong here?

in is a reserved word used for loops. Rename it to something else (line 5) and its will work
Here is an example:
class Testing {
static void getJsonAsString(String endPointUrl) {
BufferedReader reader= null;
println"hello: $endPointUrl"
}
}
def t = new Testing()
t.getJsonAsString("http://google.com")
// and this is how "in" can be used
for( i in [1,2,3,4]) {
println i
}
This is a working code that prints:
hello: http://google.com
1
2
3
4

How to multiply strings in Haxe

I'm trying to multiply some string a by some integer b such that a * b = a + a + a... (b times). I've tried doing it the same way I would in python:
class Test {
static function main() {
var a = "Text";
var b = 4;
trace(a * b); //Assumed Output: TextTextTextText
}
}
But this raises:
Build failure Test.hx:6: characters 14-15 : String should be Int
There doesn't seem to be any information in the Haxe Programming Cookbook or the API Documentation about multiplying strings, so I'm wondering if I've mistyped something or if I should use:
class Test {
static function main() {
var a = "Text";
var b = 4;
var c = "";
for (i in 0...b) {
c = c + a;
}
trace(c); // Outputs "TextTextTextText"
}
}

Not very short, but array comprehension might help in some situations :
class Test {
static function main() {
var a = "Text";
var b = 4;
trace( [for (i in 0...b) a].join("") );
//Output: TextTextTextText
}
}
See on try.haxe.org.

The numeric multiplication operator * requires numeric types, like integer. You have a string. If you want to multiply a string, you have to do it manually by appending a target string within the loop.
The + operator is not the numeric plus in your example, but a way to combine strings.
You can achieve what you want by operator overloading:
abstract MyAbstract(String) {
public inline function new(s:String) {
this = s;
}
#:op(A * B)
public function repeat(rhs:Int):MyAbstract {
var s:StringBuf = new StringBuf();
for (i in 0...rhs)
s.add(this);
return new MyAbstract(s.toString());
}
}
class Main {
static public function main() {
var a = new MyAbstract("foo");
trace(a * 3); // foofoofoo
}
}

To build on tokiop's answer, you could also define a times function, and then use it as a static extension.
using Test.Extensions;
class Test {
static function main() {
trace ("Text".times(4));
}
}
class Extensions {
public static function times (str:String, n:Int) {
return [for (i in 0...n) str].join("");
}
}
try.haxe.org demo here

To build on bsinky answer, you can also define a times function as static extension, but avoid the array:
using Test.Extensions;
class Test {
static function main() {
trace ("Text".times(4));
}
}
class Extensions {
public static function times (str:String, n:Int) {
var v = new StringBuf();
for (i in 0...n) v.add(str);
return v.toString();
}
}
Demo: https://try.haxe.org/#e5937
StringBuf may be optimized for different targets. For example, on JavaScript target it is compiled as if you were just using strings https://api.haxe.org/StringBuf.html

The fastest method (at least on the JavaScript target from https://try.haxe.org/#195A8) seems to be using StringTools._pad.
public static inline function stringProduct ( s : String, n : Int ) {
if ( n < 0 ) {
throw ( 1 );
}
return StringTools.lpad ( "", s, s.length * n );
}
StringTools.lpad and StringTools.rpad can't seem to decide which is more efficient. It looks like rpad might be better for larger strings and lpad might be better for smaller strings, but they switch around a bit with each rerun. haxe.format.JsonPrinter uses lpad for concatenation, but I'm not sure which to recommend.

Semantically disambiguating an ambiguous syntax

Using Antlr 4 I have a situation I am not sure how to resolve. I originally asked the question at https://groups.google.com/forum/#!topic/antlr-discussion/1yxxxAvU678 on the Antlr discussion forum. But that forum does not seem to get a lot of traffic, so I am asking again here.
I have the following grammar:
expression
: ...
| path
;
path
: ...
| dotIdentifierSequence
;
dotIdentifierSequence
: identifier (DOT identifier)*
;
The concern here is that dotIdentifierSequence can mean a number of things semantically, and not all of them are "paths". But at the moment they are all recognized as paths in the parse tree and then I need to handle them specially in my visitor.
But what I'd really like is a way to express the dotIdentifierSequence usages that are not paths into the expression rule rather than in the path rule, and still have dotIdentifierSequence in path to handle path usages.
To be clear, a dotIdentifierSequence might be any of the following:
A path - this is a SQL-like grammar and a path expression would be like a table or column reference in SQL, e.g. a.b.c
A Java class name - e.g. com.acme.SomeJavaType
A static Java field reference - e.g. com.acme.SomeJavaType.SOME_FIELD
A Java enum value reference - e.g. com.acme.Gender.MALE
The idea is that during visitation "dotIdentifierSequence as a path" resolves as a very different type from the other usages.
Any idea how I can do this?

The issue here is that you're trying to make a distinction between "paths" while being created in the parser. Constructing paths inside the lexer would be easier (pseudo code follows):
grammar T;
tokens {
JAVA_TYPE_PATH,
JAVA_FIELD_PATH
}
// parser rules
PATH
: IDENTIFIER ('.' IDENTIFIER)*
{
String s = getText();
if (s is a Java class) {
setType(JAVA_TYPE_PATH);
} else if (s is a Java field) {
setType(JAVA_FIELD_PATH);
}
}
;
fragment IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;
and then in the parser you would do:
expression
: JAVA_TYPE_PATH #javaTypeExpression
| JAVA_FIELD_PATH #javaFieldExpression
| PATH #pathExpression
;
But then, of course, input like this java./*comment*/lang.String would be tokenized wrongly.
Handling it all in the parser would mean manually looking ahead in the token stream and checking if either a Java type, or field exists.
A quick demo:
grammar T;
#parser::members {
String getPathAhead() {
Token token = _input.LT(1);
if (token.getType() != IDENTIFIER) {
return null;
}
StringBuilder builder = new StringBuilder(token.getText());
// Try to collect ('.' IDENTIFIER)*
for (int stepsAhead = 2; ; stepsAhead += 2) {
Token expectedDot = _input.LT(stepsAhead);
Token expectedIdentifier = _input.LT(stepsAhead + 1);
if (expectedDot.getType() != DOT || expectedIdentifier.getType() != IDENTIFIER) {
break;
}
builder.append('.').append(expectedIdentifier.getText());
}
return builder.toString();
}
boolean javaTypeAhead() {
String path = getPathAhead();
if (path == null) {
return false;
}
try {
return Class.forName(path) != null;
} catch (Exception e) {
return false;
}
}
boolean javaFieldAhead() {
String path = getPathAhead();
if (path == null || !path.contains(".")) {
return false;
}
int lastDot = path.lastIndexOf('.');
String typeName = path.substring(0, lastDot);
String fieldName = path.substring(lastDot + 1);
try {
Class<?> clazz = Class.forName(typeName);
return clazz.getField(fieldName) != null;
} catch (Exception e) {
return false;
}
}
}
expression
: {javaTypeAhead()}? path #javaTypeExpression
| {javaFieldAhead()}? path #javaFieldExpression
| path #pathExpression
;
path
: dotIdentifierSequence
;
dotIdentifierSequence
: IDENTIFIER (DOT IDENTIFIER)*
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
DOT
: '.'
;
which can be tested with the following class:
package tl.antlr4;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class Main {
public static void main(String[] args) {
String[] tests = {
"mu",
"tl.antlr4.The",
"java.lang.String",
"foo.bar.Baz",
"tl.antlr4.The.answer",
"tl.antlr4.The.ANSWER"
};
for (String test : tests) {
TLexer lexer = new TLexer(new ANTLRInputStream(test));
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTreeWalker.DEFAULT.walk(new TestListener(), parser.expression());
}
}
}
class TestListener extends TBaseListener {
#Override
public void enterJavaTypeExpression(#NotNull TParser.JavaTypeExpressionContext ctx) {
System.out.println("JavaTypeExpression -> " + ctx.getText());
}
#Override
public void enterJavaFieldExpression(#NotNull TParser.JavaFieldExpressionContext ctx) {
System.out.println("JavaFieldExpression -> " + ctx.getText());
}
#Override
public void enterPathExpression(#NotNull TParser.PathExpressionContext ctx) {
System.out.println("PathExpression -> " + ctx.getText());
}
}
class The {
public static final int ANSWER = 42;
}
which would print the following to the console:
PathExpression -> mu
JavaTypeExpression -> tl.antlr4.The
JavaTypeExpression -> java.lang.String
PathExpression -> foo.bar.Baz
PathExpression -> tl.antlr4.The.answer
JavaFieldExpression -> tl.antlr4.The.ANSWER

If Else: String Equality (Java)

Lab Description : Compare two strings to see if each of the two strings contains the same letters in the
same order.
This is what I have so far far:
import static java.lang.System.*;
public class StringEquality
{
private String wordOne, wordTwo;
public StringEquality()
{
}
public StringEquality(String one, String two)
{
setWords (wordOne, wordTwo);
}
public void setWords(String one, String two)
{
wordOne = one;
wordTwo = two;
}
public boolean checkEquality()
{
if (wordOne == wordTwo)
return true;
else
return false;
}
public String toString()
{
String output = "";
if (checkEquality())
output += wordOne + " does not have the same letters as " + wordTwo;
else
output += wordOne + " does have the same letters as " + wordTwo;
return output;
}
}
My runner looks like this:
import static java.lang.System.*;
public class StringEqualityRunner
{
public static void main(String args[])
{
StringEquality test = new StringEquality();
test.setWords(hello, goodbye);
out.println(test);
}
}
Everything is compiling except for the runner. It keeps saying that hello and goodbye aren't variables. How can I fix this so that the program does not read hello and goodbye as variables, but as Strings?

You need to quote strings otherwise they are treated as variables.
"hello"
"goodbye"
so this would work better.
test.setWords("hello", "goodbye");

Problem with your code is with checkEquality(), you are comparing the string's position in memory when you use == use .equals() to check the string
public boolean checkEquality()
{
if (wordOne == wordTwo) //use wordOne.equals(wordTwo) here
return true;
else
return false;
}

Enclose them in double-quotes.

How to create a method at runtime using Reflection.emit

I'm creating an object at runtime using reflection emit. I successfully created the fields, properties and get set methods.
Now I want to add a method. For the sake of simplicity let's say the method just returns a random number. How do I define the method body?
EDIT:
Yes, I've been looking at the msdn documentation along with other references and I'm starting to get my head wrapped around this stuff.
I see how the example above is adding and/or multplying, but what if my method is doing other stuff. How do I define that "stuff"
Suppose I was generating the class below dynamically, how would I create the body of GetDetails() method?
class TestClass
{
public string Name { get; set; }
public int Size { get; set; }
public TestClass()
{
}
public TestClass(string Name, int Size)
{
this.Name = Name;
this.Size = Size;
}
public string GetDetails()
{
string Details = "Name = " + this.Name + ", Size = " + this.Size.ToString();
return Details;
}
}

You use a MethodBuilder to define methods. To define the method body, you call GetILGenerator() to get an ILGenerator, and then call the Emit methods to emit individual IL instructions. There is an example on the MSDN documentation for MethodBuilder, and you can find other examples of how to use reflection emit on the Using Reflection Emit page:
public static void AddMethodDynamically(TypeBuilder myTypeBld,
string mthdName,
Type[] mthdParams,
Type returnType,
string mthdAction)
{
MethodBuilder myMthdBld = myTypeBld.DefineMethod(
mthdName,
MethodAttributes.Public |
MethodAttributes.Static,
returnType,
mthdParams);
ILGenerator ILout = myMthdBld.GetILGenerator();
int numParams = mthdParams.Length;
for (byte x = 0; x < numParams; x++)
{
ILout.Emit(OpCodes.Ldarg_S, x);
}
if (numParams > 1)
{
for (int y = 0; y < (numParams - 1); y++)
{
switch (mthdAction)
{
case "A": ILout.Emit(OpCodes.Add);
break;
case "M": ILout.Emit(OpCodes.Mul);
break;
default: ILout.Emit(OpCodes.Add);
break;
}
}
}
ILout.Emit(OpCodes.Ret);
}
It sounds like you're looking for resources on writing MSIL. One important resource is the OpCodes class, which has a member for every IL instruction. The documentation describes how each instruction works. Another important resource is either Ildasm or Reflector. These will let you see the IL for compiled code, which will help you understand what IL you want to write. Running your GetDetailsMethod through Reflector and setting the language to IL yields:
.method public hidebysig instance string GetDetails() cil managed
{
.maxstack 4
.locals init (
[0] string Details,
[1] string CS$1$0000,
[2] int32 CS$0$0001)
L_0000: nop
L_0001: ldstr "Name = "
L_0006: ldarg.0
L_0007: call instance string ConsoleApplication1.TestClass::get_Name()
L_000c: ldstr ", Size = "
L_0011: ldarg.0
L_0012: call instance int32 ConsoleApplication1.TestClass::get_Size()
L_0017: stloc.2
L_0018: ldloca.s CS$0$0001
L_001a: call instance string [mscorlib]System.Int32::ToString()
L_001f: call string [mscorlib]System.String::Concat(string, string, string, string)
L_0024: stloc.0
L_0025: ldloc.0
L_0026: stloc.1
L_0027: br.s L_0029
L_0029: ldloc.1
L_002a: ret
}
To generate a method like that dynamically, you will need to call ILGenerator.Emit for each instruction:
ilGen.Emit(OpCodes.Nop);
ilGen.Emit(OpCodes.Ldstr, "Name = ");
ilGen.Emit(OpCodes.Ldarg_0);
ilGen.Emit(OpCodes.Call, nameProperty.GetGetMethod());
// etc..
You may also want to look for introductions to MSIL, such as this one: Introduction to IL Assembly Language.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ANTLR 4: Dynamic Token - antlr4

Given: grammar Hbs; var: START_DELIM ID END_DELIM; START_DELIM: '{{'; END_DELIM: '}}'; I'd like to know how to change START_DELIM and END_DELIM at runtime to be for example <% and %>. Does anyone know how to do this in ANTLR 4? Thanks.

Related

<expression> expected, got '=' for variable assignment in Groovy

How to multiply strings in Haxe

Semantically disambiguating an ambiguous syntax

If Else: String Equality (Java)

How to create a method at runtime using Reflection.emit

Categories

Resources