Changing token value in ANTLR4 lexer

Changing token value in ANTLR4 lexer - antlr4

I'm trying to parse a language in ANTLR4 that is case-insensitive as far as identifiers go. If possible I'd like to push this onto the lexer, something like:
IDENT : [a-zA-Z]+ { /* set token = token.toUpper() */ }
Except I can't find anything in the documentation that would let me change a token's value in a lexer action and looking at the generated code it doesn't look like there's anything exposed in a lexer action that would permit this.
Am I missing something or do I need to handle this in the application code?

You can do it this way:
IDENT : [a-zA-Z]+ { setText(getText().toUpperCase()); };
It seems to be the proposed way, a similar example is here

Related

Pass Dynamic values to the rules in ANTLR4 grammar

I am newbie to ANTLR4
I want to write a grammar that would parse the syntax using the values which it reads dynamically.
Say my grammar is as follows in image
I need help such the HANDLERID not only takes the values mentioned,but a list of values based on a function call,dynamic values. For example a function return list containing {'ACD','GHY','XYZ' ..}. Not to confuse with identifier,these values are names of some defined set of objects, so writing a grammar for IDENTIFIER is not solution.
Any help is appeciated.

Maybe actions are a viable solution? These are written in the target language and allow to do all kind of processing. Formulated as a predicate (appending a ? to the action block) they can even be used to guide the parser what path to take.
Here's a typical form:
decl: type ID ';' { System.out.println("found a decl"); };
or as a predicate:
HANDLERID: ID { isSpecialWord($ID.text) }?;
which will only be matched for IDs that your internal function isSpecialWord is returning true for. So essentially, you are not passing the lexer rule some values, but you do the evaluation in internal code.

Ways of keeping ANTLR4 grammar target independent

I'm writing a grammar for C++ target, however I'd like to keep it working with Java as well since ANTLR comes with great tools that work for grammars with Java target. The book ("The Definitive ANTLR 4 Reference") says that the way of achieving target independence is to use listeners and/or visitors. There is one problem though. Any predicate, local variable, custom constructor, custom token class etc. that I might need introduces target language dependence that cannot be removed, at least according to the information I took from the book. Since the book might be outdated here are the questions:
Is there a way of declaring primitive variables in language independent way, something like:
item[$bool hasAttr]
:
type ( { $hasAttr }? attr | ) ID
;
where $bool would be translated to bool in C++, but to boolean in Java (workaround would be to use int in that case but most likely not in all potential targets)
Is there a way of declaring certain code fragments to be for specific target only, something like:
parser grammar testParser;
options
{
tokenVocab=testLexer;
}
#header
<lang=Cpp>{
#include "utils/helper.h"
}
<lang=Java>{
import test.utils.THelper;
}
#members
<lang=Cpp>{
public:
testParser(antlr4::TokenStream *input, utils::THelper *helper);
private:
utils::THelper *Helper;
public:
}
<lang=Java>{
public testParser(TokenStream input, THelper helper) {
this(input);
Helper = helper;
}
private THelper Helper;
}
start
:
(
<lang=Cpp>{ Helper->OnUnitStart(this); }
<lang=Java>{ Helper.OnUnitStart(this); }
unit
<lang=Cpp>{ _localctx = Helper->OnUnitEnd(this); }
<lang=Java>{ _localctx = Helper.OnUnitEnd(this); }
)*
EOF
;
...
For the time being I'm keeping two separate grammars changing the Java one and merging the changes to C++ one once I'm happy with the results, but if possible
I'd rather keep it in one file.

This target dependency is a real nuisance and I'm thinking for a while already how to get rid of that in a good way. Haven't still found something fully usable.
What you can do is to stay with syntax that both Java and C++ can understand (e.g. write a predicate like a function call: a: { isValid() }? b c; and implement such functions in a base class from which you derive your parser (ANTLR allows to specify such a base class via the grammar option superClass).
The C++ target also got a number of additional named actions which you can use to specify C++ specific stuff only.

How can I pass a structure down a tree (i.e. an inherited attribute) when using Visitor pattern?

I'm using the C++ version of ANTLR4 to develop a DSL for a music product. I used to (30 years ago!) do this kind of thing by hand so it's mostly a pleasure to have something like ANTLR, particularly now that I don't have to insert code in the actual grammar definition itself.
I want to do type checking of actual vs formal args in a function call. In the grammar segment below, the 'actualParameter' can return the type of the expression. However, the 'actualParameterList' needs to return an array (say) of these types so that the code for functionCall can compare to the formal parameter list.
If I was handwriting this, the calls to visit or visitChildren would take an extra parameter after context such that I could create a new array at the appropriate place and then have child nodes fill in the details.
I suppose that instead of just calling visitChildren inside the 'visitActualParameterList' I could create the array there and manually call each child rather than just a simple visitChildren but that feels like a hack, and it becomes very sensitive to minor changes in the grammar.
Is there a better approach?
functionCall: Identifier LeftParen actualParameterList? RightParen
;
actualParameterList:
actualParameter anotherActualParameter
;
actualParameter:
expression
;
anotherActualParameter:
Comma actualParameter anotherActualParameter
|
;

You're on the right path. I would suggest something like:
functionCall: Identifier LPAREN actualParameterList RPAREN
;
actualParameterList:
actualParameter (',' actualParameter)*
;
actualParameter:
expression
;
LPAREN : '(';
RPAREN : ')';
Using this, in the Visitor for actualParameterList you can check each child to see if it's of type actualParameterContext and if so, explicitly call Visit on that child, which will get you into your expression evaluation code (presumably handled in the visitor for actualParameter). This alleviates the need, as you say, to just generically visit children. It's very precise when you can check the type like this.
Here's an example of this pattern from my own code (in C# but surely you'll see the pattern in action):
for (int c = 0; c < context.ChildCount; c++)
{
if (context.GetChild(c) is SystemParser.ServerContext) // make sure correct type
{
string serverinfo = Visit(context.GetChild(c)); // visit the specific child and save return value, string in this case
sb.Append(serverinfo); // use result to fill array or do whatever
}
}
Now that you can see the pattern, back to your code. The syntax:
actualParameter (',' actualParameter)*
means that a parameter list has one actualParameter followed by zero or more additional ones with the * operator. I just threw the comma in there for visual clarity.
As you suggest, Visitor is the perfect pattern for this because you can explicitly visit any node you need to. It won't give you an array, but you can fill an array or any other necessary structure with the results of the visiting the children as you saw in the snip from my code. My Visitor returns strings, and I just appended to a StringBuilder. You can use the same pattern to build whatever you need.

xtext inferrer: multiple entities

I am very new to Xtext/Xtend, therefore apologies in advance if the answer is obvious.
I would like to allow the end-users of my DSL to define a 'filter', that when applied and 'returns' true it means that they want to 'filter out' the given entity of data from consideration.
I want to allow them 2 ways of defining the filter
A) by introspecting the attributes of a given data object and apply basic rules like
if (obj.field1<CURRENT_DATE && obj.field2=="EXPIRED)
{ return true;} else {return false;}
B) by executing a controlled snippet using 'eval' of my host language
In other words, the user would be expected to type into a string/code block a valid
code snippet of the hosting language
I had decided that the easiest way for me support case A) would be to leverage the XBase rules (including expressions/etc)
Therefore I defined filters (mostly copying the ideas from Lorenzo's book)
Filter:
(FilterDSL | FilterCode);
FilterDSL:
'filterDSL' (type=JvmTypeReference)? name=ID
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression ;
FilterCode:
'filterCode' (type=JvmTypeReference)? name=ID
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
'{'
body=STRING
'}';
Now when trying to implement the Java mapping for my DSL, via the inferrer stub in Xtend -- I am running into multiple problems.
All of them likely indicate that I am missing some fundamental understanding
Problem 1) fl.body is not defined. fl Is of type Filter, not FilterDSL or FilterCode
And I do not understand how to check what type a given instance is of, so that I can access the content of a 'body' feature.
Problem 2) I do not understand where 'body' attribute in the inferrer method is defined and why. Is this part of ECore? (I could not find it)
Problem 3) what's the proper way to allow a user to specify a code block? String seems to be not the right thing as it does not allow multiline
Problem 4) How do I correctly convert a code block into something that is accepted by the 'body' such that it ends up in the generated code.
Problem 5) How do I setup multiple inferrers (as I have more than one thing for which I need the code generated (mostly) by xBase code generator)
Appreciate in advance any suggestions, or pointer to code examples solving similar problems.
As a side observation, Inferrer and its interplay with XBase has sofar been the most confusing and difficult thing to understand.

in general: have a look at the xtend docs at xtend-lang.org
You can do a if (x instanceof Type) or a switch statement with Type guards (see domain model example)
i dont get that question. both your FilterDSL and FilterCode EClasses should have a field+getter/setter named body, FilterCode of type String, FilterDSL of type XBlockExpression. The JvmTypesBuilder add extension methods to JvmOperation called setBody(String) and setBody(XExpression), syntax sugar lets you call body = .... instead of setBody(...)
(btw you can do crtl+click to find out where a thing is defined)
strings are actually multiline
is answered by (2)
you dont need multiple inferrers, you can infer multiple stuff e.g. by calling toClass or toField multiple times for the same input

Is there any fast tool which performs constant substitution without stripping out comments in JavaScript source code?

For example, setting MYCONST = true would lead to the transformation of
if (MYCONST) {
console.log('MYCONST IS TRUE!'); // print important message
}
to
if (true) {
console.log('MYCONST IS TRUE!'); // print important message
}
This tool ideally has a fast node.js accessible API.

A better way to achieve what you want -
Settings.js
settings = {
MYCONST = true
};
MainCode.js
if (settings.MYCONST) {
console.log('MYCONST IS TRUE!'); // print important message
}
This way, you make a change to one single file.

google's closure compiler does, among other things, inlining of constants when annotated as such, leaving string content untouched but I am not sure if it's a viable option for you.

Patch a beautifier, for example
Get the JS Beautifier https://raw.github.com/einars/js-beautify/master/beautify.js written in JS.
Replace the last line of function print_token() by something like
output.push(token_text=="MYCONST"?"true":token_text);
Call js_beautify(your_code) from within nodejs.

The Apache Ant build system supports a replace task that could be used to achieve this.

Edit: Whoops. Gotta read title first. Ignore me.
Google Closure Compiler has such a feature:
You can combine the #define tag in your code with the --define parameter to change variables at "compile" time.
The closure compiler will also remove your if-statement, which is probably what you want.
Simply write your code like this:
/** #define {boolean} */
var MYCONST = false; // default value is necessary here
if (MYCONST) {
console.log('MYCONST IS TRUE!'); // print important message
}
And call the compiler with the parameter:
java -jar closure-compiler.jar --define=MYCONST=true --js pathto/file.js
Regarding you API request: Closure Compiler has a JSON API.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Changing token value in ANTLR4 lexer - antlr4

You can do it this way: IDENT : [a-zA-Z]+ { setText(getText().toUpperCase()); }; It seems to be the proposed way, a similar example is here

Related

Pass Dynamic values to the rules in ANTLR4 grammar

Ways of keeping ANTLR4 grammar target independent

How can I pass a structure down a tree (i.e. an inherited attribute) when using Visitor pattern?

xtext inferrer: multiple entities

Is there any fast tool which performs constant substitution without stripping out comments in JavaScript source code?

Categories

Resources