Antl4 no rule index for labelled rules - antlr4

For the grammar snippet from Java.g4,
statement
: block # blockStmt
| 'if' parExpression statement ('else' statement)? # ifStmt
| 'for' '(' forControl ')' statement # forStmt
| 'while' parExpression statement # whileStmt
;
All the alternatives are labelled.
I can get all StatementContext objects using this method
Trees.getAllRuleNodes(root,JavaParser.Rule_statement);
But if I am only interested in getting the IfStmtContext objects, how can I use the above method without using something like this
for(ParseTree tree : statementContextList)
{
if(tree instanceof IfStmtContext)
{
//add to a list
}
The generated JavaParser doesnt create rule indexes for labelled rules.
Do I have to customize the grammar in some way to make them indexed?
Or there is another ways do this?
My code should be fast and I need to remove as much as iterations and conditions as possible. Need to get rid of the 'instanceof' checks as well as possible

Related

Getting context name when using Labels for alternative subrules

ANTLR Version: 4.11.1
Grammar: cpp/CPP14Parser.g4 in https://github.com/antlr/grammars-v4/
ANTLR Target Language: C++
I modified the selectionStatement rule
From
If LeftParen condition RightParen statement (Else statement)?
| Switch LeftParen condition RightParen statement;
To
If LeftParen condition RightParen statement (Else statement)? # ifStatement
| Switch LeftParen condition RightParen statement # switchStatement
;
Just added labels to the alternatives.
In my ParserListener, using enterEveryRule and exitEveryRule, I create a list of entered and exited context names. The sytnax used to find context name is
parser.getRuleNames()[context->getRuleIndex()]
From the above list, in exitIfStatement function, I am unable to locate "ifStatement", but only "selectionStatement". What change do I need to the find-context-name syntax to get "ifStatement"?

Chaining filters in JMESpath

I want to be able to chain together multiple filters using JMESpath but it appears you cannot filter on the output of a filter.
My working example is as follows:
// document:
{
pips: {
ancestors:[{
id: 'p01234567'
}],
episode: {
more: 'data',
goes: 'here
}
}
}
// working filter: `[pips][?ancestors[?pid=='p01234567'] && episode]`
But I would like to write my filter instead as follows, effectively to filter the output of another filter:
[pips][?ancestors[?pid=='p01234567']][?episode]
Any idea why this doesn't work?
I am building this in NodeJS using the following NPM package: https://www.npmjs.com/package/jmespath
Is there a mistake in the syntax I am using, is there a bug in the library I am using, or am I just trying to do something that is outside what JMESpath allows?
Thank you!
I found the reason why - projections are evaluated in two steps, with the left-hand-side creating a JSON array of initial values and the right-hand-side is the expression.
Solution: "Pipe expressions" which allow you to"operate on the result of a projection".
So instead of the incorrect expression from before: [pips][?ancestors[?pid=='p01234567']][?episode]
This instead should be written as: [pips][?ancestors[?pid=='p01234567']] | [?episode]
And to undo the conversion of the initial document into an array, we can convert this back to an object like this: [pips][?ancestors[?pid=='p01234567']] | [?episode] | [0]
As a side note, I observed that using parentheses () also works, but using pipes are a bit cleaner.

Formatting string in Powershell but only first or specific occurrence of replacement token

I have a regular expression that I use several times in a script, where a single word gets changed but the rest of the expression remains the same. Normally I handle this by just creating a regular expression string with a format like the following example:
# Simple regex looking for exact string match
$regexTemplate = '^{0}$'
# Later on...
$someString = 'hello'
$someString -match ( $regexTemplate -f 'hello' ) # ==> True
However, I've written a more complex expression where I need to insert a variable into the expression template and... well regex syntax and string formatting syntax begin to clash:
$regexTemplate = '(?<=^\w{2}-){0}(?=-\d$)'
$awsRegion = 'us-east-1'
$subRegion = 'east'
$awsRegion -match ( $regexTemplate -f $subRegion ) # ==> Error
Which results in the following error:
InvalidOperation: Error formatting a string: Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
I know what the issue is, it's seeing one of my expression quantifiers as a replacement token. Rather than opt for a string-interpolation approach or replace {0} myself, is there a way I can tell PowerShell/.NET to only replace the 0-indexed token? Or is there another way to achieve the desired output using format strings?
If a string template includes { and/or } characters, you need to double these so they do not interfere with the numbered placeholders.
Try
$regexTemplate = '(?<=^\w{{2}}-){0}(?=-\d$)'

how to handle conditionally existing components in action code?

This is another problem I am facing while migrating from antlr3 to antlr4. This problem is with the java action code for handling conditional components of rules. One example is shown below.
The following grammar+code worked in antlr3. Here, if the unary operator is not present, then a value of '0' is returned, and the java code checks for this value and takes appropriate action.
exprUnary returns [Expr e]
: (unaryOp)? e1=exprAtom
{if($unaryOp.i==0) $e = $e1.e;
else $e = new ExprUnary($unaryOp.i, $e1.e);
}
;
unaryOp returns [int i]
: '-' {$i = 1;}
| '~' {$i = 2;}
;
In antlr4, this code results in a null pointer exception during a run, because 'unaryOp' is 'null' if it is not present. But if I change the code like below, then antlr generation itself reports an error:
if($unaryOp==null) ...
java org.antlr.v4.Tool try.g4
error(67): missing attribute access on rule reference 'unaryOp' in '$unaryOp'
How should the action be coded for antlr4?
Another example of this situation is in if-then-[else] - here $s2 is null in antlr4:
ifStmt returns [Stmt s]
: 'if' '(' e=cond ')' s1=stmt ('else' s2=stmt)?
{$s = new StmtIf($e.e, $s1.s, $s2.s);}
;
NOTE: question 16392152 provides a solution to this question with listeners, but I am not using listeners, my requirement is for this to be handled in the action code.
There are at least two potential ways to correct this:
The "ANTLR 4" way to do it is to create a listener or visitor instead of placing the Java code inside of actions embedded in the grammar itself. This is the only way I would even consider solving the problem in my own grammars.
If you still use an embedded action, the most efficient way to check if the item exists or not is to access the ctx property, e.g. $unaryOp.ctx. This property resolves to the UnaryOpContext you were assuming would be accessible by $unaryOp by itself.
ANTLR expects you access an attribute. Try its text attribute instead: $unaryOp.text==null

ANTLR4: Tree construction

I am extending the baseClass Listener and am attempting to read in some values, however there doesnt seem to be any hierrarchy in the order.
A cut down version of my grammar is as follows:
start: config_options+
config_options: (KEY) EQUALS^ (PATH | ALPHANUM) (' '|'\r'|'\n')* ;
KEY: 'key' ;
EQUALS: '=' ;
ALPHANUM: [0-9a-zA-Z]+ ;
However the parse tree of this implementation is flat at the config_options level (Terminal level) i.e.the rule start has many children of config_options but EQUALS is not the root of subtrees of config_options, all of the TOKENS have the rule config_options as root node. How can I make one of the terminals a root node instead?
In this particular rule I dont want any of the spaces to be captured, I know that there is the -> skip directed for the lexer however there are some cases where I do want the space. i.e. in String '"'(ALPHANUM|' ')'"'
(Note: the ^ does not seem to work)
an example for input is:
key=abcdefg
key=90weata
key=acbefg9
All I want to do is extract the key and value pairs. I would expect that the '=' would be the root and the two children would be the key and the value.
When you generate your grammar, you should be getting a syntax error over the use of the ^ operator, which was removed in ANTLR 4. ANTLR 4 generates parse trees, the roots of which are implicitly defined by the rules in your grammar. In other words, for the grammar you gave above the parse tree nodes will be start and config_options.
The generated config_options rule will return an instance of Config_optionsContext, which contains the following methods:
KEY() returns a TerminalNode for the KEY token.
EQUALS() (same for the EQUALS token)
PATH() (same for the PATH token)
ALPHANUM() (same for the ALPHANUM token)
You can call getSymbol() on a TerminalNode to get the Token instance.

Resources