Meaning for = in antlr4 - antlr4

I have a grammar like
rule1 : GO (count=DECIMAL)? ;
rule2 : name '=' expression
I dont understand the difference between '=' sign in rule1 and rule2

The assignment is a variable assignment. ANTLR4 will generate a member variable named count for you, which gets the DECIMAL token when matched (since it is optional, count might be empty/null).
You can use count for instance in your listener code to directly get that value, however you could also just use DECIMAL instead. So it's mostly useful for action code or predicates in your grammar. You can refer to such variables by using e.g. $count:
rule1: GO (count = DECIMAL)? { $count.toString().toInteger() < 4}?;
which matches only if GO is followed by a value less than 4.
Side note: toInteger() is just pseudo code here. Use your target's string-to-int conversion API.

Related

How can i specify parameter dependent on index value?

I'm trying to port code from DML 1.2 to DML 1.4. Here is part of code that i ported:
group rx_queue [i < NQUEUES] {
<...>
param desctype = i < 64 #? regs.SRRCTL12[i].DESCTYPE.val #: regs.SRRCTL2[i - 64].DESCTYPE.val; // error occurs here
<...>
}
Error says:
error: non-constant expression: cast(i, int64 ) < 64
How can i specify parameter dependent on index value?
I tried to use if...else instead ternary operator, but it says that conditional parameters are not allowed in DML.
Index parameters in DML are a slightly magical expressions; when used from within parameters, they can evaluate to either a constant or a variable depending on where the parameter is used from. Consider the following example:
group g[i < 5] {
param x = i * 4;
method m() {
log info: "%d", x;
log info: "%d", g[4 - i].x;
log info: "%d", g[2].x;
}
}
i becomes an implicit local variable within the m method, and in params, indices are a bit like implicit macro parameters. When the compiler encounters x in the first log statement, the param will expand to i * 4 right away. In the second log statement, the x param is taken from an object indexed with the expression 4 - i, so param expansion will instead insert (5 - i) * 4. In the third log statement, the x param is taken from a constant indexed object, so it expands to 2 * 4 which collapses into the constant 8.
Most uses of desctype will likely happen from contexts where indices are variable, and the #? expression requires a constant boolean as condition, so this will likely give an error as soon as anyone tries to use it.
I would normally advise you to switch from #? to ? in the definition of the desctype param, but that fails in this particular case: DMLC will report error: array index out of bounds on the i - 64 expression. This error is much more confusing, and happens because DMLC automatically evaluates every parameter once with all zero indices, to smoke out misspelled identifiers; this will include evaluation of SRRCTL2[i-64] which collapses into SRRCTL2[-64] which annoys DMLC.
This is arguably a compiler bug; DMLC should probably be more tolerant in this corner. (Note that even if we would remove the zero-indexed validation step from the compiler, your parameter would still give the same error message if it ever would be explicitly referenced with a constant index, like log info: "%d", rx_queue[0].desctype).
The reason why you didn't get an error in DML 1.2 is that DML 1.2 had a single ternary operator ? that unified 1.4's ? and #?; when evaluated with a constant condition the dead branch would be disregarded without checking for errors. This had some strange effects in other situations, but made your particular use case work.
My concrete advise would be to replace the param with a method; this makes all index variables unconditionally non-constant which avoids the problem:
method desctype() -> (uint64) {
return i < 64 ? regs.SRRCTL12[i].DESCTYPE.val : regs.SRRCTL2[i - 64].DESCTYPE.val;
}

element => command in Terraform

I see a code as below in https://github.com/terraform-aws-modules/terraform-aws-efs/blob/master/examples/complete/main.tf#L58
# Mount targets / security group
mount_targets = { for k, v in toset(range(length(local.azs))) :
element(local.azs, k) => { subnet_id = element(module.vpc.private_subnets, k) }
}
I am trying to understand what => means here. Also this command with for loop, element and =>.
Could anyone explain here please?
In this case the => symbol isn't an independent language feature but is instead just one part of the for expression syntax when the result will be a mapping.
A for expression which produces a sequence (a tuple, to be specific) has the following general shape:
[
for KEY_SYMBOL, VALUE_SYMBOL in SOURCE_COLLECTION : RESULT
if CONDITION
]
(The KEY_SYMBOL, portion and the if CONDITION portion are both optional.)
The result is a sequence of values that resulted from evaluating RESULT (an expression) for each element of SOURCE_COLLECTION for which CONDITION (another expression) evaluated to true.
When the result is a sequence we only need to specify one result expression, but when the result is a mapping (specifically an object) we need to specify both the keys and the values, and so the mapping form has that additional portion including the => symbol you're asking about:
{
for KEY_SYMBOL, VALUE_SYMBOL in SOURCE_COLLECTION : KEY_RESULT => VALUE_RESULT
if CONDITION
}
The principle is the same here except that for each source element Terraform will evaluate both KEY_RESULT and VALUE_RESULT in order to produce a key/value pair to insert into the resulting mapping.
The => marker here is just some punctuation so that Terraform can unambiguously recognize where the KEY_RESULT ends and where the VALUE_RESULT begins. It has no special meaning aside from being a delimiter inside a mapping-result for expression. You could think of it as serving a similar purpose as the comma between KEY_SYMBOL and VALUE_SYMBOL; it has no meaning of its own, and is only there to mark the boundary between two clauses of the overall expression.
When I read a for expression out loud, I typically pronounce => as "maps to". So with my example above, I might pronounce it as "for each key and value in source collection, key result maps to value result if the condition is true".
Lambda expressions use the operator symbol =, which reads as "goes to." Input parameters are specified on the operator's left side, and statement/expressions are specified on the right. Generally, lambda expressions are not directly used in query syntax but are often used in method calls. Query expressions may contain method calls.
Lambda expression syntax features are as follows:
It is a function without a name.
There are no modifiers, such as overloads and overrides.
The body of the function should contain an expression, rather than a statement.
May contain a call to a function procedure but cannot contain a call to a subprocedure.
The return statement does not exist.
The value returned by the function is only the value of the expression contained in the function body.
The End function statement does not exist.
The parameters must have specified data types or be inferred.
Does not allow generic parameters.
Does not allow optional and ParamArray parameters.
Lambda expressions provide shorthand for the compiler, allowing it to emit methods assigned to delegates.
The compiler performs automatic type inference on the lambda arguments, which is a key advantage.

Dropping various string variables in a loop in Stata

I want to a drop a great number of string variables that contain the word "Other" in their observations. As such, I tried the following loop to drop all the variables:
foreach var of varlist v1-v240 {
drop `var' if `var'=="Other"
}
What I get in return is the answer "syntax error". I would like to know not only a way to perform the task of dropping all the variables that contain the word "Other", but also why the code that I've entered returns an error.
The short answer on why your syntax is illegal, which #Dimitriy Masterov doesn't quite spell out, is that drop supports just two syntaxes, which can't be mixed, dropping variables and dropping observations. This is documented: see e.g. http://www.stata.com/help.cgi?drop and the corresponding on-line help and manual entry within Stata.
In addition to other solutions, findname from the Stata Journal would allow this solution:
findname, any(# == "Other")
drop `r(varlist)'
Your interpretation of contain is evidently 'is equal to' judging by your use of == as an operator, echoed above. If contain really means 'includes as substring', then you need a syntax such as
any(strpos(#, "Other"))
or
any(regexm(#, "Other"))
as #Dimitriy also explains.
If they are actual strings, this should work:
sysuse auto, clear
ds, has(type string) // get a list of string variables
// loop over each string variable, count observations that contain Buick anywhere, and drop the variable if N>0
foreach var of varlist `r(varlist)' {
count if regexm(`var',"Buick")
if r(N)>0 {
drop `var'
}
}
If "contains" means only contains, then you need to use "^Buick$" instead or
count if `var'=="Buick"
Beware of leading/trailing spaces.
The if qualifier restricts the scope of a command to those observations for which the value of the expression is true. Your code errors because you are asking Stata to drop a variable (a column) if some observations (rows) satisfy a condition. You could use the if qualifier to drop those observations or you can drop a variable, but not both simultaneously. My code uses the if command (a different beast) to verify the condition, and then drops the variable if that condition is satisfied.
You might be tempted to do something like
if `var'=="Other" {
drop `var'
}
but that will usually not work as expected (it would drop the variable only if the first observation was "Other").

ANTLR4 Semantic Predicates that is Context Dependent Does Not Work

I am parsing a C++ like declaration with this scaled down grammar (many details removed to make it a fully working example). It fails to work mysteriously (at least to me). Is it related to the use of context dependent predicate? If yes, what is the proper way to implement the "counting the number of child nodes logic"?
grammar CPPProcessor;
cppCompilationUnit : decl_specifier_seq? init_declarator* ';' EOF;
init_declarator: declarator initializer?;
declarator: identifier;
initializer: '=0';
decl_specifier_seq
locals [int cnt=0]
#init { $cnt=0; }
: decl_specifier+ ;
decl_specifier : #init { System.out.println($decl_specifier_seq::cnt); }
'const'
| {$decl_specifier_seq::cnt < 1}? type_specifier {$decl_specifier_seq::cnt += 1;} ;
type_specifier: identifier ;
identifier:IDENTIFIER;
CRLF: '\r'? '\n' -> channel(2);
WS: [ \t\f]+ -> channel(1);
IDENTIFIER:[_a-zA-Z] [0-9_a-zA-Z]* ;
I need to implement the standard C++ rule that no more than 1 type_specifier is allowed under an decl_specifier_seq.
Semantic predicate before type_specifier seems to be the solution. And the count is naturally declared as a local variable in decl_specifier_seq since nested decl_specifier_seq are possible.
But it seems that a context dependent semantic predicate like the one I used will produce incorrect parsing i.e. a semantic predicate that references $attributes. First an input file with correct result (to illustrate what a normal parse tree looks like):
int t=0;
and the parse tree:
But, an input without the '=0' to aid the parsing
int t;
0
1
line 1:4 no viable alternative at input 't'
1
the parsing failed with the 'no viable alternative' error (the numbers printed in the console is debug print of the $decl_specifier_cnt::cnt value as a verification of the test condition). i.e. the semantic predicate cannot prevent the t from being parsed as type_specifier and t is no longer considered a init_declarator. What is the problem here? Is it because a context dependent predicate having $decl_specifier_seq::cnt is used?
Does it mean context dependent predicate cannot be used to implement "counting the number of child nodes" logic?
EDIT
I tried new versions whose predicate uses member variable instead of the $decl_specifier_seq::cnt and surprisingly the grammar now works proving that the Context Dependent predicate did cause the previous grammar to fail:
....
#parser::members {
public int cnt=0;
}
decl_specifier
#init {System.out.println("cnt:"+cnt); }
:
'const'
| {cnt<1 }? type_specifier {cnt++;} ;
A normal parse tree is resulted:
This gives rise to the question of how to support nested rule if we must use member variables to replace the local variables to avoid context sensitive predicates?
And a weird result is that if I add a /*$ctx*/ after the predicate, it fails again:
decl_specifier
#init {System.out.println("cnt:"+cnt); }
:
'const'
| {cnt<1 /*$ctx*/ }? type_specifier {cnt++;} ;
line 1:4 no viable alternative at input 't'
The parsing failed with no viable alternative. Why the /*$ctx*/ causes the parsing to fail like when $decl_specifier_seq::cnt is used although the actual logic uses a member variable only?
And, without the /*$ctx*/, another issue related to the predicate called before #init block appears(described here)
ANTLR 4 evaluates semantic predicates in two cases.
The generated code evaluates a semantic predicate during parsing, and throws an exception of the evaluation returns false. All predicates traversed during parsing are evaluated in this way, including context-dependent predicates and predicates which do not appear at the left side of a decision.
The prediction method evaluates predicates in order to make correct decisions during parsing. In this case, predicates which appear anywhere other than the left edge of the decision being evaluated are assumed to return true (i.e. they are ignored). In addition, context-dependent predicates are only evaluated if the context data is available. The prediction algorithm will not create context structures that were not already provided by the parsing code. If a context-dependent predicate is encountered during prediction and no context is available, the predicate is assumed to return true (i.e. it is ignored for that decision).
The code generator does not evaluate the semantics of the target language, so it has no way to know that $ctx is semantically irrelevant when it appears in /*$ctx*/. Both cases result in the predicate being treated as context-dependent.

Token empty when matching grammar although rule matched

So my rule is
/* Addition and subtraction have the lowest precedence. */
additionExp returns [double value]
: m1=multiplyExp {$value = $m1.value;}
( op=AddOp m2=multiplyExp )* {
if($op != null){ // test if matched
if($op.text == "+" ){
$value += $m2.value;
}else{
$value -= $m2.value;
}
}
}
;
AddOp : '+' | '-' ;
My test ist 3 + 4 but op.text always returns NULL and never a char.
Does anyone know how I can test for the value of AddOp?
In the example from ANTLR4 Actions and Attributes it should work:
stat: ID '=' INT ';'
{
if ( !$block::symbols.contains($ID.text) ) {
System.err.println("undefined variable: "+$ID.text);
}
}
| block
;
Are you sure $op.text is always null? Your comparison appears to check for $op.text=="+" rather than checking for null.
I always start these answers with a suggestion that you migrate all of your action code to listeners and/or visitors when using ANTLR 4. It will clean up your grammar and greatly simplify long-term maintenance of your code.
This is probably the primary problem here: Comparing String objects in Java should be performed using equals: "+".equals($op.text). Notice that I used this ordering to guarantee that you never get a NullPointerException, even if $op.text is null.
I recommend removing the op= label and referencing $AddOp instead.
When you switch to using listeners and visitors, removing the explicit label will marginally reduce the size of the parse tree.
(Only relevant to advanced users) In some edge cases involving syntax errors, labels may not be assigned while the object still exists in the parse tree. In particular, this can happen when a label is assigned to a rule reference (your op label is assigned to a token reference), and an error appears within the labeled rule. If you reference the context object via the automatically generated methods in the listener/visitor, the instances will be available even when the labels weren't assigned, improving your ability to report details of some errors.

Resources