The sumScalarOperator gives me this error, it seems that antlr see it like a possible infinite recursion loop. How can i avoid it?
sumScalarOperator: function SUM_TOKEN function;
function :
| INTEGER_TOKEN
| NUMERIC_TOKEN
| sumScalarOperator
| ID;
ID : [A-Za-z_-] [a-zA-Z0-9_-]*;
INTEGER_TOKEN: [0-9]+;
NUMERIC_TOKEN: [0-9]+'.'[0-9]+ ;
ANTLR4 can't cope with mutually left-recursive rules, but it can rewrite single left-recursive rules automatically to eliminate the left-recursion, so you can just feed it with something like:
function : function SUM_TOKEN function # sumScalarOperator
| INTEGER_TOKEN # value
| NUMERIC_TOKEN # value
| ID # value
;
Replace the value label with anything you need.
Related
With jq, I'm trying to use entries in one array to index into a separate array. A simple JSON input would look like this:
{
"myArray": [ "AA", "BB", "CC", "DD", "EE" ],
"myFlags": [ 4, 3, 2, 1, 0 ]
}
jq's nifty 'as' operator is then able to bring the myArray array into scope and the indexing works fine:
.myArray as $Array | .myFlags | .[] | $Array[.] ====> yields "EE","DD","CC","BB","AA"
So far so jq-manual. However, if I try and move the $Array array access down into a function, the as-variable scope disappears:
def myFun: $Array[.]; .myArray as $Array | .myFlags | .[] | myFun
jq: error: $Array is not defined at <top-level>, line 1:
def myFun: $Array[.]; .myArray as $Array | .myFlags | .[] | myFun
To get around this, I currently pass down a temporary JSON object containing both the index and the array:
def myFun: .a[.b]; .myArray as $Array | .myFlags | .[] | { a: $Array, b: . } | myFun
Though this works, I have to say I'm not hugely comfortable with it.
Really, this doesn't feel to me as though this is proper jq language behaviour. It seems to me that the 'as'-scope ought to persist down into invoked def-functions. :-(
Is there a better way of extending as-scope down into def-functions? Am I missing some jq subtlety?
It actually is possible to make an "as" variable visible inside a function without passing it in as a parameter (see "Lexical Scoping" below), but it is usually unnecessary, and in fact using "as" variables is often unnecessary as well.
Avoiding the "as" variable
You can simplify your original query to:
.myArray[.myFlags[]]
Using function arguments
You can write jq functions with one or more arguments. This is the appropriate way to parameterize filters.
The syntax is quite conventional except that for functions with more than one argument, the semicolon (";") is used as the argument separator, both in the function definitions and invocations.
Note also that jq function arguments can themselves be filters, e.g. you could write:
def myFun(array; $ix): array | .[$ix];
myFun(.myArray; .myFlags[])
Lexical scoping
Here's an example showing how an 'as' variable can be made visible inside a function:
[1,2] as $array | def myFun: $array[1]; myFun
With a feature containing a Scenario Outline and an Example table like:
Feature: Demonstrate Issue
Scenario Outline: Parameters with spaces don't match
Given a variable containing <var>
Examples:
| var |
| No_Spaces |
| Some Spaces |
I'm struggling to match the <var> in my steps using cucumber expression with cucumberjs.
I know that the {string} parameter type requires double quotes rather than the angular brackets, but I expected the anonymous type to match:
Given('a variable containing {}', function(expectedVar) {
return true;
});
But it doesn't.
I know I can use the regex option:
Given(/^a variable containing (.*)$/, function(expectedVar) {
return true;
});
I would just like to know where I'm going wrong in my use of the anonymous parameter type.
I must define a rule which expresses the following statement: {x in y | x > 0}.
For the first part of that comprehension "x in y", i have the subrule:
FIRSTPART: Name "in" Name
, whereas Name can be everything.
My problem is that I do not want a greedy behaviour. So, it should parse until the "|" sign and then stop. Since I am new in ANTLR4, I do not know how to achieve that.
best regards,
Normally, the lexer/parser rules should represent the allowable syntax of the source input stream.
The evaluation (and consequences) of how the source matches any rule or subrule is a matter of semantics -- whether the input matches a particular subrule and whether that should control how the rule is finally evaluated.
Normally, semantics are implemented as part of the tree-walker analysis. You can use alternate subrule lables (#inExpr, etc) to create easily distinguishable tree nodes for analysis purposes:
comprehension : LBrace expression property? RBrace ;
expression : ....
| Name In Name #inExpr
| Name BinOp Name #binExpr
| ....
;
property : Provided expression ;
BinOp : GT | LT | GTE | .... ;
Provided : '|' ;
In : 'in' ;
I am doing a parser in bison/flex.
This is part of my code:
I want to implement the assignment production, so the identifier can be both boolean_expr or expr, its type will be checked by a symbol table.
So it allows something like:
int a = 1;
boolean b = true;
if(b) ...
However, it is reduce/reduce if I include identifier in both term and boolean_expr, any solution to solve this problem?
Essentially, what you are trying to do is to inject semantic rules (type information) into your syntax. That's possible, but it is not easy. More importantly, it's rarely a good idea. It's almost always best if syntax and semantics are well delineated.
All the same, as presented your grammar is unambiguous and LALR(1). However, the latter feature is fragile, and you will have difficulty maintaining it as you complete the grammar.
For example, you don't include your assignment syntax in your question, but it would
assignment: identifier '=' expr
| identifier '=' boolean_expr
;
Unlike the rest of the part of the grammar shown, that production is ambiguous, because:
x = y
without knowing anything about y, y could be reduced to either term or boolean_expr.
A possibly more interesting example is the addition of parentheses to the grammar. The obvious way of doing that would be to add two productions:
term: '(' expr ')'
boolean_expr: '(' boolean_expr ')'
The resulting grammar is not ambiguous, but it is no longer LALR(1). Consider the two following declarations:
boolean x = (y) < 7
boolean x = (y)
In the first one, y must be an int so that (y) can be reduced to a term; in the second one y must be boolean so that (y) can be reduced to a boolean_expr. There is no ambiguity; once the < is seen (or not), it is entirely clear which reduction to choose. But < is not the lookahead token, and in fact it could be arbitrarily distant from y:
boolean x = ((((((((((((((((((((((y...
So the resulting unambiguous grammar is not LALR(k) for any k.
One way you could solve the problem would be to inject the type information at the lexical level, by giving the scanner access to the symbol table. Then the scanner could look a scanned identifier token in the symbol table and use the information in the symbol table to decide between one of three token types (or more, if you have more datatypes): undefined_variable, integer_variable, and boolean_variable. Then you would have, for example:
declaration: "int" undefined_variable '=' expr
| "boolean" undefined_variable '=' boolean_expr
;
term: integer_variable
| ...
;
boolean_expr: boolean_variable
| ...
;
That will work but it should be obvious that this is not scalable: every time you add a type, you'll have to extend both the grammar and the lexical description, because the now the semantics is not only mixed up with the syntax, it has even gotten intermingled with the lexical analysis. Once you let semantics out of its box, it tends to contaminate everything.
There are languages for which this really is the most convenient solution: C parsing, for example, is much easier if typedef names and identifier names are distinguished so that you can tell whether (t)*x is a cast or a multiplication. (But it doesn't work so easily for C++, which has much more complicated name lookup rules, and also much more need for semantic analysis in order to find the correct parse.)
But, honestly, I'd suggest that you do not use C -- and much less C++ -- as a model of how to design a language. Languages which are hard for compilers to parse are also hard for human beings to parse. The "most vexing parse" continues to be a regular source of pain for C++ newcomers, and even sometimes trips up relatively experienced programmers:
class X {
public:
X(int n = 0) : data_is_available_(n) {}
operator bool() const { return data_is_available_; }
// ...
private:
bool data_is_available_;
// ...
};
X my_x_object();
// ...
if (!x) {
// This code is unreachable. Can you see why?
}
In short, you're best off with a language which can be parsed into an AST without any semantic information at all. Once the parser has produced the AST, you can do semantic analyses in separate passes, one of which will check type constraints. That's far and away the cleanest solution. Without explicit typing, the grammar is slightly simplified, because an expr now can be any expr:
expr: conjunction | expr "or" conjunction ;
conjunction: comparison | conjunction "and" comparison ;
comparison: product | product '<' product ;
product: factor | product '*' factor ;
factor: term | factor '+' term ;
term: identifier
| constant
| '(' expr ')'
;
Each action in the above would simply create a new AST node and set $$ to the new node. At the end of the parse, the AST is walked to verify that all exprs have the correct type.
If that seems like overkill for your project, you can do the semantic checks in the reduction actions, effectively intermingling the AST walk with the parse. That might seem convenient for immediate evaluation, but it also requires including explicit type information in the parser's semantic type, which adds unnecessary overhead (and, as mentioned, the inelegance of letting semantics interfere with the parser.) In that case, every action would look something like this:
expr : expr '+' expr { CheckArithmeticCompatibility($1, $3);
$$ = NewArithmeticNode('+', $1, $3);
}
I have trouble writing a parameterized test with Spock, when one the parameter needs the pipe character, for instance because its a flag computation.
def "verify inferInputType()"() {
expect:
inputType == mPresenter.inferInputType(opt)
where:
opt | inputType
0 | 0
EDITTEXT_TYPE_ALPHANUM | InputType.TYPE_CLASS_TEXT
EDITTEXT_TYPE_NUM | InputType.TYPE_CLASS_NUMBER
EDITTEXT_TYPE_FLOAT | (InputType.TYPE_CLASS_NUMBER | InputType.TYPE_NUMBER_FLAG_DECIMAL)
}
The test fails with the following error message :
Row in data table has wrong number of elements (3 instead of 2) # line 25, column 9.
EDITTEXT_TYPE_FLOAT | InputType.TYPE_CLASS_NUMBER | InputType.TYPE_NUMBER_FLAG_DECIMAL
^
The only way I find to make it work is to wrap the parameter inside an closure, like that
EDITTEXT_TYPE_FLOAT | {InputType.TYPE_CLASS_NUMBER | InputType.TYPE_NUMBER_FLAG_DECIMAL}()
But it's ugly, if someone has a better solution, please tell me.
You should be able to do:
InputType.TYPE_CLASS_NUMBER.or( InputType.TYPE_NUMBER_FLAG_DECIMAL )
Not sure if that is better ;-)