Antlr4 Adjacent Token Precedence - antlr4

I'm running into a problem while building a complex grammar. A pet grammar to illustrate it is below:
grammar test;
start: (r1 | r2 | .)*
r1: A B
r2: B C
// A B C are tokens
When the following input occurs:
ABC
The parse tree looks like this:
start
| \
r1 C
| \
A B
But what I actually want is for it to look like this:
start
| \
A r2
| \
B C
I've tried reordering the rules & adding <assoc=right>, but nothing seems to work except removing rule r1, which is incorrect because I expect AB and BC to be valid inputs. What am I missing?
EDIT
It seems the above problem description oversimplifies the actual issue, so I'll give more details:
r3: rA r4 // prefers rA(classB classC) over (rA classB)classC
r4: classB? classC // also used elsewhere other than r3
rA: // rules to build A subtree, ends with classB? in 'some' cases
classB: B1 | B2 | ... | Bm
classC: C1 | C2 | ... | Cn
I've found that the following 'kind of' works:
r3: rA Bx classC | ...
But the following doesn't:
r3: <assoc=right> rA r4 | ... // still builds (rA classB)classC
I'm wondering if there's a way I can build the tree correctly while being able to utilize r4 and its associated code (and avoid having to put another m lines for all instances of B)?
PS. rA is expensive, so expanding B tokens in r3 like above throws performance to the dogs.

The problem I see here is that you tell the parser to produce the parse tree you don't want. If you don't want it then don't specify the grammar that is supposed to produce it.
Similar to what Mike Cargal came up with I think the real solution is to more explicitly specify what you want to see at the end. Here's something that works pretty well (using your initial problem description and MikeC's test input):
parser grammar testparser;
options {
tokenVocab = testlexer;
}
start: (A r2 | r1 | .)*? EOF;
r1: A B;
r2: B C;
lexer grammar testlexer;
A: 'A';
B: 'B';
C: 'C';
WHITE_SPACE: [ \u000B\t\r\n] -> skip ;
OTHER: .;
With the input AB!C2 I get this parse three:
Leaving out C this changes to:
The main change is that you specialise the rule to make BC match in their own sub parse tree, by adding A for the r2 alt and put that first.
Note
Moving that single A down to the r2 rule will break this, because then you tell the parser to create a sub tree with ABC in it (what you don't want).

I suspect that you'll find that you're really fighting the way a recursive descent parser works.
When the parser tries to match rA it will try to match as many input tokens as possible, and is not remotely aware of the parents subsequent rule (i.e. the current rule's "next sibling"). (assoc = right isn't going to make the rule look up to it's parent and next sibling, it's designed for building the correct parse trees for things like exponentiation.)
As such, you'd need to use something like a semantic predicate to "block" the wrong alternative from matching, by looking far enough ahead to determine whether it should actually match r4 (for example).
For your example, the following grammar parses the input "AB!C2" as
grammar test
;
start: r3 EOF;
r3
: rA r4 // prefers rA(classB classC) over (rA classB)classC
;
r4
: classB? classC // also used elsewhere other than r3
;
rA // rules to build A subtree, ends with classB? in 'some' cases
: A C1
| A
| {!( //
_input.get(_input.index()).getType() == C1 || //
_input.get(_input.index()).getType() == C2) //
}? A classB?
;
classB: B1 | B2;
classC: C1 | C2;
A: 'A';
B1: 'B1';
B2: 'B2';
C1: 'C1';
C2: 'C2';
WS: [ \n\r] -> skip;
OTHER: .;
However, that's making an assumption that You can determine an r4 rule match by just looking ahead at the second token that rule would possibly examine. I would suspect that this is completely untenable from a maintenance and understanding standpoint.
You could slightly improve maintainability by including functions using the #parser::members ability. This allows you to have more complex logic for your predicate, while not littering the actual grammar too badly.
grammar test
;
#parser::members {
private int[] laValues = new int[] { C1, C2 };
private boolean r4Follows() {
int la = _input.get(_input.index()).getType();
for (int i = 0; i < laValues.length; i++) {
if (la == laValues[i])
return true;
}
return false;
}
}
start: r3 EOF;
r3
: rA r4 // prefers rA(classB classC) over (rA classB)classC
;
r4
: classB? classC // also used elsewhere other than r3
;
rA // rules to build A subtree, ends with classB? in 'some' cases
: A C1
| A
| A classB {!r4Follows()}?
;
classB: B1 | B2;
classC: C1 | C2;
A: 'A';
B1: 'B1';
B2: 'B2';
C1: 'C1';
C2: 'C2';
WS: [ \n\r] -> skip;
OTHER: .;
However, this is still likely to be a mess to maintain. I think the bottom line is that what you're trying to do, creates an ambiguity as to which rule classB belongs to.
A recursive descent parser, doesn't really see an ambiguity, it just matches the rule it's working on at that time (rA) the best it can and that includes pulling in the classB match. You'll have to introduce a semantic predicate to prevent that, and that's going to be, pretty much, unmanageable.
Addendum:
Against all that is holy in the ANTLR realm, I did work out a way to use the actual r4() rule to test for the match of a rule at the current position of the token stream. I HIGHLY recommend you file this under "stupid ANTLR tricks".
Since this actually attempts the parse of r4, depending on the complexity involved, it could substantially impact performance.
Also, I can make no guarantee that this doesn't still leave some state violated, though it worked on simple examples.
NOTE: The location of the semantic predicate is important as the current index of the token stream is advancing as you progress through the rule.
grammar test
;
#parser::header {import java.util.function.Supplier;}
#parser::members {
private BailErrorStrategy bailStrategy = new BailErrorStrategy();
private boolean ruleMatches(Supplier<ParserRuleContext> rule) {
boolean result = false;
// save state
int idx = _input.index();
int savedState = this.getState();
List<ParseTree> savedChildren = _ctx.children;
_ctx.children = new ArrayList<>();
ANTLRErrorStrategy savedErrStrategy = this.getErrorHandler();
this.setErrorHandler(bailStrategy);
try {
ParserRuleContext attempt = rule.get();
result = true;
} catch (ParseCancellationException pce) {
result = false;
} finally {
// restore state
this.setErrorHandler(savedErrStrategy);
_ctx.children = savedChildren;
this.setState(savedState);
_input.seek(idx);
}
return result;
}
}
start: r3 EOF;
r3
: rA r4? // prefers rA(classB classC) over (rA classB)classC
;
r4
: classB? classC // also used elsewhere other than r3
;
rA // rules to build A subtree, ends with classB? in 'some' cases
: A {!ruleMatches(this::r4)}? classB?
| A D
;
classB: B1 | B2;
classC: C1 | C2;
A: 'A';
B1: 'B1';
B2: 'B2';
C1: 'C1';
C2: 'C2';
D: 'D';
WS: [ \n\r] -> skip;
OTHER: .;
input "AB1C1"
input "ADC1"
input "AC1"
input "AB1"
(I believe it is now required that I go sacrifice an innocent kitten to the ANTLR gods to save my soul for posting that)

Related

Elegant way to parse Calculator.g4 in grammars-v4 using antlr4 listener model

I learned the basic grammars of antlr4 and tried to build a simple calculator. But, I have no idea how to handle PLUS | MINUS and TIMES | DIV.
expression: multiplyingExpression ((PLUS | MINUS) multiplyingExpression)*;
multiplyingExpression: signedAtom ((TIMES | DIV) signedAtom)*;
signedAtom: PLUS signedAtom | MINUS signedAtom | func_ | atom;
(code extracted from Antlr4 sample calculator grammar)
It seems like no API can handle PLUS | MINUS, because they are not defined like signedAtom/expression, which can be handled in method like exitXXX.
Does this mean, the grammar like this can only be parsed in visitor model?
Example:
Here is a extremely simple example code in golang.
calculator.g4
grammar calculator;
expression: atom ((PLUS | MINUS) atom)*;
atom: '1';
PLUS: '+';
MINUS: '-';
WS: [ \r\n\t]+ -> skip;
Code in Golang listener model
func NewCalculatorListenrImpl() *CalculatorListenrImpl {
return &CalculatorListenrImpl{
stack: stack.New(),
}
}
type CalculatorListenrImpl struct {
BasecalculatorListener
stack *stack.Stack
}
func (s *CalculatorListenrImpl) ExitExpression(ctx *ExpressionContext) {
left, op, right := s.stack.Pop().(int), s.stack.Pop().(string), s.stack.Pop().(int)
switch op {
case "+":
s.stack.Push(left + right)
case "-":
s.stack.Push(left - right)
}
}
func (s *CalculatorListenrImpl) ExitAtom(ctx *AtomContext) {
v, _ := strconv.ParseInt(ctx.GetText(), 10, 32)
s.stack.Push(int(v))
}
I can push the atom element to stack in exitAtom and then handle the logic in exitExpression method. However, there is no listener for (PLUS | MINUS). What I am looking for is method like exitPLUS where I can simply push them to stack like atom. May be there is other ways to do this? I knostrateforw their is some magic syntax like # and op=xxx, but these code fragements are copied from grammars-v4.

Lock Challenge in Alloy

I would like to solve the following lock challenge using Alloy.
My main issue is how to model the integers representing the digit keys.
I created a quick draft:
sig Digit, Position{}
sig Lock {
d: Digit one -> lone Position
}
run {} for exactly 1 Lock, exactly 3 Position, 10 Digit
In this context, could you please:
tell me if Alloy seems to you suitable to solve this kind of problem?
give me some pointers regarding the way I could model the key digits (without using Ints)?
Thank you.
My frame of this puzzle is:
enum Digit { N0,N1,N2,N3,N4,N5,N6,N7,N8,N9 }
one sig Code {a,b,c:Digit}
pred hint(h1,h2,h3:Digit, matched,wellPlaced:Int) {
matched = #(XXXX) // fix XXXX
wellPlaced = #(XXXX) // fix XXXX
}
fact {
hint[N6,N8,N2, 1,1]
hint[N6,N1,N4, 1,0]
hint[N2,N0,N6, 2,0]
hint[N7,N3,N8, 0,0]
hint[N7,N8,N0, 1,0]
}
run {}
A simple way to get started, you do not always need sig's. The solution found is probably not the intended solution but that is because the requirements are ambiguous, took a shortcut.
pred lock[ a,b,c : Int ] {
a=6 || b=8 || c= 2
a in 1+4 || b in 6+4 || c in 6+1
a in 0+6 || b in 2+6 || c in 2+0
a != 7 && b != 3 && c != 8
a = 7 || b=8 || c=0
}
run lock for 6 int
Look in the Text view for the answer.
upate we had a discussion on the Alloy list and I'd like to amend my solution with a more readable version:
let sq[a,b,c] = 0->a + 1->b + 2->c
let digit = { n : Int | n>=0 and n <10 }
fun correct[ lck : seq digit, a, b, c : digit ] : Int { # (Int.lck & (a+b+c)) }
fun wellPlaced[ lck : seq digit, a, b, c : digit ] : Int { # (lck & sq[a,b,c]) }
pred lock[ a, b, c : digit ] {
let lck = sq[a,b,c] {
1 = correct[ lck, 6,8,2] and 1 = wellPlaced[ lck, 6,8,2]
1 = correct[ lck, 6,1,4] and 0 = wellPlaced[ lck, 6,1,4]
2 = correct[ lck, 2,0,6] and 0 = wellPlaced[ lck, 2,0,6]
0 = correct[ lck, 7,3,8]
1 = correct[ lck, 7,8,0] and 0 = wellPlaced[ lck, 7,8,0]
}
}
run lock for 6 Int
When you think solve complete, let's examine whether the solution is generic.
Here is another lock.
If you can’t solve this in same form, your solution may not enough.
Hint1: (1,2,3) - Nothing is correct.
Hint2: (4,5,6) - Nothing is correct.
Hint3: (7,8,9) - One number is correct but wrong placed.
Hint4: (9,0,0) - All numbers are correct, with one well placed.
Yes, I think Alloy is suitable for this kind of problem.
Regarding digits, you don't need integers at all: in fact, it is a bit irrelevant for this particular purpose if they are digits or any set of 10 different identifiers (no arithmetic is performed with them). You can use singleton signatures to declare the digits, all extending signature Digit, which should be marked as abstract. Something like:
abstract sig Digit {}
one sig Zero, One, ..., Nine extends Digit {}
A similar strategy can be used to declare the three different positions of the lock. And btw since you have exactly one lock you can also declare Lock as singleton signature.
I like the Nomura solution on this page. I made a slight modification of the predicate and the fact to solve.
enum Digit { N0,N1,N2,N3,N4,N5,N6,N7,N8,N9 }
one sig Code {a,b,c: Digit}
pred hint(code: Code, d1,d2,d3: Digit, correct, wellPlaced:Int) {
correct = #((code.a + code.b + code.c)&(d1 + d2 + d3))
wellPlaced = #((0->code.a + 1->code.b + 2->code.c)&(0->d1 + 1->d2 + 2->d3))
}
fact {
some code: Code |
hint[code, N6,N8,N2, 1,1] and
hint[code, N6,N1,N4, 1,0] and
hint[code, N2,N0,N6, 2,0] and
hint[code, N7,N3,N8, 0,0] and
hint[code, N7,N8,N0, 1,0]
}
run {}
Update (2020-12-29):
The new puzzle presented by Nomura (https://stackoverflow.com/a/61022419/5005552) demonstrates a weakness in the original solution: it does not account for multiple uses of a digit within a code. A modification to the expression for "correct" fixes this. Intersect each guessed digit with the union of the digits from the passed code and sum them for the true cardinality. I encapsulated the matching in a function, which will return 0 or 1 for each digit.
enum Digit {N0,N1,N2,N3,N4,N5,N6,N7,N8,N9}
let sequence[a,b,c] = 0->a + 1->b + 2->c
one sig Code {c1, c2, c3: Digit}
fun match[code: Code, d: Digit]: Int { #((code.c1 + code.c2 + code.c3) & d) }
pred hint(code: Code, d1,d2,d3: Digit, correct, wellPlaced:Int) {
// The intersection of each guessed digit with the code (unordered) tells us
// whether any of the digits match each other and how many
correct = match[code,d1].plus[match[code,d2]].plus[match[code,d3]]
// The intersection of the sequences of digits (ordered) tells us whether
// any of the digits are correct AND in the right place in the sequence
wellPlaced = #(sequence[code.c1,code.c2,code.c3] & sequence[d1, d2, d3])
}
pred originalLock {
some code: Code |
hint[code, N6,N8,N2, 1,1] and
hint[code, N6,N1,N4, 1,0] and
hint[code, N2,N0,N6, 2,0] and
hint[code, N7,N3,N8, 0,0] and
hint[code, N7,N8,N0, 1,0]
}
pred newLock {
some code: Code |
hint[code, N1,N2,N3, 0,0] and
hint[code, N4,N5,N6, 0,0] and
hint[code, N7,N8,N9, 1,0] and
hint[code, N9,N0,N0, 3,1]
}
run originalLock
run newLock
run test {some code: Code | hint[code, N9,N0,N0, 3,1]}

inconsistent behavior of equivalence when prevent overflows is set to 'yes'

The following example shows two checks that appear equivalent, yet the second finds a counterexample while the first does not. When setting 'prevent overflow' to 'No', both return the same result:
sig A {
x : Int,
y : Int
}
pred f1[a:A] {
y = x ++ (a -> ((a.x).add[1]))
}
pred f2[a:A] {
a.y = (a.x).add[1]
y = x ++ (a -> a.y)
}
check {
all a : A | {
f1[a] => f2[a]
f2[a] => f1[a]
}
}
check {
all a:A | {
f1[a] <=> f2[a]
}
}
I am using Alloy 4.2_2015-02_22 on Ubuntu Linux with sat4j.
I am sorry not to be able to offer a more useful answer than the following. With luck, someone else will do better.
It is possible, I think, that your model exhibits a bug in Alloy. But perhaps instead the model exhibits a counter-intuitive consequence of the semantics of integers in Alloy when the Prevent Overflow flag is set. Those semantics are described in the paper Preventing arithmetic overflows in Alloy by Aleksandar Milicevic and Daniel Jackson. You may have an easier time following the details than I am having.
The following expressions seem to suggest (a) that the unintuitive results have to do with the fact that the law of excluded middle does not hold when negation is applied to undefined values, and (b) that in the case where a.x = 7 (or whatever the maximum value of Int is, in a particular scope), the predicates f1 and f2 behave differently:
pred xm1a[ a : A] { (f1[a] or not(f1[a])) }
pred xm2a[ a : A] { (f2[a] or not(f2[a])) }
pred xm1b[ a : A] { (not (f1[a] and not(f1[a]))) }
pred xm2b[ a : A] { (not (f2[a] and not(f2[a]))) }
// some sensible assertions: no counterexamples
check x1 { all a : A | a.x = 7 implies xm1a[a] }
check x2 { all a : A | a.x = 7 implies xm2a[a] }
check x3 { all a : A | a.x = 7 implies xm1b[a] }
check x4 { all a : A | a.x = 7 implies xm2b[a] }
// some odd assertions: counterexamples for y2 and y4
check y1 { all a : A | a.x = 7 implies not xm1a[a] }
check y2 { all a : A | a.x = 7 implies not xm2a[a] }
check y3 { all a : A | a.x = 7 implies not xm1b[a] }
check y4 { all a : A | a.x = 7 implies not xm2b[a] }
It's not clear (to me) whether the relevant difference between f1 and f2 is that f2 refers explicitly to a.y, or that f2 has two clauses with an implicit and between them.
If the arithmetic relation between a.x and a.y is important to the model, then figuring out exactly how overflow cases are handled will be essential. If all the matters is that a.x != a.y and y = x ++ (a -> a.y), then weakening the condition will have the nice side effect that the reader need not understand Alloy's overflow semantics. (I expect you realize that already; I mention it for the benefit of later readers.)

Creating an object for each relation in Alloy

I have the following def. in Alloy:
sig A {b : set B}
sig B{}
sig Q {s: A , t: B}
I want to add a set of constraints such that for each relation b1:b there exists one and only one Q1:Q where Q1.s and Q1.t refers to the source and target of b1, respectively. For example, if I have an instance which contains A1 and B1 and b1 connects them (i.e., b1: A1->B1), then I also would like to have a Q1 where Q1.s=A1 and Q1.t=B1.
Obviously number (cardinality) of Q is equal to number (cardinality) of b relation.
I managed to write such a constraint as bellow:
t in s.b
all q1,q2:Q | q1.s=q2.s and q1.t=q2.t => q1=q2
all a1:A,b1:B | a1->b1 in b => some q:Q | q.s=a1 and q.t=b1
I am wondering if anyone has a bit more concise way to express my intentions in terms of an alloy fact. I am open to use Alloy util package if it makes life easier.
Thanks
sig A { b : set B }
sig B {}
sig Q { ab : A -> B }{ one ab }
fact { b = Q.ab and #Q = #b }
I would complete the #user1513683 answer by adding two relations s and t to make it the complete answer to the question:
sig A { b : set B }
sig B {}
sig Q { ab : A -> B , s:A, t:B}{ one ab and t=ab[s]}
fact { b = Q.ab and #Q = #b }

NDepend rule to warn if objects of a given type are compared using ==

as the title says: I need a NDepend rule (CQLinq) for C#/.net code, that fires whenever instances of a given type are compared using == (reference comparison). In other words, I want to force the programmer to use .Equals.
Note that the type in question has no overloaded equality operator.
Is this possible? If so, how? :)
Thanks, cheers,
Tim
With the following code with see that for value type, == translate to the IL instruction: ceq. This kind of usage cannot be detected with NDepend.
int i = 2;
int j = 3;
Debug.Assert(i == j);
var s1 = "2";
var s2 = "3";
Debug.Assert(s1 == s2);
However for reference types we can see that a operator method named op_Equality is called.
L_001d: call bool [mscorlib]System.String::op_Equality(string, string)
Hence we just need a CQLinq query that first match all method named op_Equality, and then list all callers of these methods. This can look like:
let equalityOps = Methods.WithSimpleName("op_Equality")
from m in Application.Methods.UsingAny(equalityOps)
select new { m,
typesWhereEqualityOpCalled = m.MethodsCalled.Intersect(equalityOps).Select(m1 => m1.ParentType) }
This seems to work pretty well :)

Resources