ANTLR4 not reporting ambiguity - antlr4

Given the following grammar:
grammar ReportAmbiguity;
unit : statements+;
statements :
callStatement+
// '.' // <- uncomment this line
;
callStatement : 'CALL' ID (argsByRef | argsByVal)*;
argsByRef : ('BY' 'REF')? ID+;
argsByVal : 'BY' 'VAL' ID+;
ID : ('A'..'Z')+;
WS : (' '|'\n')+ -> channel(HIDDEN);
When parsing the string "CALL FUNCTION BY VAL A B" through the non-root rule callStatement everything works and the parser correctly reports an ambiguity:
line 1:24 reportAttemptingFullContext d=6 (argsByVal), input='B'
line 1:24 reportAmbiguity d=6 (argsByVal): ambigAlts={1, 2}, input='B'
Parser correcly outputs the tree: (callStatement CALL FUNCTION (argsByVal BY VAL A B)).
Now consider uncommenting the line shown above (the 7th). Testing everything again.
The parser still outputs the same tree, but the ambiguity reports are gone. Why this obviously ambiguous grammar with such an ambiguous input is not being reported anymore?
(This is part of a bigger problem. I'm trying to understand this so I can pin down another possible problem with my grammar.)
EDIT 1
Using antlr4 version 4.6.
I've prepared a pet project in github: https://github.com/rslemos/pet-grammars (in module g, type mvn clean test -Dtest=br.eti.rslemos.petgrammars.ReportAmbiguityUnitTest to have the commented version tested; uncomment the 7th line and run it again to see it failing).
EDIT 2
Changed unit: statements*; to unit: statements+;. This change itself changes nothing to the original problem. It only allows another experience (further edition pending).
EDIT 3
Another way to trigger this bug is to change unit: statements+; to unit: statements+ unit;.
Like when adding '.' to statements, this change also makes antlr4 forgo ambiguity detection.
I think this has something to do with an EOF that possibly follows argsByVal.
The first alternative (append '.' to statements) precludes EOF from appearing just after argsByVal.
The second one (append unit to itself) makes it a non-root rule (and it seems that antlr implicitly appends EOF to every root rule).
I always thought antlr4 rules were meant to be invoked anyway we liked, with no rule given some special treatment, the root rule being so called just because we (grammar author) know which rule is the root.
EDIT 4
Could be related to https://github.com/antlr/antlr4/issues/1545.

Related

Python ANTLR4 example - Parser doesn't seem to parse correctly

To demonstrate the problem, I'm going to create a simple grammar to merely detect Python-like variables.
I create a virtual environment and install antlr4-python3-runtime in it, as mentioned in "Where can I get the runtime?":
Then, I create a PyVar.g4 file with the following content:
grammar PyVar;
program: IDENTIFIER+;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
NEWLINE: '\n' | '\r\n';
WHITESPACE: [ ]+ -> skip;
Now if I test the grammar with grun, I can see that the grammar detects the variables just fine:
Now I'm trying to write a parser in Python to do just that. I generate the Lexer and Parser, using this command:
antlr4 -Dlanguage=Python3 PyVar.g4
And they're generated with no errors:
But when I use the example provided in "How do I run the generated lexer and/or parser?", I get no output:
What am I not doing right?
There are two problems here.
1. The grammar:
In the line where I had,
program: IDENTIFIER+;
the parser will only detect one or more variables, and it will not detect any newline. The output you see when running grun is the output created by the lexer, that's why newlines are present in the tokens. So I had to replace it with something like this, for the parser to detect newlines.
program: (IDENTIFIER | NEWLINE)+;
2. Printing the output of parser
In PyVar.py file, I created a tree with this line:
tree = parser.program()
But it didn't print its output, nor did I know how to, but the OP's comment on this accepted answer suggests using tree.toStringTree().
Now if we fix those, we can see that it works:

Freemarker: How to check for ?api-able type (v 2.3.26)

In Freemarker, I have a Map<Map<...>> in the model.
Due to FM glitch, querying for 2nd level Map needs ?api. However, that escapes the normal value existence checks and complicates things.
This is what I have:
<#if sortedStatsMap[rowTag.name]?? && sortedStatsMap[rowTag.name]?is_hash>
${mapToJson(sortedStatsMap[rowTag.name]?api.get(boxTag.name))!}
</#if>
This ends up with:
APINotSupportedTemplateException: The value doesn't support ?api. See requirements in the FreeMarker Manual.
(FTL type: sequence+extended_hash+string (wrapper: f.c.DefaultToExpression$EmptyStringAndSequence),
TemplateModel class: f.c.DefaultToExpression$EmptyStringAndSequence,
ObjectWapper: freemarker.template.DefaultObjectWrapper#1074040321(2.3.26, useAdaptersForContainers=true, forceLegacyNonListCollections=true, iterableSupport=trueexposureLevel=1, exposeFields=false, treatDefaultMethodsAsBeanMembers=true, sharedClassIntrospCache=#1896185155, ...))
The blamed expression:
==> sortedStatsMap[rowTag.name]! [in template "reports/templates/techReport-boxes.ftl" at line 152, column 84]
If I try
sortedStatsMap[rowTag.name]!?is_hash
then this also fails because if missing, it gives me empty_string_and_sequence and ?is_hash can't be applied, reportedly. (Says that in an error.)
What's the proper logic to check whether I can use ?api.get(key)? Or the right way to use ! to handle missing values or a missing key?
You can check if a value supports ?api with ?has_api. Though maybe you don't need that; the example and the problems related to it should be clarified (see my comments).

ANTLR 4: Recognises 'and' but not 'or' without a space

I'm using the ANTLR 4 plugin in IntelliJ, and I have the most bizarre bug. I'll start with the relevant parser/lexer rules:
// Take care of whitespace.
WS : [ \r\t\f\n]+ -> skip;
OTHER: . -> skip;
STRING
: '"' [A-z ]+ '"'
;
evaluate // starting rule.
: textbox? // could be an empty textbox.
;
textbox
: (row '\n')*
;
row
: ability
| ability_list
ability
: activated_ability
| triggered_ability
| static_ability
triggered_ability
: trigger_words ',' STRING
;
trigger_words
: ('when'|'whenever'|'as') whenever_triggers|'at'
;
whenever_triggers
: triggerer (('or'|'and') triggerer)* // this line has the issue.
;
triggerer
: self
self: '~'
I pass it this text: whenever ~ or ~, and it fails on the or, saying line 1:10 mismatched input ' or' expecting {'or', 'and'}. However, if I add a space to the whenever_triggers rule's or string (making it ' or'|'and'), it works fine.
The weirdest thing is that if I try whenever ~ and ~, it works fine even without the rule having a space in the and string. This doesn't change if I make 'and'|'or' a lexer rule either. It's just bizarre. I've confirmed this bug happens when running the 'test rig' in Antlrworks 2, so it's not just an IntelliJ thing.
This is an image of the parse tree when the error occurs:
Alright you have found the answer more or less by yourself so with this answer of mine I will focus on explaining why the problem occured in the first place.
First of all - for everyone stumbling upon this question - the problem was that he had another implicit lexer rule defined that looked like this ' or' (notice the whitespace). Changing that to 'or' resolved the problem.
But why was that a problem?
In order to understand that you have to understand what ANTLR does if you write '<something>' in one of your parser rules: When compiling the grammar it will generate a new lexer rule for each of those declarations. These lexer rules will be created before the lexer rules defined in your grammar. The lexer itself will match the given input into tokens and for that it processes each lexer rule at a time in the order they have been declared. Therefore it will always start with the implicit token definitions and then move on to the topmost "real" lexer rule.
The problem is that the lexer isn't too clever about this process that means once it has matched some input with the current lexer rule it will create a respective token and moves on with the trailing input.
As a result a lexer rule that comes afterwards that would have matched the input as well (but as another token as it is a different lexer rule) will be skipped so that the respective input might not have the expected token type because the lexer rules have overwrritten themselves.
In your example the self-overwriting rules are ' or'(Token 1) and 'or'(Token 2). Each of those implicit lexer rule declarations will result in a different lexer rule and as the first one got matched I assume that it is declared before the second one.
Now look at your input: whenever ~ or ~ The lexer will start interpreting it and the first rule it comes across is ' or' (After the start is matched of course) and it will match the input as there really is a space before the or. Therefore it will match it as Token 1.
The parser on the other hand is expecting a Token 2 at this point so that it will complain about the given input (although it really is complaining about the wrong token type). Altering the input to whenever ~or ~ will result in the correct interpretation.
Exactly that is the reason why you shouldn't use implicit token definitions in your grammar (unless it is really small). Create a new lexer rule for every input and start with the most specific rules. That means rules that match special character sequences (e.g. keywords) should be declared before general lexer rules like ID or STRING or something like that. Rules that will match all the characters in order to prevent the lexer from throwing an error upon unrecognized input have to declared last as they would overwrite every lexer rule after them.

Antlr4 match whole input string or bust

I am new to Antlr4 and have been wracking my brain for some days now about a behaviour that I simply don't understand. I have the following combined grammar and expect it to fail and report an error, but it doesn't:
grammar MWE;
parse: cell EOF;
cell: WORD;
WORD: ('a'..'z')+;
If I feed it the input
a4
I expect it to not be able to parse it, because I want it to match the whole input string and not just a part of it, as signified by the EOF. But instead it reports no error (I listen for errors with a errorlistener implementing the IAntlrErrorListener interface) and gives me the following parse tree:
(parse (cell a) <EOF>)
Why is this?
The error recovery mechanism when input is reached which no lexer rule matches is to drop a character and continue with the next one. In your case, the lexer is dropping the 4 character, so your parser is seeing the equivalent of this input:
a
The solution is to instruct the lexer to create a token for the dropped character rather than ignore it, and pass that token on to the parser where an error will be reported. In the grammar, this rule takes the following form and is always added as the last rule in the grammar. If you have multiple lexer modes, a rule with this form should appear as the last rule in the default mode as well as the last rule in each extra mode.
ErrChar
: .
;

Prolog: Pass User-Input as Parameters

I am new to Prolog and therefore need help with the following task.
I have the programm:
do(save) :- save_bal(bad).
do(act) :- save_bal(good), inc(good).
do(comb) :- save_bal(good), inc(bad).
save_bal(good) :- savMoney(X), depPeople(Y), Min is Y * 1000, X >= Min.
save_bal(bad) :- not(save_bal(good)).
inc(good) :- earn(Z), depPeople(Y), MinE is 3000 + Y * 400, Z >= MinE.
inc(bad) :- not(inc(good)).
savMoney(30000).
earn(60000).
depPeople(4).
My task is to rewrite this programm, so the numbers 30000, 60000 and 4 is set by a user input. How can I do this?
I tried:
:- read(A), savMoney(A).
:- read(B), earn(B).
:- read(C), depPeople(C).
But that won't work.
Can someone point me in the right direction?
Thanks in advance!
Prolog is an homoiconic language, then the first step you should take is to declare which predicate are data, and which are (just to say) logic constraints on the data.
Then, add near top of file (just a stylistic hint) the declarations
:- dynamic(savMoney/1).
:- dynamic(earn/1).
:- dynamic(depPeople/1).
then you can add a service predicate, say user_update_store/1, like
user_update_store(Entry) :-
AccessValueCurr =.. [Entry, ValueCurr],
(retract(AccessValueCurr) -> true ; ValueCurr = 0),
format('enter value for ~s (current is ~w):', [Entry, ValueCurr]),
read(NewValue),
% validate it's a number etc...
StoreNewValue =.. [Entry, NewValue],
assertz(StoreNewValue).
now you can start your user interface:
?- maplist(user_udpdate_store, [savMoney,earn,depPeople]).
this code should work for every (ISO compliant) Prolog. Note: I didn't tested it...
HTH
CapelliC provided an excellent (better) answer while I was busy typing away at this monstrosity. In fact, I didn't end up addressing the question in your title, because you were passing parameters just fine. Instead I wrote about assertz/1 and retract/1. However, I taught myself a fair amount while composing it and you might also find it informative.
In your example code, we have 3 facts declared with the predicates savMoney/1, earn/1, depPeople/1'. We then have a number of rules that determine values based on these facts. A rule is of the form :- ., and which I sometimes read to myself as "<head> is true if <body> is true". We can think of a fact as a rule of the form :- true, e.g.,savMoney(30000) :- true.`, which we might read as "30000 is savMoney if true is true", and true is true or we're all screwed. (BTW, is 'savMoney' short for saved money?)
A directive is of the form :- <body>.. It is like a rule that must be tested in order for the program (or world) to be true (this is more evocative than accurate, because, as you've seen, when a directive fails the whole program-world is not false, we just get a warning). When we consult a prolog file, we add new rules and facts to our program-world, and these can even be impossible nonsense statements like a :- \+ a. "a is true if not-a is true"1. That contradiction will cause cause a stack overflow if you query ?- a., but the program will load just fine. However, directives have to be evaluated and settled while the program loads in the order they are encountered:
This program will throw a stack overflow error when the interpreter consults it.
a :- \+ a.
:- a.
This program will throw an undefined procedure error, because it is being directed to prove a before a has been entered into the database.
:- a.
a :- \+ a.
When we have a directive like :- read(A), savMoney(A)., it's not saying "read the value of user input into A and then set saveMoney to A". Instead, it's saying something more like, "if this program is loaded, then A is a value read in from user input and A is savMoney." Suppose you run the program and enter 100 at the first prompt (the plain prompt is |). What happens?
prolog unifies the variable A with 100.
prolog tries to prove savMoney(100).
it replies Warning: Goal (directive) failed: user:(read(_G2072),savMoney(_G2072)).
This is because, while savMoney(30000) is true, savMoney(100) is not. A directive does not assert the contents of its body, it only tells prolog to prove those contents.
What you are trying to do is allow the user to assert a previously unknown fact into the database. As indicated by mbratch, this requires using the predicate assertz/12. However, predicted that be changed during run-time are differentiated from standard predicates.
If you try to define a reestablished predicate in a program, you'll get an error. E.g., consult a file consisting of the following declaration:
length(2, y).
You'll receive an error:
ERROR: /Users/aporiac/myprolog/swi/studies/test.pl:18:
No permission to modify static procedure `length/2'
Defined at /opt/local/lib/swipl-6.2.6/boot/init.pl:2708
This tells us that 'length/2' is static and that it is already defined in init.pl file at line 2708.
The same happens if you try to assert a static predicate with assertz/1. You can try this by querying assertz(savMoney(100)) in swipl. In order to add new facts or rules about a predicate, we have to declare the predicate to be dynamic.
This is accomplished with dynamic/1. To assure that prolog knows which of our predicates are to be counted as dynamic, we give it a directive like so3:
:- dynamic savMoney/1.
If you've added that to your file (before you define the predicate), you can then query ?- assertz(savMoney(100)). to add the new fact to the database. Now, if you query ?- savMoney(X), you'll get
X = 30000;
X = 100.
There are now two possible values for X, because we've added another fact to the database.
Of course, in your case, you don't want to keep adding values to savMoney/1, you want to be able to update and replace the value.
That calls for retract/1 (If you think there's a chance that more than one occurrence the asserted predicate could get added at some point, then you can use retractall/1 to clear all instances). Now we can write a rule like the following:
set_saved(Amount) :-
retract( savMoney(_) ),
assertz( savMoney(Amount) ).
set_saved(Amount) is true if savMoney(_) can be retracted and removed from the database and the new fact savMoney(Amount) can be asserted.
I've just seen that CapelliC has provided a simple input interface, and a much more concise solution to the problem, but here's my version of your example program in case it might be informative. (I didn't actually get around to adding the prompt and input, but querying, e.g., ?- set_saved(100), does what you'd expect).
:- dynamic [ savMoney/1,
earn/1,
depPeople/1 ].
do(save) :- save_bal(bad).
do(act) :- save_bal(good), inc(good).
do(comb) :- save_bal(good), inc(bad).
save_bal(good) :- savMoney(X), depPeople(Y), Min is Y * 1000, X >= Min.
save_bal(bad) :- not(save_bal(good)).
inc(good) :- earn(Z), depPeople(Y), MinE is 3000 + Y * 400, Z >= MinE.
inc(bad) :- not(inc(good)).
savMoney(30000).
earn(60000).
depPeople(4).
set_saved(Amount) :-
retract( savMoney(_) ),
assertz( savMoney(Amount) ).
set_earned(Amount) :-
retract( earn(_) ),
assertz( earn(Amount) ).
set_people_in_department(Number) :-
retract( depPeople(_) ),
assertz( depPeople(Number) ).
report([Saved, Earned, People]) :-
Saved = savMoney(_) , Saved,
Earned = earn(_) , Earned,
People = depPeople(_), People.
\+/1 is the standard negation operator in swi-prolog and not/1 is depreciated.
assert/1 is equivalent to, and depreciated in favor of, assertz/1. asserta/1 asserts the fact or clause as the first instance of the predicate at hand, while assertz/1 asserts it as the last. (Cf. the manual section on the Database).
Of course, this goes against the interpretation of the directive I suggested before. My interpretation fits when you're using 'normal' predicates in a directive. But, most often, we see directives used for special predicates like in module declarations (:- module(name, [<list of exported predicates>]) or module imports (:- use_module([<list of modules>])).

Resources