(Programming language) How to judge whether this grammar is ambiguous - programming-languages

S -> ()
| (S)
| SS
Is this grammar ambiguous?
and How do i judge whether this grammar is ambiguous?
I learned to draw a Parse tree. But I do not know how to draw it.

You can do the following: write the grammar in the format for yacc (or any other parser generator you're familiar with). Like so
%%
s: '(' ')' | '(' s ')' | s s;
Run this through yacc, and look for error shift/reduce or reduce/reduce conflics.

Related

How does the latest ANTLR4 resolve the "dangling else" ambiguity?

I am using antlr 'org.antlr:antlr4:4.9.2' and come across the "dangling else" ambiguity problem; see the following grammar IfStat.g4.
// file: IfStat.g4
grammar IfStat;
stat : 'if' expr 'then' stat
| 'if' expr 'then' stat 'else' stat
| expr
;
expr : ID ;
ID : LETTER (LETTER | [0-9])* ;
fragment LETTER : [a-zA-Z] ;
WS : [ \t\n\r]+ -> skip ;
I tested this grammar against the input "if a then if b then c else d". It is parsed as `"if a then (if b then c else d)" as expected. How does ANTLR4 resolve this ambiguity?
ANTLR will choose the first possible (successful) path it is able to make.
You can enable ANTLR to report such ambiguities in your grammar. Check this Q&A for that: Ambiguity in grammar not reported by ANTLR

Why does my antlr grammar give me an error?

I have the little grammar below. node is the start production. When my input is (a:b) I get an error: line 1:1 extraneous input 'a' expecting {':', INAME}
Why is this?
EDIT - I forgot that the lexer and parser run as a separate phases. By the time the parser runs, the lexer has completed. When the lexer runs it has no knowledge of the parser rules. It has already made the TYPE/INAME decision choosing TYPE per #bart's reasoning below.
grammar g1;
TYPE: [A-Za-z_];
INAME: [A-Za-z_];
node: '(' namesAndTypes ')';
namesAndTypes:
INAME ':' TYPE
| ':' TYPE
| INAME
;
That is because the lexer will never produce an INAME token. The lexer works in the following was:
try to match as much characters as possible
when 2 or more lexer rules match the same characters, let the one defined first "win"
Because the input "a" and "b" both match the TYPE and INAME rules, the TYPE rule wins because it is defined first. It doesn't matter if the parser is trying to match an INAME rule, the lexer will not produce it. The lexer does not "listen" to he parser.
You could create some sort of ID rule, and then define type and iname parser rules instead:
ID: [A-Za-z_];
node
: '(' namesAndTypes ')'
;
namesAndTypes
: iname ':' type
| ':' type
| iname
;
type
: ID
;
iname
: ID
;

Precedence of alternation vs sequencing in ANTLR4

I believed that sequencing (implicitly given by order of subrules) had a higher priority in ANTLR4 parser than alternation (explictly given by | character), meaning that
a : x | y z ;
was semantically identical to
a : x | ( y z) ;
Looking in the ANTLR4 book and searching generally I can't find a clear statement of this but it seems reasonable, however given a rule
expression :
pmqident
|
constant
|
[snip]
|
'(' scalar_subquery ')'
|
unary_operator expression // this is unbracketed
|
expression binary_operator expression
[snip]
;
and I feed it this select - 2 / 3 I get this parse tree
whereas if I just add brackets around unary_operator expression and change absolutely nothing else, to get this
expression :
[snip]
'(' scalar_subquery ')'
|
( unary_operator expression ) // brackets added here
|
expression binary_operator expression
[snip]
;
and give it the same SQL, I get this
What am I misunderstanding?
(BTW and separately, the freaky parse of "- 2 / 3" into "(- ( 2 / 3))" is actually the one I want. That's how MSSQL does it. Mad world)
------
Ok, to reproduce (works for me), not utterly minimal but heavily stripped code. File is named MSSQL.g4:
grammar MSSQL;
expression :
constant
|
unary_operator expression // bracket/unbracket this
|
expression binary_operator expression
;
constant : INTEGER_CONST ;
INTEGER_CONST : [0-9]+ ;
binary_operator :
arithmetic_operator
;
arithmetic_operator :
subtract
|
divide
;
add_symbol : PLUS_SIGN ;
subtract : MINUS_SIGN ;
divide : DIVIDE_SIGN ;
unary_operator :
SIGN
;
SIGN : PLUS_SIGN | MINUS_SIGN ;
DIVIDE_SIGN : '/' ;
PLUS_SIGN : '+' ;
MINUS_SIGN : '-' ;
SKIPWS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
The DOS crud to compile it (relevant parts given):
set CurrDir=%~dp0
set CurrDir=%CurrDir:~0,-1%
cd %CurrDir%
java org.antlr.v4.Tool -Werror -o %CurrDir%\MSSQL MSSQL.g4
IF %ERRORLEVEL% NEQ 0 goto problem
javac %CurrDir%\MSSQL\MSSQL*.java
IF %ERRORLEVEL% NEQ 0 goto problem
cd ./MSSQL
echo enter sql...
java org.antlr.v4.gui.TestRig MSSQL expression -gui -trace -tokens
input is - 2 / 3
Running on win2k8R2, versions of bits are as follows
C:\Users\jan>java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
C:\Users\jan>java org.antlr.v4.Tool
ANTLR Parser Generator Version 4.5.1
Anything else needed? Can anyone reproduce?
Frankly I'm struggling to believe this is a bug. It's just too elemental.
FYI I found this originally not by bracketing/unbracketing but by hoisting the body of a subrule into rule, and noticed behaviour changed.
This answer is being written in the context of antlr/antlr4#564 not being fixed.
During the code generation process, ANTLR looks for a few specific patterns when rewriting left-recursive rules to work in a recursive-descent parser.
Consider the following rule:
expression
: INT
| '++' expression
| expression '++'
| expression '+' expression
;
Suffix: Top-level alternatives which start with a recursive invocation. In the example, the alternative expression '++' falls into this category.
Prefix: Top-level alternatives which end with a recursive invocation. In the example, the alternative '++' expression falls into this category.
Binary: Top-level alternatives which start and end with a recursive invocation. In the example, the alternative expression '+' expression falls into this category.
Other: Everything else. In the example, the alternative INT falls into this category.
When matching these patterns, no simplifications are performed. This includes removing otherwise-unnecessary parentheses, which is the basis of issue antlr/antlr4#564.
By including parentheses around a top-level alternative in a left-recursive rule, you force the alternative to be treated as Other. For alternatives that would normally be Suffix or Binary, this results in a compilation error due to left recursion that was not eliminated. For Prefix alternatives (which you have), the grammar still compiles but changes behavior because the alternative is treated as a primary expression instead of an operator which overrides its original position in the operator precedence sequence.
Note that including parentheses around a top-level alternative which was already in the Other category will not change behavior at all. Likewise, including parentheses around an alternative in a rule which is not left-recursive will not change behavior.

Antlr4 perentheses and arithmetics

I am parsing an SQL like language of which I need to handle arithmetics with precedence.
Things could be like this:
(a + b) - c
(a + b) / 1000
a + (b - c)
a + (SELECT...)
(SELECT... ) + (SELECT ...)
etc..
I am using the antlr4 listeners pattern and so I can't find a way to build a representation tree for these arithmetic clauses.
grammer parts:
arithmetic_select_clause:
result_column arithmeticExpression result_column # ArithmeticSelect
| result_column arithmeticExpression arithmetic_select_clause # ArithmeticSelect
| arithmetic_select_clause arithmeticExpression result_column # ArithmeticSelect
| '(' arithmetic_select_clause ')' # ArithmeticSelectParentheses
;
arithmeticExpression : '+' # arithmeticsAdd
| '-' # arithmeticsSubtract
| '*' # arithmeticsMultiply
| '/' # arithmeticsDivide
| '%' # arithmeticsModulus
;
I can create a tree using the antlr listenres but I cant handle precedence.
Help please
ANTLR can help you there but you need to follow a few rules for it to do so. The arithmeticExpression rule needs to contain both operands and be directly recursive so that ANTLR can figure out how to rewrite it.
Here's an example of what you could do:
expression : '(' expression ')'
| expression op=('*'|'/'|'%') expression
| expression op=('+'|'-') expression
| result_column
| arithmetic_select_clause
;
This rule is left-recursive but ANTLR will rewrite it to eliminate the left-recursion. Relevant docs.
Notice how the levels of precedence are ordered. Each level gets its alternative. Same-precedence operators are on one level.
Also, for processing math expressions it's much easier to use a visitor than a listener. ANTLR can generate the base classes for you. It'll be much easier to traverse/process the parse tree in the precedence order this way.

string recursion antlr lexer token

How do I build a token in lexer that can handle recursion inside as this string:
${*anything*${*anything*}*anything*}
?
Yes, you can use recursion inside lexer rules.
Take the following example:
${a ${b} ${c ${ddd} c} a}
which will be parsed correctly by the following grammar:
parse
: DollarVar
;
DollarVar
: '${' (DollarVar | EscapeSequence | ~Special)+ '}'
;
fragment
Special
: '\\' | '$' | '{' | '}'
;
fragment
EscapeSequence
: '\\' Special
;
as the interpreter inside ANTLRWorks shows:
alt text http://img185.imageshack.us/img185/5471/recq.png
ANTLR's lexers do support recursion, as #BartK adeptly points out in his post, but you will only see a single token within the parser. If you need to interpret the various pieces within that token, you'll probably want to handle it within the parser.
IMO, you'd be better off doing something in the parser:
variable: DOLLAR LBRACE id variable id RBRACE;
By doing something like the above, you'll see all the necessary pieces and can build an AST or otherwise handle accordingly.

Resources