Specify order of alternatives in happy parser

Specify order of alternatives in happy parser - haskell

I am working on a Happy parser for a language with the following types, and many more.
type :: { ... }
type :
'void' { ... }
| type '*' { ... } {- pointer -}
| type '(' types ')' { ... } {- function -}
| ... {- many more! -}
types :: { ... }
{- empty -} { ... }
| types ',' type { ... }
The language has apparently ambiguous syntax for calls.
callable :: { ... }
callable :
type operand { ... } {- return type -}
| type '(' types ')' '*' operand { ... } {- return and argument types -}
The second rule does not have the same meaning as the first when type takes on the type of a function pointer.
The ambiguity can be removed by adding a special rule for a type that isn't a function pointer. Barring doing so and duplicating all of the type definitions to produce something like
callable :: { ... }
callable :
typeThatIsNotAFunctionPointer operand { ... }
| type '(' types ')' '*' operand { ... }
How can I specify that the alternative type operand is only legal when the type '(' types ')' '*' operand alternative fails?
There are many questions on stack overflow about why a grammar has ambiguities (I found at least 7), and some about how to remove an ambiguity, but none about how to specify how to resolve an ambiguity.
Undesirable Solution
I'm aware that I can refactor the grammar for types to a giant convoluted mess.
neverConstrainedType :: { ... }
neverConstrainedType :
'int' { ... }
| ... {- many more! -}
voidType :: { ... }
voidType :
'void'
pointerType :: { ... }
pointerType :
type '*' { ... } {- pointer -}
functionType :: { ... }
type '(' types ')' { ... } {- function -}
type :: { ... }
type :
neverConstrainedType { ... }
| voidType { ... }
| pointerType { ... }
| functionType { ... }
typeNonVoid :: { ... } {- this already exists -}
typeNonVoid :
neverConstrainedType { ... }
| pointerType { ... }
| functionType { ... }
typeNonPointer :: { ... }
typeNonPointer :
neverConstrainedType { ... }
| voidType { ... }
| functionType { ... }
typeNonFunction :: { ... }
typeNonFunction :
neverConstrainedType { ... }
| voidType { ... }
| functionType { ... }
typeNonFunctionPointer :: { ... }
typeNonFunctionPointer :
typeNonPointer { ... }
| typeNonFunction '*' { ... }
And then define callable as
callable :: { ... }
callable :
typeNonFunctionPointer operand { ... }
| type '(' types ')' '*' operand { ... }

Basically you have what's called a shift/reduce conflict. You can google "resolve shift/reduce conflict" for more info and resources.
The basic idea in resolving shift/reduce conflicts is to refactor the grammar. For instance, this grammar is ambiguous:
%token id comma int
A : B comma int
B : id
| id comma B
The shift/reduce conflict can be eliminated by refactoring it as:
A : B int
B : id comma
| id comma B
In your case you could try something like this:
type : simple {0}
| func {0}
| funcptr {0}
simple : 'void' {0}
| simple '*' {0}
| funcptr '*' {0}
func : type '(' type ')' {0}
funcptr : func '*' {0}
The idea is this:
simple matches any type that is not a function or function pointer
func matches any function type
funcptr matches any function pointer type
That said, many of the things I've attempted to do in grammars I've found are better accomplished by analyzing the parse tree after it's been created.

Related

Xtext refering to element from different file does not work

Hello I am having two files in my xtext editor, the first one containing all definitions and the second one containing the executed recipe. The Grammar looks like this:
ServiceAutomationProgram:
('package' name=QualifiedName ';')?
imports+=ServiceAutomationImport*
definitions+=Definition*;
ServiceAutomationImport:
'import' importedNamespace=QualifiedNameWithWildcard ';';
Definition:
'define' ( TypeDefinition | ServiceDefinition |
SubRecipeDefinition | RecipeDefinition) ';';
TypeDefinition:
'quantity' name=ID ;
SubRecipeDefinition:
'subrecipe' name=ID '('( subRecipeParameters+=ServiceParameterDefinition (','
subRecipeParameters+=ServiceParameterDefinition)*)? ')' '{'
recipeSteps+=RecipeStep*
'}';
RecipeDefinition:
'recipe' name=ID '{' recipeSteps+=RecipeStep* '}';
RecipeStep:
(ServiceInvocation | SubRecipeInvocation) ';';
SubRecipeInvocation:
name=ID 'subrecipe' calledSubrecipe=[SubRecipeDefinition] '('( parameters+=ServiceInvocationParameter (',' parameters+=ServiceInvocationParameter)* )?')'
;
ServiceInvocation:
name=ID 'service' service=[ServiceDefinition]
'(' (parameters+=ServiceInvocationParameter (',' parameters+=ServiceInvocationParameter)*)? ')'
;
ServiceInvocationParameter:
ServiceEngineeringQuantityParameter | SubRecipeParameter
;
ServiceEngineeringQuantityParameter:
parameterName=[ServiceParameterDefinition] value=Amount;
ServiceDefinition:
'service' name=ID ('inputs' serviceInputs+=ServiceParameterDefinition (','
serviceInputs+=ServiceParameterDefinition)*)?;
ServiceParameterDefinition:
name=ID ':' (parameterType=[TypeDefinition]);
;
SubRecipeParameter:
parameterName=[ServiceParameterDefinition]
;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
QualifiedName:
ID ('.' ID)*;
Amount:
INT ;
....
definitionfile file.mydsl:
define quantity Temperature;
define service Heater inputs SetTemperature:Temperature;
define subrecipe sub_recursive() {
Heating1 service Heater(SetTemperature 10);
};
....
recipefile secondsfile.mydsl:
define recipe Main {
sub1 subrecipe sub_recursive();
};
.....
In my generator file which looks like this:
override void doGenerate(Resource resource, IFileSystemAccess2 fsa, IGeneratorContext context) {
for (e : resource.allContents. toIterable.filter (RecipeDefinition)){
e.class;//just for demonstration add breakpoint here and //traverse down the tree
}
}
I need as an example the information RecipeDefinition.recipesteps.subrecipeinvocation.calledsubrecipe.recipesteps.serviceinvocation.service.name which is not accessible (null) So some of the very deep buried information gets lost (maybe due to lazylinking?).
To make the project executable also add to the scopeprovider:
public IScope getScope(EObject context, EReference reference) {
if (context instanceof ServiceInvocationParameter
&& reference == MyDslPackage.Literals.SERVICE_INVOCATION_PARAMETER__PARAMETER_NAME) {
ServiceInvocationParameter invocationParameter = (ServiceInvocationParameter) context;
List<ServiceParameterDefinition> candidates = new ArrayList<>();
if(invocationParameter.eContainer() instanceof ServiceInvocation) {
ServiceInvocation serviceCall = (ServiceInvocation) invocationParameter.eContainer();
ServiceDefinition calledService = serviceCall.getService();
candidates.addAll(calledService.getServiceInputs());
if(serviceCall.eContainer() instanceof SubRecipeDefinition) {
SubRecipeDefinition subRecipeCall=(SubRecipeDefinition) serviceCall.eContainer();
candidates.addAll(subRecipeCall.getSubRecipeParameters());
}
return Scopes.scopeFor(candidates);
}
else if(invocationParameter.eContainer() instanceof SubRecipeInvocation) {
SubRecipeInvocation serviceCall = (SubRecipeInvocation) invocationParameter.eContainer();
SubRecipeDefinition calledSub = serviceCall.getCalledSubrecipe();
candidates.addAll(calledSub.getSubRecipeParameters());
return Scopes.scopeFor(candidates);
}
}return super.getScope(context, reference);
}
When I put all in the same file it works as it does the first time executed after launching runtime but afterwards(when dogenerate is triggered via editor saving) some information is missing. Any idea how to get to the missing informations? thanks a lot!

bar.with{ ... this.#foo = ... }: groovy.lang.MissingMethodException: No signature of method: MyClass.setFoo() is applicable for argument types

(using Groovy 2.4.11)
The following pseudo-modified code:
enum EnumClass { a, b }
class Some {
Foo foo
Some() {
EnumClass.with{ this.#foo = new Foo( a ) }
}
Some setFoo( String _foo ) { ... }
}
is called like new Some() and brings up the following runtime exception:
groovy.lang.MissingMethodException: No signature of method: MyClass.setFoo() is applicable for argument types: (Foo) values: [Foo$12345] ...`
It looks as if the compiler thinks there would be some this.foo = ... instead of this.#foo = .... :-(
(This should not happen as I understand it and seems to be some bug)

Workaround: writing it like this (outside the with-closure) works ...
enum EnumClass { a, b }
class Some {
Foo foo
Some() {
//EnumClass.with{ this.#foo = new Foo( a ) } // throws exception
this.#foo = new Foo( EnumClass.a ) // works
}
Some setFoo( String _foo ) { ... }
}

How can I know that I can call a Perl 6 method that with a particular signature?

In Perl 6, a multi-dispatch language, you can find out if there is a method that matches a name. If there is, you get a list of Method objects that match that name:
class ParentClass {
multi method foo (Str $s) { ... }
}
class ChildClass is ParentClass {
multi method foo (Int $n) { ... }
multi method foo (Rat $r) { ... }
}
my $object = ChildClass.new;
for $object.can( 'foo' )
.flatmap( *.candidates )
.unique -> $candidate {
put join "\t",
$candidate.package.^name,
$candidate.name,
$candidate.signature.perl;
};
ParentClass foo :(ParentClass $: Str $s, *%_)
ChildClass foo :(ChildClass $: Int $n, *%_)
ChildClass foo :(ChildClass $: Rat $r, *%_)
That's fine, but it's a lot of work. I'd much rather have something simpler, such as:
$object.can( 'foo', $signature );
I can probably do a lot of work to make that possible, but am I missing something that's already there?

As I hit submit on that question I had this idea, which still seems like too much work. The cando method can test a Capture (the inverse of a signature). I can grep those that match:
class ParentClass {
multi method foo (Str $s) { ... }
}
class ChildClass is ParentClass {
multi method foo (Int $n) { ... }
multi method foo (Rat $r) { ... }
}
my $object = ChildClass.new;
# invocant is the first thing for method captures
my $capture = \( ChildClass, Str );
for $object.can( 'foo' )
.flatmap( *.candidates )
.grep( *.cando: $capture )
-> $candidate {
put join "\t",
$candidate.package.^name,
$candidate.name,
$candidate.signature.perl;
};
I'm not sure I like this answer though.

Semantically disambiguating an ambiguous syntax

Using Antlr 4 I have a situation I am not sure how to resolve. I originally asked the question at https://groups.google.com/forum/#!topic/antlr-discussion/1yxxxAvU678 on the Antlr discussion forum. But that forum does not seem to get a lot of traffic, so I am asking again here.
I have the following grammar:
expression
: ...
| path
;
path
: ...
| dotIdentifierSequence
;
dotIdentifierSequence
: identifier (DOT identifier)*
;
The concern here is that dotIdentifierSequence can mean a number of things semantically, and not all of them are "paths". But at the moment they are all recognized as paths in the parse tree and then I need to handle them specially in my visitor.
But what I'd really like is a way to express the dotIdentifierSequence usages that are not paths into the expression rule rather than in the path rule, and still have dotIdentifierSequence in path to handle path usages.
To be clear, a dotIdentifierSequence might be any of the following:
A path - this is a SQL-like grammar and a path expression would be like a table or column reference in SQL, e.g. a.b.c
A Java class name - e.g. com.acme.SomeJavaType
A static Java field reference - e.g. com.acme.SomeJavaType.SOME_FIELD
A Java enum value reference - e.g. com.acme.Gender.MALE
The idea is that during visitation "dotIdentifierSequence as a path" resolves as a very different type from the other usages.
Any idea how I can do this?

The issue here is that you're trying to make a distinction between "paths" while being created in the parser. Constructing paths inside the lexer would be easier (pseudo code follows):
grammar T;
tokens {
JAVA_TYPE_PATH,
JAVA_FIELD_PATH
}
// parser rules
PATH
: IDENTIFIER ('.' IDENTIFIER)*
{
String s = getText();
if (s is a Java class) {
setType(JAVA_TYPE_PATH);
} else if (s is a Java field) {
setType(JAVA_FIELD_PATH);
}
}
;
fragment IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;
and then in the parser you would do:
expression
: JAVA_TYPE_PATH #javaTypeExpression
| JAVA_FIELD_PATH #javaFieldExpression
| PATH #pathExpression
;
But then, of course, input like this java./*comment*/lang.String would be tokenized wrongly.
Handling it all in the parser would mean manually looking ahead in the token stream and checking if either a Java type, or field exists.
A quick demo:
grammar T;
#parser::members {
String getPathAhead() {
Token token = _input.LT(1);
if (token.getType() != IDENTIFIER) {
return null;
}
StringBuilder builder = new StringBuilder(token.getText());
// Try to collect ('.' IDENTIFIER)*
for (int stepsAhead = 2; ; stepsAhead += 2) {
Token expectedDot = _input.LT(stepsAhead);
Token expectedIdentifier = _input.LT(stepsAhead + 1);
if (expectedDot.getType() != DOT || expectedIdentifier.getType() != IDENTIFIER) {
break;
}
builder.append('.').append(expectedIdentifier.getText());
}
return builder.toString();
}
boolean javaTypeAhead() {
String path = getPathAhead();
if (path == null) {
return false;
}
try {
return Class.forName(path) != null;
} catch (Exception e) {
return false;
}
}
boolean javaFieldAhead() {
String path = getPathAhead();
if (path == null || !path.contains(".")) {
return false;
}
int lastDot = path.lastIndexOf('.');
String typeName = path.substring(0, lastDot);
String fieldName = path.substring(lastDot + 1);
try {
Class<?> clazz = Class.forName(typeName);
return clazz.getField(fieldName) != null;
} catch (Exception e) {
return false;
}
}
}
expression
: {javaTypeAhead()}? path #javaTypeExpression
| {javaFieldAhead()}? path #javaFieldExpression
| path #pathExpression
;
path
: dotIdentifierSequence
;
dotIdentifierSequence
: IDENTIFIER (DOT IDENTIFIER)*
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
DOT
: '.'
;
which can be tested with the following class:
package tl.antlr4;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class Main {
public static void main(String[] args) {
String[] tests = {
"mu",
"tl.antlr4.The",
"java.lang.String",
"foo.bar.Baz",
"tl.antlr4.The.answer",
"tl.antlr4.The.ANSWER"
};
for (String test : tests) {
TLexer lexer = new TLexer(new ANTLRInputStream(test));
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTreeWalker.DEFAULT.walk(new TestListener(), parser.expression());
}
}
}
class TestListener extends TBaseListener {
#Override
public void enterJavaTypeExpression(#NotNull TParser.JavaTypeExpressionContext ctx) {
System.out.println("JavaTypeExpression -> " + ctx.getText());
}
#Override
public void enterJavaFieldExpression(#NotNull TParser.JavaFieldExpressionContext ctx) {
System.out.println("JavaFieldExpression -> " + ctx.getText());
}
#Override
public void enterPathExpression(#NotNull TParser.PathExpressionContext ctx) {
System.out.println("PathExpression -> " + ctx.getText());
}
}
class The {
public static final int ANSWER = 42;
}
which would print the following to the console:
PathExpression -> mu
JavaTypeExpression -> tl.antlr4.The
JavaTypeExpression -> java.lang.String
PathExpression -> foo.bar.Baz
PathExpression -> tl.antlr4.The.answer
JavaFieldExpression -> tl.antlr4.The.ANSWER

Define function inside \score in LilyPond

I compile a large song book, and for that I would like to have many local definitions of functions, that will, in the end, be in an \include d file, but that makes no difference here. For this, I need to define the functions inside \score{ ... } scope. However, LilyPond keeps throwing errors.
The non-working example:
\version "2.17.26"
\book {
\header {
title = "This is a book"
}
\score {
xyz = { a' b' c'' }
abc = #(define-music-function
( parser location musicnotes )
( ly:music? )
#{
c' $musicnotes e'
#}
)
{ \abc { d' } f' \xyz }
\header {
piece = "First piece"
opus = "op. 1024"
}
}
\score {
xyz = { a' a' a' }
abc = #(define-music-function
( parser location musicnotes )
( ly:music? )
#{
e' $musicnotes c'
#}
)
{ \abc { d' } f' \xyz }
\header {
piece = "Second piece"
opus = "op. 1025"
}
}
}
Throws an error:
test.ly:10:17: error: unrecognized string, not in text script or \lyricmode
xyz = { a' b' c'' }
The following works, however, I have to give the functions unique names, which is frowned upon.
\version "2.17.26"
xyz = { a' b' c'' }
abc = #(define-music-function
( parser location musicnotes )
( ly:music? )
#{
c' $musicnotes e'
#}
)
xxyz = { a' a' a' }
aabc = #(define-music-function
( parser location musicnotes )
( ly:music? )
#{
e' $musicnotes c'
#}
)
\book {
\header {
title = "This is a book"
}
\score {
{ \abc { d' } f' \xyz }
\header {
piece = "First piece"
opus = "op. 1024"
}
}
\score {
{ \aabc { d' } f' \xxyz }
\header {
piece = "Second piece"
opus = "op. 1025"
}
}
}

Unfortunatey, it's not possible to stick assignments in a score. You can only put assignments in the following places:
the top level,
inside \display, \header, and \midi blocks
The LilyPond grammar makes this quite clear, even if the rest of the manual is a bit evasive about it. (Look at http://lilypond.org/doc/v2.17/Documentation/contributor/lilypond-grammar , and look for where the assignment rule gets used).
Assuming your assignments are not appropriate for the blocks listed above (which is definitely the case in this example), and assuming that you don't want to do something exotic like go and define your own Scheme modules and figure out how to use them in your LilyPond file, you have two choices:
Define xyz and abc, then define the music that will go into the first score. Then redefine xyz and abc before defining the music for the next score. This works because assignments overwrite whatever was previously there, and because LilyPond defines are generally processed in order. However, if you want some of your defines to be used in both scores and to be the same, you may get confused.
Settle for your approach, though I would pick a prefix or a suffix that makes it clearer which score the define goes with.
The first option would look something like this:
\version "2.18.0"
xyz = { a' b' c'' }
abc = #(define-music-function (parser location musicnotes)
(ly:music?)
#{ c' $musicnotes e' #})
smus_a = { \abc { d' } f' \xyz }
xyz = { a' a' a' }
abc = #(define-music-function (parser location musicnotes)
(ly:music?)
#{ e' $musicnotes c' #})
smus_b = { \abc { d' } f' \xyz }
\book {
\header {
title = "A Book!"
}
\score {
\smus_a
\header { piece = "First piece" }
}
\score {
\smus_b
\header { piece = "Second piece" }
}
}
This also works if the music-defining parts are refactored out into separate LilyPond source files.

It is possible!
But you have to define a command to define the variable or command:
parserDefine =
#(define-void-function (parser location name val)(symbol? scheme?)
(ly:parser-define! parser name val))
This is a void-function and can be called almost anywhere:
\score {
{
% you have to be in your music-expression
\parserDefine xyz { a' a' a' }
% There must be something between parserDefine and the call!
c'' \xyz
\parserDefine abc #(define-music-function
( parser location musicnotes )
( ly:music? )
#{
c' $musicnotes e'
#}
)
a' \abc d'
}
}
If the command is defined, you can call inside your music expressions. After you have done so, the parser needs a little lookahead, so that the variable really is available - here its the c''. You can optionally wrap the expression in another pair of curly braces.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Specify order of alternatives in happy parser - haskell

Related

Xtext refering to element from different file does not work

bar.with{ ... this.#foo = ... }: groovy.lang.MissingMethodException: No signature of method: MyClass.setFoo() is applicable for argument types

How can I know that I can call a Perl 6 method that with a particular signature?

Semantically disambiguating an ambiguous syntax

Define function inside \score in LilyPond

Categories

Resources