Context
I'm trying to generate a parser for BCP47 Language-Tag values, which are specified in ABNF (Augmented Backus–Naur form). I'm doing this in Haskell and would like to use the robust BNFC tool-chain, which expects LBNF (Labeled Backus–Naur form). I've searched for tooling to do this conversion automatically and could find none, so I'm basically attempting to write an LBNF for it using the ABNF as reference.
Attempted so far
I've done a lot of searching, and I think this question may be useful, but I can't get bnfc to accept any use of ε, it always spits out a syntax error at that character. For example,
Convert every option [ E ] to a fresh non-terminal X and add
X = ε | E.
-- ABNF option:
-- foo = [ E ]
-- Fresh X
Foo. Foo ::= X ;
-- add
X. X ::= ε | E ;
E. E ::= "e" ;
syntax error at line 8, column 10 due to lexer error
Giving up on that, I tried to get something even simpler working:
language = 2*ALPHA
I could not.
I've seen some BNF documentation (sorry I lost the link now) with an example for digits that looked like:
number ::= digit
number ::= number digit
This makes sense to me, so I tried the following:
LanguageISO2. Language ::= ALPHA ALPHA ;
token ALPHA ( letter ) ;
The fails to parse "en", but does parse "e n". It's clear why, but what is the right way to do what I'm intending?
I can make things kind of work by abusing token,
LanguageISO2. Language ::= ALPHA_TWO ;
token ALPHA_TWO ( letter letter ) ;
But this will quickly get out of hand as I handle 3*ALPHA and 5*8ALPHA, etc.
Specific Question
Could someone convert the following to LBNF so I can see the right approach to these things?
langtag = (language
["-" script]
["-" region]
*("-" variant))
language = (2*3ALPHA [ extlang ])
extlang = *3("-" 3ALPHA) ; reserved for future use
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166 code
/ 3DIGIT ; UN M.49 code
variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)
alphanum = (ALPHA / DIGIT) ; letters and numbers
Thanks very much in advance.
I have a requirement where I want to extend an existing grammar A with additions defined in grammar B to produce a grammar C.
I have already tried importing grammar A in B, but that selects only certain things defined in grammar A. My guess is that the unused content of A in B is skipped while generating classes. This makes sense as the requirement is not to inherit but intermix/ merge/ combine the two grammars.
Just for understanding (the original grammar is huge), an example:
File : A.g4:
grammar A;
keywords
: X
| Y
| Z
;
X: 'X';
Y: 'Y';
Z: 'Z';
File : B.g4:
grammar B;
keywords
: A
| B
| C
;
A: 'A';
B: 'B';
C: 'C';
File : C.g4:
grammar C;
keywords
: X
| Y
| Z
| A
| B
| C
;
X: 'X';
Y: 'Y';
Z: 'Z';
A: 'A';
B: 'B';
C: 'C';
Note: I do not have the option to manipulate the grammar A directly, but I want to retain all the functionality in grammar A along with the additional rules/ keywords etc. defined in grammar B as shown above.
Any help will be much appreciated. Thanks.
Grammar import might not work as you expect it to work. Rules in the importing grammar take precedence over same named rules in the imported grammar. Thus you cannot override an existing rule in your main grammar. See also the description in the ANTLR4 repo:
Think of import as more like a smart include statement (which does not include rules that are already defined).
However, it should be possible to override a rule in a second import grammar. In your case, you would not define the keywords in your main grammar (I assume this is C). Import the grammars A and Bin reverse order if you want B´s keywords rule to take precedence over the one in A.
import B, A;
This is also demonstrated in the image from this Markdown file:
The rule r from grammar G2 is ignored, since it is imported last, so G3 kinda "overrides" it.
I'm currently reading Implementing functional languages: a tutorial by SPJ and the (sub)chapter I'll be referring to in this question is 3.8.7 (page 136).
The first remark there is that a reader following the tutorial has not yet implemented C scheme compilation (that is, of expressions appearing in non-strict contexts) of ECase expressions.
The solution proposed is to transform a Core program so that ECase expressions simply never appear in non-strict contexts. Specifically, each such occurrence creates a new supercombinator with exactly one variable which body corresponds to the original ECase expression, and the occurrence itself is replaced with a call to that supercombinator.
Below I present a (slightly modified) example of such transformation from 1
t a b = Pack{2,1} ;
f x = Pack{2,2} (case t x 7 6 of
<1> -> 1;
<2> -> 2) Pack{1,0} ;
main = f 3
== transformed into ==>
t a b = Pack{2,1} ;
f x = Pack{2,2} ($Case1 (t x 7 6)) Pack{1,0} ;
$Case1 x = case x of
<1> -> 1;
<2> -> 2 ;
main = f 3
I implemented this solution and it works like charm, that is, the output is Pack{2,2} 2 Pack{1,0}.
However, what I don't understand is - why all that trouble? I hope it's not just me, but the first thought I had of solving the problem was to just implement compilation of ECase expressions in C scheme. And I did it by mimicking the rule for compilation in E scheme (page 134 in 1 but I present that rule here for completeness): so I used
E[[case e of alts]] p = E[[e]] p ++ [Casejump D[[alts]] p]
and wrote
C[[case e of alts]] p = C[[e]] p ++ [Eval] ++ [Casejump D[[alts]] p]
I added [Eval] because Casejump needs an argument on top of the stack in weak head normal form (WHNF) and C scheme doesn't guarantee that, as opposed to E scheme.
But then the output changes to enigmatic: Pack{2,2} 2 6.
The same applies when I use the same rule as for E scheme, i.e.
C[[case e of alts]] p = E[[e]] p ++ [Casejump D[[alts]] p]
So I guess that my "obvious" solution is inherently wrong - and I can see that from outputs. But I'm having trouble stating formal arguments as to why that approach was bound to fail.
Can someone provide me with such argument/proof or some intuition as to why the naive approach doesn't work?
The purpose of the C scheme is to not perform any computation, but just delay everything until an EVAL happens (which it might or might not). What are you doing in your proposed code generation for case? You're calling EVAL! And the whole purpose of C is to not call EVAL on anything, so you've now evaluated something prematurely.
The only way you could generate code directly for case in the C scheme would be to add some new instruction to perform the case analysis once it's evaluated.
But we (Thomas Johnsson and I) decided it was simpler to just lift out such expressions. The exact historical details are lost in time though. :)
I'm designing a low-punctuation language in which I want to support the declaration of arrays using the following syntax:
512 by 512 of 255 // a 512x512 array filled with 255
100 of 0 // a 100-element array filled with 0
expr1 by expr2 by expr3 ... by exprN of exprFill
These array declarations are just one kind of expression among many.
I'm having a hard time figuring out how to write the grammar rules. I've simplified my grammar down to the simplest thing that reproduces my trouble:
grammar Dimensions;
program
: expression EOF
;
expression
: expression (BY expression)* OF expression
| INT
;
BY : 'by';
OF : 'of';
INT : [0-9]+;
WHITESPACE : [ \t\n\r]+ -> skip;
When I feed in 10 of 1, I get the parse I want:
When I feed in 20 by 10 of 1, the middle expression non-terminal slurps up the 10 of 1, leaving nothing left to match the rule's OF expression:
And I get the following warning:
line 2:0 mismatched input '<EOF>' expecting 'of'
The parse I'd like to see is
(program (expression (expression 20) by (expression 10) of (expression 1)) <EOF>)
Is there a way I can reformulate my grammar to achieve this? I feel that what I need is right-association across both BY and OF, but I don't know how to express this across two operators.
After some non-intellectual experimentation, I came up with some productions that seem to generate my desired parse:
expression
:<assoc=right> expression (BY expression)+ OF expression
|<assoc=right> expression OF expression
| INT
;
I don't know if there's a way I can express it with just one production.
I have the following input in a text file input.txt
atom1,atom2,atom3
relation(atom1 ,[10,5,2])
relation(atom2 ,[3,10,2])
relation(atom3 ,[6,5,10])
First line includes the list of atoms used in relation predicates in the file and each remaining line represents a relation predicate in order of the first line list.relation(atom1, [x,y,z]) means atom1 has a relation value of 10 with first atom, 5 with the second and 2 with the third
I need to read this file and add represent relation values for each atom seperately.For example , these are the relation values which will be added for atom1 :
assert(relation(atom1, atom1,10)).
assert(relation(atom1, atom2, 5)).
assert(relation(atom1, atom3, 2)).
I have read some prolog io tutorials and seen some recommendations on using DCG but I'm a beginner prolog programmer and having trouble to choose the method for the solving problem. So I'm here to ask help from experienced prolog programmers.
Since you didn't stated what Prolog you're using, here is a snippet written in SWI-Prolog. I attempted to signal non ISO builtins by means of SWI-Prolog docs reference.
parse_input :-
open('input.txt', read, S),
parse_line(S, atoms(Atoms)),
repeat,
( parse_line(S, a_struct(relation(A, L)))
-> store(Atoms, A, L), fail
; true ),
close(S).
:- meta_predicate(parse_line(+, //)).
parse_line(S, Grammar) :-
% see http://www.swi-prolog.org/pldoc/doc_for?object=read_line_to_codes/2
read_line_to_codes(S, L),
L \= end_of_file,
phrase(Grammar, L).
% match any sequence
% note - clauses order is mandatory
star([]) --> [].
star([C|Cs]) --> [C], star(Cs).
% --- DCGs ---
% comma sep atoms
atoms(R) -->
star(S),
( ",",
{atom_codes(A, S), R = [A|As]},
atoms(As)
; {atom_codes(A, S), R = [A]}
).
% parse a struct X,
% but it's far easier to use a builtin :)
% see http://www.swi-prolog.org/pldoc/doc_for?object=atom_to_term/3
a_struct(X, Cs, []) :-
atom_codes(A, Cs),
atom_to_term(A, X, []).
% storage handler
:- dynamic(relation/3).
store(Atoms, A, L) :-
nth1(I, L, W),
nth1(I, Atoms, B),
assertz(relation(A, B, W)).
with the sample input.txt, I get
?- parse_input.
true .
?- listing(relation).
:- dynamic relation/3.
relation(atom1, atom1, 10).
relation(atom1, atom2, 5).
relation(atom1, atom3, 2).
relation(atom2, atom1, 3).
relation(atom2, atom2, 10).
relation(atom2, atom3, 2).
relation(atom3, atom1, 6).
relation(atom3, atom2, 5).
relation(atom3, atom3, 10).
HTH