What is the grammar to have parentheses take place of linebreaks? - antlr4

For example, I'm attempting to write a grammar to parse DNS zone files. The resource records are normally separated by newlines. However, a record can be broken across multiple lines by using parentheses. For example:
record1 part1 part2 part3 part4
or
record1 part1 ( part 2
part3
part4
)
I can't come up with how to allow for the parentheses to exist at any place within a record.

How about this (not thoroughly tested).
grammar:
grammar dns;
file : (record|NL)+ EOF ;
record : recordName recordPart+ (NL|EOF)
;
recordName : Something;
recordPart
: '(' recordPartOrNewLine+ ')'
| Something
;
recordPartOrNewLine
: NL
| recordPart
;
Something : [a-zA-Z0-9:.]+; // adjust!
WS: [ \t]+ -> skip;
NL : ('\r'? '\n')|'\r';
Comment : ';' ~[\r\n]* -> skip;
test case (from wikipedia):
example.com. 1800 IN SOA ns1.example.com. mailbox.example.com. (
100 ; Seriennummer
300 ; Refresh Time
100 ; Retry Time
6000 ; Expire Time
600 ; negative Caching Zeit
)
example.com. 1800 IN NS ns1.example.com.
ns1.example.com. 1800 IN A 172.27.182.17
ns1.example.com. 1800 IN AAAA 2001:db8::f:a
www.example.com. 1800 IN A 192.168.1.2
www.example.com. 1800 IN AAAA 2001:db8::1:2
result (large image here):

Related

Double digit format in impex

I created an impex similitar to this:
INSERT_UPDATE Unit;code[unique=true];type(code)[default='TEST', unique=true] ;conversion;
; a ;; 0,001
; b ;; 0,001
; c ;; 1
; d ;; 1
; e ;; 1000
It works just fine on my local. But in another test enviroment the comma is not working as a digit separator. What should I do to indicate in this impex that it should use comma as a digit separator?
It worked after I added to the impex header:
#% impex.setLocale(Locale.ENGLISH);
Like this:
#% impex.setLocale(Locale.ENGLISH);
INSERT_UPDATE Unit;code[unique=true];type(code)[default='TEST', unique=true] ;conversion;
; a ;; 0,001
; b ;; 0,001
; c ;; 1
; d ;; 1
; e ;; 1000

ADA & GTK => function Get_Text

I want to create a Toplevel window and use this function in it .
There is no example anywhere...
Here the complete description in /usr/share/ada/adainclude/gtkada/gtk-gentry.ads
function Get_Text (The_Entry : access Gtk_Entry_Record) return UTF8_String;
-- Modify the text in the entry.
-- The text is cut at the maximum length that was set when the entry was
-- created.
-- The text replaces the current contents.
For Debian and relatives OS , you can access to the directory after : sudo apt-get install libgtkada2.24.1-dev
I figured out how to use the Get_text function with the Entry .
manuBriot & andlabs =
I also found the Signal in the Entry's package for the reaction when the user press _Enter .
Finally , everything works fine now .
What my program do ?
= Its a window , look exactly like this : http://pix.toile-libre.org/?img=1450777307.png
And , after you write something and press _Enter in the graphical entry , the result is print in command_line .
Simple and useful for begining in GTK language .
WITH Gtk.Main ; USE Gtk.Main ;
WITH Gtk.Window ; USE Gtk.Window ;
WITH Gtk.Enums ; USE Gtk.Enums ;
WITH Gtk.Button ; USE Gtk.Button ;
WITH Gtk.Alignment ; USE Gtk.Alignment ;
WITH Gtk.Box ; USE Gtk.Box ;
WITH Gtk.Gentry; USE Gtk.Gentry;
WITH Ada.text_io; USE Ada.text_io;
WITH Gtk.Widget ; USE Gtk.Widget ;
with Gtk.Handlers;
PROCEDURE prototype IS
-----------------------
-- VARIABLES -- |
----------------------------------------------------------
win : Gtk_window ;
Btn1, Btn2 ,Btn3 : Gtk_Button ;
alignG, alignM ,alignD : Gtk_Alignment ;
Boite : Gtk_VBox ;
Boutons : Gtk_HBox ;
saisie : Gtk_Entry ;
----------------------------------------------------------
--Instanciation package(s) for connexion
----------------------------------------------------------
PACKAGE P_Callback IS NEW Gtk.Handlers.Callback(Gtk_Widget_Record);
USE P_Callback ;
----------------------------------------------------------
-- Handlers (or callbacks) |
----------------------------------------------------------
procedure Stop_Program(Emetteur : access Gtk_Widget_Record'class)
is
PRAGMA Unreferenced (Emetteur);
begin
Main_Quit;
end Stop_Program ;
procedure Handler_text(Ent : access Gtk_Widget_Record'class)
is begin
put_line(get_text(saisie));
end Handler_text ;
-------------------------------------------------
BEGIN
Init ;
----------------
-- NEW -- |
-------------------------------------------------
Gtk_New(win);
Gtk_New(saisie);
Gtk_New(Btn1, "Bouton 1") ;
Gtk_New(Btn2, "Bouton 2") ;
Gtk_New(Btn3, "Bouton 3") ;
Gtk_New(alignG,0.0,1.0,1.0,1.0);
Gtk_New(alignM,0.5,1.0,1.0,1.0);
Gtk_New(alignD,1.0,1.0,1.0,1.0);
Gtk_New_VBox
(Boite, homogeneous => false, Spacing => 0) ;
Gtk_New_HBox
(Boutons, homogeneous => false, Spacing => 0) ;
---------------------------------
-- Add |
---------------------------------
alignG.add(Btn1) ;
alignM.add(Btn2) ;
alignD.add(Btn3) ;
win.Add(Boite);
------------------------------------------
-- Connect |
------------------------------------------
Connect(Widget => win ,
Name => "destroy" ,
Cb => Stop_Program'access);
Connect(Widget => saisie ,
Name => "activate" ,
Cb => Handler_text'access);
------------------------------------------
-- Design Window |
------------------------------------------
Boite.Pack_Start(saisie);
Boite.Pack_Start(Boutons);
Boutons.Pack_Start(alignG);
Boutons.Pack_Start(alignM);
Boutons.Pack_Start(alignD);
win.Set_Default_Size(500,500) ;
win.set_position(Win_Pos_Mouse) ;
-- win.set_opacity(0.7) ;
win.Show_all ;
Main ;
END prototype ;
WITH Gtk.Main ; USE Gtk.Main ;
WITH Gtk.Window ; USE Gtk.Window ;
WITH Gtk.Gentry; USE Gtk.Gentry;
WITH Gtk.Box ; USE Gtk.Box ;
WITH Gtk.Enums ; USE Gtk.Enums ;
Procedure gtkada_get_a_entry is
win : Gtk_window ;
space : Gtk_Entry ;
the_box : Gtk_VBox ;
-- function Get_Text (The_Entry : access Gtk_Entry_Record) return UTF8_String;
-- How to use the function ???
begin
Init ;
Gtk_New(win);
Gtk_New(space);
Gtk_New_VBox
(the_box, homogeneous => false, Spacing => 0) ;
the_box.Pack_Start(space);
win.Add(the_box);
win.Set_Default_Size(300,200) ;
win.set_position(Win_Pos_Center) ;
win.Show_all ;
Main ;
end gtkada_get_a_entry;
All I want to do is use the Get_text function as it is described in the package.
The code I posted is minimal: prints on screen the text entry, but again, it's useless if I cannot use the function.

Overlapping rules - mismatched input

My grammar (as follows (trimmed down from the original)) requires somewhat overlapping rules
grammar NOVIANum;
statement : (priorityStatement | integerStatement)* ;
priorityStatement : T_PRIO TwoDigits ;
integerStatement : T_INTEGER Integer ;
WS : [ \t\r\n]+ -> skip ;
T_PRIO : 'PRIO' ;
T_INTEGER : 'INTEGER' ;
Integer: OneToNine Digit* | ZERO ;
TwoDigits : Digit Digit ;
fragment OneToNine : ('1'..'9') ;
fragment Digit: ('0'..'9');
ZERO : [0] ;
so "Integer" and "TwoDigits" overlap to a certain extent.
The following input
INTEGER 10
PRIO 10
results in
line 2:5 mismatched input '10' expecting TwoDigits
when Integer precedes TwoDigits and in
line 1:8 mismatched input '10' expecting Integer
when TwoDigits precedes Integer in the grammar.
Is there a way around this ?
Thanks - Alex
Edit:
Thanks #GRosenberg, your suggestion, of course, worked for this small example, but when I integrated this into my full grammar it led to different mismatched input errors sure enough.
The reason being another lexer rule which requires a range of '[1-4]', so I thought I'll be clever and turn it into
grammar NOVIANum;
statement : (priorityT | integerT | levelT )* ;
priorityT : T_PRIO twoDigits ;
integerT : T_INTEGER integer ;
levelT : T_LEVEL levelNumber ;
levelNumber : ( ZERO DIGIT ) | ( OneToFour (ZERO | DIGIT) ) ;
integer: ZERO* ( DIGIT ( DIGIT | ZERO )* ) ;
twoDigits : (ZERO | DIGIT) ( ZERO | DIGIT ) ;
oneToFour : OneToFour (DIGIT | ZERO) ;
WS : [ \t\r\n]+ -> skip ;
T_INTEGER : 'INTEGER' ;
T_LEVEL : 'LEVEL' ;
T_PRIO : 'PRIO' ;
DIGIT: OneToFour | FiveToNine ;
ZERO : '0' ;
OneToFour : [1-4] ;
FiveToNine : [5-9] ;
This still works for the previous inputs but ...
INTEGER 350
PRIO 10
LEVEL 01
LEVEL 05
LEVEL 10
LEVEL 49
results in
[#0,0:6='INTEGER',<2>,1:0]
[#1,8:8='3',<5>,1:8]
[#2,9:9='5',<5>,1:9]
[#3,10:10='0',<6>,1:10]
[#4,12:15='PRIO',<4>,2:0]
[#5,17:17='1',<5>,2:5]
[#6,18:18='0',<6>,2:6]
[#7,20:24='LEVEL',<3>,3:0]
[#8,26:26='0',<6>,3:6]
[#9,27:27='1',<5>,3:7]
[#10,29:33='LEVEL',<3>,4:0]
[#11,35:35='0',<6>,4:6]
[#12,36:36='5',<5>,4:7]
[#13,38:42='LEVEL',<3>,5:0]
[#14,44:44='1',<5>,5:6]
[#15,45:45='0',<6>,5:7]
[#16,47:51='LEVEL',<3>,6:0]
[#17,53:53='4',<5>,6:6]
[#18,54:54='9',<5>,6:7]
[#19,55:54='<EOF>',<-1>,6:8]
line 5:6 no viable alternative at input '1'
line 6:6 no viable alternative at input '4'
(statement (integerT INTEGER (integer 3 5 0)) (priorityT PRIO (twoDigits 1 0)) (levelT LEVEL (levelNumber 0 1)) (levelT LEVEL (levelNumber 0 5)) (levelT LEVEL (levelNumber 1 0)) (levelT LEVEL (levelNumber 4 9)))
What am I missing here ?
Edit 2:
Ok, answering my own question here, of course
DIGIT: OneToFour | FiveToNine ;
kicks in where it shouldn't, even in this combined form,
so about the only way to get around this - I can think of - would be
grammar NOVIANum;
statement : (priorityT | integerT | levelT )* ;
priorityT : T_PRIO twoDigits ;
integerT : T_INTEGER integer ;
levelT : T_LEVEL levelNumber ;
levelNumber : ( ZERO (OneToFour | FiveToNine) | ( OneToFour (ZERO | (OneToFour | FiveToNine)) ) ) ;
integer: ZERO* ( (OneToFour | FiveToNine) ( (OneToFour | FiveToNine) | ZERO )* ) ;
twoDigits : (ZERO | (OneToFour | FiveToNine)) ( ZERO | (OneToFour | FiveToNine) ) ;
WS : [ \t\r\n]+ -> skip ;
T_INTEGER : 'INTEGER' ;
T_LEVEL : 'LEVEL' ;
T_PRIO : 'PRIO' ;
// DIGIT: OneToFour | FiveToNine;
ZERO : '0' ;
OneToFour : [1-4] ;
FiveToNine : [5-9] ;
because when I create a parser rule for it like
oneToNine : OneToFour | FiveToNine ;
it'll give me this
integerT INTEGER (integer (oneToNine 3) (oneToNine 5) 0))
which is ugly and harder to handle than just
(integerT INTEGER (integer 3 5 0))
As an general issue of design, always try to work with distinguishing elements and their objects (T_PRIO -> TwoDigits) at the same level, parser or lexer. Presuming the semantic nature of the Integer and TwoDigits rules is important, promote them to the parser and let the lexer only produce digits. That is, don't over-constrain the lexer.
In the parser, you can let the integer rule functionally hide the twoDigits rule except in the evaluation of the priorityStatement rule:
priorityStatement : T_PRIO twoDigits ;
integerStatement : T_INTEGER integer ;
integer: ZERO | ( DIGIT ( DIGIT | ZERO )* ) ;
twoDigits : DIGIT DIGIT ;
T_PRIO : 'PRIO' ;
T_INTEGER : 'INTEGER' ;
DIGIT : [1-9] ;
ZERO : '0' ;

antlr4 mismatch input error on sql parser

I am getting following error on parsing but not sure why it's happening.
line 1:24 mismatched input '1' expecting NUM
line 1:24 mismatched input '1' expecting NUM
select a from abc limit 1 ;
--
grammar SQLCmd;
parse : sql
;
sql : ('select' ((columns (',' columns))|count) 'from')
tables
('where' condition ((and|or) condition))* (limit)? ';'
;
limit : 'limit' NUM
;
num : NUM
;
count : 'count(*)'
;
columns : VAL
;
tables : VAL
;
condition : ( left '=' right )+
;
and : 'and'
;
or : 'or'
;
left : VAL
;
right : VAL
;
VAL : [*a-z0-9A-Z~?]+
;
NUM : [0-9]+
;
WS : [ \t\n\r]+ -> skip
;
It looks like you have a VAL instead of a NUM.
The "1" is both a VAL and a NUM but since VAL comes first, there will never be NUM tokens since every NUM will be a VAL.
Try putting the NUM rule before the VAL rule.
You could have found out this by yourself by looking at the token types from the lexer. This will tell you the actual type of the token that is present.
#TheAntlrGuy: Maybe one could add the actual token type to the error message?

Why does AntlrWorks 2 display warning 125 (implicit definition of token in parser) in this case?

I have a separate lexer and parser grammar (derived from the sample ModeTagsLexer/ModeTagsParser) and get a warning in AntlrWorks 2 that I don't understand:
warning(125): implicit definition of token OPEN in parser
If I replace the OPEN rule with '<' the warning goes away. I wonder what the difference between OPEN and CLOSE ist which get's no warning.
I'm using antlr-4.1-complete.jar and 2013-01-22-antlrworks-2.0.
Lexer STLexer.g4:
lexer grammar STLexer;
// Default mode rules (the SEA)
OPEN : '<' -> pushMode(ISLAND) ; // switch to ISLAND mode
TEXT : ~'<'+ ; // clump all text together
mode ISLAND;
CLOSE : '>' -> popMode ; // back to SEA mode
SLASH : '/' ;
ID : [a-zA-Z0-9"=]+ ; // match/send ID in tag to parser
WS : [ \t]+ -> channel(HIDDEN);
Parser STParser.g4:
parser grammar STParser;
options { tokenVocab=STLexer; } // use tokens from STLexer.g4
unit: (tag | TEXT)* ;
tag : OPEN ID+ CLOSE
| OPEN SLASH ID+ CLOSE
;
It even persists if I rename the rule slightly and remove the additional mode:
lexer grammar STLexer;
Lexer (modified):
// Default mode rules (the SEA)
OPPEN : '<' ;// -> pushMode(ISLAND) ; // switch to ISLAND mode
TEXT : ~'<'+ ; // clump all text together
//mode ISLAND;
CLOSE : '>' ; // -> popMode ; // back to SEA mode
SLASH : '/' ;
ID : [a-zA-Z0-9"=]+ ; // match/send ID in tag to parser
WS : [ \t]+ -> channel(HIDDEN);
Parser (modified):
parser grammar STParser;
options { tokenVocab=STLexer; } // use tokens from STLexer.g4
unit: (tag | TEXT)* ;
tag : ID OPPEN ID+ CLOSE
| ID OPPEN SLASH ID+ CLOSE
;

Resources