How to programmatically handle objects in Xtext - dsl

I have a grammar defined like:
Key:
name=ID;
Step:
name+=[Key]+;
#Override terminal ID:
('a'..'z' | 'A'..'Z' | '_' | '-' | '0'..'9')
('a'..'z' | 'A'..'Z' | '_' | ' ' | '-' | '0'..'9')+;
And my input is:
When I open window Kitchen Window
Then I (see) Beautiful Garden
The question: How to programmatically handle input and split it to Key references in some internal rules.
4example In first string I want it to be:
[When] [I open window] [Kitchen Window]
And the second to be:
[Then] [I] [(see)] [Beautiful Garden]
I don't know how to split it until it reached scope or linker and should make a decision somewhere in the code. Where should I do it?

Related

Antlr4 Match Force Priority

I have a query grammar I am working on and have found one case that is proving difficult to solve. The below provides a minimal version of the grammar to reproduce it.
grammar scratch;
query : command* ; // input rule
RANGE: '..';
NUMBER: ([0-9]+ | (([0-9]+)? '.' [0-9]+));
STRING: ~([ \t\r\n] | '(' | ')' | ':' | '|' | ',' | '.' )+ ;
WS: [ \t\r\n]+ -> skip ;
command
: 'foo:' number_range # FooCommand
| 'bar:' item_list # BarCommand
;
number_range: NUMBER RANGE NUMBER # NumberRange;
item_list: '(' (NUMBER | STRING)+ ((',' | '|') (NUMBER | STRING)+)* ')' # ItemList;
When using this you can match things like bar:(bob, blah, 57, 4.5) foo:2..4.3 no problem. But if you put in bar:(bob.smith, blah, 57, 4.5) foo:2..4 it will complain line 1:8 token recognition error at: '.s' and split it into 'bob' and 'mith'. Makes sense, . is ignored as part of string. Although not sure why it eats the 's'.
So, change string to STRING: ~([ \t\r\n] | '(' | ')' | ':' | '|' | ',' )+ ; instead without the dot in it. And now it will recognize 2..4.3 as a string instead of number_range.
I believe that this is because the string matches more character in one stretch than other options. But is there a way to force STRING to only match if it hasn't already matched elements higher in the grammar? Meaning it is only a STRING if it does not contain RANGE or NUMBER?
I know I can add TERM: '"' .*? '"'; and then add TERM into the item_list, but I was hoping to avoid having to quote things if possible. But seems to be the only route to keep the .. range in, that I have found.
You could allow only single dots inside strings like this:
STRING : ATOM+ ( '.' ATOM+ )*;
fragment ATOM : ~[ \t\r\n():|,.];
Oh, and NUMBER: ([0-9]+ | (([0-9]+)? '.' [0-9]+)); is rather verbose. This does the same: NUMBER : ( [0-9]* '.' )? [0-9]+;

ANTLR4.7 listener for a rule when sub rules are labeled

I have an antlr4.7 grammar like this, where all sub rules are labeled.
date_expr
: attr op=( '+' | '-' ) dt_interval=ISO8601_INTERVAL
#dateexpr_Op
| DATETIME_NAME
#dateexpr_Named
| d=( DATETIME_LITERAL | DATE_LITERAL | TIME_LITERAL )
#dateexpr_Literal
| attr
#dateexpr_Attr
| '(' date_expr ')'
#dateexpr_Paren
;
I would like to annotate the tree when a date_expr rule completes. However, looking at the generated listener class, I see no exitDate_expr. How can I add this? Or, do I have to use a visitor interface for it. I am not much familiar with grammar tools.
Thanks.
To achieve beforeAllLabledAlts and afterAllLabledAlts visit points, wrap the labeled alt rule in a singleton rule:
anyDate : dateExpr ;
dateExpr
: attr op=( '+' | '-' ) dt_interval=ISO8601_INTERVAL #dateexpr_Op
| DATETIME_NAME #dateexpr_Named
| d=( DATETIME_LITERAL | DATE_LITERAL | TIME_LITERAL ) #dateexpr_Literal
| attr #dateexpr_Attr
| '(' date_expr ')' #dateexpr_Paren
;
The ANTLR tool will then generate the listener interface (and/or visitor interface) with AnyDateContext onEntry and onExit methods.

Parsing in Linux

I want to parse the compute zones in open-stack command output as below
+-----------------------+----------------------------------------+
| Name | Status |
+-----------------------+----------------------------------------+
| internal | available |
| |- controller | |
| | |- nova-conductor | enabled :-) 2016-07-07T08:09:57.000000 |
| | |- nova-consoleauth | enabled :-) 2016-07-07T08:10:01.000000 |
| | |- nova-scheduler | enabled :-) 2016-07-07T08:10:00.000000 |
| | |- nova-cert | enabled :-) 2016-07-07T08:10:00.000000 |
| Compute01 | available |
| |- compute01 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:09:53.000000 |
| Compute02 | available |
| |- compute02 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:10:00.000000 |
| nova | not available |
+-----------------------+----------------------------------------+
i want to parse the result as below, taking only nodes having nova-compute
Compute01;Compute02
I used below command:
nova availability-zone-list | awk 'NR>2 {print $2}' | grep -v '|' | tr '\n' ';'
but it returns output like this
;internal;Compute01;Compute02;nova;;
In Perl (and written rather more verbosely than is really necessary):
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $node; # Store current node name
my #compute_nodes; # Store known nova-compute nodes
while (<>) { # Read from STDIN
# If we find the start of line, followed by a pipe, a space and
# a series of word characters...
if (/^\| (\w+)/) {
# Store the series of word characters (i.e. the node name) in $node
$node = $1;
}
# If we find a line that contains "nova-compute", add the current
# node name in #compute_nodes
push #compute_nodes, $node if /nova-compute/;
}
# Print out all of the values in #compute_nodes
say join ';', #compute_nodes;
I detest one-line programs except for the most simple of applications. They are unnecessarily cryptic, they have none of the usual programming support, and they are stored only in the terminal buffer. Want to do the same thing tomorrow? You must start coding again
Here's a Perl solution. Run it as
$ perl nova-compute.pl command-output.txt
use strict;
use warnings 'all';
my ($node, #nodes);
while ( <> ) {
$node = $1 if /^ \| \s* (\w+) /x;
push #nodes, $node if /nova-compute/;
}
print join(';', #nodes), "\n";
output
Compute01;Compute02
Now all of that is saved on disk. It may be run again at any time, modified for similar results, or fixed if you got it wrong. It is also readable. No contest
$ nova availability-zone-list | awk '/^[|] [^|]/{node=$2} node && /nova-compute/ {s=s ";" node} END{print substr(s,2)}'
Compute01;Compute02
How it works:
/^[|] [^|]/{node=$2}
Any time a line begins with | followed by space followed by a character not |, then save the second field as a node name.
node && /nova-compute/ {s=s ";" node}
If node is non-empty and the current line contains nova-compute, then append node to the string s.
END{print substr(s,2)}
After we have read all the lines, print out string s minus its first character which is a superfluous ;.

Struggling to parse array notation

I have a very simple grammar to parse statements.
Here are examples of the type of statements that need be parsed:
a.b.c
a.b.c == "88"
The issue I am having is that array notation is not matching. For example, things that are not working:
a.b[0].c
a[3][4]
I hope someone can point out what I am doing wrong here. (I am testing in ANTLRWorks)
Here is the grammar (generationUnit is my entry point):
grammar RatBinding;
generationUnit: testStatement | statement;
arrayAccesor : identifier arrayNotation+;
arrayNotation: '[' Number ']';
testStatement:
(statement | string | Number | Bool )
(greaterThanAndEqual
| lessThanOrEqual
| greaterThan
| lessThan | notEquals | equals)
(statement | string | Number | Bool )
;
part: identifier | arrayAccesor;
statement: part ('.' part )*;
string: ('"' identifier '"') | ('\'' identifier '\'');
greaterThanAndEqual: '>=';
lessThanOrEqual: '<=';
greaterThan: '>';
lessThan: '<';
notEquals : '!=';
equals: '==';
identifier: Letter (Letter|Digit)*;
Bool : 'true' | 'false';
ArrayLeft: '\u005B';
ArrayRight: '\u005D';
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f '|
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
Digit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;
You referenced the non-existent rule Number in the arrayNotation parser rule.
A Digit rule does exist in the lexer, but it will only match a single-digit number. For example, 1 is a Digit, but 10 is two separate Digit tokens so a[10] won't match the arrayAccesor rule. You probably want to resolve this in two parts:
Create a Number token consisting of one or more digits.
Number
: Digit+
;
Mark Digit as a fragment rule to indicate that it doesn't form tokens on its own, but is merely intended to be referenced from other lexer rules.
fragment // prevents a Digit token from being created on its own
Digit
: ...
You will not need to change arrayNotation because it already references the Number rule you created here.
Bah, waste of space. I Used Number instead of Digit in my array declaration.

How to reuse Cucumber step definition with a table for the last parameter?

This code:
Then %{I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |}
And this one:
Then "I should see the following data in the \"Feeds\" data grid:
| Name |
| #{name} |"
And this:
Then "I should see the following data in the \"Feeds\" data grid:\n| Name |\n| #{name} |"
And even this:
Then <<EOS
I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |
EOS
Gives me:
Your block takes 2 arguments, but the Regexp matched 1 argument.
(Cucumber::ArityMismatchError)
tests/endtoend/step_definitions/instruments_editor_steps.rb:29:in `/^the editor shows "([^"]*)" in the feeds list$/'
melomel-0.6.0/lib/melomel/cucumber/data_grid_steps.rb:59:in `/^I should see the following data in the "([^"]*)" data grid:$/'
tests/endtoend/instruments_editor.feature:11:in `And the editor shows "myFeed" in the feeds list
This one:
Then "I should see the following data in the \"Feeds\" data grid: | Name || #{name} |"
And this one:
Then "I should see the following data in the \"Feeds\" data grid:| Name || #{name} |"
Gives:
Undefined step: "I should see the following data in the "Feeds" data grid:| Name || myFeed |" (Cucumber::Undefined)
./tests/endtoend/step_definitions/instruments_editor_steps.rb:31:in `/^the editor shows "([^"]*)" in the feeds list$/'
tests/endtoend/instruments_editor.feature:11:in `And the editor shows "myFeed" in the feeds list'
I've found the answer myself:
steps %Q{
Then I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |
}
NOTE ON THE ABOVE: might seem obvious, but the new line after the first '{' is soooooo important
Another way:
Given /^My basic step:$/ do |table|
#do table operation
end
Given /^My referring step:$/ do |table|
table.hashes.each do |row|
row_as_table = %{
|prop1|prop2|
|#{row[:prop1]}|#{row[:prop2]}|
}
Given %{My basic step:}, Cucumber::Ast::Table.parse(row_as_table, "", 0)
end
end
You can also write it this way, using #table
Then /^some other step$/ do
Then %{I should see the following data in the "Feeds" data grid:}, table(%{
| Name |
| #{name} |
})
end
Consider using
Given /^events with:-$/ do |table|
Given %{I am on the event admin page}
table.hashes.each do |row|
Given %{an event with:-}, Cucumber::Ast::Table.new([row]).transpose
end
end
I find that much more elegant that building up the table by hand.
events with:- gets a table like this
| Form | Element | Label |
| foo | bar | baz |
and an event with:- gets a table like
| Form | foo |
| Element | bar |
| Label | baz |

Resources