I have data structures as such:
data IfTree = If Expr Statement IfTree | Else Statement | EndIf
data Statement = IfStatement IfTree
What is want is to make it impossible to do any of these combinations:
IfStatement $ Else ...
IfStatement $ EndIf
An IfStatement should only be able to take an If.
I am aware that I can hide the data constructors and only expose functions that compose them behind the scenes, but I want to limit this at the data type.
Update:
What I was trying to do was clunky. Thanks to the excellent answers and comments, a MUCH better way of handling this was given:
data Statement = If Expr Statement (Maybe Statement) | ...
or even:
data Stat = IfStat Expr Stat | IfElseStat Expr Stat Stat | …
This would be the traditional way of doing it:
data Stat = IfStat Expr Stat (Maybe Stat) | BarStat | BazStat | …
data Expr = FooExpr | …
-- if (foo) bar;
IfStat FooExpr BarStat Nothing
-- if (foo) bar; else baz;
IfStat FooExpr BarStat (Just BazStat)
The idea is to encode the grammar of your language as a data type, or at least the important bits. Else and EndIf don’t make sense outside of an If, so you don’t actually need to represent them.
You can inline the Maybe into the statement data type:
data Stat = IfStat Expr Stat | IfElseStat Expr Stat Stat | …
Or, if it makes sense for your language, you can add a representation for empty statements:
data Stat = IfStat Expr Stat Stat | EmptyStat | …
-- if (foo) bar;
-- if (foo) bar; else;
IfStat FooExpr BarStat EmptyStat
-- if (foo) bar; else baz;
IfStat FooExpr BarStat BazStat
However, normalising things like this can be problematic if you want exact pretty-printing later on.
Block statements can be handled similarly:
data Stat = … | BlockStat [Stat] | …
-- if (foo) { bar; baz; }
IfStat FooExpr (BlockStat [BarStat, BazStat]) EmptyStat
The problem is that an If isn't just an Else. You should define something like
data If = IfNoElse Expr Statement | IfElse Expr Statement Statement
data Statement = If If | While | ...
Related
I'm using ANTLR 4 and have a fairly complex grammar. I'm trying to simplify here...
Given an expression like: true and or false I want a parsing error since the operands defined expect expressions on either side and this has an expr operand operand expr
My reduced grammar is:
grammar MappingExpression;
/* The start rule; begin parsing here.
operator precedence is implied by the ordering in this list */
// =======================
// = PARSER RULES
// =======================
expr:
| op=(TRUE|FALSE) # boolean
| expr op=AND expr # logand
| expr op=OR expr # logor
;
TRUE : 'true';
FALSE : 'false';
WS : [ \t\r\n]+ -> skip; // ignore whitespace
AND : 'and';
OR : 'or';
however, it seems that the parser stops after evaluating true even though it has all four tokens identified (e.g., alt state returned becomes 2 in the parser).
If I can't get a parsing exception (because it is seeing what I deem operands as expressions), if I got the entire parse tree I could throw a runtime exception for two operands in a row (e.g., 'and' and 'or').
Originally, I'd just had:
expr 'and' expr #logand
expr 'or' expr #logor
and this suffered the same parsing problem (stopping early).
You should get a parsing error if you force the parser to consume all tokens by "anchoring" a rule with the built-in EOF
parse
: expr EOF
;
This is what I get when parsing the input true and or false:
See the error in the lower left corner:
line 1:9 extraneous input 'or' expecting {'true', 'false'}
line 1:17 missing {'true', 'false'} at '<EOF>'
Bart Kiers answer above is correct. I just wanted to provide more details for people working with Java who have experienced incomplete parsing issues.
I'd had a fairly complex g4 file that defined an expr as a series of OR'ed rules associated with tags (e.g., following a # that become the method name in the ExpressionsVisitor). While this seemed to work there were situations where I'd expected parsing errors but received none. I also had situations where only part of an input to the parser was interpreted making it impossible to process the entire input statement.
I repaired the g4 file as follows (the full version is here):
// =======================
// = PARSER RULES
// =======================
expr_to_eof : expr EOF ;
expr:
ID # id
| '*' # field_values
| DESCEND # descendant
| DOLLAR # context_ref
| ROOT # root_path
| ARR_OPEN exprOrSeqList? ARR_CLOSE # array_constructor
| OBJ_OPEN fieldList? OBJ_CLOSE # object_constructor
| expr '.' expr # path
| expr ARR_OPEN ARR_CLOSE # to_array
| expr ARR_OPEN expr ARR_CLOSE # array
| expr OBJ_OPEN fieldList? OBJ_CLOSE # object
| VAR_ID (emptyValues | exprValues) # function_call
| FUNCTIONID varList '{' exprList? '}' # function_decl
| VAR_ID ASSIGN (expr | (FUNCTIONID varList '{' exprList? '}')) # var_assign
| (FUNCTIONID varList '{' exprList? '}') exprValues # function_exec
| op=(TRUE|FALSE) # boolean
| op='-' expr # unary_op
| expr op=('*'|'/'|'%') expr # muldiv_op
| expr op=('+'|'-') expr # addsub_op
| expr op='&' expr # concat_op
| expr op=('<'|'<='|'>'|'>='|'!='|'=') expr # comp_op
| expr 'in' expr # membership
| expr 'and' expr #logand
| expr 'or' expr # logor
| expr '?' expr (':' expr)? # conditional
| expr CHAIN expr # fct_chain
| '(' (expr (';' (expr)?)*)? ')' # parens
| VAR_ID # var_recall
| NUMBER # number
| STRING # string
| 'null' # null
;
Based on Bart's suggestion I added the top rule for expr_to_eof that resulted in that method being added to the MappingExpressionParser. So, in my Expressions class where before I'd called tree = parser.expr(); I now needed to call tree = parser.expr_to_eof(); which resulted in a ParseTree that included a last child for the Token.EOF.
Because my code needed to check some conditions for the first and last step performed it was easiest for me to add the following to strip out the <EOF> and get back the ParseTree (ExprContext rather than Expr_to_eofContext) I had been using by adding this statement:
newTree = ((Expr_to_eofContext)tree).expr();
So, overall, it was quite easy to fix a long standing bug (and others I'd postponed addressing) just by adding the new rule in the .g4 file and changing the parser so it would parse to end of file () and then extract the entire expression that was parsed.
I expect this will allow me to add considerably more functions to JSONata4Java to match the JavaScript version jsonata.js
Thanks again Bart!
So I'm getting to get the CPU core temperature using sensors command.
Inside conky, I wrote
$Core 0 Temp:$alignr${execi 1 sensors | grep 'Core 0' | awk {'print $3'}} $alignr${execibar 1 sensors | grep 'Core 0' | awk {'print $3'}}
Each second I'm running the exact same command sensors | grep 'Core 0' | awk {'print $3'} in two places for exact same output. Is there is a way to hold the output inside a variable and use that variable in place of the commands.
conky does not have user variables. What you can do instead is call lua from conky to do this for you. The lua language is usually built-in to conky by default, so you need only put some code in a file, include the file in the conky setup file, and call the function. For example, these shell commands will create a test:
cat >/tmp/.conkyrc <<\!
conky.config = {
lua_load = '/tmp/myfunction.lua',
minimum_height = 400,
minimum_width = 600,
use_xft = true,
font = 'Times:size=20',
};
conky.text = [[
set ${lua myfunction t ${execi 1 sensors | awk '/^Core 0/{print 0+$3}'}}°C
get ${lua myfunction t}°C ${lua_bar myfunction t}
]]
!
cat >/tmp/myfunction.lua <<\!
vars = {}
function conky_myfunction(varname, arg)
if(arg~=nil)then vars[varname] = conky_parse(arg) end
return vars[varname]
end
!
conky -c /tmp/.conkyrc -o
In the myfunction.lua file, we declare a function myfunction() (which needs to be prefixed conky_ so we can call it from conky). It takes 2 parameters, the name of a variable, and a conky expression. It calls conky_parse() to evaluate the expression, and saves the value in a table vars, under the name provided by the caller. It then returns the resulting value to the caller. If no expression was given, it will return the previous value.
In the conky.text the line beginning set calls myfunction in lua with the arbitrary name of a variable, t, and the execi sensors expression to evaluate, save, and return. The line beginning get calls myfunction to just get the value.
lua_bar is similar to exec_bar, but calls a lua function, see man conky. However, it expects a number without the leading + that exec_bar accepts, so I've changed the awk to return this, and have added the °C to the conky text instead.
I have a ANTR4 rule "expression" that can be either "maths" or "comparison", but "comparison" can contain "maths". Here a concrete code:
expression
: ID
| maths
| comparison
;
maths
: maths_atom ((PLUS | MINUS) maths_atom) ? // "?" because in fact there is first multiplication then pow and I don't want to force a multiplication to make an addition
;
maths_atom
: NUMBER
| ID
| OPEN_PAR expression CLOSE_PAR
;
comparison
: comp_atom ((EQUALS | NOT_EQUALS) comp_atom) ?
;
comp_atom
: ID
| maths // here is the expression of interest
| OPEN_PAR expression CLOSE_PAR
;
If I give, for instance, 6 as input, this is fine for the parse tree, because it detects maths. But in the ANTLR4 plugin for Intellij Idea, it mark my expression rule as red - ambiguity. Should I say goodbye to a short parse tree and allow only maths trough comparison in expression so it is not so ambiguous anymore ?
The problem is that when the parser sees 6, which is a NUMBER, it has two paths of reaching it through your grammar:
expression - maths - maths_atom - NUMBER
or
expression - comparison - comp_atom - NUMBER
This ambiguity triggers the error that you see.
You can fix this by flattening your parser grammar as shown in this tutorial:
start
: expr | <EOF>
;
expr
: expr (PLUS | MINUS) expr # ADDGRP
| expr (EQUALS | NOT_EQUALS) expr # COMPGRP
| OPEN_PAR expression CLOSE_PAR # PARENGRP
| NUMBER # NUM
| ID # IDENT
;
Grammar:
grammar Test;
file: (procDef | statement)* EOF;
procDef: 'procedure' ID NL statement+ ;
statement: 'statement'? NL;
WS: (' ' | '\t') -> skip;
NL: ('\r\n' | '\r' | '\n');
ID: [a-zA-Z0-9]+;
Test data:
statement
procedure Proc1
statement
statement
The parser does what I want (i.e. statement+ is greedy), but it reports an ambiguity because it doesn't know whether the last statement belongs to procDef or file (as I understand it).
As predicates are language dependent I'd prefer not to use one.
The procedure is supposed to end when a statement that can't belong to it, such as 'procedure', occurs.
I also would prefer to have the statements bound to the procedure to avoid having to rearrange the structure later.
Edit
It seems I should expand my test data a bit (but I will leave the original as it is small and shows the ambiguity I want to solve).
I want to be able to handle situations like this:
statement
procedure Proc1
statement
statement
procedure Proc2
statement
statement
procedure Proc2a
statement
statement
global
statement
procedure Proc3
statement
statement
(The indentation is not significant.) I can do it without predicates with something like
file: (
commonStatement
| globalStatement
)* EOF;
procDef: 'procedure' ID NL commonStatement+ ;
commonStatement: 'statement'? NL;
globalStatement: 'global' NL | procDef (globalStatement | EOF);
but then the tree becomes deeper with each consecutive procDef, and that feels very undesirable.
Then a solution with predicates is actually preferable.
#parser::members { boolean inProc; }
file: (
{!inProc}? commonStatement
| globalStatement
)* EOF;
procDef: 'procedure' ID {inProc = true;} NL commonStatement+ ;
commonStatement: 'statement'? NL;
globalStatement: ('global' NL {inProc = false;} | procDef) ;
The situation is actually worse than this, as globally acessible commonStatements can occur without an intervening globalStatement (accessible through gotos), but there is no way a parser can distinguish between that and statements belonging to the procedure, so my plan was to just discourage such use (and I don't think it's common). In fact, it is perfectly legal to jump into procedure code as well ...
It may turn out that in the end I will have to examine runtime paths anyway (scope is very much determined at runtime), and the grammar might end up something like
file: (
commonStatement
| globalStatement
| procDef
)* EOF;
procDef: 'procedure' ID NL procStatement*;
commonStatement: 'statement'? NL;
procStatement: 'proc' NL;
globalStatement: 'global' NL;
We will see ...
By your criteria, it is impossible for a statement to follow a procDef. You are well within your rights to design a language that way, but I hope you have an answer ready for the FAQ "How do I write a statement which comes after a procedure definition."
Writing the grammar is the easy part:
file: statement* procDef* EOF;
In a rule expr : expr '<' expr | ...;
the ANTLR parser will accept expressions like 1 < 2 < 3 (and construct left-associative trees corrsponding to brackets (1 < 2) < 3.
You can tell ANTLR to treat operators as right associative, e.g.
expr : expr '<'<assoc=right> expr | ...;
to yield parse trees 1 < (2 < 3).
However, in many languages, relational operators are non-associative, i.e., an expression 1 < 2 < 3 is forbidden.
This can be specified in YACC and its derivates.
Can it also be specified in ANTLR?
E.g., as expr : expr '<'<assoc=no> expr | ...;
I was unable to find something in the ANTLR4-book so far.
How about the following approach. Basically the "result" of a < b has a type not compatible for another application of operator < or >:
expression
: boolExpression
| nonBoolExpression
;
boolExpression
: nonBoolExpression '<' nonBoolExpression
| nonBoolExpression '>' nonBoolExpression
| ...
;
nonBoolExpression
: expression '*' expression
| expression '+' expression
| ...
;
Although personally I'd go with Darien and rather detect the error after parsing.