Bison simple postfix calculator using union with struct - struct

Im making a simple calculator that prints postfix to learn bison. i could make the postfix part work ,but now i need to do assigments to variables (a-z) tex: a=3+2; should print: a32+=; for example. Im trying modified my working postfix code to be able to read a char too.
If I understand correctly, to be able to put different types in the $$ i need a union and to make the nodes i should use a struct because in my case 'expr' can be a int or a char.
This is my parser:
%code requires{
struct nPtr{
char *val;
int num;
};
}
%union {
int iValue;
char sIndex;
struct nPtr *e;
};
%token PLUS MINUS STAR LPAREN RPAREN NEWLINE DIV ID NUMBER POW EQL
%type <iValue> NUMBER
%type <sIndex> ID
%type <e> expr line
%left PLUS MINUS
%left STAR DIV
%left POW
%left EQL
%%
line : /* empty */
|line expr NEWLINE { printf("=\n%d\n", $2.num); }
expr : LPAREN expr RPAREN { $$.num = $2.num; }
| expr PLUS expr { $$.num = $1.num + $3.num; printf("+"); }
| expr MINUS expr { $$.num = $1.num - $3.num; printf("-"); }
| expr STAR expr { $$.num = $1.num * $3.num; printf("*"); }
| expr DIV expr { $$.num = $1.num / $3.num; printf("/");}
| expr POW expr { $$.num = pow($1.num, $3.num); printf("**");}
| NUMBER { $$.num = $1.num; printf("%d", yylval); }
| ID EQL expr { printf("%c", yylval); }
;
%%
I have this in the lex to handle the "=" and variables
"=" { return EQL; }
[a-z] { yylval.sIndex = strdup(yytext); return ID; }
i get and error
warning empty rule for typed nonterminal and no action
the only answer i found here about that is this:
Bison warning: Empty rule for typed nonterminal
It says to just remove the /* empty */ part in:
line: /* empty */
| line expr NEWLINE { printf("=\n%d\n", $2.num); }
when i do that i get 3 new warnings:
warning 3 nonterminal useless in grammar
warning 10 rules useless in grammar
fatal error: start symbol line does not derive any sentence
I googled and got some solutions that just gave me other problems, for example.
when i change:
line:line expr NEWLINE { printf("=\n%d\n", $2.num); }
to
line:expr NEWLINE { printf("=\n%d\n", $2.num); }
bison works but when i try to run the code in visual studio i get a lot of errors like:
left of '.e' must have struct/union type
'pow': too few arguments for call
'=': cannot convert from 'int' to 'YYSTYPE'
thats as afar as i got. I cant find a simple example similar to my needs. I just want to make 'expr' be able to read a char and print it. If someone could check my code and recommend some changes . it will be really really appreciated.

It is important to actually understand what you are asking Bison to do, and what it is telling you when it suggests that your instructions are wrong.
Applying a fix from someone else's problem will only work if they made the same mistake as you did. Just making random changes based on Google searching is neither a very structured way to debug nor to learn a new tool.
Now, what you said was:
%type <e> expr line
Which means that expr and line both produce a value whose union tag is e. (Tag e is a pointer to a struct nPtr. I doubt that's what you wanted either, but we'll start with the Bison warnings.)
If a non-terminal produces a value, it produce a value in every production. To produce a value, the semantic rule needs to assign a value (of the correct tag-type) to $$. For convenience, if there is no semantic action and if $1 has the correct tag-type, then Bison will provide the default rule $$ = $1. (That sentence is not quite correct but it's a useful approximation.) Many people think that you should not actually rely on this default, but I think it's OK as long as you know that it is happening, and have verified that the preconditions are valid.
In your case, you have two productions for line. Neither of them assigns a value to $$. This might suggest that you didn't really intend for line to even have a semantic value, and if we look more closely, we'll see that you never attempt to use a semantic value for any instance of the non-terminal line. Bison does not, as far as I know, attempt such a detailed code examination, but it is capable of noting that the production:
line : /* empty */
has no semantic action, and
does not have any symbols on the right-hand side, so no default action can be constructed.
Consequently, it warns you that there is some circumstance in which line will not have its value set. Think of this as the same warning as your C compiler might give you if it notices that a variable you use has never had a value assigned to it.
In this case, as I said, Bison has not noticed that you never use value of line. It just assumes that you wouldn't have declared the type of the value if you don't intend to somehow provide a value. You did declare the type, and you never provide a value. So your instructions to Bison were not clear, right?
Now, the solution is not to remove the production. There is no problem with the production, and if you remove it then line is left with only one production:
line : line expr NEWLINE
That's a recursive production, and the recursion can only terminate if there is some other production for line which does not recurse. That would be the production line: %empty, which you just deleted. So now Bison can see that the non-terminal line is useless. It is useless because it can never derive any string without non-terminals.
But that's a big problem, because that's the non-terminal your grammar is supposed to recognize. If line can't derive any string, then your grammar cannot recognize anything. Also, because line is useless, and expr is only used by a production in line (other than recursive uses), expr itself can never actually be used. So it, too, becomes useless. (You can think of this as the equivalent of your C compiler complaining that you have unreachable code.)
So then you attempt to fix the problem by making (the only remaining) rule for line non-recursive. Now you have:
line : expr NEWLINE
That's fine, and it makes line useful again. But it does not recognise the language you originally set out to recognise, because it will now only recognise a single line. (That is, an expr followed by a NEWLINE, as the production says.)
I hope you have now figured out that your original problem was not the grammar, which was always correct, but rather the fact that you told Bison that line produced a value, but you did not actually do anything to provide that value. In other words, the solution would have been to not tell Bison that line produces a value. (You could, instead, provide a value for line in the semantic rule associated with every production for line, but why would you do that if you have no intention of every using that value?)
Now, let's go back to the actual type declarations, to see why the C compiler complains. We can start with a simple one:
expr : NUM { $$.num = $1.num; }
$$ refers the the expr itself, which is declared as having type tag e, which is a struct nPtr *. The star is important: it indicates that the semantic value is a pointer. In order to get at a field in a struct from a pointer to the struct, you need to use the -> operator, just like this simple C example:
struct nPtr the_value; /* The actual struct */
struct nPtr *p = &the_value; /* A pointer to the struct */
p.num = 3; /* Error: p is not a struct */
p->num = 3; /* Fine. p points to a struct */
/* which has a field called num */
(*p).num = 3; /* Also fine, means exactly the same thing. */
/* But don't write this. p->num is much */
/* more readable. */
It's also worth noting that in the C example, we had to make sure that p was initialised to the address of some actual memory. If we hadn't done that and we attempted to assign something to p->num, then we would be trying to poke a value into an undefined address. If you're lucky, that would segfault. Otherwise, it might overwrite anything in your executable.
If any of that was not obvious, please go back to your C textbook and make sure that you understand how C pointers work, before you try to use them.
So, that was the $$.num part. The actual statement was
$$.num = $1.num;
So now lets look at the right-hand side. $1 refers to a NUM which has tag type iValue, which is an int. An int is not a struct and it has no named fields. $$.num = $1.num is going to be processed into something like:
int i = <some value>;
$$.num = i.num;
which is totally meaningless. Here, the compiler will again complain that iValue is not a struct, but for a different reason: on the left-hand side, e was a pointer to a struct (which is not a struct); on the right-hand side iValue is an integer (which is also not a struct).
I hope that gives you some ideas on what you need to do. If in doubt, there are a variety of fully-worked examples in the Bison manual.

Related

Including comments in AST

I'm planning on writing a Parser for some language. I'm quite confident that I could cobble together a parser in Parsec without too much hassle, but I thought about including comments into the AST so that I could implement a code formatter in the end.
At first, adding an extra parameter to the AST types seemed like a suitable idea (this is basically what was suggested in this answer). For example, instead of having
data Expr = Add Expr Expr | ...
one would have
data Expr a = Add a Expr Expr
and use a for whatever annotation (e.g. for comments that come after the expression).
However, there are some not so exciting cases. The language features C-like comments (// ..., /* .. */) and a simple for loop like this:
for (i in 1:10)
{
... // list of statements
}
Now, excluding the body there are at least 10 places where one could put one (or more) comments:
/*A*/ for /*B*/ ( /*C*/ i /*E*/ in /*F*/ 1 /*G*/ : /*H*/ 10 /*I*/ ) /*J*/
{ /*K*/
...
In other words, while the for loop could previously be comfortably represented as an identifier (i), two expressions (1 & 10) and a list of statements (the body), we would now at least had to include 10 more parameters or records for annotations.
This get ugly and confusing quite quickly, so I wondered whether there is a clear better way to handle this. I'm certainly not the first person wanting to write a code formatter that preserves comments, so there must be a decent solution or is writing a formatter just that messy?
You can probably capture most of those positions with just two generic comment productions:
Expr -> Comment Expr
Stmt -> Comment Stmt
This seems like it ought to capture comments A, C, F, H, J, and K for sure; possibly also G depending on exactly what your grammar looks like. That only leaves three spots to handle in the for production (maybe four, with one hidden in Range here):
Stmt -> "for" Comment "(" Expr Comment "in" Range Comment ")" Stmt
In other words: one before each literal string but the first. Seems not too onerous, ultimately.

Arduino and TinyGPS++ convert lat and long to a string

I' m having a problem parsing the lat and long cords from TinyGPS++ to a Double or a string. The code that i'm using is:
String latt = ((gps.location.lat(),6));
String lngg = ((gps.location.lng(),6));
Serial.println(latt);
Serial.println(lngg);
The output that i'm getting is:
0.06
Does somebody know what i'm doing wrong? Does it have something to do with rounding? (Math.Round) function in Arduino.
Thanks!
There are two problems:
1. This does not compile:
String latt = ((gps.location.lat(),6));
The error I get is
Wouter.ino:4: warning: left-hand operand of comma has no effect
Wouter:4: error: invalid conversion from 'int' to 'const char*'
Wouter:4: error: initializing argument 1 of 'String::String(const char*)'
There is nothing in the definition of the String class that would allow this statement. I was unable to reproduce printing values of 0.06 (in your question) or 0.006 (in a later comment). Please edit your post to have the exact code that compiles, runs and prints those values.
2. You are unintentionally using the comma operator.
There are two places a comma can be used: to separate arguments to a function call, and to separate multiple expressions which evaluate to the last expression.
You're not calling a function here, so it is the latter use. What does that mean? Here's an example:
int x = (1+y, 2*y, 3+(int)sin(y), 4);
The variable x will be assigned the value of the last expression, 4. There are very few reasons that anyone would actually use the comma operator in this way. It is much more understandable to write:
int x;
1+y; // Just a calculation, result never used
2*y; // Just a calculation, result never used
3 + (int) sin(y); // Just a calculation, result never used
x = 4; // A (trivial) calculation, result stored in 'x'
The compiler will usually optimize out the first 3 statements and only generate code for the last one1. I usually see the comma operator in #define macros that are trying to avoid multiple statements.
For your code, the compiler sees this
((gps.location.lat(),6))
And evaluates it as a call to gps.location.lat(), which returns a double value. The compiler throws this value away, and even warns you that it "has no effect."
Next, it sees a 6, which is the actual value of this expression. The parentheses get popped, leaving the 6 value to be assigned to the left-hand side of the statement, String latt =.
If you look at the declaration of String, it does not define how to take an int like 6 and either construct a new String, or assign it 6. The compiler sees that String can be constructed from const char *, so it tells you that it can't convert a numeric 6 to a const char *.
Unlike a compiler, I think I can understand what you intended:
double latt = gps.location.lat();
double lngg = gps.location.lon();
Serial.println( latt, 6 );
Serial.println( lngg, 6 );
The 6 is intended as an argument to Serial.println. And those arguments are correctly separated by a comma.
As a further bonus, it does not use the String class, which will undoubtedly cause headaches later. Really, don't use String. Instead, hold on to numeric values, like ints and floats, and convert them to text at the last possible moment (e.g, with println).
I have often wished for a compiler that would do what I mean, not what I say. :D
1 Depending on y's type, evaluating the expression 2*y may have side effects that cannot be optimized away. The streaming operator << is a good example of a mathematical operator (left shift) with side effects that cannot be optimized away.
And in your code, calling gps.location.lat() may have modified something internal to the gps or location classes, so the compiler may not have optimized the function call away.
In all cases, the result of the call is not assigned because only the last expression value (the 6) is used for assignment.

Bison/Flex, reduce/reduce, identifier in different production

I am doing a parser in bison/flex.
This is part of my code:
I want to implement the assignment production, so the identifier can be both boolean_expr or expr, its type will be checked by a symbol table.
So it allows something like:
int a = 1;
boolean b = true;
if(b) ...
However, it is reduce/reduce if I include identifier in both term and boolean_expr, any solution to solve this problem?
Essentially, what you are trying to do is to inject semantic rules (type information) into your syntax. That's possible, but it is not easy. More importantly, it's rarely a good idea. It's almost always best if syntax and semantics are well delineated.
All the same, as presented your grammar is unambiguous and LALR(1). However, the latter feature is fragile, and you will have difficulty maintaining it as you complete the grammar.
For example, you don't include your assignment syntax in your question, but it would
assignment: identifier '=' expr
| identifier '=' boolean_expr
;
Unlike the rest of the part of the grammar shown, that production is ambiguous, because:
x = y
without knowing anything about y, y could be reduced to either term or boolean_expr.
A possibly more interesting example is the addition of parentheses to the grammar. The obvious way of doing that would be to add two productions:
term: '(' expr ')'
boolean_expr: '(' boolean_expr ')'
The resulting grammar is not ambiguous, but it is no longer LALR(1). Consider the two following declarations:
boolean x = (y) < 7
boolean x = (y)
In the first one, y must be an int so that (y) can be reduced to a term; in the second one y must be boolean so that (y) can be reduced to a boolean_expr. There is no ambiguity; once the < is seen (or not), it is entirely clear which reduction to choose. But < is not the lookahead token, and in fact it could be arbitrarily distant from y:
boolean x = ((((((((((((((((((((((y...
So the resulting unambiguous grammar is not LALR(k) for any k.
One way you could solve the problem would be to inject the type information at the lexical level, by giving the scanner access to the symbol table. Then the scanner could look a scanned identifier token in the symbol table and use the information in the symbol table to decide between one of three token types (or more, if you have more datatypes): undefined_variable, integer_variable, and boolean_variable. Then you would have, for example:
declaration: "int" undefined_variable '=' expr
| "boolean" undefined_variable '=' boolean_expr
;
term: integer_variable
| ...
;
boolean_expr: boolean_variable
| ...
;
That will work but it should be obvious that this is not scalable: every time you add a type, you'll have to extend both the grammar and the lexical description, because the now the semantics is not only mixed up with the syntax, it has even gotten intermingled with the lexical analysis. Once you let semantics out of its box, it tends to contaminate everything.
There are languages for which this really is the most convenient solution: C parsing, for example, is much easier if typedef names and identifier names are distinguished so that you can tell whether (t)*x is a cast or a multiplication. (But it doesn't work so easily for C++, which has much more complicated name lookup rules, and also much more need for semantic analysis in order to find the correct parse.)
But, honestly, I'd suggest that you do not use C -- and much less C++ -- as a model of how to design a language. Languages which are hard for compilers to parse are also hard for human beings to parse. The "most vexing parse" continues to be a regular source of pain for C++ newcomers, and even sometimes trips up relatively experienced programmers:
class X {
public:
X(int n = 0) : data_is_available_(n) {}
operator bool() const { return data_is_available_; }
// ...
private:
bool data_is_available_;
// ...
};
X my_x_object();
// ...
if (!x) {
// This code is unreachable. Can you see why?
}
In short, you're best off with a language which can be parsed into an AST without any semantic information at all. Once the parser has produced the AST, you can do semantic analyses in separate passes, one of which will check type constraints. That's far and away the cleanest solution. Without explicit typing, the grammar is slightly simplified, because an expr now can be any expr:
expr: conjunction | expr "or" conjunction ;
conjunction: comparison | conjunction "and" comparison ;
comparison: product | product '<' product ;
product: factor | product '*' factor ;
factor: term | factor '+' term ;
term: identifier
| constant
| '(' expr ')'
;
Each action in the above would simply create a new AST node and set $$ to the new node. At the end of the parse, the AST is walked to verify that all exprs have the correct type.
If that seems like overkill for your project, you can do the semantic checks in the reduction actions, effectively intermingling the AST walk with the parse. That might seem convenient for immediate evaluation, but it also requires including explicit type information in the parser's semantic type, which adds unnecessary overhead (and, as mentioned, the inelegance of letting semantics interfere with the parser.) In that case, every action would look something like this:
expr : expr '+' expr { CheckArithmeticCompatibility($1, $3);
$$ = NewArithmeticNode('+', $1, $3);
}

How to parse C-style comments with Alex lexer?

NB. I'm using this Alex template from Simon Marlow.
I'd like to create lexer for C-style comments. My current approach creates separate tokens for starting comments, ending, middle and oneline
%wrapper "monad"
tokens :-
<0> $white+ ;
<0> "/*" { mkL LCommentStart `andBegin` comment }
<comment> . { mkL LComment }
<comment> "*/" { mkL LCommentEnd `andBegin` 0 }
<0> "//" .*$ { mkL LSingleLineComment }
data LexemeClass
= LEOF
| LCommentStart
| LComment
| LCommentEnd
| LSingleLineComment
How can I reduce number of middle tokens? For input /*blabla*/ I will get 8 tokens instead of one!
How can I strip the // part from single line comment token?
Is it possible to lex comments without monad wrapper?
Have a look at this:
http://lpaste.net/107377
Test with something like:
echo "This /* is a */ test" | ./c_comment
which should print:
Right [W "This",CommentStart,CommentBody " is a ",CommentEnd,W "test"]
The key alex routines you need to use are:
alexGetInput -- gets the current input state
alexSetInput -- sets the current input state
alexGetByte -- returns the next byte and input state
andBegin -- return a token and set the current start code
Each of the routines commentBegin, commentEnd and commentBody have the following signature:
AlexInput -> Int -> Alex Lexeme
where Lexeme stands for the your token type. The AlexInput parameter has the form (for the monad wrapper):
(AlexPosn, Char, [Bytes], String)
The Int parameter is the length of the match stored in the String field. Therefore the form of most token handlers will be:
handler :: AlexInput -> Int -> Alex Lexeme
handler (pos,_,_,inp) len = ... do something with (take len inp) and pos ...
In general it seems that a handler can ignore the Char and [Bytes] fields.
The handlers commentBegin and commentEnd can ignore both the AlexInput and Int arguments because they just match fixed length strings.
The commentBody handler works by calling alexGetByte to accumulate the comment body until "*/" is found. As far as I know C comments may not be nested so the comment ends at the first occurrence of "*/".
Note that the first character of the comment body is in the match0 variable. In fact, my code has a bug in it since it will not match "/**/" correctly. It should look at match0 to decide whether to start at loop or loopStar.
You can use the same technique to parse "//" style comments - or any token where a non-greedy match is required.
Another key point is that patterns like $white+ are qualified with a start code:
<0>$white+
This is done so that they are not active while processing comments.
You can use another wrapper, but note that the structure of the AlexInput type may be different -- e.g. for the basic wrapper it is just a 3-tuple: (Char,[Byte],String). Just look at the definition of AlexInput in the generated .hs file.
A final note... accumulating characters using ++ is, of course, rather inefficient. You probably want to use Text (or ByteString) for the accumulator.

Any nice record Handling tricks in Haskell?

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.
Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.
You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Resources