I have a grammer snippet like this
expr: left=expr '=' right=expr #exprassign
| atom #expratom
;
atom: TEXT
| ID
;
TEXT: '\''(.)*?'\'';
ID:[A-Z]+;
In my visitor method
visitExprAtom()
How can I find getparent context which is current. Lets say my method works under left side say it fromleft other is fromright
I need to change my approaches when I learn where does it come from fromleft or fromright
Thanks.
EDIT:
Let's assume I got a code snippet.
var varcompany='C1';// Creates an variable named company initialated C1
SELECT * FROM TABLE
WHERE COMPANY = varcompany; //COMPANY ='C1'
Above expr grammar is parsing for COMPANY = varcompany
this approach my parser think for COMPANY as a variable and tries to find in my map for it. Or I use atom expression other Dsl scenerios and I need its acting differently. on SQL it must return like 'C1' but other scenerios it must return C1
Related
I'm planning on writing a Parser for some language. I'm quite confident that I could cobble together a parser in Parsec without too much hassle, but I thought about including comments into the AST so that I could implement a code formatter in the end.
At first, adding an extra parameter to the AST types seemed like a suitable idea (this is basically what was suggested in this answer). For example, instead of having
data Expr = Add Expr Expr | ...
one would have
data Expr a = Add a Expr Expr
and use a for whatever annotation (e.g. for comments that come after the expression).
However, there are some not so exciting cases. The language features C-like comments (// ..., /* .. */) and a simple for loop like this:
for (i in 1:10)
{
... // list of statements
}
Now, excluding the body there are at least 10 places where one could put one (or more) comments:
/*A*/ for /*B*/ ( /*C*/ i /*E*/ in /*F*/ 1 /*G*/ : /*H*/ 10 /*I*/ ) /*J*/
{ /*K*/
...
In other words, while the for loop could previously be comfortably represented as an identifier (i), two expressions (1 & 10) and a list of statements (the body), we would now at least had to include 10 more parameters or records for annotations.
This get ugly and confusing quite quickly, so I wondered whether there is a clear better way to handle this. I'm certainly not the first person wanting to write a code formatter that preserves comments, so there must be a decent solution or is writing a formatter just that messy?
You can probably capture most of those positions with just two generic comment productions:
Expr -> Comment Expr
Stmt -> Comment Stmt
This seems like it ought to capture comments A, C, F, H, J, and K for sure; possibly also G depending on exactly what your grammar looks like. That only leaves three spots to handle in the for production (maybe four, with one hidden in Range here):
Stmt -> "for" Comment "(" Expr Comment "in" Range Comment ")" Stmt
In other words: one before each literal string but the first. Seems not too onerous, ultimately.
I'm a beginner in Haskell playing around with parsing and building an AST. I wonder how one would go about defining types like the following:
A Value can either be an Identifier or a Literal. Right now, I simply have a type Value with two constructors (taking the name of the identifier and the value of the string literal respectively):
data Value = Id String
| Lit String
However, then I wanted to create a type representing an assignment in an AST, so I need something like
data Assignment = Asgn Value Value
But clearly, I always want the first part of an Assignment to always be an Identifier! So I guess I should make Identifier and Literal separate types to better distinguish things:
data Identifier = Id String
data Literal = Lit String
But how do I define Value now? I thaught of something like this:
-- this doesn't actually work...
data Value = (Id String) -- How to make Value be either an Identifier
| (Lit String) -- or a Literal?
I know I can simply do
data Value = ValueId Identifier
| ValueLit Literal
but this struck me as sort of unelegant and got me wondering if there was a better solution?
I first tried to restructure my types so that I would be able to do it with GADTs, but in the end the simpler solution was to go leftroundabout's suggestion. I guess it's not that "unelegant" anyways.
I am doing a parser in bison/flex.
This is part of my code:
I want to implement the assignment production, so the identifier can be both boolean_expr or expr, its type will be checked by a symbol table.
So it allows something like:
int a = 1;
boolean b = true;
if(b) ...
However, it is reduce/reduce if I include identifier in both term and boolean_expr, any solution to solve this problem?
Essentially, what you are trying to do is to inject semantic rules (type information) into your syntax. That's possible, but it is not easy. More importantly, it's rarely a good idea. It's almost always best if syntax and semantics are well delineated.
All the same, as presented your grammar is unambiguous and LALR(1). However, the latter feature is fragile, and you will have difficulty maintaining it as you complete the grammar.
For example, you don't include your assignment syntax in your question, but it would
assignment: identifier '=' expr
| identifier '=' boolean_expr
;
Unlike the rest of the part of the grammar shown, that production is ambiguous, because:
x = y
without knowing anything about y, y could be reduced to either term or boolean_expr.
A possibly more interesting example is the addition of parentheses to the grammar. The obvious way of doing that would be to add two productions:
term: '(' expr ')'
boolean_expr: '(' boolean_expr ')'
The resulting grammar is not ambiguous, but it is no longer LALR(1). Consider the two following declarations:
boolean x = (y) < 7
boolean x = (y)
In the first one, y must be an int so that (y) can be reduced to a term; in the second one y must be boolean so that (y) can be reduced to a boolean_expr. There is no ambiguity; once the < is seen (or not), it is entirely clear which reduction to choose. But < is not the lookahead token, and in fact it could be arbitrarily distant from y:
boolean x = ((((((((((((((((((((((y...
So the resulting unambiguous grammar is not LALR(k) for any k.
One way you could solve the problem would be to inject the type information at the lexical level, by giving the scanner access to the symbol table. Then the scanner could look a scanned identifier token in the symbol table and use the information in the symbol table to decide between one of three token types (or more, if you have more datatypes): undefined_variable, integer_variable, and boolean_variable. Then you would have, for example:
declaration: "int" undefined_variable '=' expr
| "boolean" undefined_variable '=' boolean_expr
;
term: integer_variable
| ...
;
boolean_expr: boolean_variable
| ...
;
That will work but it should be obvious that this is not scalable: every time you add a type, you'll have to extend both the grammar and the lexical description, because the now the semantics is not only mixed up with the syntax, it has even gotten intermingled with the lexical analysis. Once you let semantics out of its box, it tends to contaminate everything.
There are languages for which this really is the most convenient solution: C parsing, for example, is much easier if typedef names and identifier names are distinguished so that you can tell whether (t)*x is a cast or a multiplication. (But it doesn't work so easily for C++, which has much more complicated name lookup rules, and also much more need for semantic analysis in order to find the correct parse.)
But, honestly, I'd suggest that you do not use C -- and much less C++ -- as a model of how to design a language. Languages which are hard for compilers to parse are also hard for human beings to parse. The "most vexing parse" continues to be a regular source of pain for C++ newcomers, and even sometimes trips up relatively experienced programmers:
class X {
public:
X(int n = 0) : data_is_available_(n) {}
operator bool() const { return data_is_available_; }
// ...
private:
bool data_is_available_;
// ...
};
X my_x_object();
// ...
if (!x) {
// This code is unreachable. Can you see why?
}
In short, you're best off with a language which can be parsed into an AST without any semantic information at all. Once the parser has produced the AST, you can do semantic analyses in separate passes, one of which will check type constraints. That's far and away the cleanest solution. Without explicit typing, the grammar is slightly simplified, because an expr now can be any expr:
expr: conjunction | expr "or" conjunction ;
conjunction: comparison | conjunction "and" comparison ;
comparison: product | product '<' product ;
product: factor | product '*' factor ;
factor: term | factor '+' term ;
term: identifier
| constant
| '(' expr ')'
;
Each action in the above would simply create a new AST node and set $$ to the new node. At the end of the parse, the AST is walked to verify that all exprs have the correct type.
If that seems like overkill for your project, you can do the semantic checks in the reduction actions, effectively intermingling the AST walk with the parse. That might seem convenient for immediate evaluation, but it also requires including explicit type information in the parser's semantic type, which adds unnecessary overhead (and, as mentioned, the inelegance of letting semantics interfere with the parser.) In that case, every action would look something like this:
expr : expr '+' expr { CheckArithmeticCompatibility($1, $3);
$$ = NewArithmeticNode('+', $1, $3);
}
I have a type as follows:
data Stitch mark = OverStitch mark (Stitch mark) | TokenStitch | TerminalStitch
There can only be one single value of TerminalStitch. So I wish I could define this value at the top level of my module something like that:
terminalStitch :: Stitch
terminalStitch = TerminalStitch -- <--- value = constructor()
But it doesn't seem to work. What should I do instead?
Well the concrete problem here is a typo
terminalSticth = TerminalStitch
-- ^ swapped the letters
Also in your type signature, you need to provide stitch a type
terminalStitch :: Stitch a
What are you trying to achieve here? In Haskell you can't compare things "by identity" only by value. So using terminalStitch is completely identical to just using TerminalStitch.
Classic way to define Haskell functions is
f1 :: String -> Int
f1 ('-' : cs) -> f1 cs + 1
f1 _ = 0
I'm kinda unsatisfied writing function name at every line. Now I usually write in the following way, using pattern guards extension and consider it more readable and modification friendly:
f2 :: String -> Int
f2 s
| '-' : cs <- s = f2 cs + 1
| otherwise = 0
Do you think that second example is more readable, modifiable and elegant? What about generated code? (Haven't time to see desugared output yet, sorry!). What are cons? The only I see is extension usage.
Well, you could always write it like this:
f3 :: String -> Int
f3 s = case s of
('-' : cs) -> f3 cs + 1
_ -> 0
Which means the same thing as the f1 version. If the function has a lengthy or otherwise hard-to-read name, and you want to match against lots of patterns, this probably would be an improvement. For your example here I'd use the conventional syntax.
There's nothing wrong with your f2 version, as such, but it seems a slightly frivolous use of a syntactic GHC extension that's not common enough to assume everyone will be familiar with it. For personal code it's not a big deal, but I'd stick with the case expression for anything you expect other people to be reading.
I prefer writing function name when I am pattern matching on something as is shown in your case. I find it more readable.
I prefer using guards when I have some conditions on the function arguments, which helps avoiding if else, which I would have to use if I was to follow the first pattern.
So to answer your questions
Do you think that second example is more readable, modifiable and elegant?
No, I prefer the first one which is simple and readable. But more or less it depends on your personal taste.
What about generated code?
I dont think there will be any difference in the generated code. Both are just patternmatching.
What are cons?
Well patternguards are useful to patternmatch instead of using let or something more cleanly.
addLookup env var1 var2
| Just val1 <- lookup env var1
, Just val2 <- lookup env var2
= val1 + val2
Well the con is ofcourse you need to use an extension and also it is not Haskell98 (which you might not consider much of a con)
On the other hand for trivial pattern matching on function arguments I will just use the first method, which is simple and readable.