Haskell recursive problem, tiny parser. A few things - haskell

data Expr = Var Char | Tall Int | Sum Expr Expr | Mult Expr Expr | Neg Expr | Let Expr Expr Expr
deriving(Eq, Show)
That is the datatype for Expr, I have a few questions. I'm suppose to parse expressions like *(Expr,Expr) as shown in the datatype definition. However I do have some problems with "creating" a valid Expr. I use pattern matching for recognizing the different things Expr can be. Some more code:
parseExpr :: String -> (Expr, String)
parseExpr ('*':'(':x:',':y:')':s) = (Mult (parseExpr [x] parseExpr [y]),s)
This is not working, obviously. The return type of parseExpr is to return the rest of the expression that is to be parsed an a portion of the parsed code as an Expr. The right side of this code is the problem. I can't make a valid Expr. The function is suppose to call it self recursively until the problem is solved.
ANOTHER problem is that I don't know how to do the pattern matching against Var and Tall. How can I check that Var is an uppercase character between A-Z and that Tall is 0-9 and return it as a valid Expr?
Generally I can just look at a few parts of the string to understand what part of Expr I'm dealing with.
Input like: parseProg "let X be 9 in *(X , 2)" Would spit out: Let (Var 'X') (Tall 9) (Mult (Var 'X') (Tall 2))

Your parseExpr function returns a pair, so of course you cannot use its result directly to construct an Expr. The way I would write this would be something like
parseExpr ('*':'(':s) = (Mult x y, s'')
where (x,',':s') = parseExpr s
(y,')':s'') = parseExpr s'
The basic idea is that, since parseExpr returns the leftover string as the second argument of the pair, you need to save that string in each recursive call you make, and when you've handled all the subexpressions, you need to return whatever is left over. And obviously the error handling here sucks, so you may want to think about that a bit more if this is intended to be a robust parser.
Handling Var and Tall I would do by just extracting the first character as is and have an if to construct an Expr of the appropriate type.
And if you want to write more complex parsers in Haskell, you'll want to look at the Parsec library, which lets you write a parser as pretty much the grammar of the language you're parsing.

Related

Is it possible to access a specific piece of a custom data type in Haskell?

I'm very new to haskell, and functional programming in general, I'm switching back and fourth between two different books on haskell, but I can't seem to find an answer to my question. Say I have a custom datatype like the one below
data Expr
= Let String Expr Expr
| Binary BinOp Expr Expr
| Unary UnOp Expr
| Literal Literal
| Var String
and I have an instance of this data type that is in the form of the first constructor Let String Expr Expr, is it possible to access a specific piece of that Expr? For example if I wanted to access the String within that specific instance.
Pattern matching is your answer.
Something like this should do the trick:
myfunction :: Expr -> SomeReturnType
myfunction (Let str _ _) = doSomethingWith str -- "str" here is your string
You'll want to handle the other cases as well though, so you don't cause a runtime error:
myfunction :: Expr -> SomeReturnType
myfunction (Let str _ _) = doSomethingElse str
myfunction (Binary _ _ _) = somethingEvenDifferent
myfunction (Unary _ _) = etc
--- etc...
the _ just says to ignore the value at that position in the constructor.
Also, as #Bergi mentioned, there are many other places you can use pattern matching, like let or case statements, just always be sure to handle all the cases that your value could potentially be at that point in your program.

Is using #-patterns to get the pattern value redundant?

I'm going through Write Yourself a Scheme in 48 Hours, and in it I've come across some seemingly redundant code; they use #-patterns and then return the value itself, let me explain.
Here's the relevant code:
data LispVal = Atom String
| List [LispVal]
| DottedList [LispVal] LispVal
| Number Integer
| String String
| Bool Bool
eval :: LispVal -> LispVal -- code in question starts here
eval val#(String _) = val
eval val#(Number _) = val
eval val#(Bool _) = val
eval (List [Atom "quote", val]) = val
It seems to me that the entire eval function could just as easily be re-written as
eval :: LispVal -> LispVal
eval (List [Atom "quote", val]) = val
eval val = val
And have the bottom case account for all the #-patterns in the original code.
Am I mistaken in thinking this, and is there an actual benefit of doing it the way they did? Or is the other way more concise?
One difference is that the original code is undefined for values constructed with Atom, i.e. there is no line
eval val#(Atom _) = val
And whether this is a copy’n’paste error or not, it highlights the important difference in style:
The first style encourages you to think about each value individually, making an explicit assertion “this is the right equation for this”. If you later add more constructors t othe LispVal type, you get runtime errors (or compiler warnings with -fwarn-incomplete-patterns, which is good practice).
The second style asserts: eval will only have to look at List values, and all others can be treated individually. In particular, later additions to the data type should work just as well, and you don’t want to be bothered about this function then.
Operationally, they are equivalent.

Parsec parsing in Haskell

I have 2 parsers:
nexpr::Parser (Expr Double)
sexpr::Parser (Expr String)
How do I build a parser that tries one and then the other if it doesn't work? I can't figure out what to return. There must be a clever way to do this.
Thanks.
EDIT:
Adding a bit more info...
I'm learning Haskel, so I started with :
data Expr a where
N::Double -> Expr Double
S::String -> Expr String
Add::Expr Double -> Expr Double -> Expr Double
Cat::Expr String -> Expr String -> Expr String
then I read about F-algebra (here) and so I changed it to:
data ExprF :: (* -> *) -> * -> * where
N::Double -> ExprF r Double
S::String -> ExprF r String
Add::r Double -> r Double -> ExprF r Double
Cat::r String -> r String -> ExprF r String
with
type Expr = HFix ExprF
so my parse to:
Parser (Expr Double)
is actually:
Parser (ExprF HFix Double)
Maybe I'm biting off more than I can chew...
As noted in the comments, you can have a parser like this
nOrSexpr :: Parser (Either (Expr Double) (Expr String))
nOrSexpr = (Left <$> nexpr) <|> (Right <$> sexpr)
However, I think the reason that you are having this difficulty is because you are not representing your parse tree as a single type, which is the more usual thing to do. Something like this:
data Expr =
ExprDouble Double
| ExprInt Int
| ExprString String
That way you can have parsers for each kind of expression that are all of type Parser Expr. This is the same as using Either but more flexible and maintainable. So you might have
doubleParser :: Parser Expr
doubleParser = ...
intParser :: Parser Expr
intParser = ...
stringParser :: Parser Expr
stringParser = ...
exprParser :: Parser Expr
exprParser = intParser <|> doubleParser <|> stringParser
Note that the order of the parsers does matter and use can use Parsec's try function if backtracking is needed.
So, for example, if you want to have a sum expression now, you can add to the data type
data Expr =
ExprDouble Double
| ExprInt Int
| ExprString String
| ExprSum Expr Expr
and make the parser
sumParser :: Parser Expr
sumParser = do
a <- exprParser
string " + "
b <- exprParser
return $ ExprSum a b
UPDATE
Well, I take my hat off to you diving straight into GADTs if you are just starting with Haskell. I have been reading through the paper you linked and noticed this immediately in the first paragraph:
The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.
There are three points worth taking away here I think. The first is simply that I would have a go with the simpler way of doing things first, to get an idea of how it works and why you might want to add more type safety, before trying to more complicated type theoretical stuff. That comment may not help so feel free to ignore it!
Secondly, and more importantly, your representation...
data ExprF :: (* -> *) -> * -> * where
N :: Double -> ExprF r Double
S :: String -> ExprF r String
Add :: r Double -> r Double -> ExprF r Double
Cat :: r String -> r String -> ExprF r String
...is specifically designed to not allow ill formed type expressions. Contrasted with mine which can, eg ExprSum (ExprDouble 5.0) (ExprString "test"). So the question you really want to ask is what should actually happen when the parser attempts to parse something like "5.0 + \"test\""? Do you want it to just not parse, or do you want it to return a nice message saying that this expression is the wrong type? Compilers are usually designed in multiple stages for this reason. The first pass turns the input into an abstract syntax tree (AST), and further passes annotate this tree with type judgements. This annotated AST can then be transformed into the semantic representation that you really want it in.
So in your case I would recommend two stages. first, parse into a dumb representation like mine, that will give you the correct tree shape but allow ill-typed expressions. Like
data ExprAST =
ExprASTDouble Double
| ExprASTInt Int
| ExprASTString String
| ExprASTAdd Expr Expr
Then have another function that will typecheck the ExprAST. Something like
typecheck :: ExprAST -> Maybe (ExprF HFix a)
(You could also use Either and return either the typechecked GADT or an error string saying what the problem is.) The further problem here is that you don't know what a is statically. The other answer solves this by using type tags and an existential wrapper, which you might find to be the best way to go. I feel like it might be simpler to have a top level expression in your GADT that all expressions must live in, so an entire parse will always have the same type. In the end there is usually only one program type.
My third, and last, point is related to this
The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.
The more type safety, generally the more work you have to do to get it. You mention you are new to Haskell, yet this adventure has taken us right to the edge of what it is capable of doing. The type of the parsed expression cannot depend only on the input string in a Haskell function, because it does not allow for dependant types. If you want to go down this path, I might suggest you have a look at a language called Idris. A great introduction to what it is capable of can be found in this video, in which he constructs a typesafe printf.
The problem described looks to be using Parsec to parse into a GADT representation, for which probably the easiest solution would be parse into a monotype representation and then have a (likely partial) type checking phase to produce the well-typed GADT, if it can. The monotype representation could be an existential wrapper over a GADT term, with a type-tag to reify the GADT index.
EDIT: a quick example
Let's define a type for type-tags and an existential wrapper:
data Type :: * -> * where
TDouble :: Type Double
TString :: Type String
data Judgement f = forall ix. Judgement (f ix) (Type ix)
With the example GADT given in the original post, we only have a problem with the outer-most production, which we need to parse to a monotype as we don't know statically which expression type we will get at runtime:
pExpr :: Parser (Judgement Expr)
pExpr = Judgement <$> pDblExpr <*> pure TDouble
<|> Judgement <$> pStrExpr <*> pure TString
We can write a type check phase to produce a GADT or fail, depending on whether the type assertion succeeds or not:
typecheck :: Judgement Expr -> Type ix -> Maybe (Expr ix)
typecheck (Judgement e TDouble) TDouble = Just e
typecheck (Judgement e TString) TString = Just e
typecheck _ _ = Nothing

Parsec not consuming all Input?

I'am writing some code to parse commands from the Simple Imperative Language defined in
Theory of Programming Languages (Reynolds, 1998).
I have a lexer module that given a string extracts the tokens from it if it's a valid language expression and then I pass that list of tokens to the parser which should build an internal representation of the command (defined as an algebraic data type).
These are my Tokens:
--Tokens for the parser
data Token = Kw Keyword
| Num Int
| Op Operator
| Str String
| Sym Symbol
deriving Show
I'm having trouble with binary operators. I'll put as an example the sum, but it happens the same with all of them, either boolean or integers.
For example if I'd run the program parse "x:=2+3"
I should get the following list of tokens from the lexer
[Str "x", Op Colon, Op Equal, Num 2, OP, Plus, Num 3]
which is actually what I'm getting.
But then the parser should return the command
Assign "x" (Ibin Plus (Const 2) (Const 3)
which is the correct representation of the command. But instead of that I'm getting the following representation:
Assign "x" (Const 2)
I guess that I screwed it at some point in the pIntExpr function because the variable identifier and the := of the assignment are parsed OK and it's not parsing the last elements. Here are the relevant parsers for this example, to see if someone can orientate me in what I'm doing wrong.
-- Integer expressions
data IntExpr = Const Int
| Var Iden --Iden=String
| Neg IntExpr
| IBin OpInt IntExpr IntExpr
deriving Show
type TParser = Parsec [Token] ()
--Internal representation of the commands
data Comm = Skip
| Assign Iden IntExpr
| If Assert Comm Comm
| Seq Comm Comm
| While Assert Comm
| Newvar Iden IntExpr Comm
deriving Show
--Parser for non sequential commands
pComm' :: TParser Comm
pComm' = choice [pif,pskip,pAssign,pwhile,pNewVar]
--Parser for the assignment command
pAssign :: TParser Comm
pAssign = do v <- pvar
_ <- silentOp Colon
_ <- silentOp Equal
e <- pIntExp
return $ Assign v e
-- Integer expressions parser
pIntExp :: TParser IntExpr
pIntExp = choice [ var' --An intexp is either a variable
, num --Or a numeric constant
, pMul --Or <intexp>x<intexp>
, pSum --Or <intexp>+<intexp>
, pRes --Or <intexp>-<intexp>
, pDiv --Division
, pMod --Modulus
, pNeg --Unary "-"
]
-- Parser for <intexp>+<intexp>
pSum :: TParser IntExpr
pSum = do
e <- pIntExp
_ <- silentOp Lexer.Plus
e' <- pIntExp
return $ IBin Lang.Plus e e'
UPDATE TAKING INTO ACCOUNT AndrewC's ANSWER
Unfortunately moving the var' parser down in the choice list didn't work, it yields the same result. But I took AndrewC's answer into account and tried to "manually" trace the execution (I'm not familiar with ghci's debugger and ended up doing lot of single steps and got lost eventually).
This is how I reason it:
I got this token list from the lexer:
[Str "x", Op Colon, Op Equal, Num 2, OP Plus, Num 3]
So, the pComm' parser fails with pif and pskip, but succeds with pAssign, consuming Str "x", Op Colon and Op Equal and trying to parse
[Num 2, OP Plus, Num 3] with pIntExp (!!)
The pIntExp parser then tries the var' parser and fails, but succeds with the num parser consuming the Num 2 token and therefore returning the erroneous result Assign "x" (Const 2).
So with AndrewC's advice in mind about choice, I moved num parser down in the list too. For the sake of simplicity I'll consider pIntExp as
choice [pSum, num, var´] that it's what's relevant for this particular example.
The first part of the reasoning remains the same. So I'll restart from (!!) where we had
[Num 2, Op Plus, Num 3] to be parsed by pIntExp
pIntExp tries now first with pSum, which in turn "calls" pIntExp again,
which will try pSum again, and so the program hangs. I tried it and it indeed hangs and never ends.
So I was wondering if there's a form to make the pSum parser "lookahead" for the Op Plus token and then parse the corresponding expressions?
UPDATE 2: After "googling" a little bit more now that I've identified the problem I found that the combinational parsers chainl1 and/or chainl might be just what I need.
I'll be playing with these and if I work it out post the solution
The choice function tries the parser it's given in the order they are in the list.
Since yoiur parser for variables appears before your parser for the more complicated addition expression, it suceeds before the other is tried.
To solve this problem, put the variable parser after any expressions that start with a variable (and think through any other substring-matching issues when using choice.
Similar problems incude 3 - 4 + 1 evaluating to -2. People expect left association in the absence of other priorities (so sum - term instead of term - sum).
You also might not want 1 + 10 * 5 to eveluate to 55, so you'll have to be careful around + and * etc if you want to implement operator precedence. You can achieve this by parsing an expression made up of multiplication as a term and then an additive expression as a sum of terms.

Complex Parsec Parsers

I don't quite know how else to ask. I think I need general guidance here. I've got something like this:
expr = buildExpressionParser table term
<?> "expression"
term = choice [
(float >>= return . EDouble)
, try (natural >>= return . EInteger)
, try (stringLiteral >>= return . EString)
, try (reserved "true" >> return (EBool True))
, try (reserved "false" >> return (EBool False))
, try assign
, try ifelse
, try lambda
, try array
, try eseq
, parens expr
]
<?> "simple expression"
When I test that parser, though, I mostly get problems... like when I try to parse
(a,b) -> "b"
it is accepted by the lambda parser, but the expr parser hates it. And sometimes it even hangs up completely in eternal rules.
I've read through Write Yourself a Scheme, but it only parses the homogeneous source of Scheme.
Maybe I am generally thinking in the wrong direction.
EDIT: Here the internal parsers:
assign = do
i <- identifier
reservedOp "="
e <- expr
return $ EAssign i e
ifelse = do
reserved "if"
e <- expr
reserved "then"
a <- expr
reserved "else"
b <- expr
return $ EIfElse e a b
lambda = do
ls <- parens $ commaSep identifier
reservedOp "->"
e <- expr
return $ ELambda ls e
array = (squares $ commaSep expr) >>= return . EArray
eseq = do
a <- expr
semi <|> (newline >>= (\x -> return [x]))
b <- expr
return $ ESequence a b
table = [
[binary "*" EMult AssocLeft, binary "/" EDiv AssocLeft, binary "%" EMod AssocLeft ],
[binary "+" EPlus AssocLeft, binary "-" EMinus AssocLeft ],
[binary "~" EConcat AssocLeft],
[prefixF "not" ENot],
[binaryF "and" EAnd AssocLeft, binaryF "or" EAnd AssocLeft]
]
And by "hates it" I meant that it tells me it expects an integer or a floating point.
What Edward in the comments and I are both trying to do is mentally run your parser, and that is a little difficult without more of the parser to go on. I'm going to make some guesses here, and maybe they will help you refine your question.
Guess 1): You have tried GHCI> parse expr "(input)" "(a,b) -> \"b\" and it has returned Left …. It would be helpful to know what the error was.
Guess 2): You have also tried GHCI> parse lambda "(input)" "(a,b) -> \"b\" and it returned Right …. based on this Edward an I have both deduced that somewhere in either your term parser or perhaps in the generated expr parser there is a conflict That is some piece of the parser is succeeding in matching the beginning of the string and returning a value, but what remains is no longer valid. It would be helpful if you would try GHCI> parse term "(input)" "(a,b) -> \"b\" as this would let us know whether the problem was in term or expr.
Guess 3): The string "(a,b)" is by itself a valid expression in the grammar as you have programmed it. (Though perhaps not as you intended to program it ;-). Try sending that through the expr parser and see what happens.
Guess 4): Your grammar is left recursive. This is what causes it to get stuck and loop forever. Parsec is a LL(k) parser. If you are used to Yacc and family which are LR(1) or LR(k) parsers, the rules for recursion are exactly reversed. If you didn't understand this last sentence thats OK, but let us know.
Guess 5): The code in the expression builder looks like it came from the function's documentation. I think you may have found the term expression somewhere as well. If that is the case you you point to where it came from. if not could you explain in a few sentences how you think term ought to work.
General Advice: The large number of try statements are eventually (a.k.a. now) going to cause you grief. They are useful in some cases but also a little naughty. If the next character can determine what choice should succeed there is no need for them. If you are just trying to get something running lots of backtracking will reduce the number of intermediate forms, but it also hides pathological cases and makes errors more obscure.
There appears to be left recursion, which will cause the parser to hang if the choice in term ever gets to eseq:
expr -> term -> eseq -> expr
The term (a,b) will not parse as a lambda, or an array, so it will fall into the eseq loop.
I don't see why (a,b) -> "b" doesn't parse as an expr, since the choice in term should hit upon the lambda, which you say works, before reaching the eseq. What is the position reported in the parse error?

Resources