Complex Parsec Parsers

Complex Parsec Parsers - haskell

I don't quite know how else to ask. I think I need general guidance here. I've got something like this:
expr = buildExpressionParser table term
<?> "expression"
term = choice [
(float >>= return . EDouble)
, try (natural >>= return . EInteger)
, try (stringLiteral >>= return . EString)
, try (reserved "true" >> return (EBool True))
, try (reserved "false" >> return (EBool False))
, try assign
, try ifelse
, try lambda
, try array
, try eseq
, parens expr
]
<?> "simple expression"
When I test that parser, though, I mostly get problems... like when I try to parse
(a,b) -> "b"
it is accepted by the lambda parser, but the expr parser hates it. And sometimes it even hangs up completely in eternal rules.
I've read through Write Yourself a Scheme, but it only parses the homogeneous source of Scheme.
Maybe I am generally thinking in the wrong direction.
EDIT: Here the internal parsers:
assign = do
i <- identifier
reservedOp "="
e <- expr
return $ EAssign i e
ifelse = do
reserved "if"
e <- expr
reserved "then"
a <- expr
reserved "else"
b <- expr
return $ EIfElse e a b
lambda = do
ls <- parens $ commaSep identifier
reservedOp "->"
e <- expr
return $ ELambda ls e
array = (squares $ commaSep expr) >>= return . EArray
eseq = do
a <- expr
semi <|> (newline >>= (\x -> return [x]))
b <- expr
return $ ESequence a b
table = [
[binary "*" EMult AssocLeft, binary "/" EDiv AssocLeft, binary "%" EMod AssocLeft ],
[binary "+" EPlus AssocLeft, binary "-" EMinus AssocLeft ],
[binary "~" EConcat AssocLeft],
[prefixF "not" ENot],
[binaryF "and" EAnd AssocLeft, binaryF "or" EAnd AssocLeft]
]
And by "hates it" I meant that it tells me it expects an integer or a floating point.

What Edward in the comments and I are both trying to do is mentally run your parser, and that is a little difficult without more of the parser to go on. I'm going to make some guesses here, and maybe they will help you refine your question.
Guess 1): You have tried GHCI> parse expr "(input)" "(a,b) -> \"b\" and it has returned Left …. It would be helpful to know what the error was.
Guess 2): You have also tried GHCI> parse lambda "(input)" "(a,b) -> \"b\" and it returned Right …. based on this Edward an I have both deduced that somewhere in either your term parser or perhaps in the generated expr parser there is a conflict That is some piece of the parser is succeeding in matching the beginning of the string and returning a value, but what remains is no longer valid. It would be helpful if you would try GHCI> parse term "(input)" "(a,b) -> \"b\" as this would let us know whether the problem was in term or expr.
Guess 3): The string "(a,b)" is by itself a valid expression in the grammar as you have programmed it. (Though perhaps not as you intended to program it ;-). Try sending that through the expr parser and see what happens.
Guess 4): Your grammar is left recursive. This is what causes it to get stuck and loop forever. Parsec is a LL(k) parser. If you are used to Yacc and family which are LR(1) or LR(k) parsers, the rules for recursion are exactly reversed. If you didn't understand this last sentence thats OK, but let us know.
Guess 5): The code in the expression builder looks like it came from the function's documentation. I think you may have found the term expression somewhere as well. If that is the case you you point to where it came from. if not could you explain in a few sentences how you think term ought to work.
General Advice: The large number of try statements are eventually (a.k.a. now) going to cause you grief. They are useful in some cases but also a little naughty. If the next character can determine what choice should succeed there is no need for them. If you are just trying to get something running lots of backtracking will reduce the number of intermediate forms, but it also hides pathological cases and makes errors more obscure.

There appears to be left recursion, which will cause the parser to hang if the choice in term ever gets to eseq:
expr -> term -> eseq -> expr
The term (a,b) will not parse as a lambda, or an array, so it will fall into the eseq loop.
I don't see why (a,b) -> "b" doesn't parse as an expr, since the choice in term should hit upon the lambda, which you say works, before reaching the eseq. What is the position reported in the parse error?

Related

understanding trifecta parser <|> and try

While reading Haskell book I came across trifecta
I'm trying to wrap my head around but still not able to understand <|>
I have following questions.
in simple words (<|>) = Monadic Choose ?
p = a <|> b -- use parser a if not then use b ?
if yes then why following parser is failing ?
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
char '/'
denominator <- decimal
case denominator of
0 -> fail "denominator cannot be zero"
_ -> return (numerator % denominator)
type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)
main = do
let p f i = parseString f mempty i
print $ p (some (skipMany (oneOf "\n") *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"
in perfect world if a is parseFraction is going to fail then <|> should go with decimal but this is not the case.
but when I use try it works.
what I'm missing ?
why we need to use try when <|> should run second parser on first failure ?
parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)

The reason is beacuse parseFraction consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:
Let say you are writing a python parser and you have to decide if a declaration is a class or a function (keyword def), then you write
parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
Then if the user writes def or class it will match, but if the user writes det It will try the first branch and match de and then fail to match expected f because t was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class parser since likely, the error is in the first branch.
In your case parseFraction matches some digits and then fails because / isn't found, and then it doesn't bother to try decimal parser.
This is a desing decision, some other libraries use a different convention (ex: Attoparsec always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy)
Notice that there is a trade-off here:
First: If <|> behaves as you expect the following
parse parseRationalOrDecimal "123456789A"
will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.
Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:
parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from" which is less close to the actual typo.
As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try if you explicitly want to do so.

Why do Haskell inferred types in return type polymorphism lead to runtime errors?

The reason I'd choose to use Haskell is because of its rich type system. This gives me more information at compile-time about my program, helping me have confidence that it is sound.
In addition, it would appear that Haskell is an optimal language in which to approach the expression problem, as Haskell typeclasses can dispatch on return type. (In contrast to Clojure protocols - which can only dispatch on first argument).
When I explore a Haskell polymorphic return value function like read:
read :: (Read a) => String -> a
with the following program:
addFive :: Int -> Int
addFive x = x + 5
main :: IO ()
main = do
print (addFive (read "11"))
putStrLn (read "11")
I get the following result:
Runtime error
...
prog: Prelude.read: no parse
So I appear to be getting a runtime error in a language with a superior type system.
Contrast this with the equivalent code in Clojure:
(defn add-five [x] (+ 5 x))
(println (add-five (read-string "11")))
(println (read-string "11"))
This gives the following result:
16
11
My question is Why do Haskell inferred types in return type polymorphism lead to runtime errors? Shouldn't it pick them up at compile-time?

That runtime error has nothing to do with polymorphism, and everything to do with the fact that the string "11" can't be parsed as a list of characters by the read function.
Here are things that work. Note that "11" can, at runtime, be parsed as an Int and "\"Some More String\"" can, at runtime, be parsed as a string.
print $ 5 + read "11"
print $ "Some string" ++ read "\"Some More String\""
Here are some things that don't work. They don't work because "Not an integer" can not be parsed as an Int and "11" can't be parsed as a string.
print $ 5 + read "Not an integer"
print $ "Some string" ++ read "11"
As was pointed out in the answer to your previous question, the type information has already been inferred at compile time. The read functions have already been selected. Imagine if we had two functions readInt :: String -> Int and readString :: String -> String that were provided for the read function for the Read instances for Int and String respectively. The compiler has already, at compile time, replaced the occurrences of read with the original respective functions:
print $ 5 + readInt "Not an integer"
print $ "Some string" ++ readString "11"
This must have happened at compile time precisely because type information is eliminated at compile time, as was explained in the answer to your previous question.

A part of the issue here is that in Haskell one can define partial functions, i.e., functions which may fail on certain inputs. Examples are read, head, tail. Non-exhaustive pattern matching is the common cause of this partiality, others including error, undefined, and infinite recursion (even if in this case you do not get a runtime error, obviously).
In particular, read is a bit nasty since it requires you to ensure that the string can be parsed. This is usually harder than ensuring that a list is non empty, for instance. One should use a safer variant such as
readMaybe :: Read a => String -> Maybe a
main = do
print $ readMaybe "11" :: Maybe Int -- prints Just 11
print $ readMaybe "11" :: Maybe String -- prints Nothing
Another part of the issue is that polymorphic values (such as read "11") are actually functions in disguise, since they depend on the type at which they are evaluated, as seen in the example above. The monomorphism restriction is an attempt to make them behave more as non-functions: it forces the compiler to find a single type for all the uses of the polymorphic value. If this is possible, the polymorphic value is evaluated only at that type, and the result can be shared in all the uses. Otherwise, you get a type error, even if the code would have been typeable without the restriction.
For example, the following code
main = do
let x = readMaybe "11"
print $ x :: Maybe Int
print $ x :: Maybe Int
parses 11 once if the monomorphism restriction is on, and twice if it is off (unless the compiler is smart enough to do some optimization). By comparison,
main = do
let x = readMaybe "11"
print $ x :: Maybe Int
print $ x :: Maybe String
raises a compile-time type error if the monomorphism restriction is on, and compiles and runs just fine if it is off (printing "Just 11" and "Nothing").
So, there is no clear winner between enabling and disabling the restriction.

The type of read is
(Read a) => String -> a
which implies it (compiler or interpreter, actually) will choose its return type according to the requirement of context.
Therefore, in addFive (read "11"), because addFive requires a Int, the type of read chosen by compiler will be String -> Int; in putStrLn (read "11"), it will be String->String because putStrLn requires a String.
And this choice happens at compile time, which means after compilation, your program sort of equals
main = do
print (addFive (readInt "11"))
putStrLn (readString "11")
But this readString cannot parse its argument "11" as a string, so it crash at run time.
The fix of this problem is simple:
main = do
print (addFive (read "11"))
putStrLn (read "\"11\"")

Parsec not consuming all Input?

I'am writing some code to parse commands from the Simple Imperative Language defined in
Theory of Programming Languages (Reynolds, 1998).
I have a lexer module that given a string extracts the tokens from it if it's a valid language expression and then I pass that list of tokens to the parser which should build an internal representation of the command (defined as an algebraic data type).
These are my Tokens:
--Tokens for the parser
data Token = Kw Keyword
| Num Int
| Op Operator
| Str String
| Sym Symbol
deriving Show
I'm having trouble with binary operators. I'll put as an example the sum, but it happens the same with all of them, either boolean or integers.
For example if I'd run the program parse "x:=2+3"
I should get the following list of tokens from the lexer
[Str "x", Op Colon, Op Equal, Num 2, OP, Plus, Num 3]
which is actually what I'm getting.
But then the parser should return the command
Assign "x" (Ibin Plus (Const 2) (Const 3)
which is the correct representation of the command. But instead of that I'm getting the following representation:
Assign "x" (Const 2)
I guess that I screwed it at some point in the pIntExpr function because the variable identifier and the := of the assignment are parsed OK and it's not parsing the last elements. Here are the relevant parsers for this example, to see if someone can orientate me in what I'm doing wrong.
-- Integer expressions
data IntExpr = Const Int
| Var Iden --Iden=String
| Neg IntExpr
| IBin OpInt IntExpr IntExpr
deriving Show
type TParser = Parsec [Token] ()
--Internal representation of the commands
data Comm = Skip
| Assign Iden IntExpr
| If Assert Comm Comm
| Seq Comm Comm
| While Assert Comm
| Newvar Iden IntExpr Comm
deriving Show
--Parser for non sequential commands
pComm' :: TParser Comm
pComm' = choice [pif,pskip,pAssign,pwhile,pNewVar]
--Parser for the assignment command
pAssign :: TParser Comm
pAssign = do v <- pvar
_ <- silentOp Colon
_ <- silentOp Equal
e <- pIntExp
return $ Assign v e
-- Integer expressions parser
pIntExp :: TParser IntExpr
pIntExp = choice [ var' --An intexp is either a variable
, num --Or a numeric constant
, pMul --Or <intexp>x<intexp>
, pSum --Or <intexp>+<intexp>
, pRes --Or <intexp>-<intexp>
, pDiv --Division
, pMod --Modulus
, pNeg --Unary "-"
]
-- Parser for <intexp>+<intexp>
pSum :: TParser IntExpr
pSum = do
e <- pIntExp
_ <- silentOp Lexer.Plus
e' <- pIntExp
return $ IBin Lang.Plus e e'
UPDATE TAKING INTO ACCOUNT AndrewC's ANSWER
Unfortunately moving the var' parser down in the choice list didn't work, it yields the same result. But I took AndrewC's answer into account and tried to "manually" trace the execution (I'm not familiar with ghci's debugger and ended up doing lot of single steps and got lost eventually).
This is how I reason it:
I got this token list from the lexer:
[Str "x", Op Colon, Op Equal, Num 2, OP Plus, Num 3]
So, the pComm' parser fails with pif and pskip, but succeds with pAssign, consuming Str "x", Op Colon and Op Equal and trying to parse
[Num 2, OP Plus, Num 3] with pIntExp (!!)
The pIntExp parser then tries the var' parser and fails, but succeds with the num parser consuming the Num 2 token and therefore returning the erroneous result Assign "x" (Const 2).
So with AndrewC's advice in mind about choice, I moved num parser down in the list too. For the sake of simplicity I'll consider pIntExp as
choice [pSum, num, var´] that it's what's relevant for this particular example.
The first part of the reasoning remains the same. So I'll restart from (!!) where we had
[Num 2, Op Plus, Num 3] to be parsed by pIntExp
pIntExp tries now first with pSum, which in turn "calls" pIntExp again,
which will try pSum again, and so the program hangs. I tried it and it indeed hangs and never ends.
So I was wondering if there's a form to make the pSum parser "lookahead" for the Op Plus token and then parse the corresponding expressions?
UPDATE 2: After "googling" a little bit more now that I've identified the problem I found that the combinational parsers chainl1 and/or chainl might be just what I need.
I'll be playing with these and if I work it out post the solution

The choice function tries the parser it's given in the order they are in the list.
Since yoiur parser for variables appears before your parser for the more complicated addition expression, it suceeds before the other is tried.
To solve this problem, put the variable parser after any expressions that start with a variable (and think through any other substring-matching issues when using choice.
Similar problems incude 3 - 4 + 1 evaluating to -2. People expect left association in the absence of other priorities (so sum - term instead of term - sum).
You also might not want 1 + 10 * 5 to eveluate to 55, so you'll have to be careful around + and * etc if you want to implement operator precedence. You can achieve this by parsing an expression made up of multiplication as a term and then an additive expression as a sum of terms.

Why does only the first defined infix operator parse when using Parsec's buildExpressionParser?

I'm trying to write a parser for the propositional calculus using Parsec. The parser uses the buildExpressionParser function from Text.Parsec.Expr. Here's the code where I define the logical operators.
operators = [ [Prefix (string "~" >> return Negation)]
, [binary "&" Conjunction]
, [binary "|" Disjunction]
, [binary "->" Conditional]
, [binary "<->" Biconditional]
]
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
expr = buildExpressionParser operators term
<?> "compound expression"
I've omitted the parsers for variables, terms and parenthesised expressions, but if you think they may be relevant to the problem you can read the full source for the parser.
The parser succeeds for expressions which use only negation and conjunction, i.e. the only prefix operator and the first infix operator.
*Data.Logic.Propositional.Parser2> runPT expr () "" "p & ~q"
Right (p ∧ ¬q)
Expressions using any other operators fail on the first character of the operator, with an error like the following:
*Data.Logic.Propositional.Parser2> runPT expr () "" "p | q"
Left (line 1, column 3):
unexpected "|"
expecting space or "&"
If I comment out the line defining the parser for conjunctions, then the parser for disjunction will work (but the rest will still fail). Putting them all into a single list (i.e. of the same precedence) doesn't work either: the same problem still manifests itself.
Can anyone point out what I'm doing wrong? Many thanks.
Thanks to Daniel Fischer for such a prompt and helpful answer.
In order to finish making this parser work correctly, I also needed to handle repeated applications of the negation symbol, so that e.g. ~~p would parse correctly. This SO answer showed me how to do it, and the change I made to the parser can be found here.

Your problem is that
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
the first tried infix operator consumes a space before it fails, so the later possibilities are not tried. (Parsec favours consuming parsers, and <|> only tries to run the second parser if the first failed without consuming any input.)
To have the other infix operators tried if the first fails, you could either wrap the binary parsers in a try
binary n c = Infix (try $ ...) AssocRight
so that when such a parser fails, it does not consume any input, or, better, and the conventional solution to that problem, remove the initial spaces from it,
binary n c = Infix (string n >> spaces >> return c) AssocRight
and have all your parsers consume spaces after the token they parsed
variable = do c <- letter
spaces
return $ Variable (Var c)
<?> "variable"
parens p = do char '('
spaces
x <- p
char ')'
spaces
return x
<?> "parens"
Of course, if you have parsers that can parse operators with a common prefix, you would still need to wrap those in a try so that if e.g parsing >= fails, >>= can still be tried.
Mocking up a datatype for the propositions and changing the space-consuming behaviour as indicated above,
*PropositionalParser Text.Parsec> head $ runPT expr () "" "p | q -> r & s"
Right (Conditional (Disjunction (Variable (Var 'p')) (Variable (Var 'q'))) (Conjunction (Variable (Var 'r')) (Variable (Var 's'))))
even a more complicated expression is parsed.

Haskell recursive problem, tiny parser. A few things

data Expr = Var Char | Tall Int | Sum Expr Expr | Mult Expr Expr | Neg Expr | Let Expr Expr Expr
deriving(Eq, Show)
That is the datatype for Expr, I have a few questions. I'm suppose to parse expressions like *(Expr,Expr) as shown in the datatype definition. However I do have some problems with "creating" a valid Expr. I use pattern matching for recognizing the different things Expr can be. Some more code:
parseExpr :: String -> (Expr, String)
parseExpr ('*':'(':x:',':y:')':s) = (Mult (parseExpr [x] parseExpr [y]),s)
This is not working, obviously. The return type of parseExpr is to return the rest of the expression that is to be parsed an a portion of the parsed code as an Expr. The right side of this code is the problem. I can't make a valid Expr. The function is suppose to call it self recursively until the problem is solved.
ANOTHER problem is that I don't know how to do the pattern matching against Var and Tall. How can I check that Var is an uppercase character between A-Z and that Tall is 0-9 and return it as a valid Expr?
Generally I can just look at a few parts of the string to understand what part of Expr I'm dealing with.
Input like: parseProg "let X be 9 in *(X , 2)" Would spit out: Let (Var 'X') (Tall 9) (Mult (Var 'X') (Tall 2))

Your parseExpr function returns a pair, so of course you cannot use its result directly to construct an Expr. The way I would write this would be something like
parseExpr ('*':'(':s) = (Mult x y, s'')
where (x,',':s') = parseExpr s
(y,')':s'') = parseExpr s'
The basic idea is that, since parseExpr returns the leftover string as the second argument of the pair, you need to save that string in each recursive call you make, and when you've handled all the subexpressions, you need to return whatever is left over. And obviously the error handling here sucks, so you may want to think about that a bit more if this is intended to be a robust parser.
Handling Var and Tall I would do by just extracting the first character as is and have an if to construct an Expr of the appropriate type.
And if you want to write more complex parsers in Haskell, you'll want to look at the Parsec library, which lets you write a parser as pretty much the grammar of the language you're parsing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Complex Parsec Parsers - haskell

Related

understanding trifecta parser <|> and try

Why do Haskell inferred types in return type polymorphism lead to runtime errors?

Parsec not consuming all Input?

Why does only the first defined infix operator parse when using Parsec's buildExpressionParser?

Haskell recursive problem, tiny parser. A few things

Categories

Resources