Refactoring do notation into applicative style - haskell

So I have been working on a simple expression solver in Haskell. I have been trying to refactor some of my code from do notation to applicative code, mainly because I am want to learn how applicatives work. I am lonst on how to reactor this
factor :: Parser Expr
factor = do
char '('
x <- buildExpr
char ')'
return x
<|> number
<|> variables
<?> "simple expression"
What would be the way to make this into an applicative style? I have tried the following but it wont type check
factor = pure buildExpr <$> (char '(' *> buildExpr *> char ')')
where buildExper has type Parser Expr.

Short answer:
factor = (char '(' *> buildExpr <* char ')') <|> number <|> variables
<?> "simple expression"
Long answer:
<$> has this type:
(<$>) :: (Functor f) => (a -> b) -> f a -> f b
In other words, it takes a function and a value of a type that is an instance of Functor (and returns something we don’t care about at the moment). Unfortunately, you aren’t giving it a function as the first argument; you’re giving it pure buildExpr, which is a Parser that, when executed, consumes no input and yields buildExpr. If you really wanted to do that, you could, with <*>:
factor = pure buildExpr <$> (char '(' *> buildExpr *> char ')')
That would run pure buildExpr, extract the function out of it, and then run that on the result of (char '(' *> buildExpr *> char ')'). But unfortunately, we can’t do that either: buildExpr is a Parser of some sort, not a function.
If you think about it enough, the thought should pass through your mind: why are we mentioning buildExpr twice if we only want to parse one? It turns out that it is sufficient to mention it only once. In fact, this probably does almost what you want:
factor = char '(' *> buildExpr *> char ')'
Only problem is, it will yield the Char ), not the result of buildExpr. Darn! But looking through the documentation and matching up the types, you should eventually be able to figure out that if you replace the second *> with a <*, it’ll all work out as you want it to:
factor = char '(' *> buildExpr <* char ')'
A good mnemonic for this is that the arrow points to the value you want to keep. Here, we don’t care about the parentheses, so the arrow points away; but we do want to keep the result of buildExpr, so the arrows point inwards toward it.

All these operators are left associative; the < and/or > points to things which contribute values; it's $ for thing-to-left-is-pure-value and * for thing-to-left-is-applicative-computation.
My rule of thumb for using these operators goes as follows. First, list the components of your grammatical production and classify them as "signal" or "noise" depending on whether they contribute semantically important information. Here, we have
char '(' -- noise
buildExpr -- signal
char ')' -- noise
Next, figure out what the "semantic function" is, which takes the values of the signal components and gives the value for the whole production. Here, we have
id -- pure semantic function, then a bunch of component parsers
char '(' -- noise
buildExpr -- signal
char ')' -- noise
Now, each component parser will need to be attached to what comes before it with an operator, but which?
always start with <
next $ for the first component (as the pure function's just before), or * for every other component
then comes > if the component is signal or if it's noise
So that gives us
id -- pure semantic function, then a bunch of parsers
<$ char '(' -- first, noise
<*> buildExpr -- later, signal
<* char ')' -- later, noise
If the semantic function is id, as here, you can get rid of it and use *> to glue noise to the front of the signal which is id's argument. I usually choose not to do that, just so that I can see the semantic function sitting clearly at the beginning of the production. Also, you can build a choice between such productions by interspersing <|> and you don't need to wrap any of them in parentheses.

Related

How to parse a character range into a tuple

I want to parse strings like "0-9" into ('0', '9') but I think my two attempts look a bit clumsy.
numRange :: Parser (Char, Char)
numRange = (,) <$> digitChar <* char '-' <*> digitChar
numRange' :: Parser (Char, Char)
numRange' = liftM2 (,) (digitChar <* char '-') digitChar
I kind of expected that there already is an operator that sequences two parsers and returns both results in a tuple. If there is then I can't find it. I'm also having a hard time figuring out the desired signature in order to search on hoogle.
I tried Applicative f => f a -> f b -> f (a, b) based off the signature of <* but that only gives unrelated results.
The applicative form:
numRange = (,) <$> digitChar <* char '-' <*> digitChar
is standard. Anyone familiar with monadic parsers will immediately understand what this does.
The disadvantage of the liftM2 (or equivalently liftA2) form, or of a function with signature:
pair :: Applicative f => f a -> f b -> f (a, b)
pair = liftA2 (,)
is that the resulting parser expressions:
pair (digitChar <* char '-') digitChar
pair digitChar (char '-' *> digitChar)
obscure the fact that the char '-' syntax is not actually part of either digit parser. As a result, I think this is more likely to be confusing than the admittedly ugly applicative syntax.
I kind of expected that there already is an operator that sequences two parsers and returns both results in a tuple.
There is; it's liftA2 (,) as you noticed. However, you aren't sequencing two parser, you are sequencing three parsers. Even though you can treat this as a "metasequence" of two two-parser sequencing operations, those two operations are different:
In digitChar <* char '-', you ignore the result of the second parser (and in my opinion, <* always looks like a typo for <*>).
In ... <*> digitChar, you use both results.
If you don't like using the applicative operators directly, consider using do syntax along with the ApplicativeDo extension and write
numRange :: Parser (Char, Char)
numRange = do
x <- digitChar
char '-'
y <- digitChar
return (x,y)
It's longer, but it's arguably more readable than either of the two using <*, which I always think looks like a typo for <*>.

understanding trifecta parser <|> and try

While reading Haskell book I came across trifecta
I'm trying to wrap my head around but still not able to understand <|>
I have following questions.
in simple words (<|>) = Monadic Choose ?
p = a <|> b -- use parser a if not then use b ?
if yes then why following parser is failing ?
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
char '/'
denominator <- decimal
case denominator of
0 -> fail "denominator cannot be zero"
_ -> return (numerator % denominator)
type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)
main = do
let p f i = parseString f mempty i
print $ p (some (skipMany (oneOf "\n") *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"
in perfect world if a is parseFraction is going to fail then <|> should go with decimal but this is not the case.
but when I use try it works.
what I'm missing ?
why we need to use try when <|> should run second parser on first failure ?
parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)
The reason is beacuse parseFraction consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:
Let say you are writing a python parser and you have to decide if a declaration is a class or a function (keyword def), then you write
parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
Then if the user writes def or class it will match, but if the user writes det It will try the first branch and match de and then fail to match expected f because t was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class parser since likely, the error is in the first branch.
In your case parseFraction matches some digits and then fails because / isn't found, and then it doesn't bother to try decimal parser.
This is a desing decision, some other libraries use a different convention (ex: Attoparsec always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy)
Notice that there is a trade-off here:
First: If <|> behaves as you expect the following
parse parseRationalOrDecimal "123456789A"
will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.
Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:
parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from" which is less close to the actual typo.
As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try if you explicitly want to do so.

Why does only the first defined infix operator parse when using Parsec's buildExpressionParser?

I'm trying to write a parser for the propositional calculus using Parsec. The parser uses the buildExpressionParser function from Text.Parsec.Expr. Here's the code where I define the logical operators.
operators = [ [Prefix (string "~" >> return Negation)]
, [binary "&" Conjunction]
, [binary "|" Disjunction]
, [binary "->" Conditional]
, [binary "<->" Biconditional]
]
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
expr = buildExpressionParser operators term
<?> "compound expression"
I've omitted the parsers for variables, terms and parenthesised expressions, but if you think they may be relevant to the problem you can read the full source for the parser.
The parser succeeds for expressions which use only negation and conjunction, i.e. the only prefix operator and the first infix operator.
*Data.Logic.Propositional.Parser2> runPT expr () "" "p & ~q"
Right (p ∧ ¬q)
Expressions using any other operators fail on the first character of the operator, with an error like the following:
*Data.Logic.Propositional.Parser2> runPT expr () "" "p | q"
Left (line 1, column 3):
unexpected "|"
expecting space or "&"
If I comment out the line defining the parser for conjunctions, then the parser for disjunction will work (but the rest will still fail). Putting them all into a single list (i.e. of the same precedence) doesn't work either: the same problem still manifests itself.
Can anyone point out what I'm doing wrong? Many thanks.
Thanks to Daniel Fischer for such a prompt and helpful answer.
In order to finish making this parser work correctly, I also needed to handle repeated applications of the negation symbol, so that e.g. ~~p would parse correctly. This SO answer showed me how to do it, and the change I made to the parser can be found here.
Your problem is that
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
the first tried infix operator consumes a space before it fails, so the later possibilities are not tried. (Parsec favours consuming parsers, and <|> only tries to run the second parser if the first failed without consuming any input.)
To have the other infix operators tried if the first fails, you could either wrap the binary parsers in a try
binary n c = Infix (try $ ...) AssocRight
so that when such a parser fails, it does not consume any input, or, better, and the conventional solution to that problem, remove the initial spaces from it,
binary n c = Infix (string n >> spaces >> return c) AssocRight
and have all your parsers consume spaces after the token they parsed
variable = do c <- letter
spaces
return $ Variable (Var c)
<?> "variable"
parens p = do char '('
spaces
x <- p
char ')'
spaces
return x
<?> "parens"
Of course, if you have parsers that can parse operators with a common prefix, you would still need to wrap those in a try so that if e.g parsing >= fails, >>= can still be tried.
Mocking up a datatype for the propositions and changing the space-consuming behaviour as indicated above,
*PropositionalParser Text.Parsec> head $ runPT expr () "" "p | q -> r & s"
Right (Conditional (Disjunction (Variable (Var 'p')) (Variable (Var 'q'))) (Conjunction (Variable (Var 'r')) (Variable (Var 's'))))
even a more complicated expression is parsed.

Complex Parsec Parsers

I don't quite know how else to ask. I think I need general guidance here. I've got something like this:
expr = buildExpressionParser table term
<?> "expression"
term = choice [
(float >>= return . EDouble)
, try (natural >>= return . EInteger)
, try (stringLiteral >>= return . EString)
, try (reserved "true" >> return (EBool True))
, try (reserved "false" >> return (EBool False))
, try assign
, try ifelse
, try lambda
, try array
, try eseq
, parens expr
]
<?> "simple expression"
When I test that parser, though, I mostly get problems... like when I try to parse
(a,b) -> "b"
it is accepted by the lambda parser, but the expr parser hates it. And sometimes it even hangs up completely in eternal rules.
I've read through Write Yourself a Scheme, but it only parses the homogeneous source of Scheme.
Maybe I am generally thinking in the wrong direction.
EDIT: Here the internal parsers:
assign = do
i <- identifier
reservedOp "="
e <- expr
return $ EAssign i e
ifelse = do
reserved "if"
e <- expr
reserved "then"
a <- expr
reserved "else"
b <- expr
return $ EIfElse e a b
lambda = do
ls <- parens $ commaSep identifier
reservedOp "->"
e <- expr
return $ ELambda ls e
array = (squares $ commaSep expr) >>= return . EArray
eseq = do
a <- expr
semi <|> (newline >>= (\x -> return [x]))
b <- expr
return $ ESequence a b
table = [
[binary "*" EMult AssocLeft, binary "/" EDiv AssocLeft, binary "%" EMod AssocLeft ],
[binary "+" EPlus AssocLeft, binary "-" EMinus AssocLeft ],
[binary "~" EConcat AssocLeft],
[prefixF "not" ENot],
[binaryF "and" EAnd AssocLeft, binaryF "or" EAnd AssocLeft]
]
And by "hates it" I meant that it tells me it expects an integer or a floating point.
What Edward in the comments and I are both trying to do is mentally run your parser, and that is a little difficult without more of the parser to go on. I'm going to make some guesses here, and maybe they will help you refine your question.
Guess 1): You have tried GHCI> parse expr "(input)" "(a,b) -> \"b\" and it has returned Left …. It would be helpful to know what the error was.
Guess 2): You have also tried GHCI> parse lambda "(input)" "(a,b) -> \"b\" and it returned Right …. based on this Edward an I have both deduced that somewhere in either your term parser or perhaps in the generated expr parser there is a conflict That is some piece of the parser is succeeding in matching the beginning of the string and returning a value, but what remains is no longer valid. It would be helpful if you would try GHCI> parse term "(input)" "(a,b) -> \"b\" as this would let us know whether the problem was in term or expr.
Guess 3): The string "(a,b)" is by itself a valid expression in the grammar as you have programmed it. (Though perhaps not as you intended to program it ;-). Try sending that through the expr parser and see what happens.
Guess 4): Your grammar is left recursive. This is what causes it to get stuck and loop forever. Parsec is a LL(k) parser. If you are used to Yacc and family which are LR(1) or LR(k) parsers, the rules for recursion are exactly reversed. If you didn't understand this last sentence thats OK, but let us know.
Guess 5): The code in the expression builder looks like it came from the function's documentation. I think you may have found the term expression somewhere as well. If that is the case you you point to where it came from. if not could you explain in a few sentences how you think term ought to work.
General Advice: The large number of try statements are eventually (a.k.a. now) going to cause you grief. They are useful in some cases but also a little naughty. If the next character can determine what choice should succeed there is no need for them. If you are just trying to get something running lots of backtracking will reduce the number of intermediate forms, but it also hides pathological cases and makes errors more obscure.
There appears to be left recursion, which will cause the parser to hang if the choice in term ever gets to eseq:
expr -> term -> eseq -> expr
The term (a,b) will not parse as a lambda, or an array, so it will fall into the eseq loop.
I don't see why (a,b) -> "b" doesn't parse as an expr, since the choice in term should hit upon the lambda, which you say works, before reaching the eseq. What is the position reported in the parse error?

haskell parsec problem

I am newbie haskell and in lernning parsec lib
a example :
nesting :: Parser Int
nesting = do{ char '('
; n <- nesting
; char ')'
; m <- nesting
; return (max (n+1) m)
}
<|> return 0
so what's n or m? why n and m is int and greater than 0?
Parsec is a monadic parsing library, so you probably should first introduce yourself to monads and the syntactic sugar that is the do notation.
nesting is a parser which you can see as a computation (monad) with a result of type Int.
Whenever you see code like this n <- nesting in a do block, it means run the monad nesting and bind the result to n.
To see how this parser works try running it by hand. For example use the string "()".
It goes like this:
Tries the parser in the do block, succeeds parsing '(', runs the parser recursively and binds the result to n.
Tries the parser in the do block, fails parsing '(', tries the next parser (return 0) which always succeeds with the value 0.
n now has the value 0, because that was the result of running the parser recursively. Next in the do block is the parser char ')', it succeeds, calls the parser again recursively and binds the result to m. Same as above the result in m is 0.
Now the whole result of the computation is max (n+1) m which is 1.
As you can see this parses nested parenthesis, and roughly at the top level n holds the number of '(' parsed, while m holds the number of ')' parsed.

Resources