understanding trifecta parser <|> and try - haskell

While reading Haskell book I came across trifecta
I'm trying to wrap my head around but still not able to understand <|>
I have following questions.
in simple words (<|>) = Monadic Choose ?
p = a <|> b -- use parser a if not then use b ?
if yes then why following parser is failing ?
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
char '/'
denominator <- decimal
case denominator of
0 -> fail "denominator cannot be zero"
_ -> return (numerator % denominator)
type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)
main = do
let p f i = parseString f mempty i
print $ p (some (skipMany (oneOf "\n") *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"
in perfect world if a is parseFraction is going to fail then <|> should go with decimal but this is not the case.
but when I use try it works.
what I'm missing ?
why we need to use try when <|> should run second parser on first failure ?
parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)

The reason is beacuse parseFraction consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:
Let say you are writing a python parser and you have to decide if a declaration is a class or a function (keyword def), then you write
parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
Then if the user writes def or class it will match, but if the user writes det It will try the first branch and match de and then fail to match expected f because t was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class parser since likely, the error is in the first branch.
In your case parseFraction matches some digits and then fails because / isn't found, and then it doesn't bother to try decimal parser.
This is a desing decision, some other libraries use a different convention (ex: Attoparsec always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy)
Notice that there is a trade-off here:
First: If <|> behaves as you expect the following
parse parseRationalOrDecimal "123456789A"
will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.
Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:
parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from" which is less close to the actual typo.
As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try if you explicitly want to do so.

Related

program mode determined from arguments count [optparse-applicative] [duplicate]

I'm trying to use the optparse-applicative library in an program which should perform a different action depending on the number of arguments.
For example, the argument parsing for a program which calculates perimeters:
module TestOpts where
import Options.Applicative
type Length = Double
data PerimeterCommand
= GeneralQuadranglePerimeter Length Length Length Length
| RectanglePerimeter Length Length
parsePerimeterCommand :: Parser PerimeterCommand
parsePerimeterCommand = parseQuadPerimeter <|> parseRectPerimeter
parseQuadPerimeter = GeneralQuadranglePerimeter <$>
parseLength "SIDE1" <*>
parseLength "SIDE2" <*>
parseLength "SIDE3" <*>
parseLength "SIDE4"
parseRectPerimeter = RectanglePerimeter <$>
parseLength "WIDTH" <*> parseLength "HEIGHT"
parseLength name = argument auto (metavar name)
Only the first argument to <|> will ever successfully parse. I think some kind of argument backtracking is required, similar to Parsec's try combinator.
Any ideas on how to parse alternative sets of arguments, when the first alternative may consume some arguments of the next alternative?
Please note: this answer was written by the optparse-applicative author, Paolo Capriotti.
You can't do this with optparse-applicative directly. The main feature
of optparse-applicative is that options can be parsed in any order. If
you want to work mainly with arguments (which are positional), you are
better off having two levels of parsers: use many argument in
optparse-applicative, then pass the resulting array to a normal parser
(say using Parsec). If you only have positional arguments, then
optparse-applicative won't buy you very much, and you could just parse
the arguments manually with Parsec.

optparse-applicative Backtracking

I'm trying to use the optparse-applicative library in an program which should perform a different action depending on the number of arguments.
For example, the argument parsing for a program which calculates perimeters:
module TestOpts where
import Options.Applicative
type Length = Double
data PerimeterCommand
= GeneralQuadranglePerimeter Length Length Length Length
| RectanglePerimeter Length Length
parsePerimeterCommand :: Parser PerimeterCommand
parsePerimeterCommand = parseQuadPerimeter <|> parseRectPerimeter
parseQuadPerimeter = GeneralQuadranglePerimeter <$>
parseLength "SIDE1" <*>
parseLength "SIDE2" <*>
parseLength "SIDE3" <*>
parseLength "SIDE4"
parseRectPerimeter = RectanglePerimeter <$>
parseLength "WIDTH" <*> parseLength "HEIGHT"
parseLength name = argument auto (metavar name)
Only the first argument to <|> will ever successfully parse. I think some kind of argument backtracking is required, similar to Parsec's try combinator.
Any ideas on how to parse alternative sets of arguments, when the first alternative may consume some arguments of the next alternative?
Please note: this answer was written by the optparse-applicative author, Paolo Capriotti.
You can't do this with optparse-applicative directly. The main feature
of optparse-applicative is that options can be parsed in any order. If
you want to work mainly with arguments (which are positional), you are
better off having two levels of parsers: use many argument in
optparse-applicative, then pass the resulting array to a normal parser
(say using Parsec). If you only have positional arguments, then
optparse-applicative won't buy you very much, and you could just parse
the arguments manually with Parsec.

What does Haskell's <|> operator do?

Going through Haskell's documentation is always a bit of a pain for me, because all the information you get about a function is often nothing more than just: f a -> f [a] which could mean any number of things.
As is the case of the <|> function.
All I'm given is this: (<|>) :: f a -> f a -> f a and that it's an "associative binary operation"...
Upon inspection of Control.Applicative I learn that it does seemingly unrelated things depending on implementation.
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
Ok, so it returns right if there is no left, otherwise it returns left, gotcha.. This leads me to believe it's a "left or right" operator, which kinda makes sense given its use of | and |'s historical use as "OR"
instance Alternative [] where
empty = []
(<|>) = (++)
Except here it just calls list's concatenation operator... Breaking my idea down...
So what exactly is that function? What's its use? Where does it fit in in the grand scheme of things?
Typically it means "choice" or "parallel" in that a <|> b is either a "choice" of a or b or a and b done in parallel. But let's back up.
Really, there is no practical meaning to operations in typeclasses like (<*>) or (<|>). These operations are given meaning in two ways: (1) via laws and (2) via instantiations. If we are not talking about a particular instance of Alternative then only (1) is available for intuiting meaning.
So "associative" means that a <|> (b <|> c) is the same as (a <|> b) <|> c. This is useful as it means that we only care about the sequence of things chained together with (<|>), not their "tree structure".
Other laws include identity with empty. In particular, a <|> empty = empty <|> a = a. In our intuition with "choice" or "parallel" these laws read as "a or (something impossible) must be a" or "a alongside (empty process) is just a". It indicates that empty is some kind of "failure mode" for an Alternative.
There are other laws with how (<|>)/empty interact with fmap (from Functor) or pure/(<*>) (from Applicative), but perhaps the best way to move forward in understanding the meaning of (<|>) is to examine a very common example of a type which instantiates Alternative: a Parser.
If x :: Parser A and y :: Parser B then (,) <$> x <*> y :: Parser (A, B) parses x and then y in sequence. In contrast, (fmap Left x) <|> (fmap Right y) parses either x or y, beginning with x, to try out both possible parses. In other words, it indicates a branch in your parse tree, a choice, or a parallel parsing universe.
(<|>) :: f a -> f a -> f a actually tells you quite a lot, even without considering the laws for Alternative.
It takes two f a values, and has to give one back. So it will have to combine or select from its inputs somehow. It's polymorphic in the type a, so it will be completely unable to inspect whatever values of type a might be inside an f a; this means it can't do the "combining" by combining a values, so it must to it purely in terms of whatever structure the type constructor f adds.
The name helps a bit too. Some sort of "OR" is indeed the vague concept the authors were trying to indicate with the name "Alternative" and the symbol "<|>".
Now if I've got two Maybe a values and I have to combine them, what can I do? If they're both Nothing I'll have to return Nothing, with no way to create an a. If at least one of them is a Just ... I can return one of my inputs as-is, or I can return Nothing. There are very few functions that are even possible with the type Maybe a -> Maybe a -> Maybe a, and for a class whose name is "Alternative" the one given is pretty reasonable and obvious.
How about combining two [a] values? There are more possible functions here, but really it's pretty obvious what this is likely to do. And the name "Alternative" does give you a good hint at what this is likely to be about provided you're familiar with the standard "nondeterminism" interpretation of the list monad/applicative; if you see a [a] as a "nondeterministic a" with a collection of possible values, then the obvious way for "combining two nondeterministic a values" in a way that might deserve the name "Alternative" is to produce a nondeterminstic a which could be any of the values from either of the inputs.
And for parsers; combining two parsers has two obvious broad interpretations that spring to mind; either you produce a parser that would match what the first does and then what the second does, or you produce a parser that matches either what the first does or what the second does (there are of course subtle details of each of these options that leave room for options). Given the name "Alternative", the "or" interpretation seems very natural for <|>.
So, seen from a sufficiently high level of abstraction, these operations do all "do the same thing". The type class is really for operating at that high level of abstraction where these things all "look the same". When I'm operating on a single known instance I just think of the <|> operation as exactly what it does for that specific type.
An interesting example of an Alternative that isn't a parser or a MonadPlus-like thing is Concurrently, a very useful type from the async package.
For Concurrently, empty is a computation that goes on forever. And (<|>) executes its arguments concurrently, returns the result of the first one that completes, and cancels the other one.
These seem very different, but consider:
Nothing <|> Nothing == Nothing
[] <|> [] == []
Just a <|> Nothing == Just a
[a] <|> [] == [a]
Nothing <|> Just b == Just b
[] <|> [b] == [b]
So... these are actually very, very similar, even if the implementation looks different. The only real difference is here:
Just a <|> Just b == Just a
[a] <|> [b] == [a, b]
A Maybe can only hold one value (or zero, but not any other amount). But hey, if they were both identical, why would you need two different types? The whole point of them being different is, you know, to be different.
In summary, the implementation may look totally different, but these are actually quite similar.

Why does only the first defined infix operator parse when using Parsec's buildExpressionParser?

I'm trying to write a parser for the propositional calculus using Parsec. The parser uses the buildExpressionParser function from Text.Parsec.Expr. Here's the code where I define the logical operators.
operators = [ [Prefix (string "~" >> return Negation)]
, [binary "&" Conjunction]
, [binary "|" Disjunction]
, [binary "->" Conditional]
, [binary "<->" Biconditional]
]
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
expr = buildExpressionParser operators term
<?> "compound expression"
I've omitted the parsers for variables, terms and parenthesised expressions, but if you think they may be relevant to the problem you can read the full source for the parser.
The parser succeeds for expressions which use only negation and conjunction, i.e. the only prefix operator and the first infix operator.
*Data.Logic.Propositional.Parser2> runPT expr () "" "p & ~q"
Right (p ∧ ¬q)
Expressions using any other operators fail on the first character of the operator, with an error like the following:
*Data.Logic.Propositional.Parser2> runPT expr () "" "p | q"
Left (line 1, column 3):
unexpected "|"
expecting space or "&"
If I comment out the line defining the parser for conjunctions, then the parser for disjunction will work (but the rest will still fail). Putting them all into a single list (i.e. of the same precedence) doesn't work either: the same problem still manifests itself.
Can anyone point out what I'm doing wrong? Many thanks.
Thanks to Daniel Fischer for such a prompt and helpful answer.
In order to finish making this parser work correctly, I also needed to handle repeated applications of the negation symbol, so that e.g. ~~p would parse correctly. This SO answer showed me how to do it, and the change I made to the parser can be found here.
Your problem is that
binary n c = Infix (spaces >> string n >> spaces >> return c) AssocRight
the first tried infix operator consumes a space before it fails, so the later possibilities are not tried. (Parsec favours consuming parsers, and <|> only tries to run the second parser if the first failed without consuming any input.)
To have the other infix operators tried if the first fails, you could either wrap the binary parsers in a try
binary n c = Infix (try $ ...) AssocRight
so that when such a parser fails, it does not consume any input, or, better, and the conventional solution to that problem, remove the initial spaces from it,
binary n c = Infix (string n >> spaces >> return c) AssocRight
and have all your parsers consume spaces after the token they parsed
variable = do c <- letter
spaces
return $ Variable (Var c)
<?> "variable"
parens p = do char '('
spaces
x <- p
char ')'
spaces
return x
<?> "parens"
Of course, if you have parsers that can parse operators with a common prefix, you would still need to wrap those in a try so that if e.g parsing >= fails, >>= can still be tried.
Mocking up a datatype for the propositions and changing the space-consuming behaviour as indicated above,
*PropositionalParser Text.Parsec> head $ runPT expr () "" "p | q -> r & s"
Right (Conditional (Disjunction (Variable (Var 'p')) (Variable (Var 'q'))) (Conjunction (Variable (Var 'r')) (Variable (Var 's'))))
even a more complicated expression is parsed.

Complex Parsec Parsers

I don't quite know how else to ask. I think I need general guidance here. I've got something like this:
expr = buildExpressionParser table term
<?> "expression"
term = choice [
(float >>= return . EDouble)
, try (natural >>= return . EInteger)
, try (stringLiteral >>= return . EString)
, try (reserved "true" >> return (EBool True))
, try (reserved "false" >> return (EBool False))
, try assign
, try ifelse
, try lambda
, try array
, try eseq
, parens expr
]
<?> "simple expression"
When I test that parser, though, I mostly get problems... like when I try to parse
(a,b) -> "b"
it is accepted by the lambda parser, but the expr parser hates it. And sometimes it even hangs up completely in eternal rules.
I've read through Write Yourself a Scheme, but it only parses the homogeneous source of Scheme.
Maybe I am generally thinking in the wrong direction.
EDIT: Here the internal parsers:
assign = do
i <- identifier
reservedOp "="
e <- expr
return $ EAssign i e
ifelse = do
reserved "if"
e <- expr
reserved "then"
a <- expr
reserved "else"
b <- expr
return $ EIfElse e a b
lambda = do
ls <- parens $ commaSep identifier
reservedOp "->"
e <- expr
return $ ELambda ls e
array = (squares $ commaSep expr) >>= return . EArray
eseq = do
a <- expr
semi <|> (newline >>= (\x -> return [x]))
b <- expr
return $ ESequence a b
table = [
[binary "*" EMult AssocLeft, binary "/" EDiv AssocLeft, binary "%" EMod AssocLeft ],
[binary "+" EPlus AssocLeft, binary "-" EMinus AssocLeft ],
[binary "~" EConcat AssocLeft],
[prefixF "not" ENot],
[binaryF "and" EAnd AssocLeft, binaryF "or" EAnd AssocLeft]
]
And by "hates it" I meant that it tells me it expects an integer or a floating point.
What Edward in the comments and I are both trying to do is mentally run your parser, and that is a little difficult without more of the parser to go on. I'm going to make some guesses here, and maybe they will help you refine your question.
Guess 1): You have tried GHCI> parse expr "(input)" "(a,b) -> \"b\" and it has returned Left …. It would be helpful to know what the error was.
Guess 2): You have also tried GHCI> parse lambda "(input)" "(a,b) -> \"b\" and it returned Right …. based on this Edward an I have both deduced that somewhere in either your term parser or perhaps in the generated expr parser there is a conflict That is some piece of the parser is succeeding in matching the beginning of the string and returning a value, but what remains is no longer valid. It would be helpful if you would try GHCI> parse term "(input)" "(a,b) -> \"b\" as this would let us know whether the problem was in term or expr.
Guess 3): The string "(a,b)" is by itself a valid expression in the grammar as you have programmed it. (Though perhaps not as you intended to program it ;-). Try sending that through the expr parser and see what happens.
Guess 4): Your grammar is left recursive. This is what causes it to get stuck and loop forever. Parsec is a LL(k) parser. If you are used to Yacc and family which are LR(1) or LR(k) parsers, the rules for recursion are exactly reversed. If you didn't understand this last sentence thats OK, but let us know.
Guess 5): The code in the expression builder looks like it came from the function's documentation. I think you may have found the term expression somewhere as well. If that is the case you you point to where it came from. if not could you explain in a few sentences how you think term ought to work.
General Advice: The large number of try statements are eventually (a.k.a. now) going to cause you grief. They are useful in some cases but also a little naughty. If the next character can determine what choice should succeed there is no need for them. If you are just trying to get something running lots of backtracking will reduce the number of intermediate forms, but it also hides pathological cases and makes errors more obscure.
There appears to be left recursion, which will cause the parser to hang if the choice in term ever gets to eseq:
expr -> term -> eseq -> expr
The term (a,b) will not parse as a lambda, or an array, so it will fall into the eseq loop.
I don't see why (a,b) -> "b" doesn't parse as an expr, since the choice in term should hit upon the lambda, which you say works, before reaching the eseq. What is the position reported in the parse error?

Resources