Foldl-like operator for Parsec - haskell

Suppose I have a function of this type:
once :: (a, b) -> Parser (a, b)
Now, I would like to repeatedly apply this parser (somewhat like using >>=) and use its last output to feed it in the next iteration.
Using something like
sequence :: (a, b) -> Parser (a, b)
sequence inp = once inp >>= sequence
with specifying the initial values for the first parser doesn't work, because it would go on until it inevitably fails. Instead, I would like it to stop when it would fail (somewhat like many).
Trying to fix it using try makes the computation too complex (adding try in each iteration).
sequence :: (a, b) -> Parser (a, b)
sequence inp = try (once inp >>= sequence) <|> pure inp
In other words, I am looking for a function somewhat similar to foldl on Parsers, which stops when the next Parser would fail.

If your once parser fails immediately without consuming input, you don't need try. As a concrete example, consider a rather silly once parser that uses a pair of delimiters to parse the next pair of delimiters:
once :: (Char, Char) -> Parser (Char, Char)
once (c1, c2) = (,) <$ char c1 <*> anyChar <*> anyChar <* char c2
You can parse a nested sequence using:
onces :: (Char, Char) -> Parser (Char, Char)
onces inp = (once inp >>= onces) <|> pure inp
which works fine:
> parseTest (onces ('(',')')) "([])[{}]{xy}xabyDONE"
('a','b')
You only need try if your once might fail after parsing input. For example, the following won't parse without try:
> parseTest (onces ('(',')')) "([])[not valid]"
parse error at (line 1, column 8):
unexpected "t"
expecting "]"
because we start parsing the opening delimiter [ before discovering not valid].
(With try, it returns the correct ('[',']').)
All that being said, I have no idea how you came to the conclusion that using try makes the computation "too complex". If you are just guessing from something you've read about try being potentially inefficient, then you've misunderstood. try can cause problems if it's used in a manner than can result in a big cascade of backtracking. That's not a problem here -- at most, you're backtracking a single once, so don't worry about it.

Related

How to parse a character range into a tuple

I want to parse strings like "0-9" into ('0', '9') but I think my two attempts look a bit clumsy.
numRange :: Parser (Char, Char)
numRange = (,) <$> digitChar <* char '-' <*> digitChar
numRange' :: Parser (Char, Char)
numRange' = liftM2 (,) (digitChar <* char '-') digitChar
I kind of expected that there already is an operator that sequences two parsers and returns both results in a tuple. If there is then I can't find it. I'm also having a hard time figuring out the desired signature in order to search on hoogle.
I tried Applicative f => f a -> f b -> f (a, b) based off the signature of <* but that only gives unrelated results.
The applicative form:
numRange = (,) <$> digitChar <* char '-' <*> digitChar
is standard. Anyone familiar with monadic parsers will immediately understand what this does.
The disadvantage of the liftM2 (or equivalently liftA2) form, or of a function with signature:
pair :: Applicative f => f a -> f b -> f (a, b)
pair = liftA2 (,)
is that the resulting parser expressions:
pair (digitChar <* char '-') digitChar
pair digitChar (char '-' *> digitChar)
obscure the fact that the char '-' syntax is not actually part of either digit parser. As a result, I think this is more likely to be confusing than the admittedly ugly applicative syntax.
I kind of expected that there already is an operator that sequences two parsers and returns both results in a tuple.
There is; it's liftA2 (,) as you noticed. However, you aren't sequencing two parser, you are sequencing three parsers. Even though you can treat this as a "metasequence" of two two-parser sequencing operations, those two operations are different:
In digitChar <* char '-', you ignore the result of the second parser (and in my opinion, <* always looks like a typo for <*>).
In ... <*> digitChar, you use both results.
If you don't like using the applicative operators directly, consider using do syntax along with the ApplicativeDo extension and write
numRange :: Parser (Char, Char)
numRange = do
x <- digitChar
char '-'
y <- digitChar
return (x,y)
It's longer, but it's arguably more readable than either of the two using <*, which I always think looks like a typo for <*>.

Describing my Parser type as a series of monad transformers

I have to describe the Parser type as a series of monad transformers.
As far as I understand, monad transformers are used to wrap monads into another monad. But I don't understand what is the task here.
Instead of defining a new type for Parser, you can simply define it as a type alias for a type created by one or more monad transformers. That is, you definition would look something like
type Parser a = SomeMonadT <some set of monads and types>
Your task, then, is to determine which monad transformer(s) to use, and what the arguments to the transformer(s) should be.
Before we begin to define combinators that act on parsers, we must choose a representation for a parser first.
A parser takes a string and produces an output that can be just about anything. A list parser will produce a list as it's output, an integer parser will produce Ints, a JSON parser might return a custom ADT representing a JSON.
Therefore, it makes sense to make Parser a polymorphic type. It also makes sense to return a list of results instead of a single result since grammar can be ambiguous, and there may be several ways to parse the same input string.
An empty list, then, implies the parser failed to parse the provided input.
newtype Parser a = Parser { parse :: String -> [(a, String)] }
You might wonder why we return the tuple (a, String), not just a. Well, a parser might not be able to parse the entire input string. Often, a parser is only intended to parse some prefix of the input, and let another parser do the rest of the parsing. Thus, we return a pair containing the parse result a and the unconsumed string subsequent parsers can use.
We can start by describing some basic parsers that do very little work. A result parser always succeeds in parsing without consuming the input string.
result :: a -> Parser a
result val = Parser $ \inp -> [(val, inp)]
item unconditionally accepts the first character of any input string.
item :: Parser Char
item = Parser parseItem
where
parseItem [] = []
parseItem (x:xs) = [(x, xs)]
Let's try some of these parsers in GHCi:
*Main> parse (result 42) "abc"
[(42, "abc")]
*Main> parse item "abc"
[('a', "bc")]
Say we want a parser that consumes a string if its first character satisfies a predicate. We can generalize this idea by writing a function that takes a (Char -> Bool) predicate and returns a parser that only consumes an input string if its first character returns True when supplied to the predicate.
The simplest solution for this would be:
sat :: (Char -> Bool) -> Parser Char
sat p = Parser parseIfSat
where
parseIfSat (x : xs) = if p x then [(x, xs)] else []
Using the previously defined item parser (this requires a Monad instance for the type Parser, which I leave to you as an exercise):
sat p =
-- Apply `item`, if it fails on an empty string, we simply short circuit and get `[]`.
item >>= \x ->
if p x
then result x
else zero
parseIfSat [] = []
Now we can use the sat combinator to describe several useful parsers. For example, parser for ASCII digits:
-- import Data.Char (isDigit, isLower, isUpper)
digit :: Parser Char
digit = sat isDigit
You get the idea. You start by defining elementary parsers, and use those to build more complex parsers. The type Parser shown here is actually StateT Monad Transformer, it combines State and [] in this case.
The code shown was taken from here.

"string" Implementation in Text.Parser.Char

First, just some quick context. I'm going through the Haskell Programming From First Principles book, and ran into the following exercise.
Try writing a Parser that does what string does, but using char.
I couldn't figure it out, so I checked out the source for the implementation. I'm currently trying to wrap my head around it. Here it is:
class Parsing m => CharParsing m where
-- etc.
string :: CharParsing m => String -> m String
string s = s <$ try (traverse_ char s) <?> show s
My questions are as follows, from most to least specific.
Why is show necessary?
Why is s <$ necessary? Doesn't traverse char s <?> s work the same? In other words, why do we throw away the results of the traversal?
What is going on with the traversal? I get what a list traversal does, so I guess I'm confused about the Applicative/Monad instances for Parser. On a high level, I get that the traversal applies char, which has type CharParsing m => Char -> m Char, to every character in string s, and then collects all the results into something of type Parser [Char]. So the types make sense, but I have no idea what's going on in the background.
Thanks in advance!
1) Why is show necessary?
Because showing a string (or a Text, etc.) escapes special characters, which makes sense for error messages:
GHCi> import Text.Parsec -- Simulating your scenario with Parsec.
GHCi> runParser ((\s -> s <$ try (traverse_ char s) <?> s) "foo\nbar") () "" "foo"
Left (line 1, column 4):
unexpected end of input
expecting foo
bar
GHCi> runParser ((\s -> s <$ try (traverse_ char s) <?> show s) "foo\nbar") () "" "foo"
Left (line 1, column 4):
unexpected end of input
expecting "foo\nbar"
2) Why is s <$ necessary? Doesn't traverse char s <?> s work the same? In other words, why do we throw away the results of the traversal?
The result of the parse is unnecessary because we know in advance that it would be s (if the parse were successful). traverse would needlessly reconstruct s from the results of parsing each individual character. In general, if the results are not needed it is a good idea to use traverse_ (which just combines the effects, discarding the results without trying to rebuild the data structure) rather than traverse, so that is likely why the function is written the way it is.
3) What is going on with the traversal?
traverse_ char s (traverse_, and not traverse, as explained above) is a parser. It tries to parse, in order, each character in s, while discarding the results, and it is built by sequencing parsers for each character in s. It may be helpful to remind that traverse_ is just a fold which uses (*>):
-- Slightly paraphrasing the definition in Data.Foldable:
traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
traverse_ f = foldr (\x u -> f x *> u) (pure ())

Haskell parser combinator - do notation

I was reading a tutorial regarding building a parser combinator library and i came across a method which i don't quite understand.
newtype Parser a = Parser {parse :: String -> [(a,String)]}
chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> return a
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
p `chainl1` op = do {a <- p; rest a}
where rest a = (do f <- op
b <- p
rest (f a b))
<|> return a
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
the bind is the implementation of the (>>=) operator. I don't quite get how the chainl1 function works. From what I can see you extract f from op and then you apply it to f a b and you recurse, however I do not get how you extract a function from the parser when it should return a list of tuples?
Start by looking at the definition of Parser:
newtype Parser a = Parser {parse :: String -> [(a,String)]}`
A Parser a is really just a wrapper around a function (that we can run later with parse) that takes a String and returns a list of pairs, where each pair contains an a encountered when processing the string, along with the rest of the string that remains to be processed.
Now look at the part of the code in chainl1 that's confusing you: the part where you extract f from op:
f <- op
You remarked: "I do not get how you extract a function from the parser when it should return a list of tuples."
It's true that when we run a Parser a with a string (using parse), we get a list of type [(a,String)] as a result. But this code does not say parse op s. Rather, we are using bind here (with the do-notation syntactic sugar). The problem is that you're thinking about the definition of the Parser datatype, but you're not thinking much about what bind specifically does.
Let's look at what bind is doing in the Parser monad a bit more carefully.
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
What does p >>= f do? It returns a Parser that, when given a string s, does the following: First, it runs parser p with the string to be parsed, s. This, as you correctly noted, returns a list of type [(a, String)]: i.e. a list of the values of type a encountered, along with the string that remained after each value was encountered. Then it takes this list of pairs and applies a function to each pair. Specifically, each (a, s') pair in this list is transformed by (1) applying f to the parsed value a (f a returns a new parser), and then (2) running this new parser with the remaining string s'. This is a function from a tuple to a list of tuples: (a, s') -> [(b, s'')]... and since we're mapping this function over every tuple in the original list returned by parse p s, this ends up giving us a list of lists of tuples: [[(b, s'')]]. So we concatenate (or join) this list into a single list [(b, s'')]. All in all then, we have a function from s to [(b, s'')], which we then wrap in a Parser newtype.
The crucial point is that when we say f <- op, or op >>= \f -> ... that assigns the name f to the values parsed by op, but f is not a list of tuples, b/c it is not the result of running parse op s.
In general, you'll see a lot of Haskell code that defines some datatype SomeMonad a, along with a bind method that hides a lot of the dirty details for you, and lets you get access to the a values you care about using do-notation like so: a <- ma. It may be instructive to look at the State a monad to see how bind passes around state behind the scenes for you. Similarly, here, when combining parsers, you care most about the values the parser is supposed to recognize... bind is hiding all the dirty work that involves the strings that remain upon recognizing a value of type a.

Convert String to Tuple, Special Formatting in Haskell

For a test app, I'm trying to convert a special type of string to a tuple. The string is always in the following format, with an int (n>=1) followed by a character.
Examples of Input String:
"2s"
"13f"
"1b"
Examples of Desired Output Tuples (Int, Char):
(2, 's')
(13, 'f')
(1, 'b')
Any pointers would be extremely appreciated. Thanks.
You can use readS to parse the int and get the rest of the string:
readTup :: String -> (Int, Char)
readTup s = (n, head rest)
where [(n, rest)] = reads s
a safer version would be:
maybeReadTup :: String -> Maybe (Int, Char)
maybeReadTup s = do
[(n, [c])] <- return $ reads s
return (n, c)
Here's one way to do it:
import Data.Maybe (listToMaybe)
parseTuple :: String -> Maybe (Int, Char)
parseTuple s = do
(int, (char:_)) <- listToMaybe $ reads s
return (int, char)
This uses the Maybe Monad to express the possible parse failure. Note that if the (char:_) pattern fails to match (i.e., if there is only a number with no character after it), this gets translated into a Nothing result (this is due to how do notation works in Haskell. It calls the fail function of the Monad if pattern matches fail. In the case of Maybe a, we have fail _ = Nothing). The function also evaluates to Nothing if reads can't read an Int at the beginning of the input. If this happens, reads gives [] which is then turned into Nothing by listToMaybe.

Resources