Infinite loop when calling many on custom Parser [duplicate] - haskell

I'm trying to implement my own Applicative parser, here's the code I use:
{-# LANGUAGE ApplicativeDo, LambdaCase #-}
module Parser where
-- Implementation of an Applicative Parser
import Data.Char
import Control.Applicative (some, many, empty, (<*>), (<$>), (<|>), Alternative)
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
-- fmap :: (a -> b) -> (Parser a -> Parser b)
fmap f (Parser p) = Parser (\s -> [(f a, s') | (a,s') <- p s])
instance Applicative Parser where
-- pure :: a -> Parser a
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s ->
[(f a, s'') | (f, s') <- pf s, (a, s'') <- p s']
instance Alternative Parser where
-- empty :: Parser a
-- <|> :: Parser a -> Parser a -> Parser a
empty = Parser $ \_s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of [] -> p2 s
xs -> xs
char :: Char -> Parser Char
char c = Parser $ \case (c':cs) | c == c' -> [(c,cs)] ; _ -> []
main = print $ runParser (some $ char 'A') "AAA"
When I run it, it gets stuck and never returns. After digging into the problem I pinpointed the root cause to be my implementation of the <|> method. If I use the following implementation then everything goes as expected:
instance Alternative Parser where
empty = Parser $ \_s -> []
p1 <|> p2 = Parser $ \s ->
case runParser p1 s of [] -> runParser p2 s
xs -> xs
These two implementations are, in my understanding, quite equivalent. What I guess is that this may have something to do with Haskell's lazy evaluation scheme. Can someone explain what's going on?

Fact "star": in your implementation of (<*>):
Parser p1 <*> Parser p2 = ...
...we must compute enough to know that both arguments are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equation.
Fact "pipe strict": in this implementation:
Parser p1 <|> Parser p2 = ...
...we must compute enough to know that both parsers are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equals sign.
Fact "pipe lazy": in this implementation:
p1 <|> p2 = Parser $ ...
...we may proceed to the right-hand side of the equals sign without doing any computation on p1 or p2.
This is important, because:
some v = some_v where
some_v = pure (:) <*> v <*> (some_v <|> pure [])
Let's take your first implementation, the one about which we know the "pipe strict" fact. We want to know if some_v is an application of Parser to something. Thanks to fact "star", we must therefore know whether pure (:), v, and some_v <|> pure [] are applications of Parser to something. To know this last one, by fact "pipe strict", we must know whether some_v and pure [] are applications of Parser to something. Whoops! We just showed that to know whether some_v is an application of Parser to something, we need to know whether some_v is an application of Parser to something -- an infinite loop!
On the other hand, with your second implementation, to check whether some_v is a Parser _, we still must check pure (:), v, and some_v <|> pure [], but thanks to fact "pipe lazy", that's all we need to check -- we can be confident that some_v <|> pure [] is a Parser _ without first checking recursively that some_v and pure [] are.
(And next, you will learn about newtype -- and be confused yet again when changing from data to newtype makes both implementation work!)

Related

haskell parser combinator infinite loop

I'm trying to write a simple parser via Haskell, but stuck at an infinite loop.
the code is:
import Control.Applicative (Alternative, empty, many, (<|>))
data Parser a = Parser {runParser :: String -> [(a, String)]}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> [(f x', s') | (x', s') <- p s]
instance Applicative Parser where
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s -> [(f' x, ss') | (f', ss) <- pf s, (x, ss') <- p ss]
instance Alternative Parser where
empty = Parser $ \s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of
[] -> p2 s
xs -> xs
singleSpaceParser :: Parser Char
singleSpaceParser = Parser $ \s ->
( case s of
x : xs -> if x == ' ' then [(' ', xs)] else []
[] -> []
)
multiSpaceParser :: Parser [Char]
multiSpaceParser = many singleSpaceParser
I just load this file in ghci, and run:
runParser multiSpaceParser " 123"
I expect it to get [(" ", "123")], but actually it got an infinite loop
I used trace to debug, and it seems that many is wrong
How can I fix this bug?
Let's assume
many p = (:) <$> p <*> many p <|> pure []
and consider the call
many singleSpaceParser " 123"
(The string does not actually matter here, the many singleSpaceParser call will always loop.)
One reduction step yields
((:) <$> singleSpaceParser <*> many singleSpaceParser <|> pure []) " 123"
Now observe that, in order to reduce the call to (<|>), we have to evaluate both arguments of (<|>) to be of the shape Parser ....
Let's consider doing that for (:) <$> singleSpaceParser <*> many singleSpaceParser. As both (<$>) and (<*>) are infixl 4, this is an application of <*> at the outermost level.
But now observe that in order to reduce (<*>), we again have to evaluate both arguments of (<*>) to be of the shape Parser ..., so in particular the recursive call many singleSpaceParser.
This is where we get the infinite loop.
By switching data to newtype (or alternatively, at least avoiding aggressively pattern-matching on the Parser constructor in all the second arguments), these problems can be avoided.

Destructuring `Maybe (a, b)`

Sort of a followup to my last question. I'm going through Brent Yorgey's Haskell course and I'm trying to solve an exercise that asks us to create an instance of Applicative for the following type:
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
runParser parsers a string and returns a token and the remaining string. p1 <*> p2 should in this case apply a function generated by runParser p1 to the token generated by runParser p2 (applied to what's left of the string after running runParser p1).
So far I have:
(Parser { runParser = run }) <*> (Parser { runParser = run' }) = Parser run''
where run'' s = (first <$> f) <*> (s' >>= run')
where f = fst <$> run s
s' = snd <$> run s
(first <$> f) <*> (s' >>= run') seems pretty concise to me, but the nested where's and the weird destructuring of run s look "off". Is there a nicer way to write this?
First, let me rewrite this a bit to avoid pattern matching:
p <*> q = Parser run
where run s = (first <$> f) <*> (s' >>= runParser q)
where f = fst <$> runParser p s
s' = snd <$> runParser p s
Here, I’ve simply used the field accessor runParser :: Parser a -> String -> Maybe (a, String) instead of directly pattern-matching on the arguments. This is considered a more idiomatic method of accessing newtyped functions in Haskell.
Next, there are some obvious simplifications which can be made, particularly inlining some functions:
p <*> q = Parser $ \s -> (first <$> f) <*> (s' >>= runParser q)
where
f = fst <$> runParser p s
s' = snd <$> runParser p s
(Note that s now has to be explicitly passed to the functions in the where block so they can access it. Don’t worry, I’ll get rid of this in a minute.)
One confusing thing in this implementation is the nested applicatives and monads. I’ll rewrite that section slightly to make it a bit clearer:
p <*> q = Parser $ \s ->
let qResult = s' s >>= runParser q
in first <$> f s <*> qResult
where
f s = fst <$> runParser p s
s' s = snd <$> runParser p s
Next, let’s get rid of those annoying f and s' definitions. We can do this using pattern-matching. By pattern-matching on the output of runParser p s, we can access these values directly:
p <*> q = Parser $ \s ->
case runParser p s of
Nothing -> Nothing
Just (f, s') ->
let qResult = runParser q s'
in first f <$> qOutput
(Note that, since f and s' are no longer in Maybe, much of the applicative and monadic plumbing which was required before is now unneeded. One <$> still remains, since runParser q s' can still fail).
Let’s rewrite this slightly, by inlining qResult:
p <*> q = Parser $ \s ->
case runParser p s of
Nothing -> Nothing
Just (f, s') -> first f <$> runParser q s'
Now observe a pattern in this code. It does runParser p s, fails if this fails; otherwise it uses the value in another computation which can fail. This just sounds like monadic sequencing! So let’s rewrite it with >>=:
p <*> q = Parser $ \s -> runParser p s >>= \(f, s') -> first f <$> runParser q s'
And finally, the whole thing can be rewritten in do-notation for readability:
p <*> q = Parser $ \s -> do
(f, s') <- runParser p s
qResult <- runParser q s'
return $ first f qResult
Much easier to read! And what makes this version particularly nice is that it’s easy to see what’s going on — run the first parser, get its output and use it to run the second parser, then combine the results.
In my eye, there's no shame in keeping it simple using basic pattern matching only, without relying too much on <*>, <$>, first, and other library functions.
Parser pF <*> Parser pX = Parser $ \s -> do
(f, s' ) <- pF s
(x, s'') <- pX s'
return (f x, s'')
The above do block is in the Maybe monad.

Haskell - some, many implementation

In the article: "Write you a Haskell" (page 34) the following interpretation of "some" and "many" is given:
Derived automatically from the Alternative typeclass definition are
the many and some functions. many takes a single function argument and
repeatedly applies it until the function fails, and then yields the
collected results up to that point. The some function behaves similar
except that it will fail itself if there is not at least a single
match.
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
I have been trying to understand this implementation for a while.
I dont understand how "many" and "some" could be applied to "lists" or "Maybe".
Also I am not sure about (:) <$> v <*> many_v.
How does one derive this?
From ghc/libraries/base/GHC/Base.hs there is this recursive declaration:
... :: f a -> f [a]
...
many_v = some_v <|> pure []
some_v = liftA2 (:) v many_v -- (:) <$> v <*> many_v
The argument v must be a value of a type that is instance of
Alternative (and Applicative and Functor).
v is lifted already and just needs to be applied (<*>).
The application results in a lifted list.
some and many make recursive applications of the constructor of v,
and put the constructed values into a list.
some stops application, when the first empty is constructed
many continues application beyond that
[] is instance of Alternative:
instance Alternative [] where
empty = []
(<|>) = (++)
some tries with list:
av = [[2], [2,3], [], [2,3,5]]
some av -- no error, but never stops
do {v <- some av; return v;} -- no error, but never stops
Comparing to letter in
What are Alternative's "some" and "many" useful for?:
import Control.Monad(Functor(..))
import Control.Applicative
import Data.Char
newtype P a = P { runP :: String -> [(a,String)] }
instance Functor P where
fmap f (P q) = P (\s -> [ (f y,ys) | (y,ys) <- q s])
instance Applicative P where
pure x = P (\s -> [(x,s)])
P p <*> P q = P (\s -> [(x y, ys) | (x,xs) <- p s, (y,ys) <- q xs])
letter = P p where
p (x:xs) | isAlpha x = [(x,xs)]
p _ = []
instance Alternative P where
P p <|> P q = P (\s-> p s ++ q s)
empty = P (\s-> [])
with usage:
runP (many letter) "ab123"
letter is a smart constructor, that
constructs a value of P with field runP using local variable p.
(The result of the accessor) runP is a function.
P is a type implementing Alternative as well as a constructor.
P stands for parser.
fmap is not used in some and many.
<*> leads to an application of the function runP,
whose argument is not yet here.
Basically some and many construct a program that will eat from its argument.
The argument must be a list.
Due to laziness the recursion stops at the first constructor.
p = some letter -- constructs a program accessed via #runP#
runP p "a1" -- [("a","1")]
q = some [2] -- basically also a program
q -- never stops
Recursive application is not empty ([] for list),
because there is no source of criteria to stop, i.e. no input.
These stop:
some [] -- []
many [] -- [[]]
some Nothing -- Nothing
many Nothing -- Just []
There are more requirements on the argument of some and many (v).
v is value of a type that is instance of Alternative.
The recursive application of the constructor of the v type must stop with empty.
v must contain a function applied during construction with <*> to have stop criteria.
Conclusion:
some and many cannot be applied to list values,
even though list implements Alternative.
Why does list implement Alternative?
I think, it is to add the Monoid interface <|> and empty on top of Applicative,
not because of some and many.
foldr (<|>) [] [[2],[],[3,4]] -- [2,3,4]
some and many seem to be just for parser construction
or more generally construction of a program consuming input
and producing more outputs, of which some can be empty.
That is quite general already.
But the place in Alternative is only justified,
if most of Alternative instances have a sensible usage for it.
That is not the case.

Why does this loops with 'data' but not 'newtype'?

Here is the code :
import Control.Applicative
-- newtype Parser a = Parser { runParser :: String -> [(a, String)] }
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
fmap f (Parser p) = Parser (\s -> [(f x, s') | (x, s') <- p s ] )
instance Applicative Parser where
pure a = Parser (\s -> [(a, s)])
Parser q <*> Parser p = Parser (\s -> [(f x, s'') | (f, s') <- q s, (x, s'') <- p s'])
instance Alternative Parser where
empty = Parser (\s -> [])
Parser q <|> Parser p = Parser (\s -> q s ++ p s)
item = Parser (\s -> case s of
(x:xs) -> [(x, xs)]
_ -> []
)
With the current code, runParser (some item) "abcd" loops, but if Parser is declared as newtype, it works just fine.
This is a great way of getting at one of the difference between data and newtype. The heart of the problem here is actually in the pattern matching of the <|> definition.
instance Alternative Parser where
empty = Parser (\s -> [])
Parser q <|> Parser p = Parser (\s -> q s ++ p s)
Remember that at runtime, a newtype becomes the same thing as the type it is wrapping. Then, when a newtype is pattern matched, GHC doesn't do anything - there is no constructor to evaluate to WNHF.
On the contrary, when a data is matched, seeing the pattern Parser q tells GHC it needs to evaluate that parser to WNHF. That is a problem, because some is an infinite fold of <|>. There are two ways to solve the problem with data:
Don't have Parser patterns in <|>:
instance Alternative Parser where
empty = Parser (\s -> [])
q <|> p = Parser (\s -> runParser q s ++ runParser p s)
Use lazy patterns:
instance Alternative Parser where
empty = Parser (\s -> [])
~(Parser q) <|> ~(Parser p) = Parser (\s -> q s ++ p s)

Monadic type confusion

I am going through Write Yourself a Scheme in Haskell. Its a great tutorial, but I've run into a wall with one of the parsing exercises:
parseNumber :: Parser LispVal
parseNumber = liftM (Number . read) $ many1 digit
Rewrite parseNumber using:
Do-notation
explicit sequencing with the >>= operator
I had no problems with do-notation:
parseNumber :: Parser LispVal
parseNumber = do x <- many1 digit
let y = read x
return $ Number y
For #2 I've tried a bunch of variations such as:
parseNumber :: Parser LispVal
parseNumber = (many1 digit) >>= (liftM (Number . read))
but I keep running into type errors. I have two questions.
Why am I getting type errors? Am I misunderstanding the monadic bind operator?
Why AREN'T I getting similar type errors with my do-notation solution?
I feel like I am missing a fundamental concept regarding types?
You're attempting a non-trivial transformation from do-notation to bind notation, I recommend doing it the "trivial" way, and then making it points-free.
Recall:
x <- m === m >>= \x ->
let x = e === let x = e in
Then you have:
parseNumber = many1 digit >>= \x ->
let y = read x in
return (Number y)
(I've removed the $ to avoid precedence problems.)
We can then convert this into:
parseNumber = many1 digit >>= \x -> return (Number (read x))
= many1 digit >>= return . Number . read
Now, if you want to use liftM, you need to stop using bind, since the lifted function expects a monadic value as its argument.
parseNumber = liftM (Number . read) (many1 digit)
In your case, bind has type:
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
(since you're using Parser as the Monad)
You give bind two arguments: the first one, many1 digit, should be ok (regarding the type); but the type of the second argument is the result type of liftM, namely Parser a -> Parser b and this does not fit the second argument's expected type (a -> Parser b)!
Without having tested it: instead of using liftM (Number.read) as second argument of bind, try using return . Number . read - this should have the right type and gives probably what you want...

Resources