Haskell - some, many implementation - haskell

In the article: "Write you a Haskell" (page 34) the following interpretation of "some" and "many" is given:
Derived automatically from the Alternative typeclass definition are
the many and some functions. many takes a single function argument and
repeatedly applies it until the function fails, and then yields the
collected results up to that point. The some function behaves similar
except that it will fail itself if there is not at least a single
match.
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
I have been trying to understand this implementation for a while.
I dont understand how "many" and "some" could be applied to "lists" or "Maybe".
Also I am not sure about (:) <$> v <*> many_v.
How does one derive this?

From ghc/libraries/base/GHC/Base.hs there is this recursive declaration:
... :: f a -> f [a]
...
many_v = some_v <|> pure []
some_v = liftA2 (:) v many_v -- (:) <$> v <*> many_v
The argument v must be a value of a type that is instance of
Alternative (and Applicative and Functor).
v is lifted already and just needs to be applied (<*>).
The application results in a lifted list.
some and many make recursive applications of the constructor of v,
and put the constructed values into a list.
some stops application, when the first empty is constructed
many continues application beyond that
[] is instance of Alternative:
instance Alternative [] where
empty = []
(<|>) = (++)
some tries with list:
av = [[2], [2,3], [], [2,3,5]]
some av -- no error, but never stops
do {v <- some av; return v;} -- no error, but never stops
Comparing to letter in
What are Alternative's "some" and "many" useful for?:
import Control.Monad(Functor(..))
import Control.Applicative
import Data.Char
newtype P a = P { runP :: String -> [(a,String)] }
instance Functor P where
fmap f (P q) = P (\s -> [ (f y,ys) | (y,ys) <- q s])
instance Applicative P where
pure x = P (\s -> [(x,s)])
P p <*> P q = P (\s -> [(x y, ys) | (x,xs) <- p s, (y,ys) <- q xs])
letter = P p where
p (x:xs) | isAlpha x = [(x,xs)]
p _ = []
instance Alternative P where
P p <|> P q = P (\s-> p s ++ q s)
empty = P (\s-> [])
with usage:
runP (many letter) "ab123"
letter is a smart constructor, that
constructs a value of P with field runP using local variable p.
(The result of the accessor) runP is a function.
P is a type implementing Alternative as well as a constructor.
P stands for parser.
fmap is not used in some and many.
<*> leads to an application of the function runP,
whose argument is not yet here.
Basically some and many construct a program that will eat from its argument.
The argument must be a list.
Due to laziness the recursion stops at the first constructor.
p = some letter -- constructs a program accessed via #runP#
runP p "a1" -- [("a","1")]
q = some [2] -- basically also a program
q -- never stops
Recursive application is not empty ([] for list),
because there is no source of criteria to stop, i.e. no input.
These stop:
some [] -- []
many [] -- [[]]
some Nothing -- Nothing
many Nothing -- Just []
There are more requirements on the argument of some and many (v).
v is value of a type that is instance of Alternative.
The recursive application of the constructor of the v type must stop with empty.
v must contain a function applied during construction with <*> to have stop criteria.
Conclusion:
some and many cannot be applied to list values,
even though list implements Alternative.
Why does list implement Alternative?
I think, it is to add the Monoid interface <|> and empty on top of Applicative,
not because of some and many.
foldr (<|>) [] [[2],[],[3,4]] -- [2,3,4]
some and many seem to be just for parser construction
or more generally construction of a program consuming input
and producing more outputs, of which some can be empty.
That is quite general already.
But the place in Alternative is only justified,
if most of Alternative instances have a sensible usage for it.
That is not the case.

Related

haskell parser combinator infinite loop

I'm trying to write a simple parser via Haskell, but stuck at an infinite loop.
the code is:
import Control.Applicative (Alternative, empty, many, (<|>))
data Parser a = Parser {runParser :: String -> [(a, String)]}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> [(f x', s') | (x', s') <- p s]
instance Applicative Parser where
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s -> [(f' x, ss') | (f', ss) <- pf s, (x, ss') <- p ss]
instance Alternative Parser where
empty = Parser $ \s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of
[] -> p2 s
xs -> xs
singleSpaceParser :: Parser Char
singleSpaceParser = Parser $ \s ->
( case s of
x : xs -> if x == ' ' then [(' ', xs)] else []
[] -> []
)
multiSpaceParser :: Parser [Char]
multiSpaceParser = many singleSpaceParser
I just load this file in ghci, and run:
runParser multiSpaceParser " 123"
I expect it to get [(" ", "123")], but actually it got an infinite loop
I used trace to debug, and it seems that many is wrong
How can I fix this bug?
Let's assume
many p = (:) <$> p <*> many p <|> pure []
and consider the call
many singleSpaceParser " 123"
(The string does not actually matter here, the many singleSpaceParser call will always loop.)
One reduction step yields
((:) <$> singleSpaceParser <*> many singleSpaceParser <|> pure []) " 123"
Now observe that, in order to reduce the call to (<|>), we have to evaluate both arguments of (<|>) to be of the shape Parser ....
Let's consider doing that for (:) <$> singleSpaceParser <*> many singleSpaceParser. As both (<$>) and (<*>) are infixl 4, this is an application of <*> at the outermost level.
But now observe that in order to reduce (<*>), we again have to evaluate both arguments of (<*>) to be of the shape Parser ..., so in particular the recursive call many singleSpaceParser.
This is where we get the infinite loop.
By switching data to newtype (or alternatively, at least avoiding aggressively pattern-matching on the Parser constructor in all the second arguments), these problems can be avoided.

Why `data` cause an infinite loop while `newtype` not

I am learning Arrow following the tutorial programming with arrows. I've typed the following code according to the paper except that the SF is defined by data, not by newtype as in the paper (actually, I made this change by chance, since I typed the code from memory):
import Control.Category
import Control.Arrow
import Prelude hiding (id, (.))
data SF a b = SF { runSF :: [a] -> [b] } -- this is the change, using data instead of newtype as in the paper
-- The folowing code is the same as in the paper
instance Category SF where
id = SF $ \x -> x
(SF f) . (SF g) = SF $ \x -> f (g x)
instance Arrow SF where
arr f = SF $ map f
first (SF f) = SF $ unzip >>> first f >>> uncurry zip
instance ArrowChoice SF where
left (SF f) = SF $ \xs -> combine xs (f [y | Left y <- xs])
where
combine (Left _ : ys) (z:zs) = Left z : combine ys zs
combine (Right y : ys) zs = Right y : combine ys zs
combine [] _ = []
delay :: a -> SF a a
delay x = SF $ init . (x:)
mapA :: ArrowChoice a => a b c -> a [b] [c]
mapA f = arr listcase >>>
arr (const []) ||| (f *** mapA f >>> arr (uncurry (:)))
listcase :: [a] -> Either () (a, [a])
listcase [] = Left ()
listcase (x:xs) = Right (x, xs)
When I load the file in ghci and execute runSF (mapA (delay 0)) [[1,2,3],[4,5,6]], it triggers an infinit loop and runs out of memory finally. If I change data back to newtype, everything is OK. The same problem happens in ghc 8.0.2, 8.2.2 and 8.6.3.
The same problem also exists even I compile the code into an executable.
I have thought the difference between data and newtype, when defining a data structure with only one field, is the runtime cost. But this problem seems to imply more difference between them. Or there may be something that I haven't noticed about the Arrow type-class.
Can anyone have any ideas? Thanks very much!
Let's look at this example.
data A = A [Int]
deriving (Show)
cons :: Int -> A -> A
cons x (A xs) = A (x:xs)
ones :: A
ones = cons 1 ones
We would expect that ones should be A [1,1,1,1...], because all we have done is wrap a list in a data constructor. But we would be wrong. Recall that pattern matches are strict for data constructors. That is, cons 1 undefined = undefined rather than A (1 : undefined). So when we try to evaluate ones, cons pattern matches on its second argument, which causes us to evaluate ones... we have a problem.
newtypes don't do this. At runtime newtype constructors are invisible, so it's as if we had written the equivalent program on plain lists
cons :: Int -> [Int] -> [Int]
cons x ys = x:ys
ones = cons 1 ones
which is perfectly productive, since when we try to evaluate ones, there is a : constructor between us and the next evaluation of ones.
You can get back the newtype semantics by making your data constructor pattern matches lazy:
cons x ~(A xs) = A (x:xs)
This is the problem with your code (I have run into this exact problem doing this exact thing). There are a few reasons data pattern matches are strict by default; the most compelling one I see is that pattern matching would otherwise be impossible if the type had more than one constructor. There is also a small runtime overhead to lazy pattern matching in order to fix some subtle GC leaks; details linked in the comments.

Is there a lazy mapM?

At first glance I thought these two functions would work the same:
firstM _ [] = return Nothing
firstM p (x:xs) = p x >>= \r -> if r then return (Just x) else firstM p xs
firstM' p xs = fmap listToMaybe (mapM p xs)
But they don't. In particular, firstM stops as soon as the first p x is true. But firstM', because of mapM, needs the evaluate the whole list.
Is there a "lazy mapM" that enables the second definition, or at least one that doesn't require explicit recursion?
There isn't (can't be) a safe, Monad-polymorphic lazy mapM. But the monad-loops package contains many lazy monadic variants of various pure functions, and includes firstM.
One solution is to use ListT, the list monad transformer. This type interleaves side effects and results, so you can peek at the initial element without running the whole computation first.
Here's an example using ListT:
import Control.Monad
import qualified ListT
firstM :: Monad m => (a -> Bool) -> [a] -> m (Maybe a)
firstM p = ListT.head . mfilter p . ListT.fromFoldable
(Note that the ListT defined in transformers and mtl is buggy and should not be used. The version I linked above should be okay, though.)
If there is, I doubt it's called mapM.
As I recall, mapM is defined in terms of sequence:
mapM :: Monad m => (a -> b) -> [a] -> m [b]
mapM f = sequence . map f
and the whole point of sequence is to guarantee that all the side-effects are done before giving you anything.
As opposed to using some alternate mapM, you could get away with just using map and sequence, so you could change the container from [a] to Just a:
firstM p xs = sequence $ listToMaybe (map p xs)
or even:
firstM p xs = mapM f $ listToMaybe xs
Now that mapM and sequence can operate on generic Traversables, not just lists.

Under what circumstances are monadic computations tail-recursive?

In Haskell Wiki's Recursion in a monad there is an example that is claimed to be tail-recursive:
f 0 acc = return (reverse acc)
f n acc = do
v <- getLine
f (n-1) (v : acc)
While the imperative notation leads us to believe that it is tail-recursive, it's not so obvious at all (at least to me). If we de-sugar do we get
f 0 acc = return (reverse acc)
f n acc = getLine >>= \v -> f (n-1) (v : acc)
and rewriting the second line leads to
f n acc = (>>=) getLine (\v -> f (n-1) (v : acc))
So we see that f occurs inside the second argument of >>=, not in a tail-recursive position. We'd need to examine IO's >>= to get an answer.
Clearly having the recursive call as the last line in a do block isn't a sufficient condition a function to be tail-recursive.
Let's say that a monad is tail-recursive iff every recursive function in this monad defined as
f = do
...
f ...
or equivalently
f ... = (...) >>= \x -> f ...
is tail-recursive. My question is:
What monads are tail-recursive?
Is there some general rule that we can use to immediately distinguish tail-recursive monads?
Update: Let me make a specific counter-example: The [] monad is not tail-recursive according to the above definition. If it were, then
f 0 acc = acc
f n acc = do
r <- acc
f (n - 1) (map (r +) acc)
would have to be tail-recursive. However, desugaring the second line leads to
f n acc = acc >>= \r -> f (n - 1) (map (r +) acc)
= (flip concatMap) acc (\r -> f (n - 1) (map (r +) acc))
Clearly, this isn't tail-recursive, and IMHO cannot be made. The reason is that the recursive call isn't the end of the computation. It is performed several times and the results are combined to make the final result.
A monadic computation that refers to itself is never tail-recursive. However, in Haskell you have laziness and corecursion, and that is what counts. Let's use this simple example:
forever :: (Monad m) => m a -> m b
forever c' = let c = c' >> c in c
Such a computation runs in constant space if and only if (>>) is nonstrict in its second argument. This is really very similar to lists and repeat:
repeat :: a -> [a]
repeat x = let xs = x : xs in xs
Since the (:) constructor is nonstrict in its second argument this works and the list can be traversed, because you have a finite weak-head normal form (WHNF). As long as the consumer (for example a list fold) only ever asks for the WHNF this works and runs in constant space.
The consumer in the case of forever is whatever interprets the monadic computation. If the monad is [], then (>>) is non-strict in its second argument, when its first argument is the empty list. So forever [] will result in [], while forever [1] will diverge. In the case of the IO monad the interpreter is the very run-time system itself, and there you can think of (>>) being always non-strict in its second argument.
What really matters is constant stack space. Your first example is tail recursive modulo cons, thanks to the laziness.
The (getLine >>=) will be executed and will evaporate, leaving us again with the call to f. What matters is, this happens in a constant number of steps - there's no thunk build-up.
Your second example,
f 0 acc = acc
f n acc = concat [ f (n - 1) $ map (r +) acc | r <- acc]
will be only linear (in n) in its thunk build-up, as the result list is accessed from the left (again due to the laziness, as concat is non-strict). If it is consumed at the head it can run in O(1) space (not counting the linear space thunk, f(0), f(1), ..., f(n-1) at the left edge ).
Much worse would be
f n acc = concat [ f (n-1) $ map (r +) $ f (n-1) acc | r <- acc]
or in do-notation,
f n acc = do
r <- acc
f (n-1) $ map (r+) $ f (n-1) acc
because there is extra forcing due to information dependency. Similarly, if the bind for a given monad were a strict operation.

Is Haskell's mapM not lazy?

UPDATE: Okay this question becomes potentially very straightforward.
q <- mapM return [1..]
Why does this never return?
Does mapM not lazily deal with infinite lists?
The code below hangs. However, if I replace line A by line B, it doesn't hang anymore. Alternatively, if I preceed line A by a "splitRandom $", it also doesn't hang.
Q1 is: Is mapM not lazy? Otherwise, why does replacing line A with line B "fix this" code?
Q2 is: Why does preceeding line A with splitRandom "solve" the problem?
import Control.Monad.Random
import Control.Applicative
f :: (RandomGen g) => Rand g (Double, [Double])
f = do
b <- splitRandom $ sequence $ repeat $ getRandom
c <- mapM return b -- A
-- let c = map id b -- B
a <- getRandom
return (a, c)
splitRandom :: (RandomGen g) => Rand g a -> Rand g a
splitRandom code = evalRand code <$> getSplit
t0 = do
(a, b) <- evalRand f <$> newStdGen
print a
print (take 3 b)
The code generates an infinite list of random numbers lazily. Then it generates a single random number. By using splitRandom, I can evaluate this latter random number first before the infinite list. This can be demonstrated if I return b instead of c in the function.
However, if I apply the mapM to the list, the program now hangs. To prevent this hanging, I have to apply splitRandom again before the mapM. I was under the impression that mapM can lazily
Well, there's lazy, and then there's lazy. mapM is indeed lazy in that it doesn't do more work than it has to. However, look at the type signature:
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
Think about what this means: You give it a function a -> m b and a bunch of as. A regular map can turn those into a bunch of m bs, but not an m [b]. The only way to combine the bs into a single [b] without the monad getting in the way is to use >>= to sequence the m bs together to construct the list.
In fact, mapM is precisely equivalent to sequence . map.
In general, for any monadic expression, if the value is used at all, the entire chain of >>=s leading to the expression must be forced, so applying sequence to an infinite list can't ever finish.
If you want to work with an unbounded monadic sequence, you'll either need explicit flow control--e.g., a loop termination condition baked into the chain of binds somehow, which simple recursive functions like mapM and sequence don't provide--or a step-by-step sequence, something like this:
data Stream m a = Nil | Stream a (m (Stream m a))
...so that you only force as many monad layers as necessary.
Edit:: Regarding splitRandom, what's going on there is that you're passing it a Rand computation, evaluating that with the seed splitRandom gets, then returning the result. Without the splitRandom, the seed used by the single getRandom has to come from the final result of sequencing the infinite list, hence it hangs. With the extra splitRandom, the seed used only needs to thread though the two splitRandom calls, so it works. The final list of random numbers works because you've left the Rand monad at that point and nothing depends on its final state.
Okay this question becomes potentially very straightforward.
q <- mapM return [1..]
Why does this never return?
It's not necessarily true. It depends on the monad you're in.
For example, with the identity monad, you can use the result lazily and it terminates fine:
newtype Identity a = Identity a
instance Monad Identity where
Identity x >>= k = k x
return = Identity
-- "foo" is the infinite list of all the positive integers
foo :: [Integer]
Identity foo = do
q <- mapM return [1..]
return q
main :: IO ()
main = print $ take 20 foo -- [1 .. 20]
Here's an attempt at a proof that mapM return [1..] doesn't terminate. Let's assume for the moment that we're in the Identity monad (the argument will apply to any other monad just as well):
mapM return [1..] -- initial expression
sequence (map return [1 ..]) -- unfold mapM
let k m m' = m >>= \x ->
m' >>= \xs ->
return (x : xs)
in foldr k (return []) (map return [1..]) -- unfold sequence
So far so good...
-- unfold foldr
let k m m' = m >>= \x ->
m' >>= \xs ->
return (x : xs)
go [] = return []
go (y:ys) = k y (go ys)
in go (map return [1..])
-- unfold map so we have enough of a list to pattern-match go:
go (return 1 : map return [2..])
-- unfold go:
k (return 1) (go (map return [2..])
-- unfold k:
(return 1) >>= \x -> go (map return [2..]) >>= \xs -> return (x:xs)
Recall that return a = Identity a in the Identity monad, and (Identity a) >>= f = f a in the Identity monad. Continuing:
-- unfold >>= :
(\x -> go (map return [2..]) >>= \xs -> return (x:xs)) 1
-- apply 1 to \x -> ... :
go (map return [2..]) >>= \xs -> return (1:xs)
-- unfold >>= :
(\xs -> return (1:xs)) (go (map return [2..]))
Note that at this point we'd love to apply to \xs, but we can't yet! We have to instead continue unfolding until we have a value to apply:
-- unfold map for go:
(\xs -> return (1:xs)) (go (return 2 : map return [3..]))
-- unfold go:
(\xs -> return (1:xs)) (k (return 2) (go (map return [3..])))
-- unfold k:
(\xs -> return (1:xs)) ((return 2) >>= \x2 ->
(go (map return [3..])) >>= \xs2 ->
return (x2:xs2))
-- unfold >>= :
(\xs -> return (1:xs)) ((\x2 -> (go (map return [3...])) >>= \xs2 ->
return (x2:xs2)) 2)
At this point, we still can't apply to \xs, but we can apply to \x2. Continuing:
-- apply 2 to \x2 :
(\xs -> return (1:xs)) ((go (map return [3...])) >>= \xs2 ->
return (2:xs2))
-- unfold >>= :
(\xs -> return (1:xs)) (\xs2 -> return (2:xs2)) (go (map return [3..]))
Now we've gotten to a point where neither \xs nor \xs2 can be reduced yet! Our only choice is:
-- unfold map for go, and so on...
(\xs -> return (1:xs))
(\xs2 -> return (2:xs2))
(go ((return 3) : (map return [4..])))
So you can see that, because of foldr, we're building up a series of functions to apply, starting from the end of the list and working our way back up. Because at each step the input list is infinite, this unfolding will never terminate and we will never get an answer.
This makes sense if you look at this example (borrowed from another StackOverflow thread, I can't find which one at the moment). In the following list of monads:
mebs = [Just 3, Just 4, Nothing]
we would expect sequence to catch the Nothing and return a failure for the whole thing:
sequence mebs = Nothing
However, for this list:
mebs2 = [Just 3, Just 4]
we would expect sequence to give us:
sequence mebs = Just [3, 4]
In other words, sequence has to see the whole list of monadic computations, string them together, and run them all in order to come up with the right answer. There's no way sequence can give an answer without seeing the whole list.
Note: The previous version of this answer asserted that foldr computes starting from the back of the list, and wouldn't work at all on infinite lists, but that's incorrect! If the operator you pass to foldr is lazy on its second argument and produces output with a lazy data constructor like a list, foldr will happily work with an infinite list. See foldr (\x xs -> (replicate x x) ++ xs) [] [1...] for an example. But that's not the case with our operator k.
This question is showing very well the difference between the IO Monad and other Monads. In the background the mapM builds an expression with a bind operation (>>=) between all the list elements to turn the list of monadic expressions into a monadic expression of a list. Now, what is different in the IO monad is that the execution model of Haskell is executing expressions during the bind in the IO Monad. This is exactly what finally forces (in a purely lazy world) something to be executed at all.
So IO Monad is special in a way, it is using the sequence paradigm of bind to actually enforce execution of each step and this is what our program makes to execute anything at all in the end. Others Monads are different. They have other meanings of the bind operator, depending on the Monad. IO is actually the one Monad which execute things in the bind and this is the reason why IO types are "actions".
The following example show that other Monads do not enforce execution, the Maybe monad for example. Finally this leds to the result that a mapM in the IO Monad returns an expression, which - when executed - executes each single element before returning the final value.
There are nice papers about this, start here or search for denotational semantics and Monads:
Tackling the awkward squad: http://research.microsoft.com/en-us/um/people/simonpj/papers/marktoberdorf/mark.pdf
Example with Maybe Monad:
module Main where
fstMaybe :: [Int] -> Maybe [Int]
fstMaybe = mapM (\x -> if x == 3 then Nothing else Just x)
main = do
let r = fstMaybe [1..]
return r
Let's talk about this in a more generic context.
As the other answers said, the mapM is just a combination of sequence and map. So the problem is why sequence is strict in certain Monads. However, this is not restricted to Monads but also Applicatives since we have sequenceA which share the same implementation of sequence in most cases.
Now look at the (specialized for lists) type signature of sequenceA :
sequenceA :: Applicative f => [f a] -> f [a]
How would you do this? You were given a list, so you would like to use foldr on this list.
sequenceA = foldr f b where ...
--f :: f a -> f [a] -> f [a]
--b :: f [a]
Since f is an Applicative, you know what b coule be - pure []. But what is f?
Obviously it is a lifted version of (:):
(:) :: a -> [a] -> [a]
So now we know how sequenceA works:
sequenceA = foldr f b where
f a b = (:) <$> a <*> b
b = pure []
or
sequenceA = foldr ((<*>) . fmap (:)) (pure [])
Assume you were given a lazy list (x:_|_). The above definition of sequenceA gives
sequenceA (x:_|_) === (:) <$> x <*> foldr ((<*>) . fmap (:)) (pure []) _|_
=== (:) <$> x <*> _|_
So now we see the problem was reduced to consider weather f <*> _|_ is _|_ or not. Obviously if f is strict this is _|_, but if f is not strict, to allow a stop of evaluation we require <*> itself to be non-strict.
So the criteria for an applicative functor to have a sequenceA that stops on will be
the <*> operator to be non-strict. A simple test would be
const a <$> _|_ === _|_ ====> strict sequenceA
-- remember f <$> a === pure f <*> a
If we are talking about Moands, the criteria is
_|_ >> const a === _|_ ===> strict sequence

Resources