Cont Monad breaks laziness in Haskell - haskell

I was trying the Cont monad, and discovers the following problem.
First construct a infinite list and lift all the elements to a Cont monad
Use sequence operation to get a Cont monad on the infinite list.
When we try to run the monad, with head, for example, it falls into infinite loop
while trying to expand the continuation and the head is never called.
The code looks like this:
let inff = map (return :: a -> Cont r a) [0..]
let seqf = sequence inff
runCont seqf head
So is this a limitation of the Cont monad implementation in Haskell?
If so, how do we improve this?

The reason is that even though the value of the head element of sequence someList depends only on the first elemenent of someList, the effect of sequence someList can generally depend on all the effects of someList (and it does for most monads). Therefore, if we want to evaluate the head element, we still need to evaluate all the effects.
For example, if we have a list of Maybe values, the result of sequence someList is Just only if all the elements of someList are Just. So if we try to sequence an infinite list, we'd need to examine its infinite number of elements if they're all Just.
The same applies for Cont.
In the continuation monad, we can escape any time from the computation and return a result that is different from what has been computed so far.
Consider the following example:
test :: (Num a, Enum a) => a
test = flip runCont head $
callCC $ \esc -> do
sequence (map return [0..100] ++ [esc [-1]])
or directly using cont instead of callCC:
test' :: (Num a, Enum a) => a
test' = flip runCont head $
sequence (map return [0..100] ++ [cont (const (-1))])
The result of test is just -1. After processing the first 100 elements, the final element can decide to escape all of this and return -1 instead. So in order to see what is the head element of sequence someList in Cont, we again need to compute them all.

This is not a flaw with the Cont monad so much as sequence. You can get similar results for Either, for example:
import Control.Monad.Instances ()
xs :: [Either a Int]
xs = map Right [0..] -- Note: return = Right, for Either
ys :: Either a [Int]
ys = sequence xs
You can't retrieve any elements of ys until it computes the entire list, which will never happen.
Also, note that: sequence (map f xs) = mapM f xs, so we can simplify this example to:
>>> import Control.Monad.Instances
>>> mapM Right [0..]
<Hangs forever>
There are a few monads where mapM will work on an infinite list of values, specifically the lazy StateT monad and Identity, but they are the exception to the rule.
Generally, mapM/sequence/replicateM (without trailing underscores) are anti-patterns and the correct solution is to use pipes, which allows you to build effectful streams that don't try to compute all the results up front. The beginning of the pipes tutorial describes how to solve this in more detail, but the general rule of thumb is that any time you write something like:
example1 = mapM f xs
example2 = sequence xs
You can transform it into a lazy Producer by just transforming it to:
example1' = each xs >-> Pipes.Prelude.mapM f
example2' = each xs >-> Pipes.Prelude.sequence
Using the above example with Either, you would write:
>>> import Pipes
>>> let xs = each [0..] >-> mapM Right :: Producer Int (Either a) ()
Then you can lazily process the stream without generating all elements:
>>> Pipes.Prelude.any (> 10) xs
Right True

Related

How can I print every item in an infinite list with proper IO monad

I have this simple definition of the integers, done with an infinite list:
nats = 0: map (+1) nats
And I would like to make a version of that infinite list that print every element.
I did it with Data.Trace (trace)
nats = 0: map (\x -> 1 + trace (show x) x) nats
But it's a bit cheating.
It would be nicer to do it properly with the IO monad.
I tried to do it with an infinite list of type IO [Int] with the following code:
nats' = (fmap f nats') >>= mapM (\x -> do{print x; return x})
where f l = 0:map (+1) l
The type signatures work, but however when I try to evaluate it with take 10 'nats,
it gets stuck in an infinite recursion without printing anything.
What is wrong with this code, is the type signature right and what would be the right way to approach this problem?
EDIT :
I have a variant of this problem where I want to create an infinite list containing all the inputs of the user.
How can I do that?
The IO monad is strict. When we run x <- ioAction, we fully execute ioAction and observe all its side effects before x is bound to a value. Consequently, if ioAction is an infinite loop, it will never return a value to be bound to to x.
In your case, the code has the form
nats' = (fmap f nats') >>= ...
which is equivalent to
nats' = nats' >>= return . f >>= ...
So, nats' will recurse before executing return . f, causing an infinite loop and preventing any output.
How to make this actually work?
The most idiomatic Haskell way is to separate side effects from actual computation: we define a pure value
nats = 0 : map (+1) nats
and the we print it
nats' = for_ nats print
Note that above there is little point to use for/traverse/mapM (instead of their _ variants), since the list is infinite so it will never return a value, but only perform infinitely many prints.
If you really want to interleave IO and computation, even if that's usually less idiomatic, you can use recursion:
nats' = go 0
where go l = print l >>= go (l+1)
If you want to try returning a list anyways, even if that will actually never happen, you can use this:
nats' = go 0
where go l = do
print l
ls <- go (l+1)
return (l:ls)
However, this is unnecessarily complex.

Why does `fmap (take 10) . sequence . fmap return $ [1..] :: m [Int]` only work with some monads?

What I wish to have is an endless supply of numbers that are either random or non-deterministic. I went on to program it like this:
supply :: Monad m => (Int -> m Int) -> m [Int]
supply action = sequence . fmap action $ [1..]
— With action being either \n -> randomRIO (1, n) or \n -> [1.. n]
Unfortunately, I could not take anything from that supply.
When I replaced action with return and tried out different monads, I discovered that Identity and Reader work, but they are not so useful in this case.
λ flip runReader 13 (fmap (take 10) (supply return))
[1,2,3,4,5,6,7,8,9,10]
λ runIdentity (fmap (take 10) (supply return))
[1,2,3,4,5,6,7,8,9,10]
λ [] : (fmap (take 10) (supply return))
[[]^CInterrupted.
fmap (take 10) (supply return) :: IO [Int]
^CInterrupted.
There must be a reason for some monads having this quality of hanging when sequenced, but I do not see it. Is this a strictness issue? What is the hallmark difference, for example, between this Identity and list instance? Why is it that I can have a stream assembled out of Identity i, but not of similarly trivial singleton lists [i]?
It shouldn't be surprising that this doesn't work in IO. You are building up an infinite list of IO actions, and then sequence turns that into a single IO action that produces the entire list by performing all of the underlying actions. It must perform all of the actions immediately because they may have side effects.
Clearly, this will never finish. If you want this to work, you need something like unsafeInterleaveIO.
The [] example is a bit more subtle. The following will also hang:
> map (take 10) $ transpose [[x] | x <- [1..]]
[[1,2,3,4,5,6,7,8,9,10]^CInterrupted.
transpose must traverse the whole infinite list, looking for anything that might have two elements in it to decide whether there should be a second row. sequence is essentially the same.

Are there ways to call two functions (one just after another) in purely functional language? (in non-io mode)

I'm trying to understand order of execution in purely functional language.
I know that in purely functional languages, there is no necessary execution order.
So my question is:
Suppose there are two functions.
I would like to know all ways in which I can call one function after another (except nested call of one function from another) (and except io-mode).
I would like to see examples in Haskell or pseudo-code.
There is no way to do what you describe, if the functions are totally independent and you don't use the result of one when you call the other.
This is because there is no reason to do this. In a side effect free setting, calling a function and then ignoring its result is exactly the same as doing nothing for the amount of time it takes to call that function (setting aside memory usage).
It is possible that seq x y will evaluate x and then y, and then give you y as its result, but this evaluation order isn't guaranteed.
Now, if we do have side effects, such as if we are working inside a Monad or Applicative, this could be useful, but we aren't truly ignoring the result since there is context being passed implicitly. For instance, you can do
main :: IO ()
main = putStrLn "Hello, " >> putStrLn "world"
in the IO Monad. Another example would be the list Monad (which could be thought of as representing a nondeterministic computation):
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String
example = filter biggerThanTen [1..15] >> return 'a' -- This evaluates to "aaaaa"
Note that even here we aren't really ignoring the result. We ignore the specific values, but we use the structure of the result (in the second example, the structure would be the fact that the resulting list from filter biggerThanTen [1..15] has 5 elements).
I should point out, though, that things that are sequenced in this way aren't necessarily evaluated in the order that they are written. You can sort of see this with the list Monad example. This becomes more apparent with bigger examples though:
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y))) -- ==> [10,100,1000,20,200,2000,30,300,3000]
The main takeaway here is that evaluation order (in the absence of side effects like IO and ignoring bottoms) doesn't affect the ultimate meaning of code in Haskell (other than possible differences in efficiency, but that is another topic). As a result, there is never a reason to call two functions "one after another" in the fashion described in the question (that is, where the calls are totally independent from each other).
Do notation
Do notation is actually exactly equivalent to using >>= and >> (there is actually one other thing involved that takes care of pattern match failures, but that is irrelevant to the discussion at hand). The compiler actually takes things written in do notation and converts them to >>= and >> through a process called "desugaring" (since it removes the syntactic sugar). Here are the three examples from above written with do notation:
IO Example
main :: IO ()
main = do
putStrLn "Hello, "
putStrLn "World"
First list example
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String -- String is a synonym for [Char], by the way
example = do
filter biggerThanTen [1..15]
return 'a'
Second list example
example2 :: [Int]
example2 = do
x <- [1,2,3]
y <- [10,100,1000]
return (x * y)
Here is a side-by-side comparison of the conversions:
do --
m -- m >> n
n --
do --
x <- m -- m >>= (\x ->
... -- ...)
The best way to understand do notation is to first understand >>= and return since, as I said, that's what the compiler transforms do notation into.
As a side-note, >> is just the same as >>=, it just ignores the "result" of it's left argument (although it preserves the "context" or "structure"). So all definitions of >> must be equivalent to m >> n = m >>= (\_ -> n).
Expanding the >>= in the second list example
To help drive home the point that Monads are not usually impure, lets expand the >>= calls in the second list example, using the Monad definition for lists. The definition is:
instance Monad [] where
return x = [x]
xs >>= f = concatMap f xs
and we can convert example2 into:
Step 0 (what we already have)
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
Step 1 (converting the first >>=)
example2 =
concatMap
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
[1,2,3]
Step 2
example2 =
concatMap
(\x -> concatMap
(\y -> return (x * y))
[10,100,1000])
[1,2,3]
Step 3
example2 =
concatMap
(\x -> concatMap
(\y -> [x * y])
[10,100,1000])
[1,2,3]
So, there is no magic going on here, just normal function calls.
You can write a function whose arguments depend on the evaluation of another function:
-- Ads the first two elements of a list together
myFunc :: [Int] -> Int
myFunc xs = (head xs) + (head $ tail xs)
If that's what you mean. In this case, you can't get the output of myFunc xs without evaluating head xs, head $ tail xs and (+). There is an order here. However, the compiler can choose which order to execute head xs and head $ tail xs in since they are not dependent on each other, but it can't do the addition without having both of the other results. It could even choose to evaluate them in parallel, or on different machines. The point is that pure functions, because they have no side effects, don't have to be evaluated in a given order until their results are interdependent.
Another way to look at the above function is as a graph:
myFunc
|
(+)
/ \
/ \
head head
\ |
\ tail
\ /
xs
In order to evaluate a node, all nodes below it have to be evaluated first, but different branches can be evaluated in parallel. First xs must be evaluated, at least partially, but after that the two branches can be evaluated in parallel. There are some nuances due to lazy evaluation, but this is essentially how the compiler constructs evaluation trees.
If you really want to force one function call before the other, you can use the seq function. It takes two arguments, forces the first to be evaluated, then returns the second, e.g.
myFunc2 :: [Int] -> Int
myFunc2 xs = hxs + (hxs `seq` (head $ tail xs))
where hxs = head xs
This will force head xs to evaluate before head $ tail xs, but this is more dealing with strictness than sequencing functions.
Here is an easy way:
case f x of
result1 -> case g y of
result2 -> ....
Still, unless g y uses something from result1 and the subsequent calculations something from result2, or the pattern is such that the result must be evaluated, there is no guarantee that either of f or g are actually called, nor in what order.
Still, you wanted a way to call one function after another, and this is such a way.

Iteratively printing every integer in a List

Say I have a List of integers l = [1,2]
Which I want to print to stdout.
Doing print l produces [1,2]
Say I want to print the list without the braces
map print l produces
No instance for (Show (IO ())) arising from a use of `print'
Possible fix: add an instance declaration for (Show (IO ()))
In a stmt of an interactive GHCi command: print it
`:t print
print :: Show a => a -> IO ()
So while I thought this would work I went ahead and tried:
map putStr $ map show l
Since I suspected a type mismatch from Integer to String was to blame. This produced the same error message as above.
I realize that I could do something like concatenating the list into a string, but I would like to avoid that if possible.
What's going on? How can I do this without constructing a string from the elements of the List?
The problem is that
map :: (a -> b) -> [a] -> [b]
So we end up with [IO ()]. This is a pure value, a list of IO actions. It won't actually print anything. Instead we want
mapM_ :: (a -> IO ()) -> [a] -> IO ()
The naming convention *M means that it operates over monads and *_ means we throw away the value. This is like map except it sequences each action with >> to return an IO action.
As an example mapM_ print [1..10] will print each element on a new line.
Suppose you're given a list xs :: [a] and function f :: Monad m => a -> m b. You want to apply the function f to each element of xs, yielding a list of actions, then sequence these actions. Here is how I would go about constructing a function, call it mapM, that does this. In the base case, xs = [] is the empty list, and we simply return []. In the recursive case, xs has the form x : xs. First, we want to apply f to x, giving the action f x :: m b. Next, we want recursively call mapM on xs. The result of performing the first step is a value, say y; the result of performing the second step is a list of values, say ys. So we collect y and ys into a list, then return them in the monad:
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
mapM f [] = return []
mapM f (x : xs) = f x >>= \y -> mapM f ys >>= \ys -> return (y : ys)
Now we can map a function like print, which returns an action in the IO monad, over a list of values to print: mapM print [1..10] does precisely this for the list of integers from one through ten. There is a problem, however: we aren't particularly concerned about collecting the results of printing operations; we're primarily concerned about their side effects. Instead of returning y : ys, we simply return ().
mapM_ :: Monad m => (a -> m b) ->[a] -> m ()
mapM_ f [] = return ()
mapM_ f (x : xs) = f x >> mapM_ f xs
Note that mapM and mapM_ can be defined without explicit recursion using the sequence and sequence_ functions from the standard library, which do precisely what their names imply. If you look at the source code for mapM and mapM_ in Control.Monad, you will see them implemented that way.
Everything in Haskell is very strongly typed, including code to perform IO!
When you write print [1, 2], this is just a convenience wrapper for putStrLn (show [1, 2]), where show is a function that turns a (Show'able) object into a string. print itself doesn't do anything (in the side effect sense of do), but it outputs an IO() action, which is sort of like a mini unrun "program" (if you excuse the sloppy language), which isn't "run" at its creation time, but which can be passed around for later execution. You can verify the type in ghci
> :t print [1, 2]
print [1, 2]::IO()
This is just an object of type IO ().... You could throw this away right now and nothing would ever happen. More likely, if you use this object in main, the IO code will run, side effects and all.
When you map multiple putStrLn (or print) functions onto a list, you still get an object whose type you can view in ghci
> :t map print [1, 2]
map print [1, 2]::[IO()]
Like before, this is just an object that you can pass around, and by itself it will not do anything. But unlike before, the type is incorrect for usage in main, which expects an IO() object. In order to use it, you need to convert it to this type.
There are many ways to do this conversion.... One way that I like is the sequence function.
sequence $ map print [1, 2]
which takes a list of IO actions (ie- mini "programs" with side effects, if you will forgive the sloppy language), and sequences them together as on IO action. This code alone will now do what you want.
As jozefg pointed out, although sequence works, sequence_ is a better choice here....
Sequence not only concatinates the stuff in the IO action, but also puts the return values in a list.... Since print's return value is IO(), the new return value becomes a useless list of ()'s (in IO). :)
Using the lens library:
[1,2,3] ^! each . act print
You might write your own function, too:
Prelude> let l = [1,2]
Prelude> let f [] = return (); f (x:xs) = do print x; f xs
Prelude> f l
1
2

Meaning of `<-` in do block in Haskell

I am trying to understand Haskell monads, reading "Monads for the Curious Programmer". I've run into the example of List Monad:
tossDie=[1,2,3,4,5,6]
toss2Dice = do
n <- tossDie
m <- tossDie
return (n+m)
main = print toss2Dice
The way do block produces m as 36-element list I sort of understand - it maps every element of n as a list of 6 elements and then concatenates those lists. What I do not understand is how n is changed by the presence of m <- tossDie, from 6 elements list to 36 elements. Obviously "we first bind n and then bind m" is wrong understanding here, but what is right?
I am also not fully clear on how two-argument function is applied in do block. I suspect it's a case of curying, but I am slightly hazy as to how exactly it works.
Can someone please explain the above two mysteries?
For lists (such as tossDie), the do notation acts like a list comprehension -- that is to say, as if each variable binding were a nested foreach loop.
The do-block expression:
toss2Dice = do { n <- tossDie; m <- tossDie; return (n+m) }
does the same thing as this list comprehension:
toss2Dice = [ n+m | n <- tossDie, m <- tossDie ]
The result is comparable to the following imperative pseudocode:
toss2Dice = []
foreach n in tossDie:
foreach m in tossDie:
toss2Dice.push_back(n+m)
except that the Haskell examples generate their results lazily, on demand, instead of eagerly and all at once.
If you look at the monad instance for lists, you can see how this works:
instance Monad [] where
xs >>= f = concat (map f xs)
return x = [x]
Starting at the beginning of the do block, each variable binding creates a loop over the remainder of the block:
do { n <- tossDie; m <- tossDie; return (n+m) }
===> tossDie >>= \n -> do { m <- tossDie; return (n+m) }
===> concat ( map (\n -> do { m <- tossDie; return (n+m) }) tossDie )
Note that map function iterates over the items in the list tossDie, and the results are concatenated. The mapping function is the remainder of the do block, so the first binding effectively creates an outer loop around it.
Additional bindings create successively nested loops; and finally, the return function creates a singleton list out of each computed value (n+m) so that the "bind" function >>= (which expects lists) can concatenate them properly.
The interesting bit I assume is this:
toss2Dice = do
n <- tossDie
m <- tossDie
return (n+m)
This is somewhat equivalent to the following Python:
def toss2dice():
for n in tossDie:
for m in tossDie:
yield (n+m)
When it comes to the list monad, you can view the binding arrows (<-) in do notation as traditional imperative "foreach" loops. Everything after
n <- tossDie
belongs to the "loop body" of that foreach loop, and so will be evaluated once for every value in tossDie assigned to n.
If you want the desugaring from do notation to actual bind operators >>=, it looks like this:
toss2Dice =
tossDie >>= (\n ->
tossDie >>= (\m ->
return (n+m)
)
)
Notice how the "inner loop body"
(\n ->
tossDie >>= (\m ->
return (n+m)
)
)
Will get executed once for every value in tossDie. This is pretty much the equivalent to the nested Python loops.
Technical mumbo-jumbo: The reason you get "foreach" loops from the binding arrows has to do with the particular monad you are working with. The arrows mean different things for different monads, and to know what they mean for a particular monad you have to do some sleuthing and figure out how that monad works in general.
The arrows get desugared into calls to the bind operator, >>=, which works differently for different monads as well – this is the reason the bind arrows <- also work differently for different monads!
In the case of the list monad, the bind operator >>= takes a list to the left and a function returning a list to the right, and sort of applies that function to every element of the list. If we want to double every element in a list in a cumbersome way, we could imagine doing it as
λ> [1, 2, 3, 4] >>= \n -> return (n*2)
[2,4,6,8]
(return is required to make the types work out. >>= expects a function that returns a list, and return will, for the list monad, wrap a value in a list.) To illustrate a perhaps more powerful example, we can start by imagining the function
λ> let posneg n = [n, -n]
λ> posneg 5
[5,-5]
Then we can write
λ> [1, 2, 3, 4] >>= posneg
[1,-1,2,-2,3,-3,4,-4]
to count the natural numbers between -4 and 4.
The reason the list monad works this way is that this particular behaviour of the bind operator >>= and return makes the monad laws hold. The monad laws are important for us (and perhaps the adventurous compiler) because they let us change code in ways that we know will not break anything.
A very cute side effect of this is that it makes lists very handy to represent uncertainty in values: Say you are building an OCR thingey that's supposed to look at an image and turn it into text. You might encounter a character that could be either 4 or A or H, but you are not sure. By letting the OCR thingey work in the list monad and return the list ['A', '4', 'H'] you have covered your bases. Actually working with the scanned text then become very easy and readable with do notation for the list monad. (It sorta looks like you're working with single values, when in fact you are just generating all possible combinations!)
Adding to #kqr answer:
>>= for list monad is actually concatMap, a function that maps elements to lists of elements and concatenates the lists, but with arguments flipped:
concatMap' x f = concat (map f x)
or alternatively
concatMap' = flip concatMap
return is just
singleElementList x = [x]
Now we can replace >>= with concatMap' and singleElementList:
toss2Dice =
concatMap' tossDie (\n ->
concatMap' tossDie (\m ->
singleElementList (n+m)
)
)
Now we can replace the 2 functions with their bodies:
toss2Dice =
concat (map (\n ->
concat (map (\m ->
[n+m]
) tossDice)
) tossDice)
Remove extra line breaks:
toss2Dice = concat (map (\n -> concat (map (\m -> [n+m]) tossDice)) tossDice)
Or shorter with concatMap:
toss2Dice = concatMap (\n -> concatMap (\m -> [n+m]) tossDice) tossDice
following nponeccop's advice, with
for = flip concatMap
your code becomes
toss2Dice =
for {- n in -} tossDie {- call -}
(\n-> for {- m in -} tossDie {- call -}
(\m-> [n+m]))
where it is plainly visible that we have nested functions, one inside another; so the inner function (\m-> [n+m]), being in the scope of outer function's argument n, has access to it (to the argument n that is). So it uses the value of the argument to the outer function, which is the same on each invocation of the inner function, however many times it is invoked during the same invocation of the outer function.
This can be re-written with named functions,
toss2Dice =
for {- each elem in -} tossDie {- call -} g
where g n = for {- each elem in -} tossDie {- call -} h
where h m = [n+m]
The function h is defined inside g, i.e. in g's argument's scope. And that's how h gets to use both m and n even though only m is its argument.
So in fact we do indeed "first bind n and then bind m" here. In nested fashion, that is.

Resources