Is there a lazy Session IO Monad? - haskell

You have a sequence of actions that prefer to be executed in chunks due to some high-fixed overhead like packet headers or making connections. The limit is that sometimes the next action depends on the result of a previous one in which case, all pending actions are executed at once.
Example:
mySession :: Session IO ()
a <- readit -- nothing happens yet
b <- readit -- nothing happens yet
c <- readit -- nothing happens yet
if a -- all three readits execute because we need a
then write "a"
else write "..."
if b || c -- b and c already available
...
This reminds me of so many Haskell concepts but I can't put my finger on it.
Of course, you could do something obvious like:
[a,b,c] <- batch([readit, readit, readit])
But I'd like to hide the fact of chunking from the user for slickness purposes.
Not sure if Session is the right word. Maybe you can suggest a better one? (Packet, Batch, Chunk and Deferred come to mind.)
Update
I think there was a really good answer last night that I read on my phone but when I came back to look for it today it was gone. Was I dreaming?

I don't think you can do exactly what you want, since what you describe exploits haskell's lazy evaluation to have the evaluation of a force the actions that compute b and c, and there's no way to seq on unspecified values.
What I could do was hack together a monad transformer that delayed actions sequenced via >> so that they could be executed all together:
data Session m a = Session { pending :: [ m () ], final :: m a }
runSession :: Monad m => Session m a -> m a
runSession (Session ms ma) = foldr (flip (>>)) (return ()) ms >> ma
instance Monad m => Monad (Session m) where
return = Session [] . return
s >>= f = Session [] $ runSession s >>= (runSession . f)
(Session ms ma) >> (Session ms' ma') =
Session (ms' ++ (ma >> return ()) : ms) ma'
This violates some monad laws, but lets you do something like:
liftIO :: IO a -> Session IO a
liftIO = Session []
exampleSession :: Session IO Int
exampleSession = do
liftIO $ putStrLn "one"
liftIO $ putStrLn "two"
liftIO $ putStrLn "three"
liftIO $ putStrLn "four"
trace "five" $ return 5
and get
ghci> runSession exampleSession
five
one
two
three
four
5
ghci> length (pending exampleSession)
4

This is very similar to what Haxl does.
For more info:
Open sourcing haxl - Facebook Code Blog
ICFP 2014 talk

You could use the unsafeInterleaveIO function. It is a dangerous function that can introduce bugs to your program if not used carefully, but it does what you're asking for.
You can insert it into your example code like this:
lazyReadits :: IO [a]
lazyReadits = unsafeInterleaveIO $ do
a <- readit
r <- lazyReadits
return (a:r)
unsafeInterleaveIO makes the action as a whole lazy, but once it starts evaluating it will evaluate as if it had been strict. This means in my above example: readit will run as soon as something tests whether the returned list is empty or not. If I'd used mapM unsafeInterleaveIO (replicate 3 readit) instead, then readit would only be run when the actual elements of the list are evaluated, which would make the contents of the list depend on the order in which its elements are inspected, which is one example of how unsafeInterleaveIO can introduce bugs.

Related

Lazy evaluation of IO actions

I'm trying to write code in source -> transform -> sink style, for example:
let (|>) = flip ($)
repeat 1 |> take 5 |> sum |> print
But would like to do that using IO. I have this impression that my source can be an infinite list of IO actions, and each one gets evaluated once it is needed downstream. Something like this:
-- prints the number of lines entered before "quit" is entered
[getLine..] >>= takeWhile (/= "quit") >>= length >>= print
I think this is possible with the streaming libraries, but can it be done along the lines of what I'm proposing?
Using the repeatM, takeWhile and length_ functions from the streaming library:
import Streaming
import qualified Streaming.Prelude as S
count :: IO ()
count = do r <- S.length_ . S.takeWhile (/= "quit") . S.repeatM $ getLine
print r
This seems to be in that spirit:
let (|>) = flip ($)
let (.>) = flip (.)
getContents >>= lines .> takeWhile (/= "quit") .> length .> print
The issue here is that Monad is not the right abstraction for this, and attempting to do something like this results in a situation where referential transparency is broken.
Firstly, we can do a lazy IO read like so:
module Main where
import System.IO.Unsafe (unsafePerformIO)
import Control.Monad(forM_)
lazyIOSequence :: [IO a] -> IO [a]
lazyIOSequence = pure . go where
go :: [IO a] -> [a]
go (l:ls) = (unsafePerformIO l):(go ls)
main :: IO ()
main = do
l <- lazyIOSequence (repeat getLine)
forM_ l putStrLn
This when run will perform cat. It will read lines and output them. Everything works fine.
But consider changing the main function to this:
main :: IO ()
main = do
l <- lazyIOSequence (map (putStrLn . show) [1..])
putStrLn "Hello World"
This outputs Hello World only, as we didn't need to evaluate any of l. But now consider replacing the last line like the following:
main :: IO ()
main = do
x <- lazyIOSequence (map (putStrLn . show) [1..])
seq (head x) putStrLn "Hello World"
Same program, but the output is now:
1
Hello World
This is bad, we've changed the results of a program just by evaluating a value. This is not supposed to happen in Haskell, when you evaluate something it should just evaluate it, not change the outside world.
So if you restrict your IO actions to something like reading from a file nothing else is reading from, then you might be able to sensibly lazily evaluate things, because when you read from it in relation to all the other IO actions your program is taking doesn't matter. But you don't want to allow this for IO in general, because skipping actions or performing them in a different order can matter (and above, certainly does). Even in the reading a file lazily case, if something else in your program writes to the file, then whether you evaluate that list before or after the write action will affect the output of your program, which again, breaks referential transparency (because evaluation order shouldn't matter).
So for a restricted subset of IO actions, you can sensibly define Functor, Applicative and Monad on a stream type to work in a lazy way, but doing so in the IO Monad in general is a minefield and often just plain incorrect. Instead you want a specialised streaming type, and indeed Conduit defines Functor, Applicative and Monad on a lot of it's types so you can still use all your favourite functions.

IO in Haskell not visible

I have a little interactive game program. The interactive part looks like this,
main :: IO ()
main = playGame newGame
where
playGame :: Game -> IO ()
playGame game =
do putStr $ show game
putStr $ if gameOver game then "Another game? (y or n) > "
else show (whoseMove game) ++ " to play (row col) > "
moveWords <- fmap (words . fmap cleanChar ) getLine
if stopGame game moveWords
then return ()
else playGame $ if gameOver game then newGame else makeMove game moveWords
This works fine. It displays the game state, asks for the next move, applies that move to the state, displays the new state, etc.
Then I saw a video by Moss Collum in which he showed a game that uses the following strategy for interacting with the user.
...
userInput <- getContents
foldM_ updateScreen (12, 40) (parseInput userInput) where
...
I couldn't find a reference to foldM_ but assuming it was some sort of fold I tried this. (I actually tried a number of things, but this seems clearest.)
main' :: IO ()
main' = do
moveList <- fmap (map words . lines . map cleanChar) getContents
let states = scanl makeMove newGame moveList
foldl (\_ state -> putStr . show $ state) (return ()) states
When I run it, I never get the game state printed out until after hitting end-of-file, at which point the correct final game state is printed. Before that, I can enter moves, and they are processed properly (according to the final game state), but I never see the intermediate states. (The idea is that lazy evaluation should print the game states as they become available.)
I'd appreciate help understanding why I don't see the intermediate states and what, if anything, I can do to fix it.
Also, after entering end-of-file (^Z on Windows) the program refuses to play again, saying that the handle has been closed. To play again I have to restart the program. Is there a way to fix that?
First, let's start with foldM:
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
a is the type of things in our container, and b is our result type. It takes a b (the value we've accumulated so far in our fold, ana(then ext thing in the list, and returns an m b, meaning it returns a b and does some monadic actions in the monad m.
Then it takes an initial value, a container of a's, and it returns an
This is basically like a normal fold function, but each step of the fold is monadic, and the final result is monadic. So the actions will actually be performed if you use foldM.
Now let's look at foldM_:
foldM_ :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m ()
Source
This is the same, but the final result is m (). This is for the case where we don't care about the final result. We keep the intermediate results and use them to perform monadic actions at each step, but when we finish, we only care about what we did in the monad, so we throw away the result.
In your case, t would be List, and m would be IO. So foldM_ iterates through the list, doing the selected IO actions at each stage, then throws away the final result.
fold doesn't actually sequence the actions together, it just folds over your list normally, even if your final result is IO. So your foldl creates an IO action, namely putStr . show $ state, then passes it to the next step of the fold. But, you ignore the first argument of your fold, so it throws the IO away without ever doing anything with it!
This is a tricky thing in Haskell. A value of type IO Something doesn't actually do the action when you create it. It just creates an IO value, which is just an instruction to the runtime of what to do when it runs main. If you throw it away, and it never gets sequenced into an IO that your main performs, then the side-effect will never happen.

How do I "continue" in a `Monad` loop?

Often times I found myself in need of skipping the rest of the iteration (like continue in C) in Haskell:
forM_ [1..100] $ \ i ->
a <- doSomeIO
when (not $ isValid1 a) <skip_rest_of_the_iteration>
b <- doSomeOtherIO a
when (not $ isValid2 b) <skip_rest_of_the_iteration>
...
However, I failed to find an easy way to do so. The only way I am aware of is probably the Trans.Maybe, but is it necessary to use a monad transform to achieve something so trivial?
Remember that loops like this in Haskell are not magic...they're just normal first-class things that you can write yourself.
For what it's worth, I don't think it's too useful to think of MaybeT as a Monad transformer. To me, MaybeT is just a newtype wrapper to give an alternative implementation of (>>=)...just like how you use Product, Sum, First, And, etc. to give alternative implementations of mappend and mempty.
Right now, (>>=) for you is IO a -> (a -> IO b) -> IO b. But it'd be more useful to have (>>=) here be IO (Maybe a) -> (a -> IO (Maybe b) -> IO (Maybe b). As soon as you get to the first action that returns a Nothing, it's really impossible to "bind" any further. That's exactly what MaybeT gives you. You also get a "custom instance" of guard, guard :: Bool -> IO (Maybe a), instead of guard :: IO a.
forM_ [1..100] $ \i -> runMaybeT $ do
a <- lift doSomeIO
guard (isValid1 a)
b <- lift $ doSomeOtherIO a
guard (isValid2 b)
...
and that's it :)
MaybeT is not magic either, and you can achieve basically the same effect by using nested whens. It's not necessary, it just makes things a lot simpler and cleaner :)
Here's how you would do it using bare-bones recursion:
loop [] = return () -- done with the loop
loop (x:xs) =
do a <- doSomeIO
if ...a...
then return () -- exit the loop
else do -- continuing with the loop
b <- doSomeMoreIO
if ...b...
then return () -- exit the loop
else do -- continuing with the loop
...
loop xs -- perform the next iteration
and then invoke it with:
loop [1..100]
You can tidy this up a bit with the when function from Control.Monad:
loop [] = return ()
loop (x:xs) =
do a <- doSomeIO
when (not ...a...) $ do
b <- doSomeMoreIO
when (not ...b...) $ do
...
loop xs
There is also unless in Control.Monad which you might prefer to use.
Using #Ørjan Johansen 's helpful advice, here is an simple example:
import Control.Monad
loop [] = return ()
loop (x:xs) = do
putStrLn $ "x = " ++ show x
a <- getLine
when (a /= "stop") $ do
b <- getLine
when (b /= "stop") $ do
print $ "iteration: " ++ show x ++ ": a = " ++ a ++ " b = " ++ b
loop xs
main = loop [1..3]
If you want to loop over a list or other container to perform actions and/or produce a summary value, and you're finding the usual convenience tools like for_ and foldM aren't good enough for the job, you might want to consider foldr, which is plenty strong enough for the job. When you're not really looping over a container, you can use plain old recursion or pull in something like https://hackage.haskell.org/package/loops or (for a very different flavor) https://hackage.haskell.org/package/machines or perhaps https://hackage.haskell.org/package/pipes.

What's the meaning of IO actions within pure functions?

I thought that in principle Haskell's type system would forbid calls to impure functions (i.e. f :: a -> IO b) from pure ones, but today I realized that by calling them with return they compile just fine. In this example:
h :: Maybe ()
h = do
return $ putStrLn "???"
return ()
h works in the Maybe monad, but it's a pure function nevertheless. Compiling and running it simply returns Just () as one would expect, without actually doing any I/O. I think Haskell's laziness puts the things together (i.e. putStrLn's return value is not used - and can't since its value constructors are hidden and I can't pattern match against it), but why is this code legal? Are there any other reasons that makes this allowed?
As a bonus, related question: in general, is it possible to forbid at all the execution of actions of a monad from within other ones, and how?
IO actions are first-class values like any other; that's what makes Haskell's IO so expressive, allowing you to build higher-order control structures (like mapM_) from scratch. Laziness isn't relevant here,1 it's just that you're not actually executing the action. You're just constructing the value Just (putStrLn "???"), then throwing it away.
putStrLn "???" existing doesn't cause a line to be printed to the screen. By itself, putStrLn "???" is just a description of some IO that could be done to cause a line to be printed to the screen. The only execution that happens is executing main, which you constructed from other IO actions, or whatever actions you type into GHCi. For more information, see the introduction to IO.
Indeed, it's perfectly conceivable that you might want to juggle about IO actions inside Maybe; imagine a function String -> Maybe (IO ()), which checks the string for validity, and if it's valid, returns an IO action to print some information derived from the string. This is possible precisely because of Haskell's first-class IO actions.
But a monad has no ability to execute the actions of another monad unless you give it that ability.
1 Indeed, h = putStrLn "???" `seq` return () doesn't cause any IO to be performed either, even though it forces the evaluation of putStrLn "???".
Let's desugar!
h = do return (putStrLn "???"); return ()
-- rewrite (do foo; bar) as (foo >> do bar)
h = return (putStrLn "???") >> do return ()
-- redundant do
h = return (putStrLn "???") >> return ()
-- return for Maybe = Just
h = Just (putStrLn "???") >> Just ()
-- replace (foo >> bar) with its definition, (foo >>= (\_ -> bar))
h = Just (putStrLn "???") >>= (\_ -> Just ())
Now, what happens when you evaluate h?* Well, for Maybe,
(Just x) >>= f = f x
Nothing >>= f = Nothing
So we pattern match the first case
f x
-- x = (putStrLn "???"), f = (\_ -> Just ())
(\_ -> Just ()) (putStrLn "???")
-- apply the argument and ignore it
Just ()
Notice how we never had to perform putStrLn "???" in order to evaluate this expression.
*n.b. It is somewhat unclear at which point "desugaring" stops and "evaluation" begins. It depends on your compiler's inlining decisions. Pure computations could be evaluated entirely at compile time.

Strict evaluation techniques for concurrent channels in Haskell

I'm toying with Haskell threads, and I'm running into the problem of communicating lazily-evaluated values across a channel. For example, with N worker threads and 1 output thread, the workers communicate unevaluated work and the output thread ends up doing the work for them.
I've read about this problem in various documentation and seen various solutions, but I only found one solution that works and the rest do not. Below is some code in which worker threads start some computation that can take a long time. I start the threads in descending order, so that the first thread should take the longest, and the later threads should finish earlier.
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan -- .Strict
import Control.Concurrent.MVar
import Control.Exception (finally, evaluate)
import Control.Monad (forM_)
import Control.Parallel.Strategies (using, rdeepseq)
main = (>>=) newChan $ (>>=) (newMVar []) . run
run :: Chan (Maybe String) -> MVar [MVar ()] -> IO ()
run logCh statVars = do
logV <- spawn1 readWriteLoop
say "START"
forM_ [18,17..10] $ spawn . busyWork
await
writeChan logCh Nothing -- poison the logger
takeMVar logV
putStrLn "DONE"
where
say mesg = force mesg >>= writeChan logCh . Just
force s = mapM evaluate s -- works
-- force s = return $ s `using` rdeepseq -- no difference
-- force s = return s -- no-op; try this with strict channel
busyWork = say . show . sum . filter odd . enumFromTo 2 . embiggen
embiggen i = i*i*i*i*i
readWriteLoop = readChan logCh >>= writeReadLoop
writeReadLoop Nothing = return ()
writeReadLoop (Just mesg) = putStrLn mesg >> readWriteLoop
spawn1 action = do
v <- newEmptyMVar
forkIO $ action `finally` putMVar v ()
return v
spawn action = do
v <- spawn1 action
modifyMVar statVars $ \vs -> return (v:vs, ())
await = do
vs <- modifyMVar statVars $ \vs -> return ([], vs)
mapM_ takeMVar vs
Using most techniques, the results are reported in the order spawned; that is, the longest-running computation first. I interpret this to mean that the output thread is doing all the work:
-- results in order spawned (longest-running first = broken)
START
892616806655
503999185040
274877906943
144162977343
72313663743
34464808608
15479341055
6484436675
2499999999
DONE
I thought the answer to this would be strict channels, but they didn't work. I understand that WHNF for strings is insufficient because that would just force the outermost constructor (nil or cons for the first character of the string). The rdeepseq is supposed to fully evaluate, but it makes no difference. The only thing I've found that works is to map Control.Exception.evaluate :: a -> IO a over all the characters in the string. (See the force function comments in the code for several different alternatives.) Here's the result with Control.Exception.evaluate:
-- results in order finished (shortest-running first = correct)
START
2499999999
6484436675
15479341055
34464808608
72313663743
144162977343
274877906943
503999185040
892616806655
DONE
So why don't strict channels or rdeepseq produce this result? Are there other techniques? Am I misinterpreting why the first result is broken?
There are two issues going on here.
The reason the first attempt (using an explicit rnf) doesn't work is that, by using return, you've created a thunk that fully evaluates itself when it is evaluated, but the thunk itself has not being evaluated. Notice that the type of evaluate is a -> IO a: the fact that it returns a value in IO means that evaluate can impose ordering:
return (error "foo") >> return 1 == return 1
evaluate (error "foo") >> return 1 == error "foo"
The upshot is that this code:
force s = evaluate $ s `using` rdeepseq
will work (as in, have the same behavior as mapM_ evaluate s).
The case of using strict channels is a little trickier, but I believe this is due to a bug in strict-concurrency. The expensive computation is actually being run on the worker threads, but it's not doing you much good (you can check for this explicitly by hiding some asynchronous exceptions in your strings and seeing which thread the exception surfaces on).
What's the bug? Let's take a look at the code for strict writeChan:
writeChan :: NFData a => Chan a -> a -> IO ()
writeChan (Chan _read write) val = do
new_hole <- newEmptyMVar
modifyMVar_ write $ \old_hole -> do
putMVar old_hole $! ChItem val new_hole
return new_hole
We see that modifyMVar_ is called on write before we evaluate the thunk. The sequence of operations then is:
writeChan is entered
We takeMVar write (blocking anyone else who wants to write to the channel)
We evaluate the expensive thunk
We put the expensive thunk onto the channel
We putMVar write, unblocking all of the other threads
You don't see this behavior with the evaluate variants, because they perform the evaluation before the lock is acquired.
I’ll send Don mail about this and see if he agrees that this behavior is kind of suboptimal.
Don agrees that this behavior is suboptimal. We're working on a patch.

Resources