Haskell: How lazy is the lazy `Control.Monad.ST.Lazy` monad? - haskell

I have been experimenting with the strict and lazy ST monads, and I do not understand clearly the degree of laziness of each.
For example, using the lazy Control.Monad.State.Lazy monad we can write:
main = print $ (flip evalState) "a" $ do
forever $ put "b"
put "c"
get
This works fine and outputs "c". Dually, the same code for the strict Control.Monad.State.Strict variant will run put "b" forever, and hang.
Intuitively, I would expect the same duality to hold for the ST monads. That is, given the code:
main = print $ S.runST $ do
r <- newSTRef "a"
forever $ writeSTRef r "b"
writeSTRef r "c"
readSTRef r
Control.Monad.ST.Lazy should output "c", while Control.Monad.ST.Strict should hang. However, they both loop indefinitely. I believe that there is a valid reason for this, like: reading backwards, the reference r is not yet allocated at the time that the last writeSTRef is called. But it somehow feels like we could do better.

How lazy is the lazy Control.Monad.ST.Lazy monad?
Surprisingly, it is perfectly lazy. But Data.STRef.Lazy isn't.
ST.Lazy is lazy
Lets focus on another example for a second:
import qualified Control.Monad.ST as S
import qualified Control.Monad.ST.Lazy as L
squared :: Monad m => m [Integer]
squared = mapM (return . (^2)) [1..]
ok, oops :: [Integer]
ok = L.runST squared
oops = S.runST squared
Even though ok and oops should do the same, we can get only the elements of ok. If we would try to use head oops, we would fail. However, concerning ok, we can take arbitrarily many elements.
Or, to compare them to the non-monadic squared list, they behave like:
ok, oops :: [Integer]
ok' = map (^2) [1..]
oops' = let x = map (^2) [1..] in force x -- Control.DeepSeq.force
That's because the strict version evaluates all state operations, even though they're not required for our result. On the other hand, the lazy version delays the operations:
This module presents an identical interface to Control.Monad.ST, except that the monad delays evaluation of state operations until a value depending on them is required.
What about readSTRef?
Now lets focus again on your example. Note that we can get an infinite loop with even simpler code:
main = print $ L.runST $ do
forever $ return ()
r <- newSTRef "a"
readSTRef r
If we add an additional return at the end …
main = print $ L.runST $ do
forever $ return ()
r <- newSTRef "a"
readSTRef r
return "a"
… everything is fine. So apparently there's something strict in newSTRef or readSTRef. Lets have a look at their implementation:
import qualified Data.STRef as ST
newSTRef = strictToLazyST . ST.newSTRef
readSTRef = strictToLazyST . ST.readSTRef
writeSTRef r a = strictToLazyST (ST.writeSTRef r a)
And there's the culprit. Data.STRef.Lazy is actually implemented via Data.STRef, which is meant for Control.Monad.ST.Strict. strictToLazyST only hides this detail:
strictToLazyST :: ST.ST s a -> ST s a
strictToLazyST m = ST $ \s ->
Convert a strict ST computation into a lazy one. The strict state thread passed to strictToLazyST is not performed until the result of the lazy state thread it returns is demanded.
Now lets put things together:
in main, we want to print the value given by the lazy ST computation
the lazy ST computation's value is given by a lazy readSTRef
the lazy readSTRef is actually implemented as a lazy wrapper around the strict readSTRef
the strict readSTRef evaluates the state as if it was a strict one
the strict evaluation of forever $ return () bites us
So the current ST.Lazy is lazy enough. It's the Data.STRef.Lazy that's too strict. As long as Data.STRef.Lazy is based on strictToLazyST, this behavior will endure.

Related

How to avoid the IO monad when solving arithmetic problems in SBV

I am trying to solve arithmetic problems with SBV.
For example
solution :: SymbolicT IO ()
solution = do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
Main> s1 = sat solution
Main> s2 = isSatisfiable solution
Main> s1
Satisfiable. Model:
x = -1.2030502e-17 :: Float
z = -2.2888208e-37 :: Float
Main> :t s1
s1 :: IO SatResult
Main> s2
True
Main> :t s2
s2 :: IO Bool
While I can do useful things, it is easier for me to work with the pure value (SatResult or Bool) and not with the IO monad.
According to the documentation
sat :: Provable a => a -> IO SatResult
constrain :: SolverContext m => SBool -> m ()
sFloats :: [String] -> Symbolic [SFloat]
type Symbolic = SymbolicT IO
Given the type of functions I use, I understand why I always get to the IO monad.
But looking in the generalized versions of the functions for example sFloats.
sFloats :: MonadSymbolic m => [String] -> m [SFloat]
Depending on type of the function, I can work with a different monad than IO. This gives me hope that we will reach a more useful monad, the Identity monad for example.
Unfortunately looking at the examples always solves the problems within the IO monad, so I couldn't find any examples that would work for me.Besides that I don't have much experience working with monads.
Finally My question is:
Is there any way to avoid the IO monad when solving such a problem with SBV?
Thanks in advance
SBV calls out to the SMT solver of your choice (most likely z3, but others are available too), and presents the results back to you. This means that it performs IO under the hood, and thus you cannot be outside the IO monad. You can create custom monads using MonadSymbolic, but that will not get you out of the IO monad: Since the call to the SMT solver does IO you'll always be in IO.
(And I'd strongly caution against uses of unsafePerformIO as suggested in one of the comments. This is really a bad idea; and you can find lots more information on this elsewhere why you shouldn't do so.)
Note that this is no different than any other IO based computation in Haskell: You perform the IO "in-the-wrapper," but once you get your results, you can do whatever you'd like to do with them in a "pure" environment.
Here's a simple example:
import Data.SBV
import Data.SBV.Control
example :: IO ()
example = runSMT $ do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
query $ do cs <- checkSat
case cs of
Unsat -> io $ putStrLn "Unsatisfiable"
Sat -> do xv <- getValue x
yv <- getValue y
let result = use xv yv
io $ putStrLn $ "Result: " ++ show result
_ -> error $ "Solver said: " ++ show cs
-- Use the results from the solver, in a purely functional way
use :: Float -> Float -> Float
use x y = x + y
Now you can say:
*Main> example
Result: -Infinity
The function example has type IO (), because it does involve calling out to the solver and getting the results. However, once you extract those results (via calls to getValue), you can pass them to the function use which has a very simple purely functional type. So, you keep the "wrapper" in the monad, but actual processing, use-of-the values, etc., remain in the pure world.
Alternatively, you can also extract the values and continue from there:
import Data.SBV
import Data.SBV.Control
example :: IO (Maybe (Float, Float))
example = runSMT $ do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
query $ do cs <- checkSat
case cs of
Unsat -> pure Nothing
Sat -> do xv <- getValue x
yv <- getValue y
pure $ Just (xv, yv)
_ -> error $ "Solver said: " ++ show cs
Now you can say:
*Main> Just (a, b) <- example
*Main> a
-Infinity
*Main> b
4.0302105e-21
Long story short: Don't avoid the IO monad. It's there for a very good reason. Get into it, get your results out, and then the rest of your program can remain purely functional, or whatever other monad you might find yourself in.
Note that none of this is really SBV specific. This is the usual Haskell paradigm of how to use functions with side-effects. (For instance, anytime you use readFile to read the contents of a file to process it further.) Do not try to "get rid of the IO." Instead, simply work with it.
Depending on type of the function, I can work with a different monad than IO.
Not meaningfully different, in the sense you'd hope. Every instance of this class is going to be some transformed version of IO. Sorry!
Time to make a plan that involves understanding and working with IO.

Lazy evaluation of IO actions

I'm trying to write code in source -> transform -> sink style, for example:
let (|>) = flip ($)
repeat 1 |> take 5 |> sum |> print
But would like to do that using IO. I have this impression that my source can be an infinite list of IO actions, and each one gets evaluated once it is needed downstream. Something like this:
-- prints the number of lines entered before "quit" is entered
[getLine..] >>= takeWhile (/= "quit") >>= length >>= print
I think this is possible with the streaming libraries, but can it be done along the lines of what I'm proposing?
Using the repeatM, takeWhile and length_ functions from the streaming library:
import Streaming
import qualified Streaming.Prelude as S
count :: IO ()
count = do r <- S.length_ . S.takeWhile (/= "quit") . S.repeatM $ getLine
print r
This seems to be in that spirit:
let (|>) = flip ($)
let (.>) = flip (.)
getContents >>= lines .> takeWhile (/= "quit") .> length .> print
The issue here is that Monad is not the right abstraction for this, and attempting to do something like this results in a situation where referential transparency is broken.
Firstly, we can do a lazy IO read like so:
module Main where
import System.IO.Unsafe (unsafePerformIO)
import Control.Monad(forM_)
lazyIOSequence :: [IO a] -> IO [a]
lazyIOSequence = pure . go where
go :: [IO a] -> [a]
go (l:ls) = (unsafePerformIO l):(go ls)
main :: IO ()
main = do
l <- lazyIOSequence (repeat getLine)
forM_ l putStrLn
This when run will perform cat. It will read lines and output them. Everything works fine.
But consider changing the main function to this:
main :: IO ()
main = do
l <- lazyIOSequence (map (putStrLn . show) [1..])
putStrLn "Hello World"
This outputs Hello World only, as we didn't need to evaluate any of l. But now consider replacing the last line like the following:
main :: IO ()
main = do
x <- lazyIOSequence (map (putStrLn . show) [1..])
seq (head x) putStrLn "Hello World"
Same program, but the output is now:
1
Hello World
This is bad, we've changed the results of a program just by evaluating a value. This is not supposed to happen in Haskell, when you evaluate something it should just evaluate it, not change the outside world.
So if you restrict your IO actions to something like reading from a file nothing else is reading from, then you might be able to sensibly lazily evaluate things, because when you read from it in relation to all the other IO actions your program is taking doesn't matter. But you don't want to allow this for IO in general, because skipping actions or performing them in a different order can matter (and above, certainly does). Even in the reading a file lazily case, if something else in your program writes to the file, then whether you evaluate that list before or after the write action will affect the output of your program, which again, breaks referential transparency (because evaluation order shouldn't matter).
So for a restricted subset of IO actions, you can sensibly define Functor, Applicative and Monad on a stream type to work in a lazy way, but doing so in the IO Monad in general is a minefield and often just plain incorrect. Instead you want a specialised streaming type, and indeed Conduit defines Functor, Applicative and Monad on a lot of it's types so you can still use all your favourite functions.

Is evaluate or $! sufficient to WHNF-force a value in a multithreaded monadic context, or do I need pseq?

The following seems to work (as in: it keeps saying Surely tomorrow every second)
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Exception (evaluate)
main :: IO ()
main = do
godot <- newEmptyMVar
forkIO $ do
g <- evaluate $ last [0..]
putMVar godot g
let loop = do
threadDelay $ 10^6
g <- tryTakeMVar godot
case g of
Just g -> return ()
Nothing -> putStrLn "Surely tomorrow." >> loop
loop
This uses evaluate to ensure last [0..] is actually forced to WHFN before filling the MVar – if I change the forked thread to
forkIO $ do
let g = last [0..]
putMVar godot g
then the program terminates.
However, evaluate uses seq. In the context of deterministic parallelism, it's always emphasized that seq is not sufficient to actually guarantee evaluation order. Does this problem not arise in a monadic context, or should I better use
forkIO $ do
let g = last [0..]
g `pseq` putMVar godot g
to ensure the compiler can't reorder the evaluation so tryTakeMVar succeeds prematurely?
The point of pseq is to ensure that after the parent thread sparks a computation with par, it does not immediately proceed to try to evaluate the result of the sparked computation itself, but instead does its own job first. See the documentation for an example. When you're working more explicitly with concurrency, you shouldn't need pseq.
If I'm not totally wrong, evaluating last [0..] to WHNF would take an infinite amount of time, because WHNF for an Int means that you know the exact number.
putMVar will not start executing before last [0..] is evaluated to WHNF (which as we know takes forever), because putMVar will need the RealWorld-token (s) returned by the call to evaluate. (Or to put it more simply: evaluate works. It finishes only after evaluating its argument to WHNF.)
evaluate :: a -> IO a
evaluate a = IO $ \s -> seq# a s
-- this ^
putMVar (MVar mvar#) x = IO $ \ s# ->
-- which is used here ^^
case putMVar# mvar# x s# of
-- is needed here ^^
s2# -> (# s2#, () #)
where seq# is a GHC-prim function that guarantees to return (# a, s #) only after evaluating a to WHNF (that's its purpose). That is, only after a is evaluated to WHNF, s can be used in the call to putMVar. Although these tokens are purely imaginative ("RealWorld is deeply magical..."), they are respected by the compiler, and the whole IO-monad is built on top of it.
So yes, evaluate is enough in this case. evaluate is more than seq: it combines IO-monadic sequencing with seq#-sequencing to produce its effect.
In fact, the pseq version looks a bit fishy to me, because it ultimately depends on lazy, where evaluate ultimately depends on seq# and monadic token-passing. And I trust seq# a bit more.

Is there a lazy Session IO Monad?

You have a sequence of actions that prefer to be executed in chunks due to some high-fixed overhead like packet headers or making connections. The limit is that sometimes the next action depends on the result of a previous one in which case, all pending actions are executed at once.
Example:
mySession :: Session IO ()
a <- readit -- nothing happens yet
b <- readit -- nothing happens yet
c <- readit -- nothing happens yet
if a -- all three readits execute because we need a
then write "a"
else write "..."
if b || c -- b and c already available
...
This reminds me of so many Haskell concepts but I can't put my finger on it.
Of course, you could do something obvious like:
[a,b,c] <- batch([readit, readit, readit])
But I'd like to hide the fact of chunking from the user for slickness purposes.
Not sure if Session is the right word. Maybe you can suggest a better one? (Packet, Batch, Chunk and Deferred come to mind.)
Update
I think there was a really good answer last night that I read on my phone but when I came back to look for it today it was gone. Was I dreaming?
I don't think you can do exactly what you want, since what you describe exploits haskell's lazy evaluation to have the evaluation of a force the actions that compute b and c, and there's no way to seq on unspecified values.
What I could do was hack together a monad transformer that delayed actions sequenced via >> so that they could be executed all together:
data Session m a = Session { pending :: [ m () ], final :: m a }
runSession :: Monad m => Session m a -> m a
runSession (Session ms ma) = foldr (flip (>>)) (return ()) ms >> ma
instance Monad m => Monad (Session m) where
return = Session [] . return
s >>= f = Session [] $ runSession s >>= (runSession . f)
(Session ms ma) >> (Session ms' ma') =
Session (ms' ++ (ma >> return ()) : ms) ma'
This violates some monad laws, but lets you do something like:
liftIO :: IO a -> Session IO a
liftIO = Session []
exampleSession :: Session IO Int
exampleSession = do
liftIO $ putStrLn "one"
liftIO $ putStrLn "two"
liftIO $ putStrLn "three"
liftIO $ putStrLn "four"
trace "five" $ return 5
and get
ghci> runSession exampleSession
five
one
two
three
four
5
ghci> length (pending exampleSession)
4
This is very similar to what Haxl does.
For more info:
Open sourcing haxl - Facebook Code Blog
ICFP 2014 talk
You could use the unsafeInterleaveIO function. It is a dangerous function that can introduce bugs to your program if not used carefully, but it does what you're asking for.
You can insert it into your example code like this:
lazyReadits :: IO [a]
lazyReadits = unsafeInterleaveIO $ do
a <- readit
r <- lazyReadits
return (a:r)
unsafeInterleaveIO makes the action as a whole lazy, but once it starts evaluating it will evaluate as if it had been strict. This means in my above example: readit will run as soon as something tests whether the returned list is empty or not. If I'd used mapM unsafeInterleaveIO (replicate 3 readit) instead, then readit would only be run when the actual elements of the list are evaluated, which would make the contents of the list depend on the order in which its elements are inspected, which is one example of how unsafeInterleaveIO can introduce bugs.

How to best "waste" a roughly specified time by only "burning CPU" with pure functional calculations?

I occasionally would like to delay specific parts of a pure algorithm while developing / testing, so I can monitor the evaluation simply by watching the lazy result build up piece by piece (which would generally be too fast to be useful in the final, un-delayed version). I then find myself inserting ugly stuff like sum [1..1000000] `seq` q, which kind of works (though often with the usual thunk-explosion problems, because I never think much about this), but is rather trial-and-error-like.
Is there a nicer, more controllable alternative that's still just as simple, when I want to do some quick testing in that way and can't be bothered to do proper profiling, criterion etc.?
I'd also like to avoid unsafePerformIO $ threadDelay, though I reckon this might actually be an appropriate use.
This looping solution avoids calling threadDelay, but still calls unsafePerformIO, so maybe we don't gain much:
import Data.AdditiveGroup
import Data.Thyme.Clock
import Data.Thyme.Clock.POSIX
import System.IO.Unsafe
pureWait :: NominalDiffTime -> ()
pureWait time = let tsList = map unsafePerformIO ( repeat getPOSIXTime ) in
case tsList of
(t:ts) -> loop t ts
where
loop t (t':ts') = if (t' ^-^ t) > time
then ()
else loop t ts'
main :: IO ()
main = do
putStrLn . show $ pureWait (fromSeconds 10)
UPDATE: Here's an altenative solution. First determine (using IO) how many iterations do you need to achieve a given delay, and then just use a pure looping function.
pureWait :: Integer -> Integer
pureWait i = foldl' (+) 0 $ genericTake i $ intersperse (negate 1) (repeat 1)
calibrate :: NominalDiffTime -> IO Integer
calibrate timeSpan = let iterations = iterate (*2) 2 in loop iterations
where
loop (i:is) = do
t1 <- getPOSIXTime
if pureWait i == 0
then do
t2 <- getPOSIXTime
if (t2 ^-^ t1) > timeSpan
then return i
else loop is
else error "should never happen"
main :: IO ()
main = do
requiredIterations <- calibrate (fromSeconds 10)
putStrLn $ "iterations required for delay: " ++ show requiredIterations
putStrLn . show $ pureWait requiredIterations

Resources