Repeated evaluation of pure expression in IO action - haskell

I have a procedure that (a) does some IO, (b) constructs a lookup table, and (c) returns an IO action that uses the lookup table. But when compiled with -O, GHC (version 6.12.1) inlines the construction the lookup table, so that it is reevaluated for every call of the IO action.
Example:
module Main where
import Data.Array
import Data.IORef
import Control.Monad
makeAction getX getY sumRef = do
x <- getX
let a = listArray (0, 1000) [x ..]
return $ do
y <- getY
modifyIORef sumRef (\sum -> sum + a ! y)
main = do
sumRef <- newIORef 0
action <- makeAction getX getY sumRef
replicateM_ 100000 action
n <- readIORef sumRef
putStrLn (show n)
where
getX = return (1 :: Int)
getY = return 0
Is this issue well-known enough to have a standard GHC-foolproof workaround - or how would you adjust the program so that a isn't repeatedly being allocated?

The easiest workaround is to force evaluation by using strictness annotations.
{-# LANGUAGE BangPatterns #-}
Then force allocation by simply making a strict using a ! ("bang").
let !a = listArray (0, 1000) [x ..]
Alternatively, if you are working in the IO monad, strictness annotations may not always help. To force evaluation of an expression before some IO action is run, you can use evaluate. For example:
let a = listArray (0, 1000) [x ..]
evaluate a

Try forcing a when constructing the monadic value to return:
makeAction getX getY sumRef = do
x <- getX
let a = listArray (0, 1000) [x ..]
return $ a `seq` do
y <- getY
modifyIORef sumRef (\sum -> sum + a ! y)

Related

How do I recover lazy evaluation of a monadically constructed list, after switching from State to StateT?

With the following code:
(lazy_test.hs)
-- Testing lazy evaluation of monadically constructed lists, using State.
import Control.Monad.State
nMax = 5
foo :: Int -> State [Int] Bool
foo n = do
modify $ \st -> n : st
return (n `mod` 2 == 1)
main :: IO ()
main = do
let ress = for [0..nMax] $ \n -> runState (foo n) []
sts = map snd $ dropWhile (not . fst) ress
print $ head sts
for = flip map
I can set nMax to 5, or 50,000,000, and I get approximately the same run time:
nMax = 5:
$ stack ghc lazy_test.hs
[1 of 1] Compiling Main ( lazy_test.hs, lazy_test.o )
Linking lazy_test ...
$ time ./lazy_test
[1]
real 0m0.019s
user 0m0.002s
sys 0m0.006s
nMax = 50,000,000:
$ stack ghc lazy_test.hs
[1 of 1] Compiling Main ( lazy_test.hs, lazy_test.o )
Linking lazy_test ...
$ time ./lazy_test
[1]
real 0m0.020s
user 0m0.002s
sys 0m0.005s
which is as I expect, given my understanding of lazy evaluation mechanics.
However, if I switch from State to StateT:
(lazy_test2.hs)
-- Testing lazy evaluation of monadically constructed lists, using StateT.
import Control.Monad.State
nMax = 5
foo :: Int -> StateT [Int] IO Bool
foo n = do
modify $ \st -> n : st
return (n `mod` 2 == 1)
main :: IO ()
main = do
ress <- forM [0..nMax] $ \n -> runStateT (foo n) []
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
for = flip map
then I see an extreme difference between the respective run times:
nMax = 5:
$ stack ghc lazy_test2.hs
[1 of 1] Compiling Main ( lazy_test2.hs, lazy_test2.o )
Linking lazy_test2 ...
$ time ./lazy_test2
[1]
real 0m0.019s
user 0m0.002s
sys 0m0.004s
nMax = 50,000,000:
$ stack ghc lazy_test2.hs
[1 of 1] Compiling Main ( lazy_test2.hs, lazy_test2.o )
Linking lazy_test2 ...
$ time ./lazy_test2
[1]
real 0m29.758s
user 0m25.488s
sys 0m4.231s
And I'm assuming that's because I'm losing lazy evaluation of the monadically constructed list, when I switch to the StateT-based implementation.
Is that correct?
Can I recover lazy evaluation of a monadically constructed list, while keeping with the StateT-based implementation?
In your example, you're only running one foo action per runState, so your use of State and/or StateT is essentially irrelevant. You can replace the use of foo with the equivalent:
import Control.Monad
nMax = 50000000
main :: IO ()
main = do
ress <- forM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
and it behaves the same way.
The issue is the strictness of the IO monad. If you ran this computation in the Identity monad instead:
import Control.Monad
import Data.Functor.Identity
nMax = 50000000
main :: IO ()
main = do
let ress = runIdentity $ forM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
then it would run lazily.
If you want to run lazily in the IO monad, you need to do it explicitly with unsafeInterleaveIO, so the following would work:
import System.IO.Unsafe
import Control.Monad
nMax = 50000000
main :: IO ()
main = do
ress <- lazyForM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
lazyForM :: [a] -> (a -> IO b) -> IO [b]
lazyForM (x:xs) f = do
y <- f x
ys <- unsafeInterleaveIO (lazyForM xs f)
return (y:ys)
lazyForM [] _ = return []
The other answer by K A Buhr explains why State vs StateT is not the pertinent factor (IO is), and also points out how your example is strangely constructed (in that the State(T) part isn't actually used as each number uses a new state []). But aside from those points, I'm not sure I would say "losing lazy evaluation of the monadically constructed list", because if we understand something like "lazy evaluation = evaluated only when needed", then foo does indeed need to run on every element on the input list in order to perform all the effects, so lazy evaluation is not being "lost". You are getting what you asked for. (It just so happens that foo doesn't perform any IO, and perhaps someone else can comment with if it's ever possible for a compiler/GHC to optimize it away on this basis, but you can easily see why GHC does the naive thing here.)
This is a common, well-known problem in Haskell. There are various libraries (best known of which are streaming, pipes, conduit) which solve the problem by giving you streams (basically lists) which are lazy in the effects too. If I recreate your example in a streaming style,
import Data.Function ((&))
import Control.Monad.State
import Streaming
import qualified Streaming.Prelude as S
foo :: Int -> StateT [Int] IO Bool
foo n =
(n `mod` 2 == 1) <$ modify (n:)
nMax :: Int
nMax = 5000000
main :: IO ()
main = do
mHead <- S.head_ $ S.each [0..nMax]
& S.mapM (flip runStateT [] . foo)
& S.dropWhile (not . fst)
print $ snd <$> mHead
then both versions run practically instantaneously. To make the difference more apparent, imagine that foo also called print "hi". Then the streaming version, being lazy in the effects, would print only twice, whereas your original versions would both print nMax times. As they're lazy in the effects, then the whole list doesn't need to be traversed in order to short-circuit and finish early.

Simple Haskell program not behaving correct

I'm new to Haskell and trying to write simple program to find maximal element and it's index from intput. I receive values to compare one by one. Maximal element I'm holding in maxi variable, it's index - in maxIdx. Here's my program:
loop = do
let maxi = 0
let maxIdx = 0
let idx = 0
let idxN = 0
replicateM 5 $ do
input_line <- getLine
let element = read input_line :: Int
if maxi < element
then do
let maxi = element
let maxIdx = idx
hPutStrLn stderr "INNER CHECK"
else
hPutStrLn stderr "OUTER CHECK"
let idx = idxN + 1
let idxN = idx
print maxIdx
loop
Even though I know elements coming are starting from bigger to smaller (5, 4, 3, 2, 1) program enters INNER CHECK all the time (it should happen only for the first element!) and maxIdx is always 0.
What am I doing wrong?
Thanks in advance.
Anyway, let's have fun.
loop = do
let maxi = 0
let maxIdx = 0
let idx = 0
let idxN = 0
replicateM 5 $ do
input_line <- getLine
let element = read input_line :: Int
if maxi < element
then do
let maxi = element
let maxIdx = idx
hPutStrLn stderr "INNER CHECK"
else
hPutStrLn stderr "OUTER CHECK"
let idx = idxN + 1
let idxN = idx
print maxIdx
loop
is not a particularly Haskelly code (and as you know is not particularly correct).
Let's make if Haskellier.
What do we do here? We've an infinite loop, which is reading a line 5 times, does something to it, and then calls itself again for no particular reason.
Let's split it:
import Control.Monad
readFiveLines :: IO [Int]
readFiveLines = replicateM 5 readLn
addIndex :: [Int] -> [(Int, Int)]
addIndex xs = zip xs [0..]
findMaxIndex :: [Int] -> Int
findMaxIndex xs = snd (maximum (addIndex xs))
loop :: ()
loop = loop
main :: IO ()
main = do xs <- readFiveLines
putStrLn (show (findMaxIndex xs))
snd returns the second element from a tuple; readLn is essentially read . getLine; zip takes two lists and returns a list of pairs; maximum finds a maximum value.
I left loop intact in its original beauty.
You can be even Haskellier if you remember that something (huge expression) can be replaced with something $ huge expression ($ simply applies its left operand to its right operand), and the functions can be combined with .: f (g x) is the same as (f . g) x, or f . g $ x (see? it's working for the left side as well!). Additionally, zip x y can be rewritten as x `zip` y
import Control.Monad
readFiveLines :: IO [Int]
readFiveLines = replicateM 5 readLn
addIndex :: [Int] -> [(Int, Int)]
addIndex = (`zip` [0..])
findMaxIndex :: [Int] -> Int
findMaxIndex = snd . maximum . addIndex
main :: IO ()
main = do xs <- readFiveLines
putStrLn . show . findMaxIndex $ xs
As for debug print, there's a package called Debug.Trace and a function traceShow which prints its first argument (formatted with show, hence the name) to stderr, and returns its second argument:
findMaxIndex :: [Int] -> Int
findMaxIndex = snd . (\xs -> traceShow xs (maximum xs)) . addIndex
That allows you to tap onto any expression and see what's coming in (and what are the values around — you can show tuples, lists, etc.)
I think alf's answer is very good, but for what it's worth, here's how I would interpret your intention.
{-# LANGUAGE FlexibleContexts #-}
module Main where
import System.IO
import Control.Monad.State
data S = S { maximum :: Int
, maximumIndex :: Int
, currentIndex :: Int }
update :: Int -> Int -> S -> S
update m mi (S _ _ ci) = S m mi ci
increment :: S -> S
increment (S m mi ci) = S m mi (ci+1)
next :: (MonadIO m, MonadState S m) => m ()
next = do
S maxi maxIdx currIdx <- get
input <- liftIO $ getLine
let element = read input :: Int
if maxi < element
then do
modify (update element currIdx)
liftIO $ hPutStrLn stderr "INNER CHECK"
else
liftIO $ hPutStrLn stderr "OUTER CHECK"
modify increment
run :: Int -> IO S
run n = execStateT (replicateM_ n next) (S 0 0 0)
main :: IO ()
main = do
S maxi maxIdx _ <- run 5
putStrLn $ "maxi: " ++ (show maxi) ++ " | maxIdx: " ++ (show maxIdx)
This uses a monad transformer to combine a stateful computation with IO. The get function retrieves the current state, and the modify function lets you change the state.

Haskell - Couldn't match type [] with IO

I am new at Haskell. Why am I getting the error message
(Couldn't match type '[]' with 'IO' — Haskell) in folowing code.
In main I only need time of algorithm running without the result.
Only want to measure algorithm time.
qsort1 :: Ord a => [a] -> [a]
qsort1 [] = []
qsort1 (p:xs) = qsort1 lesser ++ [p] ++ qsort1 greater
where
lesser = [ y | y <- xs, y < p ]
greater = [ y | y <- xs, y >= p ]
main = do
start <- getCurrentTime
qsort1 (take 1000000 $ randomRs (1, 100000) (mkStdGen 42))
end <- getCurrentTime
print (diffUTCTime end start)
Your main function isn't right. Unless qsort1 is an IO action you cannot perform it in an IO monad. Instead you can put it in the let binding:
main = do
start <- getCurrentTime
let x = qsort1 (take 1000000 $ randomRs ((1 :: Int), 100000) (mkStdGen 42))
end <- getCurrentTime
print (diffUTCTime end start)
Also note that I have explicitly given a type annotation for 1 to avoid some compile errors.
But that being said you cannot actually find the the total time taken to do the sorting because of lazy evaluation. x will never be computed because it's never used in the program. If you run main, it give you this output which is definetly wrong:
λ> main
0.000001s
Instead you can use this to calculate the computation:
main = do
start <- getCurrentTime
let x = qsort1 (take 1000000 $ randomRs ((1 :: Int), 100000) (mkStdGen 42))
print x
end <- getCurrentTime
print (diffUTCTime end start)
Instead of printing, you can also use the BangPatterns extension to force the computation of qsort1:
main = do
start <- getCurrentTime
let !x = qsort1 (take 1000000 $ randomRs ((1 :: Int), 100000) (mkStdGen 42))
end <- getCurrentTime
print (diffUTCTime end start)
BangPatterns will not lead to full evaluation as #kosmikus points out. Instead use a library like criterion which has been specially made for benchnmarking.
I used method below and it works fine:
main = do
let arr = take 1000000 $ randomRs ((1 :: Int), 10000000) (mkStdGen 59)
defaultMain [
bgroup "qs" [ bench "1" $ nf quickSort arr ]
]

How to make a random list using IO in Haskell

I'm trying to do a flocking simulation in order to better teach myself haskell. I'm running into trouble when trying to generate the initial state for the simulation which requires randomness. I'm trying to generate a list of Boids which all have random initial positions and directions.
In the main function I call this using
let numBoids = 10
rBoids <- randomBoids numBoids
And rBoids I indend to store in an IORef which I can then update every frame, which I think is the right way to do things?
And here is the code which fails:
-- Type for the flocking algorithm
data Boid = Boid {
boidPosition :: Vector2(GLfloat)
, boidDirection :: Vector2(GLfloat)
} deriving Show
randomBoids :: Int -> IO ([Boid])
randomBoids 0 = do
return []
randomBoids n = do
b <- randomBoid
bs <- (randomBoids (n-1))
return b : bs
randomBoid = do
pos <- randomVector
vel <- randomVector
return (Boid pos vel)
randomVector = do
x <- randomRIO(-1.0, 1.0)
y <- randomRIO(-1.0, 1.0)
return (Vector2 x y)
What actually fails is return b : bs. If I change this into return [b] it compiles. The error given is:
Couldn't match expected type `IO [Boid]' with actual type `[a0]'
In the expression: return b : bs
In the expression:
do { b <- randomBoid;
bs <- (randomBoids (n - 1));
return b : bs }
In an equation for `randomBoids':
randomBoids n
= do { b <- randomBoid;
bs <- (randomBoids (n - 1));
return b : bs }
I'm pretty lost here, and my understanding of the whole imperative-code-in-a-functional language (and monads) is shaky to say the least. Any help would be most appreciated!
Gangadahr is correct. I only wanted to mention that you can shorten your code a LOT:
import Control.Applicative
import Control.Monad
randomBoids n = replicateM n randomBoid
randomBoid = Boid <$> randomVector <*> randomVector
randomVector = Vector2 <$> randomRIO (-1, 1) <*> randomRIO (-1, 1)
The first function takes advantage of replicateM, which is a very useful function when you want to repeat a monadic action and collect the results. The latter two functions use Applicative style, which is enormously useful.
The reason you are getting the error is because return b : bs will make the compiler interpret it as (return b): bs To fix this, you can change the statement to return (b:bs). That will make the statement return an IO[Boid]
The typechecker is reading return x : xs as (return x) : xs. If you write return (x:xs) it will typecheck.

how to force the evaluation of this expression?

module Main where
import Control.Parallel(par,pseq)
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List
import IO
import Data.Char
import Control.DeepSeq
time :: IO t -> IO t
time a = do
start <- getCPUTime
v <- a
end <- getCPUTime
let diff = (fromIntegral (end - start)) / (10^12)
printf "Computation time: %0.3f sec\n" (diff :: Double)
return v
learquivo :: FilePath -> IO ([[Int]])
learquivo s = do
content <- readFile s
return (read content)
main :: IO ()
main = do
t5 <- getCPUTime
content <- learquivo "mkList1.txt"
let !mapasort = rnf $ map sort content
t6 <- getCPUTime
let diffft6t5 = (fromIntegral (t6 - t5)) / (10^12)
printf "Computation time Mapasort: %0.3f sec\n" (diffft6t5 :: Double)
How to tell if it evaluates all elements of content?
let !mapasort = rnf $ map sort content
I used the line in winghci:
*Main> let !mapasort = rnf $ map sort content
But, returned:
*Main> mapasort ()
Thanks
I see two questions:
1) Why is mapsort evaluating to unit, ().
Because the rnf function always returns (). See the documentation.
2) Is everything evaluated
Yes. The DeepSeq instance (which is where rnf lives) for list just calls the deepseq instance for each element in the list:
rnf [] = ()
rnf (x:xs) = rnf x `seq` rnf xs
Your elements all Ints, which have a correct NFData instance.
And I'd add two more questions:
3) How should this benchmarking be done correctly?
Using Criterion. There are many Criterion advocates here on SO, you can find answers that would serve as good examples with a search.
4) How should this evaluation be forced for non-benchmarking purposes?
Using the parallel package.
import Control.Parallel.Strategies
...
let !mapsort = (map sort content) `using` (evalList rdeepseq)
or still using rnf:
let mapsort = map sort content
!_ = rnf mapsort

Resources