How to force evaluation in Haskell? - haskell

I am relatively new to Haskell and I am trying to learn how different actions can be executed in sequence using the do notation.
In particular, I am writing a program to benchmark an algorithm (a function)
foo :: [String] -> [String]
To this purpose I would like to write a function like
import System.CPUTime
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let r = foo inputList
end <- getCPUTime
return (end - start) -- Possible conversion needed.
The last line might need a conversion (e.g. to milliseconds) but this is not the topic of this question.
Is this the correct way to measure the time needed to compute function foo on some argument inputList?
In other words, will the expression foo inputList be completely reduced before the action end <- getCPUTime is executed? Or will r only be bound to the thunk foo inputList?
More in general, how can I ensure that an expression is completely evaluated before some action is executed?
This question was asked a few months ago on programmers (see here) and had an accepted answer there but it has been closed as off-topic because it belongs on stack overflow. The question could not be moved to stack overflow because it is older than 60 days. So, in agreement with the moderators, I am reposting the question here and posting the accepted question myself because I think it contains some useful information.

Answer originally given by user ysdx on programmers:
Indeed you version will not benchmark your algorithm. As r is not used it will not be evaluated at all.
You should be able to do it with DeepSeq:
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let r = foo inputList
end <- r `deepseq` getCPUTime
return (end - start)
(a `deepseq` b) is some "magic" expression which forces the complete/recursive evaluation of a before returning b.

I would use the language extension -XBangPatterns, I find that quite expressive in such situations. So you would have to say "let !r = foo inputList" as in:
{-# LANGUAGE BangPatterns #-}
import System.CPUTime
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let !r = foo inputList
end <- getCPUTime
return (end - start)

Related

Haskell. How to do IO inside pure Haskell function? How to print intermediate results while function is being executed? [duplicate]

This question already has answers here:
How to get normal value from IO action in Haskell
(2 answers)
Closed 1 year ago.
In many imperative programming languages like Java, C or Python we can easily add a print function which can give us information about the intermediate state of the program.
My goal is to find a way to do something like that in Haskell. I want the function which not only computes value but also prints something. The function below is a simplified version of what I want to do. My actual function is too complicated and incomprehensive without context to show it here.
My idea is to have a "pure" Haskell function that has an auxiliary function inside which has [Int] -> IO () -> Int type signature. An IO parameter is initialized in the where clause as a do block. But unfortunately, the do block is not executed, when I run the function in GHCI. The function is compiled successfuly though
module Tests where
-- Function returns the sum of the list and tries to print some info
-- but no IO actually happens
pureFuncWithIO :: [Int] -> Int
pureFuncWithIO [] = 0
pureFuncWithIO nums = auxIOfunc nums (return ())
where
auxIOfunc [] _ = 0
auxIOfunc (n : ns) _ = n + auxIOfunc ns (sneakyIOaction n)
sneakyIOaction n
= do -- Not executed
putStrLn $ "adding " ++ (show n);
return ()
Output in GHCI test:
*Tests> pureFuncWithIO [1,2,3,4,5]
15
Meanwhile, I expected something like this:
*Tests> pureFuncWithIO [1,2,3,4,5]
adding 1
adding 2
adding 3
adding 4
adding 5
15
Is it possible to come up with a way to have IO inside, keeping the return type of the outer-most function, not an IO a flavor? Thanks!
This type signature
pureFuncWithIO :: [Int] -> Int
is promising to the caller that no side effect (like prints) will be observed. The compiler will reject any attempt to perform IO. Some exceptions exist for debugging (Debug.Trace), but they are not meant to be left in production code. There also are some "forbidden", unsafe low-level functions which should never be used in regular code -- you should pretend these do not exist at all.
If you want to do IO, you need an IO return type.
pureFuncWithIO :: [Int] -> IO Int
Doing so allows to weave side effects with the rest of the code.
pureFuncWithIO [] = return 0
pureFuncWithIO (n : ns) = do
putStrLn $ "adding " ++ show n
res <- pureFuncWithIO ns
return (n + res)
A major point in the design of Haskell is to have a strict separation of functions which can not do IO and those who can. Doing IO in a non-IO context is what the Haskell type system was designed to prevent.
Your sneakyIOaction is not executed because you pass its result as a parameter to auxIOfunc, but never use that parameter, and haskell being lazy bastard it is never execute it.
If you try to use said parameter you find out that you can't. It's type not allow you to do anithing with it except combine with other IO things.
There is a way to do what you want, but it is on dark side. You need unsafePerformIO
unsafePerformIO :: IO a -> a
That stuff basically allow you to execute any IO. Tricky thing you have to consume result, otherwise you may end up with haskell skip it due to its laziness. You may want to look into seq if you really want to use it, but don't actually need result.

Haskell: How to use a HashMap in a main function

I beg for your help, speeding up the following program:
main = do
jobsToProcess <- fmap read getLine
forM_ [1..jobsToProcess] $ \_ -> do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
putStrLn $ doSomeReallyLongWorkingJob r k
There could(!) be a lot of identical jobs to do, but it's not up to me modifying the inputs, so I tried to use Data.HashMap for backing up already processed jobs. I already optimized the algorithms in the doSomeReallyLongWorkingJob function, but now it seems, it's quite as fast as C.
But unfortunately it seems, I'm not able to implement a simple cache without producing a lot of errors. I need a simple cache of Type HashMap (Int, Int) Int, but everytime I have too much or too few brackets. And IF I manage to define the cache, I'm stuck in putting data into or retrieving data from the cache cause of lots of errors.
I already Googled for some hours but it seems I'm stuck. BTW: The result of the longrunner is an Int as well.
It's pretty simple to make a stateful action that caches operations. First some boilerplate:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.Map (Map)
import qualified Data.Map as M
import Debug.Trace
I'll use Data.Map, but of course you can substitute in a hash map or any similar data structure without much trouble. My long-running computation will just add up its arguments. I'll use trace to show when this computation is executed; we'll hope not to see the output of the trace when we enter a duplicate input.
reallyLongRunningComputation :: [Int] -> Int
reallyLongRunningComputation args = traceShow args $ sum args
Now the caching operation will just look up whether we've seen a given input before. If we have, we'll return the precomputed answer; otherwise we'll compute the answer now and store it.
cache :: (MonadState (Map a b) m, Ord a) => (a -> b) -> a -> m b
cache f x = do
mCached <- gets (M.lookup x)
case mCached of
-- depending on your goals, you may wish to force `result` here
Nothing -> modify (M.insert x result) >> return result
Just cached -> return cached
where
result = f x
The main function now just consists of calling cache reallyLongRunningComputation on appropriate inputs.
main = do
iterations <- readLn
flip evalStateT M.empty . replicateM_ iterations
$ liftIO getLine
>>= liftIO . mapM readIO . words
>>= cache reallyLongRunningComputation
>>= liftIO . print
Let's try it in ghci!
> main
5
1 2 3
[1,2,3]
6
4 5
[4,5]
9
1 2
[1,2]
3
1 2
3
1 2 3
6
As you can see by the bracketed outputs, reallyLongRunningComputation was called the first time we entered 1 2 3 and the first time we entered 1 2, but not the second time we entered these inputs.
I hope i'm not too far off base, but first you need a way to carry around the past jobs with you. Easiest would be to use a foldM instead of a forM.
import Control.Monad
import Data.Maybe
main = do
jobsToProcess <- fmap read getLine
foldM doJobAcc acc0 [1..jobsToProcess]
where
acc0 = --initial value of some type of accumulator, i.e. hash map
doJobAcc acc _ = do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
case getFromHash acc (r,k) of
Nothing -> do
i <- doSomeReallyLongWorkingJob r k
return $ insertNew acc (r,k) i
Just i -> do
return acc
Note, I don't actually use the interface for putting and getting the hash table key. It doesn't actually have to be a hash table, Data.Map from containers could work. Or even a list if its going to be a small one.
Another way to carry around the hash table would be to use a State transformer monad.
I am just adding this answer since I feel like the other answers are diverging a bit from the original question, namely using hashtable constructs in Main function (inside IO monad).
Here is a minimal hashtable example using hashtables module. To install the module with cabal, simply use
cabal install hashtables
In this example, we simply put some values in a hashtable and use lookup to print a value retrieved from the table.
import qualified Data.HashTable.IO as H
main :: IO ()
main = do
t <- H.new :: IO (H.CuckooHashTable Int String)
H.insert t 22 "Hello world"
H.insert t 5 "No problem"
msg <- H.lookup t 5
print msg
Notice that we need to use explicit type annotation to specify which implementation of the hashtable we wish to use.

How to best "waste" a roughly specified time by only "burning CPU" with pure functional calculations?

I occasionally would like to delay specific parts of a pure algorithm while developing / testing, so I can monitor the evaluation simply by watching the lazy result build up piece by piece (which would generally be too fast to be useful in the final, un-delayed version). I then find myself inserting ugly stuff like sum [1..1000000] `seq` q, which kind of works (though often with the usual thunk-explosion problems, because I never think much about this), but is rather trial-and-error-like.
Is there a nicer, more controllable alternative that's still just as simple, when I want to do some quick testing in that way and can't be bothered to do proper profiling, criterion etc.?
I'd also like to avoid unsafePerformIO $ threadDelay, though I reckon this might actually be an appropriate use.
This looping solution avoids calling threadDelay, but still calls unsafePerformIO, so maybe we don't gain much:
import Data.AdditiveGroup
import Data.Thyme.Clock
import Data.Thyme.Clock.POSIX
import System.IO.Unsafe
pureWait :: NominalDiffTime -> ()
pureWait time = let tsList = map unsafePerformIO ( repeat getPOSIXTime ) in
case tsList of
(t:ts) -> loop t ts
where
loop t (t':ts') = if (t' ^-^ t) > time
then ()
else loop t ts'
main :: IO ()
main = do
putStrLn . show $ pureWait (fromSeconds 10)
UPDATE: Here's an altenative solution. First determine (using IO) how many iterations do you need to achieve a given delay, and then just use a pure looping function.
pureWait :: Integer -> Integer
pureWait i = foldl' (+) 0 $ genericTake i $ intersperse (negate 1) (repeat 1)
calibrate :: NominalDiffTime -> IO Integer
calibrate timeSpan = let iterations = iterate (*2) 2 in loop iterations
where
loop (i:is) = do
t1 <- getPOSIXTime
if pureWait i == 0
then do
t2 <- getPOSIXTime
if (t2 ^-^ t1) > timeSpan
then return i
else loop is
else error "should never happen"
main :: IO ()
main = do
requiredIterations <- calibrate (fromSeconds 10)
putStrLn $ "iterations required for delay: " ++ show requiredIterations
putStrLn . show $ pureWait requiredIterations

Dice Game in Haskell

I'm trying to spew out randomly generated dice for every roll that the user plays. The user has 3 rolls per turn and he gets to play 5 turns (I haven't implemented this part yet and I would appreciate suggestions).
I'm also wondering how I can display the colors randomly. I have the list of tuples in place, but I reckon I need some function that uses random and that list to match those colors. I'm struggling as to how.
module Main where
import System.IO
import System.Random
import Data.List
diceColor = [("Black",1),("Green",2),("Purple",3),("Red",4),("White",5),("Yellow",6)]
{-
randomList :: (RandomGen g) -> Int -> g -> [Integer]
random 0 _ = []
randomList n generator = r : randomList (n-1) newGenerator
where (r, newGenerator) = randomR (1, 6) generator
-}
rand :: Int -> [Int] -> IO ()
rand n rlst = do
num <- randomRIO (1::Int, 6)
if n == 0
then doSomething rlst
else rand (n-1) (num:rlst)
doSomething x = putStrLn (show (sort x))
main :: IO ()
main = do
--hSetBuffering stdin LineBuffering
putStrLn "roll, keep, score?"
cmd <- getLine
doYahtzee cmd
--rand (read cmd) []
doYahtzee :: String -> IO ()
doYahtzee cmd = do
if cmd == "roll"
then rand 5 []
else do print "You won"
There's really a lot of errors sprinkled throughout this code, which suggests to me that you tried to build the whole thing at once. This is a recipe for disaster; you should be building very small things and testing them often in ghci.
Lecture aside, you might find the following facts interesting (in order of the associated errors in your code):
List is deprecated; you should use Data.List instead.
No let is needed for top-level definitions.
Variable names must begin with a lower case letter.
Class prerequisites are separated from a type by =>.
The top-level module block should mainly have definitions; you should associate every where clause (especially the one near randomList) with a definition by either indenting it enough not to be a new line in the module block or keeping it on the same line as the definition you want it to be associated with.
do introduces a block; those things in the block should be indented equally and more than their context.
doYahtzee is declared and used as if it has three arguments, but seems to be defined as if it only has one.
The read function is used to parse a String. Unless you know what it does, using read to parse a String from another String is probably not what you want to do -- especially on user input.
putStrLn only takes one argument, not four, and that argument has to be a String. However, making a guess at what you wanted here, you might like the (!!) and print functions.
dieRoll doesn't seem to be defined anywhere.
It's possible that there are other errors, as well. Stylistically, I recommend that you check out replicateM, randomRs, and forever. You can use hoogle to search for their names and read more about them; in the future, you can also use it to search for functions you wish existed by their type.

Haskell: can't use getCPUTime

I have:
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let fibonaccimap = map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
listaVintesete :: [Integer]
listaVintesete = replicate 100 27
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
But
*Main> main
Computation time fibonaccimap: 0.000 sec
I do not understand why this happens.
Help-me thanks.
As others have said, this is due to lazy evaluation. To force evaluation you should use the deepseq package and BangPatterns:
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Text.Printf
import System.CPUTime
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let !fibonaccimap = rnf $ map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
...
In the above code you should notice three things:
It compiles (modulo the ... of functions you defined above). When you post code for questions please make sure it runs (iow, you should include imports)
The use of rnf from deepseq. This forces the evaluation of each element in the list.
The bang pattern on !fibonaccimap, meaning "do this now, don't wait". This forces the list to be evaluated to weak-head normal form (whnf, basically just the first constructor (:)). Without this the rnf function would itself remain unevaluated.
Resulting in:
$ ghc --make ds.hs
$ ./ds
Computation time fibonaccimap: 6.603 sec
If you're intending to do benchmarking you should also use optimization (-O2) and the Criterion package instead of getCPUTime.
Haskell is lazy. The computation you request in the line
let fibonaccimap = map fib listaVintesete
doesn't actually happen until you somehow use the value of fibonaccimap. Thus to measure the time used, you'll need to introduce something that will force the program to perform the actual computation.
ETA: I originally suggested printing the last element to force evaluation. As TomMD points out, this is nowhere near good enough -- I strongly recommend reading his response here for an actually working way to deal with this particular piece of code.
I suspect you are a "victim" of lazy evaluation. Nothing forces the evaluation of fibonaccimap between the timing calls, so it's not computed.
Edit
I suspect you're trying to benchmark your code, and in that case it should be pointed out that there are better ways to do this more reliably.
10^12 is an integer, which forces the value of fromIntegral to be an integer, which means difffibonaccimap is assigned a rounded value, so it's 0 if the time is less than half a second. (That's my guess, anyway. I don't have time to look into it.)
Lazy evaluation has in fact bitten you, as the other answers have said. Specifically, 'let' doesn't force the evaluation of an expression, it just scopes a variable. The computation won't actually happen until its value is demanded by something, which probably won't happen until an actual IO action needs its value. So you need to put your print statement between your getCPUTime evaluations. Of course, this will also get the CPU time used by print in there, but most of print's time is waiting on IO. (Terminals are slow.)

Resources