Haskell: can't use getCPUTime - haskell

I have:
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let fibonaccimap = map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
listaVintesete :: [Integer]
listaVintesete = replicate 100 27
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
But
*Main> main
Computation time fibonaccimap: 0.000 sec
I do not understand why this happens.
Help-me thanks.

As others have said, this is due to lazy evaluation. To force evaluation you should use the deepseq package and BangPatterns:
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Text.Printf
import System.CPUTime
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let !fibonaccimap = rnf $ map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
...
In the above code you should notice three things:
It compiles (modulo the ... of functions you defined above). When you post code for questions please make sure it runs (iow, you should include imports)
The use of rnf from deepseq. This forces the evaluation of each element in the list.
The bang pattern on !fibonaccimap, meaning "do this now, don't wait". This forces the list to be evaluated to weak-head normal form (whnf, basically just the first constructor (:)). Without this the rnf function would itself remain unevaluated.
Resulting in:
$ ghc --make ds.hs
$ ./ds
Computation time fibonaccimap: 6.603 sec
If you're intending to do benchmarking you should also use optimization (-O2) and the Criterion package instead of getCPUTime.

Haskell is lazy. The computation you request in the line
let fibonaccimap = map fib listaVintesete
doesn't actually happen until you somehow use the value of fibonaccimap. Thus to measure the time used, you'll need to introduce something that will force the program to perform the actual computation.
ETA: I originally suggested printing the last element to force evaluation. As TomMD points out, this is nowhere near good enough -- I strongly recommend reading his response here for an actually working way to deal with this particular piece of code.

I suspect you are a "victim" of lazy evaluation. Nothing forces the evaluation of fibonaccimap between the timing calls, so it's not computed.
Edit
I suspect you're trying to benchmark your code, and in that case it should be pointed out that there are better ways to do this more reliably.

10^12 is an integer, which forces the value of fromIntegral to be an integer, which means difffibonaccimap is assigned a rounded value, so it's 0 if the time is less than half a second. (That's my guess, anyway. I don't have time to look into it.)

Lazy evaluation has in fact bitten you, as the other answers have said. Specifically, 'let' doesn't force the evaluation of an expression, it just scopes a variable. The computation won't actually happen until its value is demanded by something, which probably won't happen until an actual IO action needs its value. So you need to put your print statement between your getCPUTime evaluations. Of course, this will also get the CPU time used by print in there, but most of print's time is waiting on IO. (Terminals are slow.)

Related

Using timeout with non-IO function haskell

I have function fun1 that is not IO and can be computationally expensive, so I want to run it for a specified amount of seconds max. I found a function timeout, but it requires this fun1 to be of IO.
timeout :: Int -> IO a -> IO (Maybe a)
How can I circumvent this, or is there a better approach to achieve my goal?
Edit:
I revised first sentence fun1 is NOT IO, it is of type fun1 :: Formula -> Bool.
Close to what talex said except moving the seq should work. Here is an example using inefficient fib as the expensive computation.
Prelude> import System.Timeout
Prelude System.Timeout> :{
Prelude System.Timeout| let fib 0 = 0
Prelude System.Timeout| fib 1 = 1
Prelude System.Timeout| fib n = fib (n-1) + fib (n-2)
Prelude System.Timeout| :}
Prelude System.Timeout> timeout 1000000 (let x = fib 44 in x `seq` return x)
Nothing
Prelude System.Timeout>
Limiting function execution to a specific time length is not pure (i.e. it does not ensure the same result every time), hence you should not be pursuing such behavior outside of IO. You can, for example, use something evil like unsafePerformIO (timeout 1000 (pure fun1)) but such usage will quickly lead to programs that are hard to understand with unexpected quirks. A better idea may be to define a custom monad that allows limited time execution and can be lifted to IO but I don't know if such a thing exists.
import System.Timeout (timeout)
import Control.Exception (evaluate)
import Control.DeepSeq (NFData, force)
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t = timeout t . evaluate
timeoutPureDeep :: NFData a => Int -> a -> IO (Maybe a)
timeoutPureDeep t = timeoutPure t . force
You may not want to actually write these functions, but they demonstrate the right approach. evaluate is better than seq for this sort of thing, because seq can potentially be moved around by the compiler, escaping the timeout. I'm not sure if that's actually possible in this case, but it's better to just do the thing that's sure to work than to try to analyze carefully whether the riskier approach is okay.

Why can't I force an IO action with seq?

Given this code snippet:
someFunction x = print x `seq` 1
main = do print (someFunction "test")
why doesn't the print x print test when the code is executed?
$./seq_test
1
If I replace it with error I can check that the left operand of seq is indeed evaluated.
How could I achieve my expected output:
test
1
modifying only someFunction?
Evaluating an IO action does nothing whatsoever. That's right!
If you like, values of IO type are merely "instruction lists". So all you do with that seq is force the program to be sure1 of what should be done if the action was actually used. And using an action has nothing to do with evaluation, it means monadically binding it to the main call. But since, as you say, someFunction is a function with a non-monadic signature, that can't happen here.
What you can do... but don't, is
import Foreign
someFunction x = unsafePerformIO (print x) `seq` 1
this actually couples evaluation to IO execution. Which normally is a really bad idea in Haskell, since evaluation can happen at completely unforseeable order, possibly a different number of times than you think (because the compiler assumes referential transparency), and other mayhem scenarios.
The correct solution is to change the signature to be monadic:
someFunction :: Int -> IO Int
someFunction x = do
print x
return 1
main = do
y <- someFunction "test"
print y
1And as it happens, the program is as sure as possible anyway, even without seq. Any more details can only be obtained by executing the action.
seq evaluated expressions to weak head normal form, which is simply the outermost constructor (or lambda application). The expression print x is already in WHNF, so seq doesn't do anything.
You can get the result you're looking for with the function Debug.Trace.trace.

How to force evaluation in Haskell?

I am relatively new to Haskell and I am trying to learn how different actions can be executed in sequence using the do notation.
In particular, I am writing a program to benchmark an algorithm (a function)
foo :: [String] -> [String]
To this purpose I would like to write a function like
import System.CPUTime
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let r = foo inputList
end <- getCPUTime
return (end - start) -- Possible conversion needed.
The last line might need a conversion (e.g. to milliseconds) but this is not the topic of this question.
Is this the correct way to measure the time needed to compute function foo on some argument inputList?
In other words, will the expression foo inputList be completely reduced before the action end <- getCPUTime is executed? Or will r only be bound to the thunk foo inputList?
More in general, how can I ensure that an expression is completely evaluated before some action is executed?
This question was asked a few months ago on programmers (see here) and had an accepted answer there but it has been closed as off-topic because it belongs on stack overflow. The question could not be moved to stack overflow because it is older than 60 days. So, in agreement with the moderators, I am reposting the question here and posting the accepted question myself because I think it contains some useful information.
Answer originally given by user ysdx on programmers:
Indeed you version will not benchmark your algorithm. As r is not used it will not be evaluated at all.
You should be able to do it with DeepSeq:
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let r = foo inputList
end <- r `deepseq` getCPUTime
return (end - start)
(a `deepseq` b) is some "magic" expression which forces the complete/recursive evaluation of a before returning b.
I would use the language extension -XBangPatterns, I find that quite expressive in such situations. So you would have to say "let !r = foo inputList" as in:
{-# LANGUAGE BangPatterns #-}
import System.CPUTime
benchmark :: [String] -> IO Integer
benchmark inputList = do
start <- getCPUTime
let !r = foo inputList
end <- getCPUTime
return (end - start)

Sequencing IO actions in parallel

I have a function that returns an IO action,
f :: Int -> IO Int
I would like to compute this function in parallel for multiple values of the argument. My naive implementation was as follows:
import Control.Parallel.Strategies
vals = [1..10]
main = do
results <- mapM f vals
let results' = results `using` parList rseq
mapM_ print results'
My reasoning for this was that the first mapM binds something of type IO [Int] to results, results' applies a parallel strategy to the contained list, and the mapM_ finally requests the actual values by printing them - but what is to be printed is already sparked in parallel, so the program should parallelize.
After being happy that it does indeed use all my CPUs, I noticed that the program is less effective (as in wall clock time) when being run with +RTS -N8 than without any RTS flags. The only explanation I can think of is that the first mapM has to sequence - i.e. perform - all the IO actions already, but that would not lead to ineffectivity, but make the N8 execution as effective as the unparallelized one, because all the work is done by the master thread. Running the program with +RTS -N8 -s yields SPARKS: 36 (11 converted, 0 overflowed, 0 dud, 21 GC'd, 4 fizzled), which surely isn't optimal, but unfortunately I can't make any sense of it.
I suppose I've found one of the beginner's stepping stones in Haskell parallelization or the internals of the IO monad. What am I doing wrong?
Background info: f n is a function that returns the solution for Project Euler problem n. Since many of them have data to read, I put the result into the IO monad. An example of how it may look like is
-- Problem 13: Work out the first ten digits of the sum of one-hundred 50-digit numbers.
euler 13 = fmap (first10 . sum) numbers
where
numbers = fmap (map read . explode '\n') $ readFile "problem_13"
first10 n
| n < 10^10 = n -- 10^10 is the first number with 11 digits
| otherwise = first10 $ n `div` 10
The full file can be found here (It's a bit long, but the first few "euler X" functions should be representative enough), the main file where I do the parallelism is this one.
Strategies are for parallel execution of pure computations. If it really is mandatory that your f returns an IO value, then consider using the async package instead. It provides useful combinators for running IO actions concurrently.
For your use case, mapConcurrently looks useful:
import Control.Concurrent.Async
vals = [1..10]
main = do
results <- mapConcurrently f vals
mapM_ print results
(I haven't tested though, because I don't know what your f is exactly.)
Try the parallel-io package. It allows you to change any mapM_ into parallel_.

Are there any problems with this Haskell function for strictly timing a computation?

Recently I was trying to determine the time needed to calculate a waveform using the vector storage type.
I wanted to do so without requiring to print the length or something like that. Finally I came up with the following two definitions. It seems simple enough, and from what I can tell it prints a non-zero computation time as expected the first time I run the function, but I'm wondering if there are any laziness caveats here that I've missed.
import System.IO
import System.CPUTime
import qualified Data.Vector.Storable as V
timerIO f = do
start <- getCPUTime
x <- f
let !y = x
end <- getCPUTime
let diff = (fromIntegral (end - start)) / (10^12)
print $ "Computation time: " ++ show diff ++ " sec\n"
timer f = timerIO $ do return f
main :: IO ()
main = do
let sr = 1000.0
time = V.map (/ sr) $ V.enumFromN 0 120000 :: V.Vector Float
wave = V.map (\x -> sin $ x * 2 * pi * 10) time :: V.Vector Float
timer wave
timer wave
prints,
Computation time: 0.16001 sec
Computation time: 0.0 sec
Are there any hidden bugs here? I'm really not sure that the let with strictness flag is really the best way to go here. Is there a more concise way to write this? Are there any standard functions that already do this that I should know about?
Edit: I should mention that I had read about criterion but in this case I was not looking for a robust way to calculate average timing for profiling-only purposes; rather I was looking for a simple / low-overhead way to integrate single timers into my program for tracing the timing of some computations during normal running of the application. Criterion is cool, but this was a slightly different use case.
If evaluating to weak head normal form is enough - for strict Vectors or UArrays it is -, then your timing code works well¹, however, instead of the bang pattern in the let-binding, you could put a bang on the monadic bind,
start <- getCPUTime
!x <- f
end <- getCPUTime
which to me looks nicer, or you could use Control.Exception.evaluate
start <- getCPUTime
evaluate f
end <- getCPUTime
which has the advantage of (supposed) portability, whereas bang patterns are a GHC extension. If WHNF is not enough, you would need to force full evaluation, for example using rnf or deepseq, like
start <- getCPUTime
!x <- rnf `fmap` f
end <- getCPUTime
However, repeatedly timing the same computation with that is hairy. If, as in your example, you give the thing a name, and call it
timer wave
timer wave
the compiler shares the computation, so it's only done once and all but the first timer calls return zero (or very close to zero) times. If you call it with code instead of a name,
timer (V.map (\x -> sin $ x * 2 * pi * 10) time :: V.Vector Float)
timer (V.map (\x -> sin $ x * 2 * pi * 10) time :: V.Vector Float)
the compiler can still share the computation, if it does common subexpression elimination. And although GHC doesn't do much CSE, it does some and I'm rather confident it would spot and share this (when compiling with optimisations). To reliably make the compiler repeat the computations, you need to hide the fact that they are the same from it (or use some low-level internals), which is not easy to do without influencing the time needed for the computation.
¹ It works well if the computation takes a significant amount of time. If it takes only a short time, the jitter introduced by outside influences (CPU load, scheduling, ...) will make single timings far too unreliable. Then you should do multiple measurements, and for that, as has been mentioned elsewhere, the criterion library is an excellent way to relieve you of the burden of writing robust timing code.
Are you familiar with the deepseq package? It's used by the criterion package for pretty much the purpose you describe.
Speaking of which, you may want to consider whether criterion itself does what you need anyway.

Resources