Looking at the following example from Parallel and Concurrent Programming in Haskell:
main = do
[n] <- getArgs
let test = [test1,test2,test3,test4] !! (read n - 1)
t0 <- getCurrentTime
r <- evaluate (runEval test)
printTimeSince t0
print r
printTimeSince t0
test1 = do
x <- rpar (fib 36)
y <- rpar (fib 35)
return (x,y)
The book shows its compilation:
$ ghc -O2 rpar.hs -threaded
And then running the above test:
$ ./rpar 1 +RTS -N2
time: 0.00s
(24157817,14930352)
time: 0.83s
If I understand correctly, the Eval Monad (using rpar) results in both fib 36 and fib 35 being computed in parallel.
Does the actual work, i.e. computing the function fib ... occur when calling (runEval test)? Or perhaps evaluate ... is required? Or, finally, perhaps it gets computed when calling print r to evaluate it entirely?
It's not clear to me when the actual work gets performed for rpar.
Here's my guess, but I can't seem to replicate this on my laptop, too many imports I'd have to get from cabal.
test1 = do
x <- rpar (fib 36)
y <- rpar (fib 35)
return (x,y)
In this, you spark the evaluation of (fib 36) and (fib 35) in parallel, but you don't wait for them - you just return (x,y) immediately, while x and y are still evaluating. Then, we you get to print r, you are forced to wait until x and y finish evaluating.
In theory, the following code should force test1 to wait until x and y have finished evaluating before returning them.
test1 = do
x <- rpar (fib 36)
y <- rpar (fib 35)
rseq x
rseq y
return (x,y)
Then, running this should give you approximately
$ ./rpar 1 +RTS -N2
time: 0.83s
(24157817,14930352)
time: 0.83s
hopefully...
EDIT
Finally got back to my machine, replicated the condition, and my suggested code gives the expected result. However, the OP raises another good question: if evaluate only evaluates to the WHNF, why does it even end up doing work before print is called?
The answer is in the monad definition of Control.Parallel.Strategies - in other words, it isn't evaluate that pushes the evaluation of x and y, but runEval. The Eval monad is strict in the first argument: in x >>= f it will evaluate x (please check out this question before continuing). Then, de-sugaring test1 gives:
test1 = (rpar (fib 36)) >>= (\x ->
(rpar (fib 35)) >>= (\y ->
(rseq x) >>= (\_ ->
(rseq y) >>= (\_ ->
return (x,y)))))
Then, since rpar only "sparks" evaluation, it uses par (which begins the evaluation of the first argument but immediately returns the second) and immediately returns Done, however, rseq (like seq, but strict only in the first argument) does not return Done until its argument is actually evaluated (to WHNF). Hence, without the rseq calls, you have know that x and y have begun to be evaluated but no assurance that they have finished, but with those calls, you know both x and y are also evaluated before return is called on them.
Related
I am dealing with memory leaks in my Haskell program and I was able to isolate it to very basic laziness problem in dealing with arrays. I understand what's happening there. First element of the array is computed while the rest produce the delayed computations which consumes the heap. Unfortunately, I was unable to force strictness for the entire array computation.
I tried various combinations of seq, BangPatterns, ($!) without much success.
import Control.Monad
force x = x `seq` x
loop :: [Int] -> IO ()
loop x = do
when (head x `mod` 10000000 == 0) $ print x
let x' = force $ map (+1) x
loop x'
main = loop $ replicate 200 1
The profile with standard profiling options didn't give me any more information than I already know:
ghc -prof -fprof-auto-calls -rtsopts test.hs
./test +RTS -M300M -p -hc
This runs out of memory in the matter of a few seconds.
force x = x `seq` x
That's useless. seq doesn't mean "evaluate this thing now"; it means "evaluate the left thing before returning the result of evaluating the right thing". When they're the same, it does nothing, and your force is equivalent to just id. Try this instead:
import Control.DeepSeq
import Control.Monad
loop :: [Int] -> IO ()
loop x = do
when (head x `mod` 10000000 == 0) $ print x
let x' = map (+1) x
loop $!! x'
main = loop $ replicate 200 1
That evaluates x' and everything in it before loop x', which is useful.
Alternatively, Control.DeepSeq has a force function that is useful. Its semantics in this case are "evaluate all of the elements of your list before returning the result of evaluating any of it". If you used its force function in place of your own, your original code would otherwise work, since the first line of loop does evaluate the beginning of the list.
I am trying to follow the implementation of the countdown problem shown in this video (https://www.youtube.com/watch?v=rlwSBNI9bXE&) and I thought it would be a good problem to try and run in parallel?
data Op =
Add
| Sub
| Mul
| Div deriving (Eq)
data Expr =
Val Int
| App Op Expr Expr
--{ helper functions }
solutions' :: [Int] -> Int -> [Expr]
solutions' ns n =
[e | ns' <- choices ns, (e, m) <- results ns', m == n]
I tried following some other posts on how to do this and came up with something like this
instance NFData Op where
rnf Add = Add `deepseq` ()
rnf Sub = Sub `deepseq` ()
rnf Mul = Mul `deepseq` ()
rnf Div = Div `deepseq` ()
instance NFData Expr where
rnf (Val a) = a `deepseq` ()
rnf (App o l r) = (rnf o) `deepseq` (rnf l) `deepseq` (rnf r)
solutions' :: [Int] -> Int -> [Expr]
solutions' ns n =
([e | ns' <- choices ns, (e, m) <- results ns', m == n]
`using` parList rdeepseq)
It compiles but the program crashes when i try and run it. To be honest I was really just guessing on what I wrote.
How do I get this to run in parallel?
when I run in GHCI
>λ= r = (solutions' [1,3,7,10,25,50] 765)
(0.00 secs, 0 bytes)
>λ= mapM_ print r
*** Exception: stack overflow
>λ=
if i compile with
ghc ./Countdown.hs +RTS -N8 -s
and then run the executable, it does not terminate.
Ok, so I just clicked at a random timestamp in the video, and by sheer luck I got a slide that describes what's wrong.
For our example, only about 5 million of the 33 million possible expressions are valid.
So, this means that you are evaluating
_fiveMillionList `using` parList rdeepseq
Now, the way (`using` parList _strat) works is that it immediately forces the entire spine of the list. When you begin evaluating your expression, parList forces all the cells of the list to exist. Further, as #DavidFletcher notes, your parallelism is actually useless. Because the filtration is underneath the using, forcing the entire spine of the list also forces all 33 million Exprs to exist, because you need to know how many elements passed the (==) test, so you need to create the Exprs to test them. They don't need to all exist simultaneously, but, in the end, 5 million of them (not counting the Exprs recursively contained in them), plus 5 million (:) constructors, will be held in memory. To add insult to injury, you proceed to create 5 million more objects in the form of dud sparks. And, all of this is being orchestrated by 5 million calls to the Eval monad's (>>=) function. I'm not sure which one of these exactly is sitting resident in memory for long enough to cause a stack overflow, but I'm fairly sure that parList is the culprit.
Perhaps try a more reasonable Strategy. I think you are pretty much forced into using parBuffer, because you need laziness. Using parBuffer n strat, if you evaluate a (:)-cell, then the strategy ensures that the next n - 1 elements have been sparked. So, essentially, it "runs ahead" of any consumer that starts at the head of the list, maintaining a buffer of parallely-evaluated elements. Something like parBuffer 1000 rdeepseq should be fine.
Your NFData instances could use some work. They aren't the problem, but they don't really demonstrate a sound understanding of how evaluation works. I'll just leave them here:
instance NFData Op where
-- (seq/deepseq) x y is defined by
-- "if you want to evaluate (seq/deepseq) x y to WHNF, then you must
-- evaluate x to WHNF/NF, then evaluate y to WHNF."
-- but e.g. Add is already in WHNF and NF, so seq Add and deeqseq Add are no-ops
-- the actual evaluation is already finished by the case in rnf's equations
-- you could even write rnf x = x `seq` (), but I think it's best to be explicit
rnf Add = ()
rnf Sub = ()
rnf Mul = ()
rnf Div = ()
instance NFData Expr where
rnf (Val a) = a `deepseq` ()
-- rnf o, rnf l :: ()
-- WHNF and NF are the same thing for the type (); all constructors are nullary
-- therefore (deepseq (rnf x) y) = seq (rnf x) y
-- but then seq (rnf x) y = deepseq x y {by definition}
rnf (App o l r) = o `deepseq` l `deepseq` rnf r
I was reading https://wiki.haskell.org/Do_notation_considered_harmful and was surprised to read the following lines
Newcomers might think that the order of statements determines the order of execution. ... The order of statements is also not the criterion for the evaluation order.
The wiki post gave some examples that demonstrate this property. While the examples make sense, I still don't fully believe that the statement is true, since if I write something like
main = do
putStrLn "foo"
putStrLn "bar"
putStrLn "baz"
The three lines come out in the order of the statements. So what exactly is going on here?
What it says is that the order of statements doesn't influence the evaluation criteria. As #chi points out in IO monad effects are sequenced in order but their evaluation order is still not known. An example of a monad which will make the concept clear:
test = do
x <- Just (2 + undefined)
y <- Nothing
return (x + y)
In ghci:
λ> test
Nothing
The above code has three statements. It can be de-sugared into the following form:
Just (2 + undefined) >>= \x -> Nothing >>= \y -> return (x + y)
Now since (>>=) is left associative, it will be evaluated like this:
(Just (2 + undefined) >>= \x -> Nothing) >>= \y -> return (x + y)
Note that Maybe monad is defined like this:
(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
Nothing >>= _ = Nothing -- A failed computation returns Nothing
(Just x) >>= f = f x -- Applies function f to value x
Applying the value (2 + undefined) to the function \x -> Nothing will result in Nothing.
The expression 2 + undefined is unevaluated, thanks to lazy evaluation strategy followed by Haskell.
Now we have a reduced form:
Nothing >>= \y -> return (2 + undefined + y)
Looking at the Monad instance for it, you can see that this will produce Nothing because Nothing >>= _ = Nothing.
What if the argument was strict instead:
test = do
!x <- Just (2 + undefined)
y <- Nothing
return (y + x)
Demo in ghci:
λ> test
*** Exception: Prelude.undefined
If we follows strict evaluation procedure, then you can see that order actually matters. But in a lazy setting, the order of statements doesn't matter. And hence the wiki claims, "the order of statements is not the criterion for the evaluation order".
I have the following code:
doSomething :: [Int] -> [Int]
doSomething arg = arg ++ [1]
afterThreeTurns = do
first <- ["test"]
doSomething [1] -- COMMENT THIS
return first
This returns:
*Main> afterThreeTurns
["test","test"]
If I take out the line marked COMMENT THIS, it returns ["test"] as expected. Why? The way I see it doSomething should have no effect on first?
Since doSomething [1] is [2,1], your code is equivalent to:
afterThreeTurns = do
first <- ["test"]
x <- [2,1]
return first
This is the same as the list comprehension [ first | first <- ["test"], x <- [2,1] ] which explains why you are getting a list of length 2.
Note that the variable x is not referenced anywhere, so this could also be written:
afterThreeTurns = do
first <- ["test"]
_ <- [2,1]
return first
Here is an analogous case using the IO monad. The code:
thirdLine = do
getLine
getLine
x <- getLine
putStrLn $ "The third line is: " ++ x
is the same as:
thirdLine = do
_ <- getLine
_ <- getLine
x <- getLine
putStrLn $ "The third line is: " ++ x
You can get ghc to flag these kinds of monadic statements with the -fwarn-unused-do-bind compiler flag. In your example ghc will emit the warning:
...: Warning:
A do-notation statement discarded a result of type ‘Int’
Suppress this warning by saying ‘_ <- doSomething [1]’
or by using the flag -fno-warn-unused-do-bind
Lets turn this into the equivalent calls to >>=:
["test"] >>= (\first ->
doSomething [1] >>= (\_ -> return first))
The compiler always does this internally with do. The two ways of writing it are exactly the same.
Now, the >>= for [] is the same as concatMap with its arguments flipped, so lets go ahead and make that transformation as well (and apply the definition return x = [x]) and reduce:
concatMap (\first -> concatMap (\_ -> [first]) (doSomething [1])) ["test"]
concatMap (\first -> concatMap (\_ -> [first]) ([1] ++ [1])) ["test"]
concatMap (\first -> concatMap (\_ -> [first]) [1, 1]) ["test"]
concatMap (\first -> concat [[first], [first]]) ["test"]
concatMap (\first -> [first, first]) ["test"]
concat [["test"], ["test"]]
["test", "test"]
Intuitively, the [] Monad can be thought of as representing a "nondeterministic" computation (that is, a computation that may take on one of several different possible results). When you combine two nondeterministic computations in this way, the number of results multiply. This is due to the different "paths" that can be taken, one for each possibility at each branch.
For reference, here is the conversions between do notation and >>= and >> calls (note that m1 >> m2 must always be equivalent to m1 >>= (\_ -> m2)):
do x <- m1 m1 >>= (\x ->
m2 x m2 x
... ...)
do m1 m1
m2 >> m2
do x x
In the List monad (or []) the "effect" is returning multiple results instead of one. So, your doSomething function doesn't contribute to the result, but does contribute to the effect, thus changing the length of the final list.
I stumbled upon a problem with Eval monad and rpar Strategy in Haskell. Consider following code:
module Main where
import Control.Parallel.Strategies
main :: IO ()
main = print . sum . inParallel2 $ [1..10000]
inParallel :: [Double] -> [Double]
inParallel xss = runEval . go $ xss
where
go [] = return []
go (x:xs) = do
x' <- rpar $ x + 1
xs' <- go xs
return (x':xs')
inParallel2 :: [Double] -> [Double]
inParallel2 xss = runEval . go $ xss
where
go [] = return []
go [x] = return $ [x + 1]
go (x:y:xs) = do
(x',y') <- rpar $ (x + 1, y + 1)
xs' <- go xs
return (x':y':xs'
I compile and run it like this:
ghc -O2 -Wall -threaded -rtsopts -fforce-recomp -eventlog eval.hs
./eval +RTS -N3 -ls -s
When I use inParallel function parallelism works as expected. In the output runtime statistics I see:
SPARKS: 100000 (100000 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
When I switch to inParallel2 function all parallelism is gone:
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
Why doesn't evaluation of tuples work in parallel? I tried forcing the tuple before passing it to rpar:
rpar $!! (x + 1, y + 1)
but still no result. What am I doing wrong?
The rpar strategy annotates a term for possible evaluation in parallel, but only up to weak head normal form, which essentially means, up to the outermost constructor. So for an integer or double, that means full evaluation, but for a pair, only the pair constructor, not its components, will get evaluated.
Forcing the pair before passing it to rpar is not going to help. Now you're evaluating the pair locally, before annotating the already evaluated tuple for possible parallel evaluation.
You probably want to combine the rpar with the rdeepseq strategy, thereby stating that the term should be completely evaluated, if possible in parallel. You can do this by saying
(rpar `dot` rdeepseq) (x + 1, y + 1)
The dot operator is for composing strategies.
There is, however, yet another problem with your code: pattern matching forces immediate evaluation, and therefore using pattern matching for rpar-annotated expressions is usually a bad idea. In particular, the line
(x',y') <- (rpar `dot` rdeepseq) (x + 1, y + 1)
will defeat all parallelism, because before the spark can be picked up for evaluation by another thread, the local thread will already start evaluating it in order to match the pattern. You can prevent this by using a lazy / irrefutable pattern:
~(x',y') <- (rpar `dot` rdeepseq) (x + 1, y + 1)
Or alternatively use fst and snd to access the components of the pair.
Finally, don't expect actual speedup if you create sparks that are as cheap as adding one to an integer. While sparks themselves are relatively cheap, they are not cost-free, so they work better if the computation you are annotating for parallel evaluation is somewhat costly.
You might want to read some tutorials on using strategies, such as Simon Marlow's
Parallel and Concurrent Programming using Haskell or my own Deterministic Parallel Programming in Haskell.