Am confused about parallel Haskell - multithreading

How is this code:
parfib :: Int -> Int
parfib 0 = 1
parfib 1 = 1
parfib n = nf2 `par` (nf1 `par` (nf1+nf2+1))
where nf1 = parfib (n-1)
nf2 = parfib (n-2)
Better than this:
parfib :: Int -> Int
parfib 0 = 1
parfib 1 = 1
parfib n = nf2 `par` (nf1 `seq` (nf1+nf2+1))
where nf1 = parfib (n-1)
nf2 = parfib (n-2)
I don't get the explanations I've found online that say "In order to guarantee that the main expression is evaluated in the right order (i.e. without blocking the main task on the child task) the seq annotation is used".
Why is seq used? I know it forces the interpreter to evaluate parfib (n-1) first but why is it necessary?
When executing the second program, won't the interpeter spark a new process to evaluate nf2, while evaluating nf1 of the nf1+nf2+1 expression in parallel? What is the need to tell it to specify that it should start with nf1?

It doesn't make much sense to evaluate nf1 in parallel to nf1+... since the latter depends on nf1, so all it would do is block on the spark of nf1. Using seq it will only try to use nf1 once you know it has been evaluated.

it could be because we want to minimize the number of sparks
from my understanding the two vesrsion will produce the same result
but with the first option you will be sparking two additional process (nf1, nf2). But when we use seq we just spark only one additionnal process ( nf1).

Related

Non-pointfree style is substantially slower

I have the following, oft-quoted code for calculating the nth Fibonacci number in Haskell:
fibonacci :: Int -> Integer
fibonacci = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
Using this, I can do calls such as:
ghci> fibonacci 1000
and receive an almost instantaneous answer.
However, if I modify the above code so that it's not in pointfree style, i.e.
fibonacci :: Int -> Integer
fibonacci x = (map fib [0..] !!) x
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
it is substantially slower. To the extent that a call such as
ghci> fibonacci 1000
hangs.
My understanding was that the above two pieces of code were equivalent, but GHCi begs to differ. Does anyone have an explanation for this behaviour?
To observe the difference, you should probably look at Core. My guess that this boils down to comparing (roughly)
let f = map fib [0..] in \x -> f !! x
to
\x -> let f = map fib [0..] in f !! x
The latter will recompute f from scratch on every invocation. The former does not, effectively caching the same f for each invocation.
It happens that in this specific case, GHC was able to optimize the second into the first, once optimization is enabled.
Note however that GHC does not always perform this transformation, since this is not always an optimization. The cache used by the first is kept in memory forever. This might lead to a waste of memory, depending on the function at hand.
I tried to find it but struck out. I think I have it on my PC at home.
What I read was that functions using fixed point were inherently faster.
There are other reasons for using fixed point. I encountered one in writing this iterative Fibonacci function. I wanted to see how an iterative version would perform then I realized I had no ready way to measure. I am a Haskell neophyte. But here is an iterative version for someone to test.
I could not get this to define unless I used the dot after the first last function.
I could not reduce it further. the [0,1] parameter is fixed and not to be supplied as a parameter value.
Prelude> fib = last . flip take (iterate (\ls -> ls ++ [last ls + last (init ls)]) [0,1])
Prelude> fib 25
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025]

Par function underlying logic

How does the par function work? Its signature is:
par :: a -> b -> b.
But this is strange. Why isn't it:
par :: (a -> b) -> a -> b
(get function, execute it in new thread and return result) ?
Another question, is this normal haskell multithreading??
par is for speculative parallelism, and relies on laziness.
You speculate that the unevaluated a should be computed while you're busy working on b.
Later in your program you might refer to a again, and it will be ready.
Here's an example. We wish to add 3 numbers together. Each number is expensive to compute. We can compute them in parallel, then add them together:
main = a `par` b `par` c `pseq` print (a + b + c)
where
a = ack 3 10
b = fac 42
c = fib 34
fac 0 = 1
fac n = n * fac (n-1)
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
Because we don't "execute functions" in Haskell. We evaluate values, that's how we control processor activity. What par x y does is basically: while evaluating the result y, the runtime will also already pre-evaluate x though that's itself not asked for yet.
Note that this isn't necessarily the nicest way of writing parallel code now. Check out newer alternatives like the Eval monad. You may want to read Simon Marlow's book.
In addition to previous answers it is worth pointing out that a and b will be evaluated to weak head normal form (WHNF) only (i.e. applying the outermost reduction or constructor), so it could be useful to force evaluation using deepseq.
In terms of operational semantics par creates a spark which is a pointer to a thunk (unevaluated computation) and added to the spark pool. This is very cheap and it is possible to have millions of sparks. Thread creation is advisory, the run time system can decide not to turn a into thread and prune superfluos parallelism by ignoring the spark or by subsuming the child spark in the parent.
The picture you show could indicate an issue with your code, where thread executed on CPU2 has significantly less work to do (load imbalance).

How to use the memoize function in Data.Function.Memoize

After failing to construct my own memoization table I turned to said class and tried using it to speed up a double recursive definition of the Fibonacci sequence:
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
I have tried using the memoize function of the class in several ways, but even the construction below seems about as slow (eating 10 seconds at fib 33):
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = memoize fib (n-1) + memoize fib (n-2)
fib' :: Int -> Int
fib' n = memoize fib n
I have tried distributing memoize in other ways, but the performance doesn't seem to improve.
I know there are other ways to have this problem in particular computed more efficiently, but for my original problem I would like to make use of the Memoize package. So my question is, how does one improve performance with the memoize function in this package?
Obviously memoisation is only useful if you do it precisely once, and then call to that multiple times. Whereas in your approach you keep memoising the function over and over again. That's not the idea!
fib' n = memoize fib n
is the right start, yet won't work as desired for a subtle reason: it's not a constant applicative form because it explicitly mentions its argument.
fib' = memoize fib
is correct. Now, for this to actually speed up the fib function you must refer back to the memoised version.
fib n = fib' (n-1) + fib' (n-2)
You see: just adhering to DRY strictly and avoiding as much boilerplate as possible gets you to the correct version!
As other said, the problem is that you are memoizing the top-level calls to the function but you are not using that information to avoid recomputing recursive calls. Let's see how we could make sure that the recursive calls are also cached.
First, we have the obvious import.
import Data.Function.Memoize
And then we are going to describe the call graph of the function fib. To do that, we write a higher order function which uses its argument instead of a recursive call:
fib_rec :: (Int -> Int) -> Int -> Int
fib_rec f 0 = 0
fib_rec f 1 = 1
fib_rec f n = f (n - 1) + f (n - 2)
Now, what we want is an operator which takes such an higher order function and somehow "ties the knot" making sure that the recursive calls are indeed the function we are interested in. We could write fix:
fix :: ((a -> b) -> (a -> b)) -> (a -> b)
fix f = f (fix f)
but then we are back to an inefficient solution: we never memoize anything. The alternative solution is to write something that looks like fix but makes sure that memoization happens all over the place. Let's call it memoized_fix:
memoized_fix :: Memoizable a => ((a -> b) -> (a -> b)) -> (a -> b)
memoized_fix = memoize . go
where go f = f (memoized_fix f)
And now you have your efficient function fib_mem:
fib_mem :: Int -> Int
fib_mem = memoized_fix fib_rec
You don't even have to write memoized_fix yourself, it's part of the memoize package.
The memoization package won't turn your function magically in a fast version. What will it do is avoid recomputing any old computation:
import Data.Function.Memoize
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
fib_mem = memoize fib
val = 40
main = do
print $ fib val -- slow
print $ fib val -- slow
print $ fib_mem val -- slow
print $ fib_mem val -- fast!
what you need is a way to avoid recomputing any value in the recursive calls. A simple way to do that would be to compute the fibonacci sequence as an infinite list, an take the nth element:
fibs :: [Int]
fibs = 0:1:(zipWith (+) fibs (tail fibs))
fib_mem n = fibs !! n
A more general technique is to carry a Map Int Int and insert the results inside.
fib :: (Map Int Int) -> Int -> (Int, Map Int Int)
-- implementation left as exercise :-)

What does "!!" mean in haskell?

There are two functions written in the Haskell wiki website:
Function 1
fib = (map fib' [0 ..] !!)
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
Function 2
fib x = map fib' [0 ..] !! x
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
What does the "!!" mean?
This is actually more difficult to read then it would seem at first as operators in haskell are more generic then in other languages.
The first thing that we are all thinking of telling you is to go look this up yourself. If you do not already know about hoogle this is the time to become familiar with it. You can ask it either to tell you what a function does by name or (and this is even more cool) you can give it the type of a function and it can offer suggestions on which function implement that type.
Here is what hoogle tells you about this function (operator):
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an
instance of the more general genericIndex, which takes an index
of any integral type.
Let us assume that you need help reading this. The first line tell us that (!!) is a function that takes a list of things ([a]) and an Int then gives you back one of the thing in the list (a). The descriptions tells you what it does. It will give you the element of the list indexed by the Int. So, xs !! i works like xs[i] would in Java, C or Ruby.
Now we need to talk about how operators work in haskell. I'm not going to give you the whole thing here, but I will at least let you know that there is something more here then what you would encounter in other programming languages. Operators "always" take two arguments and return something (a -> b -> c). You can use them just like a normal function:
add x y
(+) x y -- same as above
But, by default, you can also use them between expression (the word for this is 'infix'). You can also make a normal function work like an operator with backtics:
x + y
x `add` y -- same as above
What makes the first code example you gave (especially for new haskell coders) is that the !! operator is used as a function rather then in a typical operator (infix) position. Let me add some binding so it is clearer:
-- return the ith Fibonacci number
fib :: Int -> Int -- (actually more general than this but do't worry about it)
fib i = fibs !! i
where
fibs :: [Int]
fibs = map fib' [0 ..]
fib' :: Int -> Int
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
You can now work your way back to example 1. Make sure you understand what map fib' [0 ..] means.
I'm sorry your question got down-voted because if you understood what was going on the answer would have been easy to look up, but if you don't know about operators as the exist in haskell it is very hard to mentally parse the code above.
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an instance of the more general genericIndex, which takes an index of any integral type.
See here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:16

How to exploit any parallelism in my haskell parallel code?

I've just stated working in haskell semi-explicit parallelism with GHC 6.12. I've write the following haskell code to compute in parallel the map of the fibonnaci function upon 4 elements on a list, and in the same time the map of the function sumEuler upon two elements.
import Control.Parallel
import Control.Parallel.Strategies
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
mkList :: Int -> [Int]
mkList n = [1..n-1]
relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1
euler :: Int -> Int
euler n = length (filter (relprime n) (mkList n))
sumEuler :: Int -> Int
sumEuler = sum . (map euler) . mkList
-- parallel initiation of list walk
mapFib :: [Int]
mapFib = map fib [37, 38, 39, 40]
mapEuler :: [Int]
mapEuler = map sumEuler [7600, 7600]
parMapFibEuler :: Int
parMapFibEuler = (forceList mapFib) `par` (forceList mapEuler `pseq` (sum mapFib + sum mapEuler))
-- how to evaluate in whnf form by forcing
forceList :: [a] -> ()
forceList [] = ()
forceList (x:xs) = x `pseq` (forceList xs)
main = do putStrLn (" sum : " ++ show parMapFibEuler)
to improve my program in parallel I rewrote it with par and pseq and a forcing function to force whnf evaluation. My problem is that by looking in the threadscope it appear that i didn't gain any parallelism. Things are worse because i didn't gain any speedup.
That why I have theses two questions
Question 1 How could I modify my code to exploit any parallelism ?
Question 2 How could I write my program in order to use Strategies (parMap, parList, rdeepseq and so on ...) ?
First improvement with Strategies
according to his contribution
parMapFibEuler = (mapFib, mapEuler) `using` s `seq` (sum mapFib + sum mapEuler) where
s = parTuple2 (seqList rseq) (seqList rseq)
the parallelism appears in the threadscope but not enough to have a significant speedup
The reason you aren't seeing any parallelism here is because your spark has been garbage collected. Run the program with +RTS -s and note this line:
SPARKS: 1 (0 converted, 1 pruned)
the spark has been "pruned", which means removed by the garbage collector. In GHC 7 we made a change to the semantics of sparks, such that a spark is now garbage collected (GC'd) if it is not referred to by the rest of the program; the details are in the "Seq no more" paper.
Why is the spark GC'd in your case? Look at the code:
parMapFibEuler :: Int
parMapFibEuler = (forceList mapFib) `par` (forceList mapEuler `pseq` (sum mapFib + sum mapEuler))
the spark here is the expression forkList mapFib. Note that the value of this expression is not required by the rest of the program; it only appears as an argument to par. GHC knows that it isn't required, so it gets garbage collected.
The whole point of the recent changes to the parallel package were to let you easily avoid this bear trap. A good Rule of Thumb is to use Control.Parallel.Strategies rather than par and pseq directly. My preferred way to write this would be
parMapFibEuler :: Int
parMapFibEuler = runEval $ do
a <- rpar $ sum mapFib
b <- rseq $ sum mapEuler
return (a+b)
but sadly this doesn't work with GHC 7.0.2, because the spark sum mapFib is floated out as a static expression (a CAF), and the runtime doesn't think sparks that point to static expressions are worth keeping (I'll fix this). This wouldn't happen in a real program, of course! So let's make the program a bit more realistic and defeat the CAF optimisation:
parMapFibEuler :: Int -> Int
parMapFibEuler n = runEval $ do
a <- rpar $ sum (take n mapFib)
b <- rseq $ sum (take n mapEuler)
return (a+b)
main = do [n] <- fmap (fmap read) getArgs
putStrLn (" sum : " ++ show (parMapFibEuler n))
Now I get good parallelism with GHC 7.0.2. However, note that #John's comments also apply: generally you want to look for more fine-grained parallelism so as to let GHC use all your processors.
Your parallelism is far too course-grained to have much beneficial effect. The largest chunks of work that can be done in parallel efficiently are in sumEuler, so that's where you should add your par annotations. Try changing sumEuler to:
sumEuler :: Int -> Int
sumEuler = sum . (parMap rseq euler) . mkList
parMap is from Control.Parallel.Strategies; it expresses a map that can be done in parallel. The first argument, rseq having type Strategy a, is used to force the computation to a specific point, otherwise no work would be done, due to laziness. rseq is fine for most numeric types.
It's not useful to add parallelism to fib here, below about fib 40 there isn't enough work to make it worthwhile.
In addition to threadscope, it's useful to run your program with the -s flag. Look for a line like:
SPARKS: 15202 (15195 converted, 0 pruned)
in the output. Each spark is an entry in a work queue to possibly be performed in parallel. Converted sparks are actually done in parallel, while pruned sparks mean that the main thread got to them before a worker thread had the chance to do so. If the pruned number is high, it means your parallel expressions are too fine-grained. If the total number of sparks is low, you aren't trying to do enough in parallel.
Finally, I think parMapFibEuler is better written as:
parMapFibEuler :: Int
parMapFibEuler = sum (mapFib `using` parList rseq) + sum mapEuler
mapEuler is simply too short to have any parallelism usefully expressed here, especially as euler is already performed in parallel. I'm doubtful that it makes a substantial difference for mapFib either. If the lists mapFib and mapEuler were longer, parallelism here would be more useful. Instead of parList you may be able to use parBuffer, which tends to work well for list consumers.
Making these two changes cuts the runtime from 12s to 8s for me, with GHC 7.0.2.
Hmmm... Maybe?
((forceList mapFib) `par` (forceList mapEuler)) `pseq` (sum mapFib + sum mapEuler)
I.e. spawn mapFib in background and calculate mapEuler and only after it (mapEuler) do (+) of their sums.
Actually I guess you can do something like:
parMapFibEuler = a `par` b `pseq` (a+b) where
a = sum mapFib
b = sum mapEuler
About Q2:
As I know strategies - is the "strategies" to combine data-structures with those par and seq.
You can write your forceList = withStrategy (seqList rseq)
As well you can write your code like:
parMapFibEuler = (mapFib, mapEuler) `using` s `seq` (sum mapFib + sum mapEuler) where
s = parTuple2 (seqList rseq) (seqList rseq)
I.e. strategy applied to tuple of two lists will force their evaulation in parallel, but each list will be forced to be evaluated sequentially.
First off, I assume you know your fib definition is awful and you're just doing this to play with the parallel package.
You seem to be going for parallelism at the wrong level. Parallelizing mapFib and mapEuler won't give a good speed-up because there is more work to compute mapFib. What you should do is compute each of these very expensive elements in parallel, which is slightly finer grain but not overly so:
mapFib :: [Int]
mapFib = parMap rdeepseq fib [37, 38, 39, 40]
mapEuler :: [Int]
mapEuler = parMap rdeepseq sumEuler [7600, 7600, 7600,7600]
parMapFibEuler :: Int
parMapFibEuler = sum a + sum b
where
a = mapFib
b = mapEuler
Also, I originally fought using Control.Parallel.Strategies over Control.Parallel but have come to like it as it is more readable and avoids issues like yours where one would expect parallelism and have to squint at it to figure out why you aren't getting any.
Finally, you should always post how you compile and how you run code you're expecting to be parallelized. For example:
$ ghc --make -rtsopts -O2 -threaded so.hs -eventlog -fforce-recomp
[1 of 1] Compiling Main ( so.hs, so.o )
Linking so ...
$ ./so +RTS -ls -N2
sum : 299045675
Yields:

Resources