Par function underlying logic - haskell

How does the par function work? Its signature is:
par :: a -> b -> b.
But this is strange. Why isn't it:
par :: (a -> b) -> a -> b
(get function, execute it in new thread and return result) ?
Another question, is this normal haskell multithreading??

par is for speculative parallelism, and relies on laziness.
You speculate that the unevaluated a should be computed while you're busy working on b.
Later in your program you might refer to a again, and it will be ready.
Here's an example. We wish to add 3 numbers together. Each number is expensive to compute. We can compute them in parallel, then add them together:
main = a `par` b `par` c `pseq` print (a + b + c)
where
a = ack 3 10
b = fac 42
c = fib 34
fac 0 = 1
fac n = n * fac (n-1)
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Because we don't "execute functions" in Haskell. We evaluate values, that's how we control processor activity. What par x y does is basically: while evaluating the result y, the runtime will also already pre-evaluate x though that's itself not asked for yet.
Note that this isn't necessarily the nicest way of writing parallel code now. Check out newer alternatives like the Eval monad. You may want to read Simon Marlow's book.

In addition to previous answers it is worth pointing out that a and b will be evaluated to weak head normal form (WHNF) only (i.e. applying the outermost reduction or constructor), so it could be useful to force evaluation using deepseq.
In terms of operational semantics par creates a spark which is a pointer to a thunk (unevaluated computation) and added to the spark pool. This is very cheap and it is possible to have millions of sparks. Thread creation is advisory, the run time system can decide not to turn a into thread and prune superfluos parallelism by ignoring the spark or by subsuming the child spark in the parent.
The picture you show could indicate an issue with your code, where thread executed on CPU2 has significantly less work to do (load imbalance).

Related

How does haskell manage memory of recursive function calls

I have been working on a problem that benefits a lot from catching results of my functions and in my research I came across this article. I am stunned at how simple is the core in "Memoization with recursion" section namely:
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
I feel like I understand how it work but do correct me if I'm wrong - this function saves a list which is populated using same function.
What bothers me is that I don't understand why this works, originally I was under impression that once haskell evaluates a function it releases memory that was used to store variables inside this function, but here it seems that if part of the list was evaluated by one call of this function those values are still available to another call of the same function.
Just typing this up makes my head hurt, because I don't understand why value used in calculation of fib 2 should be available in calculation of fib 3 or better yest fib 100?
My gut feeling tells me that this behavior has two problems(I'm probably wrong but again not sure why):
purity of this function we are evaluating one call using variable that did not arrive from parameters of this function
memory leaks no longer sure when will haskell release memory from this list
I think it's easier to understand if you compare your definition to this:
not_memoized_fib :: Int -> Integer
not_memoized_fib m = map fib [0 ..] !! m
where fib 0 = 0
fib 1 = 1
fib n = not_memoized_fib (n-2) + not_memoized_fib (n-1)
The definition above is essentially the same as yours, except that it takes an explicit argument m. It is a so-called eta-expansion of the previous function, and is semantically equivalent to it. Yet, operationally, this has drastically worse performance, since memoization here does not take place.
Why? Well, your function defines the list map fib [0..] before taking the (implicit) input parameter m, so there is only one list around, for all m we may pass later on as arguments. Instead, in not_memoized_fib we first take m as input, and then define the list, making the function create a list for every call to not_memoized_fib, destroying performance.
It is even easier to see if we use let and lambdas instead of where. Compare
memoized :: Int -> Integer
memoized = let
list = map fib [0..]
fib 0 = 0
fib 1 = 1
fib n = memoized (n-1) + memoized (n-2)
in \m -> list !! m
-- ^^ here we take m, after everything is defined,
with its let over lambda (*) code structure, to
not_memoized :: Int -> Integer
not_memoized = \m -> let
-- ^^ here we take m, before everything is defined, so
-- we define local bindings {list,fib} at every call
list = map fib [0..]
fib 0 = 0
fib 1 = 1
fib n = not_memoized (n-1) + not_memoized (n-2)
in list !! m
with the let inside the lambda.
In the former case, there is only one list around, while in the latter there is one list for each call.
(*) a searchable term.
The list defined by map fib [0..] is defined as part of the definition of the function, rather than being created each time the function is called. Due to laziness, though, the list is only "realized" as necessary for any given call.
Say your first call is memoized_fib 10. This will cause the first 10 Fibonacci numbers to actually be computed and stored in memory, and they will stay in memory for the duration of the program. Subsequent calls with a smaller argument don't need to compute anything; subsequent calls with larger arguments need only compute those elements that occur later in the list than the largest existing element.

What functions are cached in Haskell?

I have the following code:
memoize f = (map f [0 ..] !!)
fib' 0 = 1
fib' 1 = 1
fib' n = fib' (n - 1) + fib' (n - 2)
fibMemo n = memoize fib' n
fibMemo' = memoize fib'
(I am aware of that fibonacci implementation has exponential time complexity and does not use the cache)
The first time I execute fibmemo' 30 it takes 3 seconds, and the second time it takes ~0 seconds, because the result is cached. But the first version, fibmemo, does not get the result cached, it always takes 3 seconds to execute. The only difference is the definition (which as far as I know are equivalent).
So my question is, what functions are cached in Haskell?
I have already read https://wiki.haskell.org/Memoization and does not resolve my question.
Essentially, the functions you defined behave as the following ones:
fibMemo n = let m = map fib' [0..] in m !! n
fibMemo' = let m = map fib' [0..] in (m !!)
Why is fibMmemo' more efficient? Well, we can rewrite it as
fibMemo' = let m = map fib' [0..] in \n -> m !! n
which makes it more clear that the single list m gets created before n is taken as input. This means that all the calls to fibMemo' will use the same m. The first call evaluates a part of m slowly, and the successive calls will reuse that cached result (assuming the call hits the cache, of course, otherwise another part of m is evaluated and cached).
Instead, fibMemo is equivalent to
fibMemo = \n -> let m = map fib' [0..] in m !! n
which takes the input n before the list m gets created. So, each call gets a new cache, which is pointless, since the whole purpose of a cache is that it is reused later.
The order of the lambda \n -> vs the let m = .. matters a lot in terms of the performance. Since m = .. does not use n, technically the let m = .. can be floated outwards, essentially turning fibMemo into fibMemo', without affecting the semantics. However, as you discovered, this does not preserve performance, in general!
This is indeed an optimization that GHC could perform, but does not, because it can easily make the performance significantly worse.

Non-pointfree style is substantially slower

I have the following, oft-quoted code for calculating the nth Fibonacci number in Haskell:
fibonacci :: Int -> Integer
fibonacci = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
Using this, I can do calls such as:
ghci> fibonacci 1000
and receive an almost instantaneous answer.
However, if I modify the above code so that it's not in pointfree style, i.e.
fibonacci :: Int -> Integer
fibonacci x = (map fib [0..] !!) x
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
it is substantially slower. To the extent that a call such as
ghci> fibonacci 1000
hangs.
My understanding was that the above two pieces of code were equivalent, but GHCi begs to differ. Does anyone have an explanation for this behaviour?
To observe the difference, you should probably look at Core. My guess that this boils down to comparing (roughly)
let f = map fib [0..] in \x -> f !! x
to
\x -> let f = map fib [0..] in f !! x
The latter will recompute f from scratch on every invocation. The former does not, effectively caching the same f for each invocation.
It happens that in this specific case, GHC was able to optimize the second into the first, once optimization is enabled.
Note however that GHC does not always perform this transformation, since this is not always an optimization. The cache used by the first is kept in memory forever. This might lead to a waste of memory, depending on the function at hand.
I tried to find it but struck out. I think I have it on my PC at home.
What I read was that functions using fixed point were inherently faster.
There are other reasons for using fixed point. I encountered one in writing this iterative Fibonacci function. I wanted to see how an iterative version would perform then I realized I had no ready way to measure. I am a Haskell neophyte. But here is an iterative version for someone to test.
I could not get this to define unless I used the dot after the first last function.
I could not reduce it further. the [0,1] parameter is fixed and not to be supplied as a parameter value.
Prelude> fib = last . flip take (iterate (\ls -> ls ++ [last ls + last (init ls)]) [0,1])
Prelude> fib 25
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025]

Haskell: repeat a function a large number of times without stackoverflow

As a newbie to Haskell I am trying to iterate a function (e.g., the logistic map) a large number of times. In an imperative language this would be a simple loop, however in Haskell I end up with stack overflow. Take for example this code:
main = print $ iter 1000000
f x = 4.0*x*(1.0-x)
iter :: Int -> Double
iter 0 = 0.3
iter n = f $ iter (n-1)
For a small number of iterations the code works, but for a million iterations I get a stack space overflow:
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I cannot understand why this does happen. The tail recursion should be fine here.
Maybe the problem is lazy evaluation. I experimented with several ways to force strict evaluation, by inserting $! or seq at various positions, but with no success.
What would be the Haskell way to iterate a function a huge number of times?
I have tried suggestions from related posts: here or here, but I always ended up with stackoverflow for a large number of iterations, e.g., main = print $ iterate f 0.3 !! 1000000.
The problem is that your definition
iter :: Int -> Double
iter 0 = 0.3
iter n = f $ iter (n-1)
tries to evaluate in the wrong direction. Unfolding it for a few steps, we obtain
iter n = f (iter (n-1))
= f (f (iter (n-2)))
= f (f (f (iter (n-3))))
...
and the entire call stack from iter 1000000 to iter 0 has to be built before anything can be evaluated. It would be the same in a strict language. You have to organise it so that part of the evaluation can take place before recurring. The usual way is to have an accumulation parameter, like
iter n = go n 0.3
where
go 0 x = x
go k x = go (k-1) (f x)
Then adding strictness annotations - in case the compiler doesn't already add them - will make it run smoothly without consuming stack.
The iterate variant has the same problem as your iter, only the call stack is built inside-out rather than outside-in as for yours. But since iterate builds its call-stack inside-out, a stricter version of iterate (or a consumption pattern where earlier iterations are forced before) solves the problem,
iterate' :: (a -> a) -> a -> [a]
iterate' f x = x `seq` (x : iterate' f (f x))
calculates iterate' f 0.3 !! 1000000 without problem.

Am confused about parallel Haskell

How is this code:
parfib :: Int -> Int
parfib 0 = 1
parfib 1 = 1
parfib n = nf2 `par` (nf1 `par` (nf1+nf2+1))
where nf1 = parfib (n-1)
nf2 = parfib (n-2)
Better than this:
parfib :: Int -> Int
parfib 0 = 1
parfib 1 = 1
parfib n = nf2 `par` (nf1 `seq` (nf1+nf2+1))
where nf1 = parfib (n-1)
nf2 = parfib (n-2)
I don't get the explanations I've found online that say "In order to guarantee that the main expression is evaluated in the right order (i.e. without blocking the main task on the child task) the seq annotation is used".
Why is seq used? I know it forces the interpreter to evaluate parfib (n-1) first but why is it necessary?
When executing the second program, won't the interpeter spark a new process to evaluate nf2, while evaluating nf1 of the nf1+nf2+1 expression in parallel? What is the need to tell it to specify that it should start with nf1?
It doesn't make much sense to evaluate nf1 in parallel to nf1+... since the latter depends on nf1, so all it would do is block on the spark of nf1. Using seq it will only try to use nf1 once you know it has been evaluated.
it could be because we want to minimize the number of sparks
from my understanding the two vesrsion will produce the same result
but with the first option you will be sparking two additional process (nf1, nf2). But when we use seq we just spark only one additionnal process ( nf1).

Resources