how does parBuffer work? - haskell

I was looking at the code of parBuffer in parallel-3.2.0.4 but I am missing something on how it works. I don't see how can it create new sparks aside from the initial ones.
As far as I can see it's using start in parBufferWHNF to force the first n to be sparked with par, and then going through ret it's using par again on the same entries (shouldn't this just discard y and not risk to get the spark GC'd?) while returning the corresponding result? and then it's returning directly xs, without any additional spark creation as rdeepseq is just calling pseq.
But clearly testing code like this
withStrategy (parBuffer 10 rdeepseq) $ take 100 [ expensive stuff ]
I can see all the 100 sparks in the ghc RTS informations, but where are the other 90 created?
Here is the code I was looking at:
parBufferWHNF :: Int -> Strategy [a]
parBufferWHNF n0 xs0 = return (ret xs0 (start n0 xs0))
where -- ret :: [a] -> [a] -> [a]
ret (x:xs) (y:ys) = y `par` (x : ret xs ys)
ret xs _ = xs
-- start :: Int -> [a] -> [a]
start 0 ys = ys
start !_n [] = []
start !n (y:ys) = y `par` start (n-1) ys
-- | Like 'evalBuffer' but evaluates the list elements in parallel when
-- pushing them into the buffer.
parBuffer :: Int -> Strategy a -> Strategy [a]
parBuffer n strat = parBufferWHNF n . map (withStrategy strat)

parBuffer is conceptually similar to a circular buffer with a constant window size rolling over the input and producing the output and is useful when implementing pipeline parallelism or working with lazy streams.
Its implementation internally depends on how the result is evaluated -- it makes
use of lazyness and graph sharing (which explains why the sparks are not discarded) to produce output as input is consumed ensuring that the number of threads is limited to N and hence constant space is used (as opposed to parList which is linear in the length of argument list).
The start function is used to create the initial N sparks and pass the rest of the input to ret unsparked. The ret function takes two lists (xs0 and xs0 but without the initial N elements, as returned by start) and sparks an element
from the second list every time a thread completes (the x in the result; this actually happens once the user demands the results) until there are no elements left.

Related

Understanding non-strictness in Haskell with a recursive example

What is the difference between this two, in terms of evaluation?
Why this "obeys" (how to say?) non-strictness
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = []
recFilter p (h:tl) = if (p h)
then h : recFilter p tl
else recFilter p tl
while this doesn't?
recFilter :: (a -> Bool) -> [a] -> Int -> [a]
recFilter _ xs 0 = xs
recFilter p (h:tl) len
| p(h) = recFilter p (tl ++ [h]) (len-1)
| otherwise = recFilter p tl (len-1)
Is it possible to write a tail-recursive function non-strictly?
To be honest I also don't understand the call stack of the first example, because I can't see where that h: goes. Is there a way to see this in ghci?
The non-tail recursive function roughly consumes a portion of the input (the first element) to produce a portion of the output (well, if it's not filtered out at least). Then recursion handles the next portion of the input, and so on.
Your tail recursive function will recurse until len becomes zero, and only at that point it will output the whole result.
Consider this pseudocode:
def rec1(p,xs):
case xs:
[] -> []
(y:ys) -> if p(y): print y
rec1(p,ys)
and compare it with this accumulator-based variant. I'm not using len since I use a separate accumulator argument, which I assume to be initially empty.
def rec2(p,xs,acc):
case xs:
[] -> print acc
(y:ys) -> if p(y):
rec2(p,ys,acc++[y])
else:
rec2(p,ys,acc)
rec1 prints before recursing: it does not need to inspect the whole input list to start printing its output. It works in a "steraming" fashion, in a sense. Instead, rec2 will only start to print at the very end, after the input list was completely scanned.
In your Haskell code there are no prints, of course, but you can thing of returning x : function call as "printing x", since x is made available to the caller of our function before function call is actually made. (Well, to be pedantic this depends on how the caller will consume the output list, but I'll neglect this.)
Hence the non-tail recursive code can also work on infinite lists. Even on finite inputs, performance is improved: if we call head (rec1 p xs), we only evaluate xs until the first non-discarded element. By contrast head (rec2 p xs) would fully filter the whole list xs, even we don't need that.
The second implementation does not make much sense: a variable named len will not contain the length of the list. You thus need to pass this, for infinite lists, this would not work, since there is no length at all.
You likely want to produce something like:
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter p = go []
where go ys [] = ys -- (1)
go ys (x:xs) | p x = go (ys ++ [x]) xs
| otherwise = go ys xs
where we thus have an accumulator to which we append the items in the list, and then eventually return the accumulator.
The problem with the second approach is that as long as the accumulator is not returned, Haskell will need to keep recursing until at least we reach weak head normal form (WHNF). This means that if we pattern match the result with [] or (_:_), we will need at least have to recurse until case one, since the other cases only produce a new expression, and it will thus not yield a data constructor on which we can pattern match.
This in contrast to the first filter where if we pattern match on [] or (_:_) it is sufficient to stop at the first case (1), or the third case 93) where the expression produces an object with a list data constructor. Only if we require extra elements to pattern match, for example (_:_:_), it will require to evaluate the recFilter p tl in case (2) of the first implementation:
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = [] -- (1)
recFilter p (h:tl) = if (p h)
then h : recFilter p tl -- (2)
else recFilter p tl
For more information, see the Laziness section of the Wikibook on Haskell that describes how laziness works with thunks.

Store "constants" without reevaluating them in Haskell

I'm tinkering with a program evaluating optimal yahtzee play. Something that will need to be done often is performing a given task for every possible roll. The following function creates a list that contains all possible rolls, which will be mapped over frequently.
type Roll = [Int]
rollspace :: Int -> [Roll]
rollspace depth = worker [[]] 0
where m xs n = map (\e -> n:e) xs
addRoll xs = m xs 1 ++ m xs 2 ++ m xs 3 ++ m xs 4 ++ m xs 5 ++ m xs 6
worker xs i = if i == depth then xs else worker (addRoll xs) (i+1)
This function works as intended, and produces a list of 7776 items when run with a depth of 5. However, it seems terribly inefficient to generate the list of all rolls at a given depth any time it is needed, especially because the depths will only be in the range 1-5. Is there and way to store the rollspace lists for the needed depths and reference them without reevaluating, or does Haskell compile away this issue?
Is there and way to store the rollspace lists for the needed depths and reference them without re-evaluating?
Yes, sure, and it's even technology you already know.
rollspace5 :: [Roll]
rollspace5 = rollspace 5
rollspace4 :: [Roll]
rollspace4 = rollspace 4
-- etc.
If it's important that you have a "function-like" object that accepts a number as input -- i.e. rather than knowing statically that you want depth 5 -- you have a few options, including:
rollspaceStored :: Int -> [Roll]
rollspaceStored 5 = rollspace5
rollspaceStored 4 = rollspace4
-- etc.
rollspaceStored other = rollspace other
rollspaceMap :: IntMap [Roll]
rollspaceMap = fromList [(n, rollspace n) | n <- [0..5]]
There are also packages on Hackage for memoization more generally; that search term should be enough to find them.

Haskell function parameter force evaluation

I need to take last n elements from list, using O(n) memory so I wrote this code
take' :: Int -> [Int] -> [Int]
take' n xs = (helper $! (length $! xs) - n + 1) xs
where helper skip [] = []
helper skip (x : xs) = if skip == 0 then xs else (helper $! skip - 1) xs
main = print (take' 10 [1 .. 100000])
this code takes O(|L|) memory where |L| -- is the length of given list.
But when I write this code
take' :: Int -> [Int] -> [Int]
take' n xs = helper (100000 - n + 1) xs
where helper skip [] = []
helper skip (x : xs) = if skip == 0 then xs else (helper $! skip - 1) xs
main = print (take' 10 [1 .. 100000])
This code now takes only O(n) memory (the only chage is (helper $! (length $! xs) - n + 1) -> helper (100000 - n + 1))
So, as I understand, Haskell for some reason doesn't evaluate length xs before the first call of helper so it leaves a thunk in skip and haskell has to keep this value in every stack frame instead of making tail recursion. But in second piece of code it evaluates (100000 - n + 1) and gives the pure value to the helper.
So the problem is how to evaluate the length of list before the first call of helper and use only O(n) memory.
The other answer referred to what it means to be a good consumer. You have posted two versions of your function, one which works for arbitrary-length lists but is not a good consumer, and one which is a good consumer but assumes a particular list length. For completeness, here is a function that is a good consumer and works for arbitrary list lengths:
takeLast n xs = go (drop n xs) xs where
go (_:xs) (_:ys) = go xs ys
go _ ys = ys
The second version does not really take only O(n) memory. Regardless of what take' does: you start off with a list of length L, and that has to be stored somewhere.
The reason it effectively takes O(n) memory is that the list is only used by one “good consumer” here, namely helper. Such a consumer deconstructs the list from head to last; because no reference to the head is needed anywhere else, the garbage collector can immediately start cleaning up those first elements – before the list comprehension has even built up the rest of the list!
That changes however if before using helper you compute the length of that list. This already forces the entire list to be NF'd†, and as I said this inevitably takes O(L) memory. Because you're still holding a reference to be used with helper, in this case the garbage collector can not take any action before the whole list is in memory.
So, it really has nothing to do with strict evaluation. In fact the only way you could achieve your goal is by making it less strict (require only a sublist of length n to be evaluated at any given time).
†More precisely: it forces the list's spine to normal form. The elements aren't evaluated, but it's still O(L).

How can this haskell rolling sum implementation be improved?

How can I improve the the following rolling sum implementation?
type Buffer = State BufferState (Maybe Double)
type BufferState = ( [Double] , Int, Int )
-- circular buffer
buff :: Double -> Buffer
buff newVal = do
( list, ptr, len) <- get
-- if the list is not full yet just accumulate the new value
if length list < len
then do
put ( newVal : list , ptr, len)
return Nothing
else do
let nptr = (ptr - 1) `mod` len
(as,(v:bs)) = splitAt ptr list
nlist = as ++ (newVal : bs)
put (nlist, nptr, len)
return $ Just v
-- create intial state for circular buffer
initBuff l = ( [] , l-1 , l)
-- use the circular buffer to calculate a rolling sum
rollSum :: Double -> State (Double,BufferState) (Maybe Double)
rollSum newVal = do
(acc,bState) <- get
let (lv , bState' ) = runState (buff newVal) bState
acc' = acc + newVal
-- subtract the old value if the circular buffer is full
case lv of
Just x -> put ( acc' - x , bState') >> (return $ Just (acc' - x))
Nothing -> put ( acc' , bState') >> return Nothing
test :: (Double,BufferState) -> [Double] -> [Maybe Double] -> [Maybe Double]
test state [] acc = acc
test state (x:xs) acc =
let (a,s) = runState (rollSum x) state
in test s xs (a:acc)
main :: IO()
main = print $ test (0,initBuff 3) [1,1,1,2,2,0] []
Buffer uses the State monad to implement a circular buffer. rollSum uses the State monad again to keep track of the rolling sum value and the state of the circular buffer.
How could I make this more elegant?
I'd like to implement other functions like rolling average or a difference, what could I do to make this easy?
Thanks!
EDIT
I forgot to mention I am using a circular buffer as I intend to use this code on-line and process updates as they arrive - hence the need to record state. Something like
newRollingSum = update rollingSum newValue
I haven't managed to decipher all of your code, but here is the plan I would take for solving this problem. First, an English description of the plan:
We need windows into the list of length n starting at each index.
Make windows of arbitrary length.
Truncate long windows to length n.
Drop the last n-1 of these, which will be too short.
For each window, add up the entries.
This was the first idea I had; for windows of length three it's an okay approach because step 2 is cheap on such a short list. For longer windows, you may want an alternate approach, which I will discuss below; but this approach has the benefit that it generalizes smoothly to functions other than sum. The code might look like this:
import Data.List
rollingSums n xs
= map sum -- add up the entries
. zipWith (flip const) (drop (n-1) xs) -- drop the last n-1
. map (take n) -- truncate long windows
. tails -- make arbitrarily long windows
$ xs
If you're familiar with the "equational reasoning" approach to optimization, you might spot a first place we can improve the performance of this function: by swapping the first map and zipWith, we can produce a function with the same behavior but with a map f . map g subterm, which can be replaced by map (f . g) to get slightly less allocation.
Unfortunately, for large n, this adds n numbers together in the inner loop; we would prefer to simply add the value at the "front" of the window and subtract the one at the "back". So we need to get trickier. Here's a new idea: we'll traverse the list twice in parallel, n positions apart. Then we'll use a simple function for getting the rolling sum (of unbounded window length) of prefixes of a list, namely, scanl (+), to convert this traversal into the actual sums we're interested in.
rollingSumsEfficient n xs = scanl (+) firstSum deltas where
firstSum = sum (take n xs)
deltas = zipWith (-) (drop n xs) xs -- front - back
There's one twist, which is that scanl never returns an empty list. So if it's important that you be able to handle short lists, you'll want another equation that checks for these. Don't use length, as that forces the entire input list into memory before starting the computation -- a potentially lethal performance mistake. Instead add a line like this above the previous definition:
rollingSumsEfficient n xs | null (drop (n-1) xs) = []
We can try these two out in ghci. You'll notice that they do not quite have the same behavior as yours:
*Main> rollingSums 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
*Main> rollingSumsEfficient 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
On the other hand, the implementations are considerably more concise and are fully lazy in the sense that they work on infinite lists:
*Main> take 5 . rollingSums 10 $ [1..]
[55,65,75,85,95]
*Main> take 5 . rollingSumsEfficient 10 $ [1..]
[55,65,75,85,95]
Efficient implementation for rolling sum in haskell-
rollingSums :: Num a => Int -> [a] -> Maybe [a]
rollingSums n xs | n <= 0 = Nothing
| otherwise = Just $ if length as == n then go (sum as) xs bs else []
where
(as, bs) = splitAt n xs
go s xs [] = [s]
go s xs (y:ys) = s : go (s + y - head xs) (tail xs) ys
Asuming that - sum((i+1)...(i+1+n)) = sum(i..(i+n)) - arr[i] + arr[i+n+1]

Project Euler 23: insight on this stackoverflow-ing program needed

Hi haskell fellows. I'm currently working on the 23rd problem of Project Euler. Where I'm at atm is that my code seems right to me - not in the "good algorithm" meaning, but in the "should work" meaning - but produces a Stack memory overflow.
I do know that my algorithm isn't perfect (in particular I could certainly avoid computing such a big intermediate result at each recursion step in my worker function).
Though, being in the process of learning Haskell, I'd like to understand why this code fails so miserably, in order to avoid this kind of mistakes next time.
Any insight on why this program is wrong will be appreciated.
import qualified Data.List as Set ((\\))
main = print $ sum $ worker abundants [1..28123]
-- Limited list of abundant numbers
abundants :: [Int]
abundants = filter (\x -> (sum (divisors x)) - x > x) [1..28123]
-- Given a positive number, returns its divisors unordered.
divisors :: Int -> [Int]
divisors x | x > 0 = [1..squareRoot x] >>=
(\y -> if mod x y == 0
then let d = div x y in
if y == d
then [y]
else [y, d]
else [])
| otherwise = []
worker :: [Int] -> [Int] -> [Int]
worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
-- http://www.haskell.org/haskellwiki/Generic_number_type#squareRoot
(^!) :: Num a => a -> Int -> a
(^!) x n = x^n
squareRoot :: Int -> Int
squareRoot 0 = 0
squareRoot 1 = 1
squareRoot n =
let twopows = iterate (^!2) 2
(lowerRoot, lowerN) =
last $ takeWhile ((n>=) . snd) $ zip (1:twopows) twopows
newtonStep x = div (x + div n x) 2
iters = iterate newtonStep (squareRoot (div n lowerN) * lowerRoot)
isRoot r = r^!2 <= n && n < (r+1)^!2
in head $ dropWhile (not . isRoot) iters
Edit: the exact error is Stack space overflow: current size 8388608 bytes.. Increasing the stack memory limit through +RTS -K... doesn't solve the problem.
Edit2: about the sqrt thing, I just copy pasted it from the link in comments. To avoid having to cast Integer to Doubles and face the rounding problems etc...
In the future, it's polite to attempt a bit of minimalization on your own. For example, with a bit of playing, I was able to discover that the following program also stack-overflows (with an 8M stack):
main = print (worker [1..1000] [1..1000])
...which really nails down just what function is screwing you over. Let's take a look at worker:
worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
Even on my first read, this function was red-flagged in my mind, because it's tail-recursive. Tail recursion in Haskell is generally not such a great idea as it is in other languages; guarded recursion (where you produce at least one constructor before recursing, or recurse some small number of times before producing a constructor) is generally better for lazy evaluation. And in fact, here, what's happening is that each recursive call to worker is building a deeper- and deeper-ly nested thunk in the prev argument. When the time comes to finally return prev, we have to go very deeply into a long chain of Set.\\ calls to work out just what it was we finally have.
This problem is obfuscated slightly by the fact that the obvious strictness annotation doesn't help. Let's massage worker until it works. The first observation is that the first clause is completely subsumed by the second one. This is stylistic; it shouldn't affect the behavior (except on empty lists).
worker [] prev = prev
worker (a:as) prev = worker as (prev Set.\\ map (a+) (a:as))
Now, the obvious strictness annotation:
worker [] prev = prev
worker (a:as) prev = prev `seq` worker as (prev Set.\\ map (a+) (a:as))
I was surprised to discover that this still stack overflows! The sneaky thing is that seq on lists only evaluates far enough to learn whether the list matches either [] or _:_. The following does not stack overflow:
import Control.DeepSeq
worker [] prev = prev
worker (a:as) prev = prev `deepseq` worker as (prev Set.\\ map (a+) (a:as))
I didn't plug this final version back into the original code, but it at least works with the minimized main above. By the way, you might like the following implementation idea, which also stack overflows:
import Control.Monad
worker as bs = bs Set.\\ liftM2 (+) as as
but which can be fixed by using Data.Set instead of Data.List, and no strictness annotations:
import Control.Monad
import Data.Set as Set
worker as bs = toList (fromList bs Set.\\ fromList (liftM2 (+) as as))
As Daniel Wagner correctly said, the problem is that
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
builds a badly nested thunk. You can avoid that and get somewhat better performance than with deepseq by exploiting the fact that both arguments to worker are sorted in this application. Thus you can get incremental output by noting that at any step everything in prev smaller than 2*a cannot be the sum of two abundant numbers, so
worker (a:as) prev = small ++ worker as (large Set.\\ map (+ a) (a:as))
where
(small,large) = span (< a+a) prev
does better. However, it's still bad because (\\) cannot use the sortedness of the two lists. If you replace it with
minus xxs#(x:xs) yys#(y:ys)
= case compare x y of
LT -> x : minus xs yys
EQ -> minus xs ys
GT -> minus xxs ys
minus xs _ = xs -- originally forgot the case for one empty list
(or use the data-ordlist package's version), calculating the set-difference is O(length) instead of O(length^2).
Ok, I loaded it up and gave it a shot. Daniel Wagner's advice is pretty good, probably better than mine. The problem is indeed with the worker function, but I was going to suggest using Data.MemoCombinators to memoize your function instead.
Also, your divisors algorithm is kind of silly. There's a much better way to do that. It's kind of mathy and would require a lot of TeX, so here's a link to a math.stackexchange page about how to do that. The one I was talking about, was the accepted answer, though someone else gives a recursive solution that I think would run faster. (It doesn't require prime factorization.)
https://math.stackexchange.com/questions/22721/is-there-a-formula-to-calculate-the-sum-of-all-proper-divisors-of-a-number

Resources