Haskell bottom-up DP space leak - haskell

Apologies if this is too specific, I am new here and not exactly sure what is reasonable. I have been bashing my head against this problem for hours with nothing to show for it. The following code is my implementation of a competitive programming problem.
module Main where
import Data.List (foldl', groupBy)
import Debug.Trace
type Case = (Int, [(Int, Int)])
type Soln = Int
main = interact handle
handle :: String -> String
handle = fmt . solve . parse
fmt :: Soln -> String
fmt s = (show s) ++ "\n"
parse :: String -> Case
parse s = (l, fs)
where
(l:_:fs') = (map read) $ words s
fs = pairs fs'
pairs :: [a] -> [(a, a)]
pairs [] = []
pairs (a:b:s) = (a, b):(pairs s)
solve :: Case -> Soln
solve c#(l, fs) = last $ foldl' run [0..l] f
where
f = concat $ map rep $ map combine $ groupBy samev fs
samev a b = (snd a) == (snd b)
combine a = (sum $ map fst $ a, snd $ head $ a)
rep (n, v) = replicate (min n (l `div` v)) v
run :: [Int] -> Int -> [Int]
run b v = (take v b) ++ (zipWith min b (drop v b))
-- run b v = (take v b) ++ (zipMin b (drop v b))
zipMin :: [Int] -> [Int] -> [Int]
zipMin [] _ = []
zipMin _ [] = []
zipMin (a:as) (b:bs) = (min a b):(zipMin as bs)
The intent is that this works like a bottom-up dynamic programming solution generating each row of the DP table from the previous using the fold in solve. In theory GHC should be able to optimize out all the old rows of the table. However, running this program on a moderately large input with approximately l = 2000 and length f = 5000, I get this:
> time ./E < E.in
0
1.98user 0.12system 0:02.10elapsed 99%CPU (0avgtext+0avgdata 878488maxresident)k
0inputs+0outputs (0major+219024minor)pagefaults 0swaps
That's using 878 MB of memory at peak! The table I am generating is only 10,000 Ints, and in theory I only need one row at a time! It seems obvious that this is some form of thunk leak or other space leak. Profiling reveals that run is consuming 99% of total runtime and memory. Digging further indicated that 98% of that was in the zipWith call. Interestingly, replacing the call to zipWith min with my custom zipMin function produces a significant improvement:
> time ./E < E.in
0
1.39user 0.08system 0:01.48elapsed 99%CPU (0avgtext+0avgdata 531400maxresident)k
0inputs+0outputs (0major+132239minor)pagefaults 0swaps
That's just 70% the run time, and 60% the memory! I tried all sorts to make this work. I know (++) is generally a bad idea, so I replaced the lists in run with Data.Sequence.Seq Int... and it got slower and used more memory. I am not particularly experienced with Haskell, but I am at my wit's end here. I am sure the answer to this problem exists somehwere on SO, but I am too new to Haskell to be able to find it, it seems.
Any help any of you can offer is very much appreciated. I would love an explanation of exactly what I have done wrong, how to diagnose it in future, and how to fix it.
Thanks in advance.
EDIT:
After following Steven's Excellent advice and replacing my lists with unboxed vectors the performance is... uh... signficantly improved:
> time ./E < E.in
0
0.01user 0.00system 0:00.02elapsed 80%CPU (0avgtext+0avgdata 5000maxresident)k
24inputs+0outputs (0major+512minor)pagefaults 0swaps

So, by using foldl' you have ensured that the accumulator will be in WHNF. Putting a list in WHNF only evaluates the first element of the list. The remainder of the list exists as a thunk, and will be passed around as a thunk to the subsequent calls of your fold. Since you are interested in multiple positions in the list at once (that is, you are dropping some parts of it in the zipWith) large portions of the lists are being kept from previous iterations.
The structure you need here is an unboxed vector. An unboxed vector will ensure that everything is maximally strict, and will run in far less memory.

Related

How to avoid infinite loop in zipWith a self reference?

I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
where
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
EDIT:
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.
Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
[18,13,7]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.
I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
where
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
where
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.

Benchmarking Filter and Partition

I was testing the performance of the partition function for lists and got some strange results, I think.
We have that partition p xs == (filter p xs, filter (not . p) xs) but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.
Here is the minimal code that shows what I'm seeing
import Criterion.Main
import System.Random
import Data.List (partition)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
where
(x, gen') = random gen
xs = randList gen' (n - 1)
main = do
gen <- getStdGen
let arg10000000 = randList gen 10000000
defaultMain [
bgroup "filters -- split list in half " [
bench "partition100" $ nf (partition (>= 50)) arg10000000
, bench "mypartition100" $ nf (mypartition (>= 50)) arg10000000
]
]
I ran the tests both with -O and without it and both times I get that the double traversals is better.
I am using ghc-7.10.3 with criterion-1.1.1.0
My questions are:
Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and (filter p xs, filter (not . p) xs) will only do two traversals if both elements of the tuple are used.
Does this has to do something with the way lists are handled in Haskell?
Thanks a lot!
There is no black or white answer to the question. To dissect the problem consider the following code:
import Control.DeepSeq
import Data.List (partition)
import System.Environment (getArgs)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
main :: IO ()
main = do
let cnt = 10000000
xs = take cnt $ concat $ repeat [1 .. 100 :: Int]
args <- getArgs
putStrLn $ unwords $ "Args:" : args
case args of
[percent, fun]
-> let p = (read percent >=)
in case fun of
"partition" -> print $ rnf $ partition p xs
"mypartition" -> print $ rnf $ mypartition p xs
"partition-ds" -> deepseq xs $ print $ rnf $ partition p xs
"mypartition-ds" -> deepseq xs $ print $ rnf $ mypartition p xs
_ -> err
_ -> err
where
err = putStrLn "Sorry, I do not understand."
I do not use Criterion to have a better control about the order of evaluation. To get timings, I use the +RTS -s runtime option. The different test case are executed using different command line options. The first command line option defines for which percentage of the data the predicate holds. The second command line option chooses between different tests.
The tests distinguish two cases:
The data is generated lazily (2nd argument partition or mypartition).
The data is already fully evaluated in memory (2nd argument partition-ds or mypartition-ds).
The result of the partitioning is always evaluated from left to right, i.e. starting with the list that contains all the elements for which the predicate holds.
In case 1 partition has the advantage that elements of the first resulting list get discarded before all elements of the input list were even produced. Case 1 is especially good, if the predicate matches many elements, i.e. the first command line argument is large.
In case 2, partition cannot play out this advantage, since all elements are already in memory.
For mypartition, in any case all elements are held in memory after the first resulting list is evaluated, because they are needed again to compute the second resulting list. Therefore there is not much of a difference between the two cases.
It seems, the more memory is used, the harder garbage collection gets. Therefore partition is well suited, if the predicate matches many elements and the lazy variant is used.
Conversely, if the predicate does not match many elements or all elements are already in memory, mypartition performs better, since its recursion does not deal with pairs in contrast to partition.
The Stackoverflow question “Irrefutable pattern does not leak memory in recursion, but why?” might give some more insights about the handling of pairs in the recursion of partition.

Directly generating specific subsets of a powerset?

Haskell's expressiveness enables us to rather easily define a powerset function:
import Control.Monad (filterM)
powerset :: [a] -> [[a]]
powerset = filterM (const [True, False])
To be able to perform my task it is crucial for said powerset to be sorted by a specific function, so my implementation kind of looks like this:
import Data.List (sortBy)
import Data.Ord (comparing)
powersetBy :: Ord b => ([a] -> b) -> [a] -> [[a]]
powersetBy f = sortBy (comparing f) . powerset
Now my question is whether there is a way to only generate a subset of the powerset given a specific start and endpoint, where f(start) < f(end) and |start| < |end|. For example, my parameter is a list of integers ([1,2,3,4,5]) and they are sorted by their sum. Now I want to extract only the subsets in a given range, lets say 3 to 7. One way to achieve this would be to filter the powerset to only include my range but this seems (and is) ineffective when dealing with larger subsets:
badFunction :: Ord b => b -> b -> ([a] -> b) -> [a] -> [[a]]
badFunction start end f = filter (\x -> f x >= start && f x <= end) . powersetBy f
badFunction 3 7 sum [1,2,3,4,5] produces [[1,2],[3],[1,3],[4],[1,4],[2,3],[5],[1,2,3],[1,5],[2,4],[1,2,4],[2,5],[3,4]].
Now my question is whether there is a way to generate this list directly, without having to generate all 2^n subsets first, since it will improve performance drastically by not having to check all elements but rather generating them "on the fly".
If you want to allow for completely general ordering-functions, then there can't be a way around checking all elements of the powerset. (After all, how would you know the isn't a special clause built in that gives, say, the particular set [6,8,34,42] a completely different ranking from its neighbours?)
However, you could make the algorithm already drastically faster by
Only sorting after filtering: sorting is O (n · log n), so you want keep n low here; for the O (n) filtering step it matters less. (And anyway, number of elements doesn't change through sorting.)
Apply the ordering-function only once to each subset.
So
import Control.Arrow ((&&&))
lessBadFunction :: Ord b => (b,b) -> ([a]->b) -> [a] -> [[a]]
lessBadFunction (start,end) f
= map snd . sortBy (comparing fst)
. filter (\(k,_) -> k>=start && k<=end)
. map (f &&& id)
. powerset
Basically, let's face it, powersets of anything but a very small basis are infeasible. The particular application “sum in a certain range” is pretty much a packaging problem; there are quite efficient ways to do that kind of thing, but you'll have to give up the idea of perfect generality and of quantification over general subsets.
Since your problem is essentially a constraint satisfaction problem, using an external SMT solver might be the better alternative here; assuming you can afford the extra IO in the type and the need for such a solver to be installed. The SBV library allows construction of such problems. Here's one encoding:
import Data.SBV
-- c is the cost type
-- e is the element type
pick :: (Num e, SymWord e, SymWord c) => c -> c -> ([SBV e] -> SBV c) -> [e] -> IO [[e]]
pick begin end cost xs = do
solutions <- allSat constraints
return $ map extract $ extractModels solutions
where extract ts = [x | (t, x) <- zip ts xs, t]
constraints = do tags <- mapM (const free_) xs
let tagged = zip tags xs
finalCost = cost [ite t (literal x) 0 | (t, x) <- tagged]
solve [finalCost .>= literal begin, finalCost .<= literal end]
test :: IO [[Integer]]
test = pick 3 7 sum [1,2,3,4,5]
We get:
Main> test
[[1,2],[1,3],[1,2,3],[1,4],[1,2,4],[1,5],[2,5],[2,3],[2,4],[3,4],[3],[4],[5]]
For large lists, this technique will beat out generating all subsets and filtering; assuming the cost function generates reasonable constraints. (Addition will be typically OK, if you've multiplications, the backend solver will have a harder time.)
(As a side note, you should never use filterM (const [True, False]) to generate power-sets to start with! While that expression is cute and fun, it is extremely inefficient!)

Interleaving list functions

Lets say I'm given two functions:
f :: [a] -> b
g :: [a] -> c
I want to write a function that is the equivalent of this:
h x = (f x, g x)
But when I do that, for large lists inevitably I run out of memory.
A simple example is the following:
x = [1..100000000::Int]
main = print $ (sum x, product x)
I understand this is the case because the list x is being stored in memory without being garbage collected. It would be better instead of f and g worked on x in, well, "parallel".
Assuming I can't change f and g, nor want to make a separate copy of x (assume x is expensive to produce) how can I write h without running into out of memory issues?
A short answer is you can't. Since you have no control over f and g, you have no guarantee that the functions process their input sequentially. Such a function can as well keep the whole list stored in memory before producing the final result.
However, if your functions are expressed as folds, the situation is different. This means that we know how to incrementally apply each step, so we can parallelize those steps in one run.
The are many resources about this area. For example:
Haskell: Can I perform several folds over the same lazy list without keeping list in memory?
Classic Beautiful folding
More beautiful fold zipping
The pattern of consuming a sequence of values with properly defined space bounds is solved more generally with pipe-like libraries such conduit, iteratees or pipes. For example, in conduit, you could express the combination of computing sums and products as
import Control.Monad.Identity
import Data.Conduit
import Data.Conduit.List (fold, sourceList)
import Data.Conduit.Internal (zipSinks)
product', sum' :: (Monad m, Num a) => Sink a m a
sum' = fold (+) 0
product' = fold (*) 1
main = print . runIdentity $ sourceList (replicate (10^6) 1) $$
zipSinks sum' product'
If you can turn your functions into folds, you can then just use them with a scan:
x = [1..100000000::Int]
main = mapM_ print . tail . scanl foo (a0,b0) . takeWhile (not.null)
. unfoldr (Just . splitAt 1000) -- adjust the chunk length as needed
$ x
foo (a,b) x = let a2 = f' a $ f x ; b2 = g' b $ g x
in a2 `seq` b2 `seq` (a2, b2)
f :: [t] -> a -- e.g. sum
g :: [t] -> b -- (`rem` 10007) . product
f' :: a -> a -> a -- e.g. (+)
g' :: b -> b -> b -- ((`rem` 10007) .) . (*)
we consume the input in chunks for better performance. Compiled with -O2, this should run in a constant space. The interim results are printed as indication of progress.
If you can't turn your function into a fold, this means it has to consume the whole list to produce any output and this trick doesn't apply.
You can use multiple threads to evaluate f x and g x in parallel.
E.g.
x :: [Int]
x = [1..10^8]
main = print $ let a = sum x
b = product x
in a `par` b `pseq` (a,b)
Its a nice way to exploit GHC's parallel runtime to prevent a space leak by doing two things at once.
Alternatively, you need to fuse f and g into a single pass.

Haskell: turning a non-infinite list into an infinite one + laziness (euler 391)

I've written the following Haskell code to produce a list where the nth element is the number of 1s in writing 1..n as binary numbers (it's related to euler 391, incidentally):
buildList :: a -> (a -> a) -> [a]
buildList start f = start : buildList (f start) f
differences :: [[Int]]
differences = buildList [0] (\x -> x ++ map (+1) x)
sequenceK' :: Int -> [Int]
sequenceK' n = tail $ scanl (+) 0 (last $ take n differences)
which results in sequenceK' n giving a list of 2^(n-1) elements.
This question has two parts:
a) Why does the time taken to compute head $ sequenceK' n increase with n? -- due to ghc's laziness, I would expect the time to remain more or less constant.
b) Is it possible to define an infinite version of this list so that I can do things like take and takeWhile without having to worry about the value of the parameter passed to sequenceK'?
a) Because you're calling last $ take n differences, which has to do more work the bigger n is.
b) Yep, it's possible. The least-thinking solution is to just take the earliest element we see at each particular depth:
*Main> take 20 . map head . transpose $ differences
[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3]
The better solution is to generate only the meaningful bits. We can do this by observing the following equality:
differences' = 1 : (differences' >>= \x -> [x, x+1])
Actually, this is slightly off, as you can probably guess:
*Main> take 20 differences'
[1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3]
But it's easily fixed by just tacking a 0 on front.

Resources