I am trying to figure out how recursion combined with List Comprehension/Monadic operations cause space leaks.
I have a small test program:
module Main where
permute :: [a] -> Integer -> [[a]]
permute _ 0 = [[]]
permute xs' n = [x:xs | x <- xs', xs <- permute xs' (n-1)]
chars1 = ['0'..'9']
main = do
putStrLn $ (permute chars1 10000)!!100000000 -- Leaks
print $ [1..]!!100000000 -- Does not leak
Now, my understanding is that the function permute can be expanded as
xs' >>= \x -> (permute xs' (n-1) >>= \xs -> (x:xs))
So lots of (permute xs' n) will be stacked before a result can be retrieved. Am I understanding correctly? If so, what can I do to make sure the function does not leak?
This code is a little bit weird because permute doesn't actually.. well.. permute things but assuming that it's what you intended it actually performs in much the same way as your [0..] !! 10000 example, it's just that computing the 10000th selection is a lot more work in your code!
The important thing is that your code is lazy, we can easily compute the 1st element of permute [0..] 10 even though there are clearly infinitely many of them.
What you might be thinking of though is that when you actually run this it takes a really long time to produce any output even if you might hope that it would produce the 1st character... wait a bit.. produce the second character.. wait a bit.. and so on. This is not really a "leak" in the Haskell sense and cannot be avoided in this scenario. Youre list is essentially constructed with
map ('0':) (permute chars (n - 1)) ++ ... ++ map ('9':) (permute chars (n - 1))
The issue is that in order to know the first character, you need to figure out which "block" you'll be selecting from, which involves computing the length of the block, which involves evaluating permute chars (n - 1) fully. So what'll happen is you'll force each recursive call in turn until you have enough elements in the resulting list to evaluate the call to !!. This is not unexpected though. You can however significantly speed up this code with one fairly simple trick: reverse the x and xs in the list comprehension.
permute :: [a] -> Integer -> [[a]]
permute _ 0 = [[]]
permute xs' n = [x:xs | xs <- permute xs' (n-1), x <- xs']
Do note that the increased sharing could in theory lead to increased memory usage (something may not be GCed as soon in this version) but I'm unable to come up with a not-absurdly-contrived program which does this.
Related
I run out of memory trying to run moderate inputs such as this:
variation_models 15 25
also running higher numbers for ncars seems to make a huge difference in speed and memory usage.
The slowdown is expected (there are more things to compare), but the exponential increase of memory usage doesn't make sense to me
import Control.Monad
orderedq f [] = True
orderedq f (x:[]) = True
orderedq f (x:y:zs) = f x y && orderedq f (y:zs)
num_orderedq = orderedq (<=)
adds_up_to n xs = n == sum xs
both_conditions f g xs = f xs && g xs
variation_models ncars nlocations =
filter (both_conditions (adds_up_to nlocations) num_orderedq) $ replicateM ncars [1..nlocations-ncars+1]
What is causing the large difference in memory usage? replicateM?
I think you've seen elsewhere that your specific problem (creating ordered lists of integers that sum to a given number) is better solved using an alternative algorithm, rather than filtering a huge list of lists of integers.
However, getting back to your original issue, it is possible to construct an equivalent of:
replicateM p [1..n]
that runs in exponential time (of course) but constant space.
The problem is that this expression is more or less equivalent to the recursion:
badPower 0 _ = pure []
badPower p n = [x:xs | x <- [1..n], xs <- badPower (p-1) n]
So, in the list comprehension, for each selected x, the whole list badPower (p-1) n needs to be re-generated from the start. GHC, sensibly enough, decides to keep badPower (p-1) n around so it doesn't need to be recomputed each time. So, the badPower p n call needs the entire badPower (p-1) n list kept in memory, which already accounts for n^(p-1) elements and exponential memory use, even without considering badPower (p-2) n, etc.
If you just flip the order of the implicit loops around:
goodPower 0 _ = pure []
goodPower p n = [x:xs | xs <- goodPower (p-1) n, x <- [1..n]]
That fixes the problem. Even though the list goodPower (p-1) n is "big", we take it's first element, use it n times for each value of x and then can discard it and move to the next element. So, goodPower (p-1) n can be garbage collected as it's used.
Note that goodPower generates the elements in a different order than badPower, with the first coordinate of the lists varying fastest, instead of the last. (If this matters, you can map reverse $ goodPower .... While reverse is "slow", it's only being applied to short lists here.)
Anyway, the following program runs (practically) forever, but does so in constant space:
power :: Int -> [a] -> [[a]]
power 0 _ = [[]]
power p lst = [x:xs | xs <- power (p-1) lst, x <- lst ]
main = do
print $ length (power 15 [1..11])
replicateM :: Applicative m => Int -> m a -> m [a]
When 'm' is [], monad join implementation will make replicateM build all permutations of n elements from the list elements. The number of such permutations is written P(n,k), and is equal to n!/(n-k)!. This is where the exponential growth come from.
I'm trying to learn haskell and implemented a function conseq that would return a list of consecutive elements of size n.
conseq :: Int -> [Int] -> [[Int]]
conseq n x
| n == length(x) = [x]
| n > length(x) = [x]
| otherwise = [take n x] ++ (conseq n (drop 1 x))
This works correctly.
> take 5 $ conseq 2 [1..10]
[[1,2],[2,3],[3,4],[4,5],[5,6]]
However, if I pass [1..] instead of [1..10], the program gets stuck in an infinite loop.
As I understood it, haskell has lazy evaluation so I should still be able to get the same result right? Is it length? Shouldn't the first two conditions evaluate to false as soon as the length becomes greater than n?
What did I misunderstand?
One of the main reasons why using length is not a good idea is because when it has to be evaluated on an infinite list, it will get stuck in an infinite loop.
The good news is however, we don't need length. It would also make the time complexity worse. We can work with two enumerators, one is n-1 places ahead of the other. If this enumerator reaches the end of the list, then we know that the first enumerator still has n-1 elements, and thus we can stop yielding values:
conseq :: Int -> [a] -> [[a]]
conseq n ys = go (drop (n-1) ys) ys
where go [] _ = []
go (_:as) ba#(~(_:bs)) = take n ba : go as bs
This gives us thus:
Prelude> conseq 3 [1 ..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12],[11,12,13],[12,13,14],[13,14,15],[14,15,16],[15,16,17],[16,17,18],[17,18,19],[18,19,20],[19,20,21],[20,21,22],[21,22,23],[22,23,24],[23,24,25],[24,25,26],[25,26,27],…
Prelude> conseq 3 [1 .. 4]
[[1,2,3],[2,3,4]]
The first thing your function does is calculate length(x), so it knows whether it should return [x], [x], or [take n x] ++ (conseq n (drop 1 x))
length counts the number of elements in the list - all the elements. If you ask for the length of an infinite list, it never finishes counting.
I run out of memory trying to run moderate inputs such as this:
variation_models 15 25
also running higher numbers for ncars seems to make a huge difference in speed and memory usage.
The slowdown is expected (there are more things to compare), but the exponential increase of memory usage doesn't make sense to me
import Control.Monad
orderedq f [] = True
orderedq f (x:[]) = True
orderedq f (x:y:zs) = f x y && orderedq f (y:zs)
num_orderedq = orderedq (<=)
adds_up_to n xs = n == sum xs
both_conditions f g xs = f xs && g xs
variation_models ncars nlocations =
filter (both_conditions (adds_up_to nlocations) num_orderedq) $ replicateM ncars [1..nlocations-ncars+1]
What is causing the large difference in memory usage? replicateM?
I think you've seen elsewhere that your specific problem (creating ordered lists of integers that sum to a given number) is better solved using an alternative algorithm, rather than filtering a huge list of lists of integers.
However, getting back to your original issue, it is possible to construct an equivalent of:
replicateM p [1..n]
that runs in exponential time (of course) but constant space.
The problem is that this expression is more or less equivalent to the recursion:
badPower 0 _ = pure []
badPower p n = [x:xs | x <- [1..n], xs <- badPower (p-1) n]
So, in the list comprehension, for each selected x, the whole list badPower (p-1) n needs to be re-generated from the start. GHC, sensibly enough, decides to keep badPower (p-1) n around so it doesn't need to be recomputed each time. So, the badPower p n call needs the entire badPower (p-1) n list kept in memory, which already accounts for n^(p-1) elements and exponential memory use, even without considering badPower (p-2) n, etc.
If you just flip the order of the implicit loops around:
goodPower 0 _ = pure []
goodPower p n = [x:xs | xs <- goodPower (p-1) n, x <- [1..n]]
That fixes the problem. Even though the list goodPower (p-1) n is "big", we take it's first element, use it n times for each value of x and then can discard it and move to the next element. So, goodPower (p-1) n can be garbage collected as it's used.
Note that goodPower generates the elements in a different order than badPower, with the first coordinate of the lists varying fastest, instead of the last. (If this matters, you can map reverse $ goodPower .... While reverse is "slow", it's only being applied to short lists here.)
Anyway, the following program runs (practically) forever, but does so in constant space:
power :: Int -> [a] -> [[a]]
power 0 _ = [[]]
power p lst = [x:xs | xs <- power (p-1) lst, x <- lst ]
main = do
print $ length (power 15 [1..11])
replicateM :: Applicative m => Int -> m a -> m [a]
When 'm' is [], monad join implementation will make replicateM build all permutations of n elements from the list elements. The number of such permutations is written P(n,k), and is equal to n!/(n-k)!. This is where the exponential growth come from.
I'm curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.
import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x
I'd imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn't actually need to bother tracking the indices myself to avoid excess copying, right?
I'm curious how the ++ impact the optimization though. I'd imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?
xs ++ ys adds some overhead in all the list cells from xs, but once it reaches the end of xs it's free — it just returns ys.
Looking at the definition of (++) helps to see why:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
i.e., it has to "re-build" the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.
The key thing to realise is that appending isn't done all at once; a new linked list is incrementally built by first walking through all of xs, and then putting ys where the [] would go.
So, you don't have to worry about reaching the end of b and suddenly incurring the one-time cost of "appending" a to it; the cost is spread out over all the elements of b.
Vectors are a different matter entirely; they're strict in their structure, so even examining just the first element of xs V.++ ys incurs the entire overhead of allocating a new vector and copying xs and ys to it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you'd have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as [Vector a] or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you're more interested in mutable vectors.
Try
minimumrotation :: Ord a => [a] -> [a]
minimumrotation xs = minimum . take len . map (take len) $ tails (cycle xs)
where
len = length xs
I expect that to be faster than what you have, though index-juggling on an unboxed Vector or UArray would probably be still faster. But, is it really a bottleneck?
If you're interested in fast concatenation and a fast splitAt, use Data.Sequence.
I've made some stylistic modifications to your code, to make it look more like idiomatic Haskell, but the logic is exactly the same, except for a few conversions to and from Seq:
import qualified Data.Sequence as S
import qualified Data.Foldable as F
minimumRotation :: Ord a => [a] -> [a]
minimumRotation xs = F.toList
. F.minimum
. fmap (`swapAt` xs')
. S.elemIndicesL (F.minimum xs')
$ xs'
where xs' = S.fromList xs
swapAt n = f . S.splitAt n
where f (a,b) = b S.>< a
Given Exercise 14 from 99 Haskell Problems:
(*) Duplicate the elements of a list.
Eg.:
*Main> dupli''' [1..10]
[1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10]
I've implemented 4 solutions:
{-- my first attempt --}
dupli :: [a] -> [a]
dupli [] = []
dupli (x:xs) = replicate 2 x ++ dupli xs
{-- using concatMap and replicate --}
dupli' :: [a] -> [a]
dupli' xs = concatMap (replicate 2) xs
{-- usign foldl --}
dupli'' :: [a] -> [a]
dupli'' xs = foldl (\acc x -> acc ++ [x,x]) [] xs
{-- using foldl 2 --}
dupli''' :: [a] -> [a]
dupli''' xs = reverse $ foldl (\acc x -> x:x:acc) [] xs
Still, I don't know how to really measure performance .
So what's the recommended function (from the above list) in terms of performance .
Any suggestions ?
These all seem more complicated (and/or less efficient) than they need to be. Why not just this:
dupli [] = []
dupli (x:xs) = x:x:(dupli xs)
Your last example is close to a good fold-based implementation, but you should use foldr, which will obviate the need to reverse the result:
dupli = foldr (\x xs -> x:x:xs) []
As for measuring performance, the "empirical approach" is profiling. As Haskell programs grow in size, they can get fairly hard to reason about in terms of runtime and space complexity, and profiling is your best bet. Also, a crude but often effective empirical approach when gauging the relative complexity of two functions is to simply compare how long they each take on some sufficiently large input; e.g. time how long length $ dupli [1..1000000] takes and compare it to dupli'', etc.
But for a program this small it shouldn't be too hard to figure out the runtime complexity of the algorithm based on your knowledge of the data structure(s) in question--in this case, lists. Here's a tip: any time you use concatenation (x ++ y), the runtime complexity is O(length x). If concatenation is used inside of a recursive algorithm operating on all the elements of a list of size n, you will essentially have an O(n ^2) algorithm. Both examples I gave, and your last example, are O(n), because the only operation used inside the recursive definition is (:), which is O(1).
As recommended you can use the criterion package. A good description is http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-library-for-haskell/.
To summarize it here and adapt it to your question, here are the steps.
Install criterion with
cabal install criterion -fchart
And then add the following to your code
import Criterion.Main
l = [(1::Int)..1000]
main = defaultMain [ bench "1" $ nf dupli l
, bench "2" $ nf dupli' l
, bench "3" $ nf dupli'' l
, bench "4" $ nf dupli''' l
]
You need the nf in order to force the evaluation of the whole result list. Otherwise you'll get just the thunk for the computation.
After that compile and run
ghc -O --make dupli.hs
./dupli -t png -k png
and you get pretty graphs of the running times of the different functions.
It turns out that dupli''' is the fastest from your functions but the foldr version that pelotom listed beats everything.