I have a function
myLength = foldl (\ x _ -> x + 1) 0
which fails with stack overflow with input around 10^6 elements (myLength [1..1000000] fails). I believe that is due to the thunk build up since when I replace foldl with foldl', it works.
So far so good.
But now I have another function to reverse a list :
myReverse = foldl (\ acc x -> x : acc) []
which uses the lazy version foldl (instead of foldl')
When I do
myLength . myReverse $ [1..1000000].
This time it works fine. I fail to understand why foldl works for the later case and not for former?
To clarify here myLength uses foldl' while myReverse uses foldl
Here's my best guess, though I'm no expert on Haskell internals (yet).
While building the thunk, Haskell allocates all the intermediate accumulator variables on the heap.
When performing the addition as in myLength, it needs to use the stack for intermediate variables. See this page. Excerpt:
The problem starts when we finally evaluate z1000000:
Note that z1000000 = z999999 +
1000000. So 1000000 is pushed on the stack. Then z999999 is evaluated.
Note that z999999 = z999998 + 999999.
So 999999 is pushed on the stack. Then
z999998 is evaluated:
Note that z999998 = z999997 + 999998.
So 999998 is pushed on the stack. Then
z999997 is evaluated:
However, when performing list construction, here's what I think happens (this is where the guesswork begins):
When evaluating z1000000:
Note that z1000000 = 1000000 :
z999999. So 1000000 is stored inside
z1000000, along with a link (pointer)
to z999999. Then z999999 is evaluated.
Note that z999999 = 999999 : z999998.
So 999999 is stored inside z999999,
along with a link to z999998. Then
z999998 is evaluated.
etc.
Note that z999999, z999998 etc. changing from a not-yet-evaluated expression into a single list item is an everyday Haskell thing :)
Since z1000000, z999999, z999998, etc. are all on the heap, these operations don't use any stack space. QED.
Related
I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):
import qualified Data.Set as Set
import qualified Data.List as List
-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys =
foldl (\acc x -> if x == y then True else acc) False ys
-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys =
head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys
-- Then I discovered `nub` which seems to be highly optimized:
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys =
head $ List.nub $ filter (==True) $ map (==y) ys
I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:
3 `elem1` [1..1000000] -- => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] -- => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] -- => (0.01 secs, 77,272 bytes)
I then tried to do the same on a (much) bigger list:
3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]
elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?
Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.
No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.
Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.
This is because foldris defined as:
foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))
whereas foldl is implemented as:
foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3
Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.
If we implement the foldl and foldr version, we get:
y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys
We then get:
Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)
Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.
The performance is almost identical:
Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)
In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.
Explanation
foldl goes to the innermost element of the list, applies the computation, and does so again recursively to the result and the next innermost value of the list, and so on.
foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
So in order to produce the result, it has to traverse all the list.
Conversely, in your function elem3 as everything is lazy, nothing gets computed at all, until you call head.
But in order to compute that value, you just the first value of the (filtered) list, so you just need to go as far as 3 is encountered in your big list. which is very soon, so the list is not traversed. if you asked for the 1000000th element, eleme3 would probably perform as badly as the other ones.
Lazyness
Lazyness ensure that your language is always composable : breaking a function into subfunction does not changes what is done.
What you are seeing can lead to a space leak which is really about how control flow works in a lazy language. both in strict and in lazy, your code will decide what gets evaluated, but with a subtle difference :
In a strict language, the builder of the function will choose, as it forces evaluation of its arguments: whoever is called is in charge.
In a lazy language, the consumer of the function chooses. whoever called is in charge. It may choose to only evaluate the first element (by calling head), or every other element. All that provided its own caller choose to evaluate his own computation as well. there is a whole chain of command deciding what to do.
In that reading, your foldl based elem function uses that "inversion of control" in an essential way : elem gets asked to produce a value. foldl goes deep inside the list. if the first element if y then it return the trivial computation True. if not, it forwards the requests to the computation acc. In other words, what you read as values acc, x or even True, are really placeholders for computations, which you receive and yield back. And indeed, acc may be some unbelievably complex computation (or divergent one like undefined), as long as you transfer control to the computation True, your caller will never see the existence of acc.
foldr vs foldl vs foldl' VS speed
As suggested in another answer, foldr might best your intent on how to traverse the list, and will shield you away from space leaks (whereas foldl' will prevent space leaks as well if you really want to traverse the other way, which can lead to buildup of complex computations ... and can be very useful for circular computation for instance).
But the speed issue is really an algorithmic one. There might be better data structure for set membership if and only if you know beforehand that you have a certain pattern of usage.
For instance, it might be useful to pay some upfront cost to have a Set, then have fast membership queries, but that is only useful if you know that you will have such a pattern where you have a few sets and lots of queries to those sets. Other data structure are optimal for other patterns, and it's interesting to note that from a API/specification/interface point of view, they are usually the same to the consumer. That's a general phenomena in any languages, and why many people love abstract data types/modules in programming.
Using foldr and expecting to be faster really encodes the assumption that, given your static knowledge of your future access pattern, the values you are likely to test membership of will sit at the beginning. Using foldl would be fine if you expect your values to be at the end of it.
Note that using foldl, you might construct the entire list, you do not construct the values themselves, until you need it of course, for instance to test for equality, as long as you have not found the searched element.
I read the old Russian translate of the Learn You a Haskell for Great Good! book. I see that the current English version (online) is newer, therefore I look it time of time also.
The quote:
When you put together two lists (even if you append a singleton list
to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to
walk through the whole list on the left side of ++. That's not a
problem when dealing with lists that aren't too big. But putting
something at the end of a list that's fifty million entries long is
going to take a while. However, putting something at the beginning of
a list using the : operator (also called the cons operator) is
instantaneous.
I assumed that Haskell has to walk through the whole list to get the last item of the list for the foldr, foldr1, scanr and scanr1 functions. Also I assumed that Haskell will do the same for getting a previous element (and so on for each item).
But I see I was mistaken:
UPD
I try this code and I see the similar time of processing for both cases:
data' = [1 .. 10000000]
sum'r = foldr1 (\x acc -> x + acc ) data'
sum'l = foldl1 (\acc x -> x + acc ) data'
Is each list of Haskell bidirectional? I assume that for getting last item of list Haskell at first are to iterate each item and to remember the necessary item (last item for example) for getting (later) the previous item of bidirectional list (for lazy computation). Am I right?
It's tricky since Haskell is lazy.
Evaluating head ([1..1000000]++[1..1000000]) will return immediately, with 1. The lists will never be fully created in memory: only the first element of the first list will be.
If you instead demand the full list [1..1000000]++[1..1000000] then ++ will indeed have to create a two-million long list.
foldr may or may not evaluate the full list. It depends on whether the function we use is lazy. For example, here's map f xs written using foldr:
foldr (\y ys -> f y : ys) [] xs
This is efficient as map f xs is: lists cells are produced on demand, in a streaming fashion. If we need only the first ten elements of the resulting list, then we indeed create only the first ten cells -- foldr will not be applied to the rest of the list. If we need the full resulting list, then foldr will be run over the full list.
Also note that xs++ys can be defined similarly in terms of foldr:
foldr (:) ys xs
and has similar performance properties.
By comparison, foldl instead always runs over the whole list.
In the example you mention we have longList ++ [something], appending to the end of the list. This only costs constant time if all we demand is the first element of the resulting list. But if we really need the last element we added, then appending will need to run over the whole list. This is why appending at the end is considered O(n) instead of O(1).
In the last update, the question speaks about computing the sum with foldr vs foldl, using the (+) operator. In such case, since (+) is strict (it needs both arguments to compute result) then both folds witll need to scan the whole list. The performance in such cases can be comparable. Indeed, they would compute, respectively
1 + (2 + (3 + (4 + ..... -- foldr
(...(((1 + 2) + 3) +4) + .... -- foldl
By comparison foldl' would be more memory efficient, since it starts reducing the above sum before building the above giant expression. That is, it would compute 1+2 first (3), then 3+3 (6), then 6 + 4 (10),... keeping in memory only the last result (a single integer) while the list is being scanned.
To the OP: the topic of laziness is not easy to grasp the first time. It is quite vast -- you just met a ton of different examples which have subtle but significant performance differences. It's hard to explain everything succinctly -- it's just too broad. I'd recommend to focus on small examples and start digesting those first.
If want to pretend that Haskell is strict and I have an algorithm in mind that does not exploit laziness (so for instance it does not use infinite lists), what problems can occur if I used only strict data types and annotated any function that I use, to be strict in its arguments? Will there be a performance penalty, if so how bad; can worse problems occur? I know it is dirty, pointless and ugly to mindlessly make every function and data type strict, and I do not intend to do so in practice but I only want to understand if by doing so, Haskell becomes strict by default?
Secondly, if I tone down the paranoia, and only make the data structures strict: will I have to worry about space leaks brought about by a lazy implementation only when I am using some form of accumulation? In other words, assume that the algorithm would not exhibit a space leak in a strict language. Also assume that I implemented it in Haskell using only strict data structures, but was careful to use seq to evaluate any variable that was being passed on in a recursion, or used functions which internally are careful to do that (like fold'), would I avoid any space leaks? Remember that I am assuming that in a strict language, the same algorithm does not lead to a space leak. So it is a question about the implementation difference between lazy and strict.
The reason I ask the second question is because apart from cases where one is trying to take advantage of laziness by using a lazy data structure, or a spine strict one, all the examples of space leaks that I have seen until now, only involve thunks developing in an accumulator because it was not the function that was recursively called did not evaluate the accumulator before applying itself on it. I am aware that if one wants to take advantage of laziness then one has to be extra careful, but that caution would be needed in a strict by default language too.
Thank you.
Laziness speeding things up
You could be worse off. The naive definition of ++ is:
xs ++ ys = case xs of (x:xs) -> x : (xs ++ ys)
[] -> ys
Laziness makes this O(1), though it may also add O(1) processing to extract the cons. Without laziness, the ++ needs to be evaluated immediately causing an O(n) operation. (If you've never seen the O(.) notation, it is something computer science has stolen from engineers: given a function f the set O( f(n) ) is the set of all algorithms which are eventually at-worst-proportional to f(n), where n is the number of bits of input fed to the function. [Formally, there exists a k and N such that for all n > N the algorithm takes time less than k * f(n).] So I'm saying that laziness makes the above operation O(1) or eventually constant-time, but adds a constant overhead to each extraction, whereas strictness makes the operation O(n) or eventually linear in the number of list elements, assuming that those elements have a fixed size.
There are some practical examples here but the O(1) added processing time can potentially also "stack up" into an O(n) dependency, so the most obvious examples are O(n2) both ways. Still there can be a difference in these examples. For example, one situation that doesn't work well is using a stack (last-in first-out, which is the style of Haskell lists) for a queue (first-in first-out).
So here's a quick library consisting of strict left-folds; I've used case statements so that each line can be pasted into GHCi (with a let):
data SL a = Nil | Cons a !(SL a) deriving (Ord, Eq, Show)
slfoldl' f acc xs = case xs of Nil -> acc; Cons x xs' -> let acc' = f acc x in acc' `seq` slfoldl' f acc' xs'
foldl' f acc xs = case xs of [] -> acc; x : xs' -> let acc' = f acc x in acc' `seq` foldl' f acc' xs'
slappend xs ys = case xs of Nil -> ys; Cons x xs' -> Cons x (slappend xs' ys)
sl_test n = foldr Cons Nil [1..n]
test n = [1..n]
sl_enqueue xs x = slappend xs (Cons x Nil)
sl_queue = slfoldl' sl_enqueue Nil
enqueue xs x = xs ++ [x]
queue = foldl' enqueue []
The trick here is that both queue and sl_queue follow the xs ++ [x] pattern to append an element to the end of the list, which takes a list and builds up an exact copy of that list. GHCi can then run some simple tests. First we make two items and force their thunks to prove that this operation itself is quite fast and not too prohibitively expensive in memory:
*Main> :set +s
*Main> let vec = test 10000; slvec = sl_test 10000
(0.02 secs, 0 bytes)
*Main> [foldl' (+) 0 vec, slfoldl' (+) 0 slvec]
[50005000,50005000]
(0.02 secs, 8604632 bytes)
Now we do the actual tests: summing the queue-versions:
*Main> slfoldl' (+) 0 $ sl_queue slvec
50005000
(22.67 secs, 13427484144 bytes)
*Main> foldl' (+) 0 $ queue vec
50005000
(1.90 secs, 4442813784 bytes)
Notice that both of these suck in terms of memory-performance (the list-append stuff is still secretly O(n2)) where they eventually occupy gigabytes of space, but the strict version nevertheless occupies three times the space and takes ten times the time.
Sometimes the data structures should be changed
If you really want a strict queue, there are a couple options. One is finger trees as in Data.Sequence -- the viewr way they do things is a little complicated but works to get the rightmost elements. However that is a bit heavy and one common solution is O(1) amortized: define the structure
data Queue x = Queue !(SL x) !(SL x)
where the SL terms are the strict stacks above. Define a strict reverse, let's call it slreverse, the obvious way, then consider:
enqueue :: Queue x -> x -> Queue x
enqueue (Queue xs ys) el = Queue xs (Cons el ys)
dequeue :: Queue x -> Maybe (x, Queue x)
dequeue (Queue Nil Nil) = Nothing
dequeue (Queue Nil (Cons x xs)) = Just (x, Queue (slreverse xs) Nil)
dequeue (Queue (Cons x xs ys)) = Just (x, Queue xs ys)
This is "amortized O(1)": each time that a dequeue reverses the list, costing O(k) steps for some k, we ensure that we are creating a structure which won't have to pay these costs for k more steps.
Laziness hides errors
Another interesting point comes from the data/codata distinction, where data are finite structures traversed by recursion on subunits (that is, every data expression halts) while codata are the rest of the structures -- strict lists vs. streams. It turns out that when you properly make this distinction, there is no formal difference between strict data and lazy data -- the only formal difference between strict and lazy is how they handle terms within themselves which loop infinitely: strict will explore the loop and hence will also loop infinitely, while lazy will simply hand the infinite-loop onwards without descending into it.
As such you will find that x = slhead (Cons x undefined) will fail where head (x : undefined) succeeds. So you may "uncover" hidden infinite loops or bugs when you do this.
Caution when making "everything strict"
Not everything necessarily becomes strict when you use strict data structures in your language: notice that I made a point above to define strict foldl, not foldl, for both lists and strict-lists. Common data structures in Haskell will be lazy -- lists, tuples, stuff in popular libraries -- and explicit calls to seq still help when building up a complicated expression.
Thinking Functionally with Haskell provides the following code for calculating the mean of a list of Float's.
mean :: [Float] -> Float
mean [] = 0
mean xs = sum xs / fromIntegral (length xs)
Prof. Richard Bird comments:
Now we are ready to see what is really wrong with mean: it has a space leak. Evaluating mean [1..1000] will cause the list to be expanded and retained in memory after summing because there is a second pointer to it, namely in the computation of its length.
If I understand this text correctly, he's saying that, if there was no pointer to xs in the length computation, then the xs memory could've been freed after calculating the sum?
My confusion is - if the xs is already in memory, isn't the length function simply going to use the same memory that's already being taken up?
I don't understand the space leak here.
The sum function does not need to keep the entire list in memory; it can look at an element at a time then forget it as it moves to the next element.
Because Haskell has lazy evaluation by default, if you have a function that creates a list, sum could consume it without the whole list ever being in memory (each time a new element is generated by the producing function, it would be consumed by sum then released).
The exact same thing happens with length.
On the other hand, the mean function feeds the list to both sum and length. So during the evaluation of sum, we need to keep the list in memory so it can be processed by length later.
[Update] to be clear, the list will be garbage collected eventually. The problem is that it stays longer than needed. In such a simple case it is not a problem, but in more complex functions that operate on infinite streams, this would most likely cause a memory leak.
Others have explained what the problem is. The cleanest solution is probably to use Gabriel Gonzalez's foldl package. Specifically, you'll want to use
import qualified Control.Foldl as L
import Control.Foldl (Fold)
import Control.Applicative
meanFold :: Fractional n => Fold n (Maybe n)
meanFold = f <$> L.sum <*> L.genericLength where
f _ 0 = Nothing
f s l = Just (s/l)
mean :: (Fractional n, Foldable f) => f n -> Maybe n
mean = L.fold meanFold
if there was no pointer to xs in the length computation, then the xs memory could've been freed after calculating the sum?
No, you're missing the important aspect of lazy evaluation here. You're right that length will use the same memory as was allocated during the sum call, the memory in which we had expanded the whole list.
But the point here is that allocating memory for the whole list shouldn't be necessary at all. If there was no length computation but only the sum, then memory could've been freed during calculating the sum. Notice that the list [1..1000] is lazily generated only when it is consumed, so in fact the mean [1..1000] should run in constant space.
You might write the function like the following, to get an idea of how to avoid such a space leak:
import Control.Arrow
mean [] = 0
mean xs = uncurry (/) $ foldr (\x -> (x+) *** (1+)) (0, 0) xs
-- or more verbosely
mean xs = let (sum, len) = foldr (\x (s, l) -> (x+s, 1+l)) (0, 0)
in sum / len
which should traverse xs only once. However, Haskell is damn lazy - and computes the first tuple components only when evaluating sum and the second ones only later for len. We need to use some more tricks to actually force the evaluation:
{-# LANGUAGE BangPatterns #-}
import Data.List
mean [] = 0
mean xs = uncurry (/) $ foldl' (\(!s, !l) x -> (x+s, 1+l)) (0,0) xs
which really runs in constant space, as you can confirm in ghci by using :set +s.
The space leak is that the entire evaluated xs is held in memory for the length function. This is wasteful, as we aren't going to be using the actual values of the list after evaluating sum, nor do we need them all in memory at the same time, but Haskell doesn't know that.
A way to remove the space leak would be to recalculate the list each time:
sum [1..1000] / fromIntegral (length [1..1000])
Now the application can immediately start discarding values from the first list as it is evaluating sum, since it is not referenced anywhere else in the expression.
The same applies for length. The thunks it generates can be marked for deletion immediately, since nothing else could possibly want it evaluated further.
EDIT:
Implementation of sum in Prelude:
sum l = sum' l 0
where
sum' [] a = a
sum' (x:xs) a = sum' xs (a+x)
The canonical implementation of length :: [a] -> Int is:
length [] = 0
length (x:xs) = 1 + length xs
which is very beautiful but suffers from stack overflow as it uses linear space.
The tail-recursive version:
length xs = length' xs 0
where length' [] n = n
length' (x:xs) n = length xs (n + 1)
doesn't suffer from this problem, but I don't understand how this can run in constant space in a lazy language.
Isn't the runtime accumulating numerous (n + 1) thunks as it moves through the list? Shouldn't this function Haskell to consume O(n) space and lead to stack overflow?
(if it matters, I'm using GHC)
Yes, you've run into a common pitfall with accumulating parameters. The usual cure is to force strict evaluation on the accumulating parameter; for this purpose I like the strict application operator $!. If you don't force strictness, GHC's optimizer might decide it's OK for this function to be strict, but it might not. Definitely it's not a thing to rely on—sometimes you want an accumulating parameter to be evaluated lazily and O(N) space is just fine, thank you.
How do I write a constant-space length function in Haskell?
As noted above, use the strict application operator to force evaluation of the accumulating parameter:
clength xs = length' xs 0
where length' [] n = n
length' (x:xs) n = length' xs $! (n + 1)
The type of $! is (a -> b) -> a -> b, and it forces the evaluation of the a before applying the function.
Running your second version in GHCi:
> length [1..1000000]
*** Exception: stack overflow
So to answer your question: Yes, it does suffer from that problem, just as you expect.
However, GHC is smarter than the average compiler; if you compile with optimizations turned out, it'll fix the code for you and make it work in constant space.
More generally, there are ways to force strictness at specific points in Haskell code, preventing the building of deeply nested thunks. A usual example is foldl vs. foldl':
len1 = foldl (\x _ -> x + 1) 0
len2 = foldl' (\x _ -> x + 1) 0
Both functions are left folds that do the "same" thing, except that foldl is lazy while foldl' is strict. The result is that len1 dies with a stack overflow in GHCi, while len2 works correctly.
A tail-recursive function doesn't need to maintain a stack, since the value returned by the function is simply going to be the value returned by the tail call. So instead of creating a new stack frame, the current one gets re-used, with the locals overwritten by the new values passed into the tail call. So every n+1 gets written into the same place where the old n was, and you have constant space usage.
Edit - Actually, as you've written it, you're right, it'll thunk the (n+1)s and cause an overflow. Easy to test, just try length [1..1000000].. You can fix that by forcing it to evaluate it first: length xs $! (n+1), which will then work as I said above.