The performance of (++) with lazy evaluation

The performance of (++) with lazy evaluation - haskell

I have been wondering about this a lot, and I haven't found satisfying answers.
Why is (++) "expensive"? Under lazy evaluation, we won't evaluate an expression like
xs ++ ys
before necessary, and even then, we will only evaluate the part we need, when we need them.
Can someone explain what I'm missing?

If you access the whole resulting list, lazy evaluation won't save any computation. It will only delay it until you need each particular element, but at the end, you have to compute the same thing.
If you traverse the concatenated list xs ++ ys, accessing each element of the first part (xs) adds a little constant overhead, checking if xs was spent or not.
So, it makes a big difference if you associate ++ to the left or to the right.
If you associate n lists of length k to the left like (..(xs1 ++ xs2) ... ) ++ xsn then accessing each of the first k elements will take O(n) time, accessing each of the next k ones will take O(n-1) etc. So traversing the whole list will take O(k n^2). You can check that
sum $ foldl (++) [] (replicate 100000 [1])
takes really long.
If you associate n lists of length k to the right like xs1 ++ ( ..(xsn_1 ++ xsn) .. ) then you'll get only constant overhead for each element, so traversing the whole list will be only O(k n). You can check that
sum $ foldr (++) [] (replicate 100000 [1])
is quite reasonable.
Edit: This is just the magic hidden behind ShowS. If you convert each string xs to showString xs :: String -> String (showString is just an alias for (++)) and compose these functions, then no matter how you associate their composition, at the end they will be applied from right to left - just what we need to get the linear time complexity. (This is simply because (f . g) x is f (g x).)
You can check that both
length $ (foldl (.) id (replicate 1000000 (showString "x"))) ""
and
length $ (foldr (.) id (replicate 1000000 (showString "x"))) ""
run in a reasonable time (foldr is a bit faster because it has less overhead when composing functions from the right, but both are linear in the number of elements).

It's not too expensive on its own, the problem arises when you start combining a whole lot of ++ from left to right: such a chain is evaluated like
( ([1,2] ++ [3,4]) ++ [5,6] ) ++ [7,8]
≡ let a = ([1,2] ++ [3,4]) ++ [5,6]
≡ let b = [1,2] ++ [3,4]
≡ let c = [1,2]
in head c : tail c ++ [3,4]
≡ 1 : [2] ++ [3,4]
≡ 1 : 2 : [] ++ [3,4]
≡ 1 : 2 : [3,4]
≡ [1,2,3,4]
in head b : tail b ++ [5,6]
≡ 1 : [2,3,4] ++ [5,6]
≡ 1:2 : [3,4] ++ [5,6]
≡ 1:2:3 : [4] ++ [5,6]
≡ 1:2:3:4 : [] ++ [5,6]
≡ 1:2:3:4:[5,6]
≡ [1,2,3,4,5,6]
in head a : tail a ++ [7,8]
≡ 1 : [2,3,4,5,6] ++ [7,8]
≡ 1:2 : [3,4,5,6] ++ [7,8]
≡ 1:2:3 : [4,5,6] ++ [7,8]
≡ 1:2:3:4 : [5,6] ++ [7,8]
≡ 1:2:3:4:5 : [6] ++ [7,8]
≡ 1:2:3:4:5:6 : [] ++ [7,8]
≡ 1:2:3:4:5:6 : [7,8]
≡ [1,2,3,4,5,6,7,8]
where you clearly see the quadratic complexity. Even if you only want to evaluate up to the n-th element, you still have to dig your way through all those lets. That's why ++ is infixr, for [1,2] ++ ( [3,4] ++ ([5,6] ++ [7,8]) ) is actually much more efficient. But if you're not careful while designing, say, a simple serialiser, you may easily end up with a chain like the one above. This is the main reason why beginners are warned about ++.
That aside, Prelude.++ is slow compared to e.g. Bytestring operations for the simple reason that it works by traversing linked lists, which have always suboptimal cache usage etc., but that's not as problematic; this prevents you from achieving C-like performance but properly written programs using only plain lists and ++ can still easily rival similar programs written in e.g. Python.

I would like to add one thing or two to Petr's answer.
As he pointed out, repeatedly appending lists at the beginning is quite cheap, while appending to the bottom is not. This is true as long as you use haskell's lists.
However, there are certain circumstances in which you HAVE TO append to the end (e.g., you are building a string to be printed out). With regular lists you have to deal with the quadratic complexity mentioned by his answer, but there's a way better solution in these cases: difference lists (see also my question on the topic).
Long story short, by describing lists as compositions of functions instead of concatenation of shorter lists you are able to append lists or individual elements at the beginning or at the end of your difference list by composing functions, in constant time. Once you're done, you can extract a regular list in linear time (in the number of elements).

Related

Why does function concat use foldr? Why not foldl'

In most resources it is recommended to use foldl', but that cause of using foldr in concat instead of foldl'?

EDIT I talk about laziness and productivity in this answer, and in my excitement I forgot a very important point that jpmariner focuses on in their answer: left-associating (++) is quadratic time!
foldl' is appropriate when your accumulator is a strict type, like most small types such as Int, or even large spine-strict data structures like Data.Map. If the accumulator is strict, then the entire list must be consumed before any output can be given. foldl' uses tail recursion to avoid blowing up the stack in these cases, but foldr doesn't and will perform badly. On the other hand, foldl' must consume the entire list in this way.
foldl f z [] = z
foldl f z [1] = f z 1
foldl f z [1,2] = f (f z 1) 2
foldl f z [1,2,3] = f (f (f z 1) 2) 3
The final element of the list is required to evaluate the outermost application, so there is no way to partially consume the list. If we expand this with (++), we will see:
foldl (++) [] [[1,2],[3,4],[5,6]]
= (([] ++ [1,2]) ++ [3,4]) ++ [5,6]
^^
= ([1,2] ++ [3,4]) ++ [5,6]
= ((1 : [2]) ++ [3,4]) ++ [5,6]
^^
= (1 : ([2] ++ [3,4])) ++ [5,6]
^^
= 1 : (([2] ++ [3,4]) ++ [5,6])
(I admit this looks a little magical if you don't have a good feel for cons lists; it's worth getting dirty with the details though)
See how we have to evaluate every (++) (marked with ^^ when they are evaluated) on the way down before before the 1 bubbles out to the front? The 1 is "hiding" under function applications until then.
foldr, on the other hand, is good for non-strict accumulators like lists, because it allows the accumulator to yield information before the entire list is consumed, which can bring many classically linear-space algorithms down to constant space! This also means that if your list is infinite, foldr is your only choice, unless your goal is to heat your room using your CPU.
foldr f z [] = z
foldr f z [1] = f 1 z
foldr f z [1,2] = f 1 (f 2 z)
foldr f z [1,2,3] = f 1 (f 2 (f 3 z))
foldr f z [1..] = f 1 (f 2 (f 3 (f 4 (f 5 ...
We have no trouble expressing the outermost applications without having to see the entire list. Expanding foldr the same way we did foldl:
foldr (++) z [[1,2],[3,4],[5,6]]
= [1,2] ++ ([3,4] ++ ([5,6] ++ []))
= (1 : [2]) ++ (3,4] ++ ([5,6] ++ []))
^^
= 1 : ([2] ++ ([3,4] ++ ([5,6] ++ [])))
1 is yielded immediately without having to evaluate any of the (++)s but the first one. Because none of those (++)s are evaluated, and Haskell is lazy, they don't even have to be generated until more of the output list is consumed, meaning concat can run in constant space for a function like this
concat [ [1..n] | n <- [1..] ]
which in a strict language would require intermediate lists of arbitrary length.
If these reductions look a little too magical, and if you want to go deeper, I suggest examining the source of (++) and doing some simple manual reductions against its definition to get a feel for it. (Just remember [1,2,3,4] is notation for 1 : (2 : (3 : (4 : [])))).
In general, the following seems to be a strong rule of thumb for efficiency: use foldl' when your accumulator is a strict data structure, and foldr when it's not. And if you see a friend using regular foldl and don't stop them, what kind of friend are you?

cause of using foldr in concat instead of foldl' ?
What if the result gets fully evaluated ?
If you consider [1,2,3] ++ [6,7,8] within an imperative programming mindset, all you have to do is redirect the next pointer at node 3 towards node 6, assuming of course you may alter your left side operand.
This being Haskell, you may NOT alter your left side operand, unless the optimizer is able to prove that ++ is the sole user of its left side operand.
Short of such a proof, other Haskell expressions pointing to node 1 have every right to assume that node 1 is forever at the beginning of a list of length 3. In Haskell, the properties of a pure expression cannot be altered during its lifetime.
So, in the general case, operator ++ has to do its job by duplicating its left side operand, and the duplicate of node 3 may then be set to point to node 6. On the other hand, the right side operand can be taken as is.
So if you fold the concat expression starting from the right, each component of the concatenation must be duplicated exactly once. But if you fold the expression starting from the left, you are facing a lot of repetitive duplication work.
Let's try to check that quantitatively. To ensure that no optimizer will get in the way by proving anything, we'll just use the ghci interpreter. Its strong point is interactivity not optimization.
So let's introduce the various candidates to ghci, and switch statistics mode on:
$ ghci
λ>
λ> myConcat0 = L.foldr (++) []
λ> myConcat1 = L.foldl (++) []
λ> myConcat2 = L.foldl' (++) []
λ>
λ> :set +s
λ>
We'll force full evaluation by using lists of numbers and printing their sum.
First, let's get baseline performance by folding from the right:
λ>
λ> sum $ concat [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,104 bytes)
λ>
λ> sum $ myConcat0 [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,144 bytes)
λ>
Second, let's fold from the left, to see whether that improves matters or not.
λ>
λ> sum $ myConcat1 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.26 secs, 4,296,646,240 bytes)
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.28 secs, 4,295,918,560 bytes)
λ>
So folding from the left allocates much more transient memory and takes much more time, probably because of this repetitive duplication work.
As a last check, let's double the problem size:
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..20000::Integer] ]
200010000
(5.91 secs, 17,514,447,616 bytes)
λ>
We see that doubling the problem size causes the resource consumptions to get multiplied by about 4. Folding from the left has quadratic cost in the case of concat.
Looking at the excellent answer by luqui, we see that both concerns:
the need to be able to access the beginning of the result list lazily
the need to avoid quadratic cost for full evaluation
happen to vote both in the same way, that is in favor of folding from the right.
Hence the Haskell library concat function using foldr.
Addendum:
After running some tests using GHC v8.6.5 with -O3 option instead of ghci, it appears that my preconceived idea of the optimizer messing up with the measurements was erroneous.
Even with -O3, for a problem size of 20,000, the foldr-based concat function is about 500 times faster that the foldl'-based one.
So either the optimizer fails to prove that it is OK to alter/reuse the left operand, or it just does not try at all.

Avoiding ++ in Haskell

In this answer on CodeReview, the question asker and the answerer both seem to show disdain for the (++) operator. Is this due to it's speed (causing the algorithm to explicitly run in O(n^2) where n is the length of the list iirc)? Is this a pre-optimization if not otherwise tested, as Haskell is known for being difficult to reason about time complexity? Should others avoid the (++) operator in their programs?

It depends.
Consider the expression
foldl (++) [] list
This expression concatenates a list of lists into a single list, but has aforementioned quadratic complexity. This happens because the implementation of (++) traverses the entire left list and prepends each element to the right list (while keeping the correct order of course).
Using a right fold, we get linear complexity:
foldr (++) [] list
This is due to the (++) operator's implementation, which traverses only the left argument and prepends it to the right.
[1,2] ++ [3,4] ++ [5,6]
is equal to
-- Example as created by foldr
[1,2] ++ ([3,4] ++ [5,6])
== [1,2] ++ [3,4,5,6]
== [1,2,3,4,5,6] -- all good, no element traversed more than once
which only traverses each list element once.
Now switching the parentheses around to the first two lists is more expensive, since now some elements are traversed multiple times, which is inefficient.
-- Example as created by foldl
([1,2] ++ [3,4]) ++ [5,6]
== [1,2,3,4] ++ [5,6]
== [1,2,3,4,5,6] -- the sublist [1,2] was traversed twice due to the ordering of the appends
All in all, watch out for such cases and you should be fine.

Why xs ++ acc works when using foldr on an infinite list, but acc ++ xs doesn't?

I know the question title might be misleading, because I'm not concatenating an infinite list with anything here. Feel free to propose something more suitable.
Here is a working implementation of the cycle function from Prelude using foldr:
fold_cycle :: [a] -> [a]
fold_cycle xs = foldr step [] [1..]
where step x acc = xs ++ acc
If we switch the operands of ++ to acc ++ xs, this function no longer works. It produces a stack overflow, which, by my understanding, is the result of trying to produce a never-ending expression for later evaluation.
I have trouble understanding what is the reason behind this. My reasoning is that regardless of the order of operands, foldr should evaluate step once, produce the new accumulator and proceed to evaluate the step function again if necessary. Why is there a difference?

foldr doesn't evaluate the accumulator at all if it's not forced. fold_cycle works precisely because it doesn't necessarily evaluate acc.
fold_cycle [1, 2]
reduces to
[1, 2] ++ ([1, 2] ++ ([1, 2] ++ ([1, 2] ++ ...
Which allows us to evaluate prefixes of the result because ++ lets us traverse the first argument without evaluating the second.
If we use step _ acc = acc ++ xs, the parentheses in the above expression associate to the left instead of right. But since we have an infinite number of appends, the expression ends up like this:
((((((((((((((... -- parentheses all the way down
Intuitively, we would have to step over an infinite number of parentheses to inspect the first element of the resulting list.

Is there a good strategy to make a function a productive function?

EDIT: It seems what I was calling "lazy" here, isn't what "lazy" means. I am not sure what the proper term is. Some people are saying that the term I am looking for is "productive", but I can't find any definitions of that term in this context. What I want is a function that can work with infinite lists. I will change any "lazy" into "productive", using my best understanding of the term.
The function
f a = a:(f (a-1))
generates an infinite list, in a productive way. Because of the a: being in front of every other evaluation.
This means you can do take 10 (f 0) and it's fine.
However, the function
h a = h (a ++ (map (-1) a))
isn't productive, and will never terminate. Since the a ++ is inside of another evaluation.
Because of this, you cannot do head (h [0]), even though it is clear that it is 0.
Is there a general strategy I can apply to turn a non-productive function into a productive function?
Specifically, the problem I'm trying to solve is to make the following productive while lazily consuming its second argument:
binarily a [] = a
binarily a (b:bs) = binarily (a ++ map (b+) a) bs

h generates a growing sequence. On [0] for example:
[0] ++
[0, -1] ++
[0, -1, -1, -2] ++
[0, -1, -1, -2, -1, -2, -2, -3] ++ ...
Note that it shows the pattern [x, f x, f (f x), …]—at each step, you’re computing one more iteration of the function. That’s exactly what iterate :: (a -> a) -> a -> [a] is for, and the fold of ++s is exactly concat:
h = concat . iterate go
where go x = x ++ map (subtract 1) x
Here’s one implementation of binarily using the same principle:
binarily a bs
= concatMap fst
. takeWhile (not . null . snd)
$ iterate go (a, bs)
where
go (acc, b : bs) = (acc ++ map (b +) acc, bs)
go x = x
We iterate the function and take elements from the stream While bs (snd) is not . null—if it’s infinite, then this just takes the whole stream—and then we concat the intermediate accumulators (map fst).
You’ll notice that if you didn’t have the takeWhile in there, you would end up with an infinitely repeating series of tuples where the snd is []. So what we’re doing is streaming until we hit a fixed point, i.e., turning recursion (fix) into streaming. :)

The 'general strategy' is to define your function so that it can compute a part of the result value before recursing.
In f, the topmost expression is an application of the (:) function, which is non-strict in its second argument. This means that it doesn't even need to evaluate f (a-1). if you don't need the remainder of the list.
In h, the first thing the function does is to recurse - i.e. it doesn't produce a "partial result".
Your binarily function actually is "lazy": it's non-strict in its first argument, so
take 10 $ binarily [1..] [1..5]
terminates.

Yet another, perhaps more "concrete" way of writing your lazy/productive binarily:
binarily a l = a ++ binRest a l
binRest a [] = []
binRest a (b:bs) = a' ++ binRest (a ++ a') bs
where
a' = map (b+) a
EDIT: I was asked for some explanation of my thought process. Let's start by looking at what the binarily in the original post passes as its first argument at each step, if we start with binarily a (b1:b2:b3:...):
a
a ++ map (b1+) a
a ++ map (b1+) a ++ map (b2+) (a ++ map (b1+) a)
a ++ map (b1+) a ++ map (b2+) (a ++ map (b1+) a) ++ map (b3+) (a ++ map (b1+) a ++ map (b2+) (a ++ map (b1+) a))
It is clear that we can produce the a ++ pretty immediately, and then in the next step map (b1+) is applied to that, so a straight concat $ iterate ... a like in #JonPurdy's answer first seems like it should work. Actually, because we go through the bs list, the scanl function is a better match than iterate.
But if we try that, we see there is still a mismatch: the part added in the third step above is not a function of the part added to the argument in the second step, but of the whole argument in the second step. concat $ scanl ... doesn't quite fit this.
It turns out, in fact, that the produced part in the very first step doesn't fit the regular pattern of all the rest.
Thus I split into two functions: First, binarily, which handles what to produce in the first step, and then passes on to binRest for all the other steps.
Secondly, binRest, which takes as its first argument everything produced so far, and uses it to calculate what to produce in this step, and then recurses.

The proper term that you were looking for is productive, not just "lazy"...
head (h [10]) is just not defined, at all. The reduction sequence is: h [10] => h [10,9] => h [10,9,9,8] => h [10,9,9,8,9,8,8,7] => .... It's true that the head of this internal sequence is always the same, 10, but the reduction itself never stops. And no, it's not the same as f 10 => [10,9,8,7,....
Your function,
binarily a [] = a
binarily a (b:bs) = binarily (a ++ map (b+) a) bs
{- try it out with [b1,b2,b3,b4] :
a b1 the arguments received;
a1#(a ++ (map (b1+) a)) b2 if we've already produced a, we just need
to produce (map (b1+) a) next
a2#(a1 ++ (map (b2+) a1)) b3 if we've already produced a1, we just need
to produce (map (b2+) a1) next
a3#(a2 ++ (map (b3+) a2)) b4 ai#... name the interim values
a4#(a3 ++ (map (b4+) a3)) [] a4 is returned -}
is equivalent to
import Data.List (mapAccumL)
-- mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])
binarily a bs = (last . snd) (mapAccumL g a bs)
where
g a b = let anew = a ++ map (b+) a
in (anew, anew) -- (next_accum, part_result)
mapAccumL captures the pattern of accumulating and producing parts of the full result at the same time. The list :: [y] (the snd field of its return value) is produced lazily, built from all ys as they are returned by the step function which is called for each x in the list :: [x] (your (b:bs)). So as long as we're ignoring the final accumulated value in the fst field of the result, the function works with infinite input too.
Obviously part of the next result is present in the previous one here, and can be returned right away:
binarily a bs = a ++ (concat . snd) (mapAccumL g a bs)
where
g a b = let -- for each b in bs:
res = map (b+) a -- this part of the result
anew = a ++ res -- next accumulator value
in (anew, res)

Find the K'th element of a list using foldr and function application ($) explanation

I'm currently at 6th chapter of Learn you a Haskell... Just recently started working my way on 99 questions.
The 3rd problem is to find the K'th element of a list. I've implemented it using take and zip.
The problem I have is understanding the alternate solution offered:
elementAt''' xs n = head $ foldr ($) xs
$ replicate (n - 1) tail
I'm "almost there" but I don't quite get it. I know the definition of the $ but.. Can you please explain to me the order of the execution of the above code. Also, is this often used as a solution to various problems, is this idiomatic or just... acrobatic ?

If you expand the definition of foldr
foldr f z (x1:x2:x3:...:[]) = x1 `f` x2 `f` x3 `f`... `f` z
you see that elementAt''' becomes
elementAt''' xs n = head (tail $ tail $ ... $ tail $ xs)
(note: it should be replicate n tail instead of replicate (n-1) tail if indexing is 0-based).
So you apply tail to xs the appropriate number of times, which has the same result as drop (n-1) xs if xs is long enough, but raises an error if it's too short, and take the head of the resulting list (if xs is too short, that latter would also raise an error with drop (n-1)).
What it does is thus
discard the first element of the list
discard the first element of the resulting list (n-1 times altogether)
take the head of the resulting list
Also, is this often used as a solution to various problems, is this idiomatic or just... acrobatic
In this case, just acrobatic. The foldr has to expand the full application before it can work back to the front taking the tails, thus it's less efficient than the straightforward traversal.

Break it down into the two major steps. First, the function replicates tail
(n-1) times. So you end up with something like
elementAt''' xs n = head $ foldr ($) xs [tail, tail, tail, ..., tail]
Now, the definition of foldr on a list expands to something like this
foldr f x [y1, y2, y3, ..., yn] = (y1 `f` (y1 `f` (... (yn `f` x))) ...)
So, that fold will expand to (replace f with $ and all the ys with tail)
foldr ($) xs [tail, tail, tail, ..., tail]
= (tail $ (tail $ (tail $ ... (tail xs))) ... )
{- Since $ is right associative anyway -}
= tail $ tail $ tail $ tail $ ... $ tail xs
where there are (n-1) calls to tail composed together. After taking n-1
tails, it just extracts the first element of the remaining list and gives that
back.
Another way to write it that makes the composition more explicit (in my opinion) would
be like this
elementAt n = head . (foldr (.) id $ replicate (n-1) tail)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

The performance of (++) with lazy evaluation - haskell

Related

Why does function concat use foldr? Why not foldl'

Avoiding ++ in Haskell

Why xs ++ acc works when using foldr on an infinite list, but acc ++ xs doesn't?

Is there a good strategy to make a function a productive function?

Find the K'th element of a list using foldr and function application ($) explanation

Categories

Resources