Avoiding ++ in Haskell - haskell

In this answer on CodeReview, the question asker and the answerer both seem to show disdain for the (++) operator. Is this due to it's speed (causing the algorithm to explicitly run in O(n^2) where n is the length of the list iirc)? Is this a pre-optimization if not otherwise tested, as Haskell is known for being difficult to reason about time complexity? Should others avoid the (++) operator in their programs?

It depends.
Consider the expression
foldl (++) [] list
This expression concatenates a list of lists into a single list, but has aforementioned quadratic complexity. This happens because the implementation of (++) traverses the entire left list and prepends each element to the right list (while keeping the correct order of course).
Using a right fold, we get linear complexity:
foldr (++) [] list
This is due to the (++) operator's implementation, which traverses only the left argument and prepends it to the right.
[1,2] ++ [3,4] ++ [5,6]
is equal to
-- Example as created by foldr
[1,2] ++ ([3,4] ++ [5,6])
== [1,2] ++ [3,4,5,6]
== [1,2,3,4,5,6] -- all good, no element traversed more than once
which only traverses each list element once.
Now switching the parentheses around to the first two lists is more expensive, since now some elements are traversed multiple times, which is inefficient.
-- Example as created by foldl
([1,2] ++ [3,4]) ++ [5,6]
== [1,2,3,4] ++ [5,6]
== [1,2,3,4,5,6] -- the sublist [1,2] was traversed twice due to the ordering of the appends
All in all, watch out for such cases and you should be fine.

Related

Why does function concat use foldr? Why not foldl'

In most resources it is recommended to use foldl', but that cause of using foldr in concat instead of foldl'?
EDIT I talk about laziness and productivity in this answer, and in my excitement I forgot a very important point that jpmariner focuses on in their answer: left-associating (++) is quadratic time!
foldl' is appropriate when your accumulator is a strict type, like most small types such as Int, or even large spine-strict data structures like Data.Map. If the accumulator is strict, then the entire list must be consumed before any output can be given. foldl' uses tail recursion to avoid blowing up the stack in these cases, but foldr doesn't and will perform badly. On the other hand, foldl' must consume the entire list in this way.
foldl f z [] = z
foldl f z [1] = f z 1
foldl f z [1,2] = f (f z 1) 2
foldl f z [1,2,3] = f (f (f z 1) 2) 3
The final element of the list is required to evaluate the outermost application, so there is no way to partially consume the list. If we expand this with (++), we will see:
foldl (++) [] [[1,2],[3,4],[5,6]]
= (([] ++ [1,2]) ++ [3,4]) ++ [5,6]
^^
= ([1,2] ++ [3,4]) ++ [5,6]
= ((1 : [2]) ++ [3,4]) ++ [5,6]
^^
= (1 : ([2] ++ [3,4])) ++ [5,6]
^^
= 1 : (([2] ++ [3,4]) ++ [5,6])
(I admit this looks a little magical if you don't have a good feel for cons lists; it's worth getting dirty with the details though)
See how we have to evaluate every (++) (marked with ^^ when they are evaluated) on the way down before before the 1 bubbles out to the front? The 1 is "hiding" under function applications until then.
foldr, on the other hand, is good for non-strict accumulators like lists, because it allows the accumulator to yield information before the entire list is consumed, which can bring many classically linear-space algorithms down to constant space! This also means that if your list is infinite, foldr is your only choice, unless your goal is to heat your room using your CPU.
foldr f z [] = z
foldr f z [1] = f 1 z
foldr f z [1,2] = f 1 (f 2 z)
foldr f z [1,2,3] = f 1 (f 2 (f 3 z))
foldr f z [1..] = f 1 (f 2 (f 3 (f 4 (f 5 ...
We have no trouble expressing the outermost applications without having to see the entire list. Expanding foldr the same way we did foldl:
foldr (++) z [[1,2],[3,4],[5,6]]
= [1,2] ++ ([3,4] ++ ([5,6] ++ []))
= (1 : [2]) ++ (3,4] ++ ([5,6] ++ []))
^^
= 1 : ([2] ++ ([3,4] ++ ([5,6] ++ [])))
1 is yielded immediately without having to evaluate any of the (++)s but the first one. Because none of those (++)s are evaluated, and Haskell is lazy, they don't even have to be generated until more of the output list is consumed, meaning concat can run in constant space for a function like this
concat [ [1..n] | n <- [1..] ]
which in a strict language would require intermediate lists of arbitrary length.
If these reductions look a little too magical, and if you want to go deeper, I suggest examining the source of (++) and doing some simple manual reductions against its definition to get a feel for it. (Just remember [1,2,3,4] is notation for 1 : (2 : (3 : (4 : [])))).
In general, the following seems to be a strong rule of thumb for efficiency: use foldl' when your accumulator is a strict data structure, and foldr when it's not. And if you see a friend using regular foldl and don't stop them, what kind of friend are you?
cause of using foldr in concat instead of foldl' ?
What if the result gets fully evaluated ?
If you consider [1,2,3] ++ [6,7,8] within an imperative programming mindset, all you have to do is redirect the next pointer at node 3 towards node 6, assuming of course you may alter your left side operand.
This being Haskell, you may NOT alter your left side operand, unless the optimizer is able to prove that ++ is the sole user of its left side operand.
Short of such a proof, other Haskell expressions pointing to node 1 have every right to assume that node 1 is forever at the beginning of a list of length 3. In Haskell, the properties of a pure expression cannot be altered during its lifetime.
So, in the general case, operator ++ has to do its job by duplicating its left side operand, and the duplicate of node 3 may then be set to point to node 6. On the other hand, the right side operand can be taken as is.
So if you fold the concat expression starting from the right, each component of the concatenation must be duplicated exactly once. But if you fold the expression starting from the left, you are facing a lot of repetitive duplication work.
Let's try to check that quantitatively. To ensure that no optimizer will get in the way by proving anything, we'll just use the ghci interpreter. Its strong point is interactivity not optimization.
So let's introduce the various candidates to ghci, and switch statistics mode on:
$ ghci
λ>
λ> myConcat0 = L.foldr (++) []
λ> myConcat1 = L.foldl (++) []
λ> myConcat2 = L.foldl' (++) []
λ>
λ> :set +s
λ>
We'll force full evaluation by using lists of numbers and printing their sum.
First, let's get baseline performance by folding from the right:
λ>
λ> sum $ concat [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,104 bytes)
λ>
λ> sum $ myConcat0 [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,144 bytes)
λ>
Second, let's fold from the left, to see whether that improves matters or not.
λ>
λ> sum $ myConcat1 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.26 secs, 4,296,646,240 bytes)
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.28 secs, 4,295,918,560 bytes)
λ>
So folding from the left allocates much more transient memory and takes much more time, probably because of this repetitive duplication work.
As a last check, let's double the problem size:
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..20000::Integer] ]
200010000
(5.91 secs, 17,514,447,616 bytes)
λ>
We see that doubling the problem size causes the resource consumptions to get multiplied by about 4. Folding from the left has quadratic cost in the case of concat.
Looking at the excellent answer by luqui, we see that both concerns:
the need to be able to access the beginning of the result list lazily
the need to avoid quadratic cost for full evaluation
happen to vote both in the same way, that is in favor of folding from the right.
Hence the Haskell library concat function using foldr.
Addendum:
After running some tests using GHC v8.6.5 with -O3 option instead of ghci, it appears that my preconceived idea of the optimizer messing up with the measurements was erroneous.
Even with -O3, for a problem size of 20,000, the foldr-based concat function is about 500 times faster that the foldl'-based one.
So either the optimizer fails to prove that it is OK to alter/reuse the left operand, or it just does not try at all.

Haskell Cycle function

The code for cycle is as follows
cycle :: [a] -> [a]
cycle [] = errorEmptyList "cycle"
cycle xs = xs' where xs' = xs ++ xs'
I would appreciate an explanation on how the last line works. I feel that it would go off into an infinite recursion without returning or shall I say without printing anything on the screen. I guess my intuition is wrong.
A list, like basically everything else in Haskell, is lazily evaluated.
Roughly, to make an OOP analogy, you can think of a list as a sort of "iterator object". When queried, it reports on whether there is a next element, and if so, what is such element and what is the tail of the list (this being another "iterator object").
A list defined as
xs = 1 : xs
does not cause non termination. It corresponds to an "iterator object" o that, when queried, answers: "the next element is 1, and the rest of the list can be queried using o". Basically, it returns itself.
This is no different than a list having as a tail a "pointer" to the list itself: a circular list. This takes a constant amount of space.
Appending with ++ works the same:
xs = [1] ++ xs
is identical to the previous list.
In your code, the part
where xs' = xs ++ xs'
crafts a list that starts with xs and then continues with the list itself xs'. Operationally, it is an "iterator object" o that returns, one by one, the elements of xs, and when the last element of xs is returned, it is paired with "you can query the rest of the list at o". Again, a back-pointer, which builds a sort of circular list.
Let's take out the last line separately:
cycle xs = xs' where xs' = xs ++ xs'
Now, let's try to reduce it:
cycle xs = xs ++ (xs ++ (xs ++ (xs ++ ...)))
You can see that it expands infinitely. But note that this is not how expressions get reduced in Haskell. Expressions will be reduced to WHNF when it's demanded. So, let's demand some values from cycle function:
ghci > take 1 $ cycle [1..]
[1]
This is how take function is implemented:
take n _ | n <= 0 = []
take _ [] = []
take n (x:xs) = x : take (n-1) xs
Now, because of pattern matching - the value n will be evaluated first. Since it is already in normal form, no further reduction needs to be done and it will be checked if it is less than or equal to zero. Since the condition fails, it will move on to the second condition. Here, it's second argument will be checked to see if it's equal to []. As usual, haskell will evaluate it to WHNF which will be 1:_. Here _ represents thunk. Now the whole expression will be reduced to 1:take 0 _. Since this value has to be printed in ghci, the whole 1:take 0 _ will be reduced again. Following the similar steps like above again, we will get 1:[] which reduces to [1].
Hence cycle [1,2,3] will get reduced to WHNF in the form (1:xs) and will be ultimately reduced to [1]. But if the cycle function, itself is strict in it's implementation, then it will just go into an infinite loop:
cycle :: NFData a => [a] -> [a]
cycle [] = []
cycle xs = let xs' = xs ++ xs'
in deepseq xs xs'
You can test that in ghci:
ghci > take 1 $ cycle [1..]
^CInterrupted.

Why xs ++ acc works when using foldr on an infinite list, but acc ++ xs doesn't?

I know the question title might be misleading, because I'm not concatenating an infinite list with anything here. Feel free to propose something more suitable.
Here is a working implementation of the cycle function from Prelude using foldr:
fold_cycle :: [a] -> [a]
fold_cycle xs = foldr step [] [1..]
where step x acc = xs ++ acc
If we switch the operands of ++ to acc ++ xs, this function no longer works. It produces a stack overflow, which, by my understanding, is the result of trying to produce a never-ending expression for later evaluation.
I have trouble understanding what is the reason behind this. My reasoning is that regardless of the order of operands, foldr should evaluate step once, produce the new accumulator and proceed to evaluate the step function again if necessary. Why is there a difference?
foldr doesn't evaluate the accumulator at all if it's not forced. fold_cycle works precisely because it doesn't necessarily evaluate acc.
fold_cycle [1, 2]
reduces to
[1, 2] ++ ([1, 2] ++ ([1, 2] ++ ([1, 2] ++ ...
Which allows us to evaluate prefixes of the result because ++ lets us traverse the first argument without evaluating the second.
If we use step _ acc = acc ++ xs, the parentheses in the above expression associate to the left instead of right. But since we have an infinite number of appends, the expression ends up like this:
((((((((((((((... -- parentheses all the way down
Intuitively, we would have to step over an infinite number of parentheses to inspect the first element of the resulting list.

Efficient version of 'inits'

That is, inits "abc" == ["","a","ab","abc"]
There is a standard version of inits in Data.List, but below I have written a version myself:
myInits = f id
where
f start (l:ls) = (start []):(f (start . (l:)) ls)
f start [] = [(start [])]
Whilst my version is quite a bit simpler than the standard version, I suspect it's not as good for efficiency reasons. I suspect that when myInits l is fully evaluated it takes O(n^2) space. Take for example, myTails, an implementation of tails:
myTails a#(_:ls) = a:(myTails ls)
myTails [] = [[]]
Which is almost exactly the same as the standard version and I suspect achieves O(n) space by reusing the tails of the lists.
Could someone explain:
Why my version of inits is bad.
Why another approach is better (either the standard one in Data.List or your own).
Your myInits uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits is O(n^2) requiring O(n^2) space. Fully evaluating inits also requires O(n^2) running time and space. Any version of inits will require O(n^2) space; lists built with : and [] can only share their tails, and there are no tails in common among the results of inits. The version of inits in Data.List uses an amortized time O(1) queue, much like the simpler queue described in the second half of a related answer. The Snoc referenced in the source code in Data.List is word play on Cons (another name for :) backwards - being able to append an item to the end of the list.
Briefly experimenting with these functions suggests myInits performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000 yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits to inits.
Strictness
There is one big difference between myInits and inits and between myTails and tails. They have different meanings when applied to undefined or _|_ (pronounced "bottom", another symbol we use for undefined).
inits has the strictness property inits (xs ++ _|_) = inits xs ++ _|_, which, when xs is the empty list [] says that inits will still yield at least one result when applied to undefined
inits (xs ++ _|_) = inits xs ++ _|_
inits ([] ++ _|_) = inits [] ++ _|_
inits _|_ = [[]] ++ _|_
inits _|_ = [] : _|_
We can see this experimentally
> head . inits $ undefined
[]
myInits does not have this property either for the empty list or for longer lists.
> head $ myInits undefined
*** Exception: Prelude.undefined
> take 3 $ myInits ([1,2] ++ undefined)
[[],[1]*** Exception: Prelude.undefined
We can fix this if we realize that f in myInits would yield start [] in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.
myInits' = f id
where
f start list = (start []):
case list of
(l:ls) -> f (start . (l:)) ls
[] -> []
This increase in laziness makes myInits' work just like inits.
> head $ myInits' undefined
[]
> take 3 $ myInits' ([1,2] ++ undefined)
[[],[1],[1,2]]
Similarly, the difference between your myTails and tails in Data.List is that tails yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_, but it actually obeys a much stronger rule that's hard to describe easily.
The prefix-functions building can be separated from their reification as actual lists:
diffInits = map ($ []) . scanl (\a x -> a . (x:)) id
This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):
diffInits _|_ == [] : _|_
diffInits (xs ++ _|_) == diffInits xs ++ _|_

The performance of (++) with lazy evaluation

I have been wondering about this a lot, and I haven't found satisfying answers.
Why is (++) "expensive"? Under lazy evaluation, we won't evaluate an expression like
xs ++ ys
before necessary, and even then, we will only evaluate the part we need, when we need them.
Can someone explain what I'm missing?
If you access the whole resulting list, lazy evaluation won't save any computation. It will only delay it until you need each particular element, but at the end, you have to compute the same thing.
If you traverse the concatenated list xs ++ ys, accessing each element of the first part (xs) adds a little constant overhead, checking if xs was spent or not.
So, it makes a big difference if you associate ++ to the left or to the right.
If you associate n lists of length k to the left like (..(xs1 ++ xs2) ... ) ++ xsn then accessing each of the first k elements will take O(n) time, accessing each of the next k ones will take O(n-1) etc. So traversing the whole list will take O(k n^2). You can check that
sum $ foldl (++) [] (replicate 100000 [1])
takes really long.
If you associate n lists of length k to the right like xs1 ++ ( ..(xsn_1 ++ xsn) .. ) then you'll get only constant overhead for each element, so traversing the whole list will be only O(k n). You can check that
sum $ foldr (++) [] (replicate 100000 [1])
is quite reasonable.
Edit: This is just the magic hidden behind ShowS. If you convert each string xs to showString xs :: String -> String (showString is just an alias for (++)) and compose these functions, then no matter how you associate their composition, at the end they will be applied from right to left - just what we need to get the linear time complexity. (This is simply because (f . g) x is f (g x).)
You can check that both
length $ (foldl (.) id (replicate 1000000 (showString "x"))) ""
and
length $ (foldr (.) id (replicate 1000000 (showString "x"))) ""
run in a reasonable time (foldr is a bit faster because it has less overhead when composing functions from the right, but both are linear in the number of elements).
It's not too expensive on its own, the problem arises when you start combining a whole lot of ++ from left to right: such a chain is evaluated like
( ([1,2] ++ [3,4]) ++ [5,6] ) ++ [7,8]
≡ let a = ([1,2] ++ [3,4]) ++ [5,6]
≡ let b = [1,2] ++ [3,4]
≡ let c = [1,2]
in head c : tail c ++ [3,4]
≡ 1 : [2] ++ [3,4]
≡ 1 : 2 : [] ++ [3,4]
≡ 1 : 2 : [3,4]
≡ [1,2,3,4]
in head b : tail b ++ [5,6]
≡ 1 : [2,3,4] ++ [5,6]
≡ 1:2 : [3,4] ++ [5,6]
≡ 1:2:3 : [4] ++ [5,6]
≡ 1:2:3:4 : [] ++ [5,6]
≡ 1:2:3:4:[5,6]
≡ [1,2,3,4,5,6]
in head a : tail a ++ [7,8]
≡ 1 : [2,3,4,5,6] ++ [7,8]
≡ 1:2 : [3,4,5,6] ++ [7,8]
≡ 1:2:3 : [4,5,6] ++ [7,8]
≡ 1:2:3:4 : [5,6] ++ [7,8]
≡ 1:2:3:4:5 : [6] ++ [7,8]
≡ 1:2:3:4:5:6 : [] ++ [7,8]
≡ 1:2:3:4:5:6 : [7,8]
≡ [1,2,3,4,5,6,7,8]
where you clearly see the quadratic complexity. Even if you only want to evaluate up to the n-th element, you still have to dig your way through all those lets. That's why ++ is infixr, for [1,2] ++ ( [3,4] ++ ([5,6] ++ [7,8]) ) is actually much more efficient. But if you're not careful while designing, say, a simple serialiser, you may easily end up with a chain like the one above. This is the main reason why beginners are warned about ++.
That aside, Prelude.++ is slow compared to e.g. Bytestring operations for the simple reason that it works by traversing linked lists, which have always suboptimal cache usage etc., but that's not as problematic; this prevents you from achieving C-like performance but properly written programs using only plain lists and ++ can still easily rival similar programs written in e.g. Python.
I would like to add one thing or two to Petr's answer.
As he pointed out, repeatedly appending lists at the beginning is quite cheap, while appending to the bottom is not. This is true as long as you use haskell's lists.
However, there are certain circumstances in which you HAVE TO append to the end (e.g., you are building a string to be printed out). With regular lists you have to deal with the quadratic complexity mentioned by his answer, but there's a way better solution in these cases: difference lists (see also my question on the topic).
Long story short, by describing lists as compositions of functions instead of concatenation of shorter lists you are able to append lists or individual elements at the beginning or at the end of your difference list by composing functions, in constant time. Once you're done, you can extract a regular list in linear time (in the number of elements).

Resources