Efficient version of 'inits' - haskell

That is, inits "abc" == ["","a","ab","abc"]
There is a standard version of inits in Data.List, but below I have written a version myself:
myInits = f id
where
f start (l:ls) = (start []):(f (start . (l:)) ls)
f start [] = [(start [])]
Whilst my version is quite a bit simpler than the standard version, I suspect it's not as good for efficiency reasons. I suspect that when myInits l is fully evaluated it takes O(n^2) space. Take for example, myTails, an implementation of tails:
myTails a#(_:ls) = a:(myTails ls)
myTails [] = [[]]
Which is almost exactly the same as the standard version and I suspect achieves O(n) space by reusing the tails of the lists.
Could someone explain:
Why my version of inits is bad.
Why another approach is better (either the standard one in Data.List or your own).

Your myInits uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits is O(n^2) requiring O(n^2) space. Fully evaluating inits also requires O(n^2) running time and space. Any version of inits will require O(n^2) space; lists built with : and [] can only share their tails, and there are no tails in common among the results of inits. The version of inits in Data.List uses an amortized time O(1) queue, much like the simpler queue described in the second half of a related answer. The Snoc referenced in the source code in Data.List is word play on Cons (another name for :) backwards - being able to append an item to the end of the list.
Briefly experimenting with these functions suggests myInits performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000 yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits to inits.
Strictness
There is one big difference between myInits and inits and between myTails and tails. They have different meanings when applied to undefined or _|_ (pronounced "bottom", another symbol we use for undefined).
inits has the strictness property inits (xs ++ _|_) = inits xs ++ _|_, which, when xs is the empty list [] says that inits will still yield at least one result when applied to undefined
inits (xs ++ _|_) = inits xs ++ _|_
inits ([] ++ _|_) = inits [] ++ _|_
inits _|_ = [[]] ++ _|_
inits _|_ = [] : _|_
We can see this experimentally
> head . inits $ undefined
[]
myInits does not have this property either for the empty list or for longer lists.
> head $ myInits undefined
*** Exception: Prelude.undefined
> take 3 $ myInits ([1,2] ++ undefined)
[[],[1]*** Exception: Prelude.undefined
We can fix this if we realize that f in myInits would yield start [] in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.
myInits' = f id
where
f start list = (start []):
case list of
(l:ls) -> f (start . (l:)) ls
[] -> []
This increase in laziness makes myInits' work just like inits.
> head $ myInits' undefined
[]
> take 3 $ myInits' ([1,2] ++ undefined)
[[],[1],[1,2]]
Similarly, the difference between your myTails and tails in Data.List is that tails yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_, but it actually obeys a much stronger rule that's hard to describe easily.

The prefix-functions building can be separated from their reification as actual lists:
diffInits = map ($ []) . scanl (\a x -> a . (x:)) id
This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):
diffInits _|_ == [] : _|_
diffInits (xs ++ _|_) == diffInits xs ++ _|_

Related

Why does function concat use foldr? Why not foldl'

In most resources it is recommended to use foldl', but that cause of using foldr in concat instead of foldl'?
EDIT I talk about laziness and productivity in this answer, and in my excitement I forgot a very important point that jpmariner focuses on in their answer: left-associating (++) is quadratic time!
foldl' is appropriate when your accumulator is a strict type, like most small types such as Int, or even large spine-strict data structures like Data.Map. If the accumulator is strict, then the entire list must be consumed before any output can be given. foldl' uses tail recursion to avoid blowing up the stack in these cases, but foldr doesn't and will perform badly. On the other hand, foldl' must consume the entire list in this way.
foldl f z [] = z
foldl f z [1] = f z 1
foldl f z [1,2] = f (f z 1) 2
foldl f z [1,2,3] = f (f (f z 1) 2) 3
The final element of the list is required to evaluate the outermost application, so there is no way to partially consume the list. If we expand this with (++), we will see:
foldl (++) [] [[1,2],[3,4],[5,6]]
= (([] ++ [1,2]) ++ [3,4]) ++ [5,6]
^^
= ([1,2] ++ [3,4]) ++ [5,6]
= ((1 : [2]) ++ [3,4]) ++ [5,6]
^^
= (1 : ([2] ++ [3,4])) ++ [5,6]
^^
= 1 : (([2] ++ [3,4]) ++ [5,6])
(I admit this looks a little magical if you don't have a good feel for cons lists; it's worth getting dirty with the details though)
See how we have to evaluate every (++) (marked with ^^ when they are evaluated) on the way down before before the 1 bubbles out to the front? The 1 is "hiding" under function applications until then.
foldr, on the other hand, is good for non-strict accumulators like lists, because it allows the accumulator to yield information before the entire list is consumed, which can bring many classically linear-space algorithms down to constant space! This also means that if your list is infinite, foldr is your only choice, unless your goal is to heat your room using your CPU.
foldr f z [] = z
foldr f z [1] = f 1 z
foldr f z [1,2] = f 1 (f 2 z)
foldr f z [1,2,3] = f 1 (f 2 (f 3 z))
foldr f z [1..] = f 1 (f 2 (f 3 (f 4 (f 5 ...
We have no trouble expressing the outermost applications without having to see the entire list. Expanding foldr the same way we did foldl:
foldr (++) z [[1,2],[3,4],[5,6]]
= [1,2] ++ ([3,4] ++ ([5,6] ++ []))
= (1 : [2]) ++ (3,4] ++ ([5,6] ++ []))
^^
= 1 : ([2] ++ ([3,4] ++ ([5,6] ++ [])))
1 is yielded immediately without having to evaluate any of the (++)s but the first one. Because none of those (++)s are evaluated, and Haskell is lazy, they don't even have to be generated until more of the output list is consumed, meaning concat can run in constant space for a function like this
concat [ [1..n] | n <- [1..] ]
which in a strict language would require intermediate lists of arbitrary length.
If these reductions look a little too magical, and if you want to go deeper, I suggest examining the source of (++) and doing some simple manual reductions against its definition to get a feel for it. (Just remember [1,2,3,4] is notation for 1 : (2 : (3 : (4 : [])))).
In general, the following seems to be a strong rule of thumb for efficiency: use foldl' when your accumulator is a strict data structure, and foldr when it's not. And if you see a friend using regular foldl and don't stop them, what kind of friend are you?
cause of using foldr in concat instead of foldl' ?
What if the result gets fully evaluated ?
If you consider [1,2,3] ++ [6,7,8] within an imperative programming mindset, all you have to do is redirect the next pointer at node 3 towards node 6, assuming of course you may alter your left side operand.
This being Haskell, you may NOT alter your left side operand, unless the optimizer is able to prove that ++ is the sole user of its left side operand.
Short of such a proof, other Haskell expressions pointing to node 1 have every right to assume that node 1 is forever at the beginning of a list of length 3. In Haskell, the properties of a pure expression cannot be altered during its lifetime.
So, in the general case, operator ++ has to do its job by duplicating its left side operand, and the duplicate of node 3 may then be set to point to node 6. On the other hand, the right side operand can be taken as is.
So if you fold the concat expression starting from the right, each component of the concatenation must be duplicated exactly once. But if you fold the expression starting from the left, you are facing a lot of repetitive duplication work.
Let's try to check that quantitatively. To ensure that no optimizer will get in the way by proving anything, we'll just use the ghci interpreter. Its strong point is interactivity not optimization.
So let's introduce the various candidates to ghci, and switch statistics mode on:
$ ghci
λ>
λ> myConcat0 = L.foldr (++) []
λ> myConcat1 = L.foldl (++) []
λ> myConcat2 = L.foldl' (++) []
λ>
λ> :set +s
λ>
We'll force full evaluation by using lists of numbers and printing their sum.
First, let's get baseline performance by folding from the right:
λ>
λ> sum $ concat [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,104 bytes)
λ>
λ> sum $ myConcat0 [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,144 bytes)
λ>
Second, let's fold from the left, to see whether that improves matters or not.
λ>
λ> sum $ myConcat1 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.26 secs, 4,296,646,240 bytes)
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.28 secs, 4,295,918,560 bytes)
λ>
So folding from the left allocates much more transient memory and takes much more time, probably because of this repetitive duplication work.
As a last check, let's double the problem size:
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..20000::Integer] ]
200010000
(5.91 secs, 17,514,447,616 bytes)
λ>
We see that doubling the problem size causes the resource consumptions to get multiplied by about 4. Folding from the left has quadratic cost in the case of concat.
Looking at the excellent answer by luqui, we see that both concerns:
the need to be able to access the beginning of the result list lazily
the need to avoid quadratic cost for full evaluation
happen to vote both in the same way, that is in favor of folding from the right.
Hence the Haskell library concat function using foldr.
Addendum:
After running some tests using GHC v8.6.5 with -O3 option instead of ghci, it appears that my preconceived idea of the optimizer messing up with the measurements was erroneous.
Even with -O3, for a problem size of 20,000, the foldr-based concat function is about 500 times faster that the foldl'-based one.
So either the optimizer fails to prove that it is OK to alter/reuse the left operand, or it just does not try at all.

Haskell Cycle function

The code for cycle is as follows
cycle :: [a] -> [a]
cycle [] = errorEmptyList "cycle"
cycle xs = xs' where xs' = xs ++ xs'
I would appreciate an explanation on how the last line works. I feel that it would go off into an infinite recursion without returning or shall I say without printing anything on the screen. I guess my intuition is wrong.
A list, like basically everything else in Haskell, is lazily evaluated.
Roughly, to make an OOP analogy, you can think of a list as a sort of "iterator object". When queried, it reports on whether there is a next element, and if so, what is such element and what is the tail of the list (this being another "iterator object").
A list defined as
xs = 1 : xs
does not cause non termination. It corresponds to an "iterator object" o that, when queried, answers: "the next element is 1, and the rest of the list can be queried using o". Basically, it returns itself.
This is no different than a list having as a tail a "pointer" to the list itself: a circular list. This takes a constant amount of space.
Appending with ++ works the same:
xs = [1] ++ xs
is identical to the previous list.
In your code, the part
where xs' = xs ++ xs'
crafts a list that starts with xs and then continues with the list itself xs'. Operationally, it is an "iterator object" o that returns, one by one, the elements of xs, and when the last element of xs is returned, it is paired with "you can query the rest of the list at o". Again, a back-pointer, which builds a sort of circular list.
Let's take out the last line separately:
cycle xs = xs' where xs' = xs ++ xs'
Now, let's try to reduce it:
cycle xs = xs ++ (xs ++ (xs ++ (xs ++ ...)))
You can see that it expands infinitely. But note that this is not how expressions get reduced in Haskell. Expressions will be reduced to WHNF when it's demanded. So, let's demand some values from cycle function:
ghci > take 1 $ cycle [1..]
[1]
This is how take function is implemented:
take n _ | n <= 0 = []
take _ [] = []
take n (x:xs) = x : take (n-1) xs
Now, because of pattern matching - the value n will be evaluated first. Since it is already in normal form, no further reduction needs to be done and it will be checked if it is less than or equal to zero. Since the condition fails, it will move on to the second condition. Here, it's second argument will be checked to see if it's equal to []. As usual, haskell will evaluate it to WHNF which will be 1:_. Here _ represents thunk. Now the whole expression will be reduced to 1:take 0 _. Since this value has to be printed in ghci, the whole 1:take 0 _ will be reduced again. Following the similar steps like above again, we will get 1:[] which reduces to [1].
Hence cycle [1,2,3] will get reduced to WHNF in the form (1:xs) and will be ultimately reduced to [1]. But if the cycle function, itself is strict in it's implementation, then it will just go into an infinite loop:
cycle :: NFData a => [a] -> [a]
cycle [] = []
cycle xs = let xs' = xs ++ xs'
in deepseq xs xs'
You can test that in ghci:
ghci > take 1 $ cycle [1..]
^CInterrupted.

Avoiding ++ in Haskell

In this answer on CodeReview, the question asker and the answerer both seem to show disdain for the (++) operator. Is this due to it's speed (causing the algorithm to explicitly run in O(n^2) where n is the length of the list iirc)? Is this a pre-optimization if not otherwise tested, as Haskell is known for being difficult to reason about time complexity? Should others avoid the (++) operator in their programs?
It depends.
Consider the expression
foldl (++) [] list
This expression concatenates a list of lists into a single list, but has aforementioned quadratic complexity. This happens because the implementation of (++) traverses the entire left list and prepends each element to the right list (while keeping the correct order of course).
Using a right fold, we get linear complexity:
foldr (++) [] list
This is due to the (++) operator's implementation, which traverses only the left argument and prepends it to the right.
[1,2] ++ [3,4] ++ [5,6]
is equal to
-- Example as created by foldr
[1,2] ++ ([3,4] ++ [5,6])
== [1,2] ++ [3,4,5,6]
== [1,2,3,4,5,6] -- all good, no element traversed more than once
which only traverses each list element once.
Now switching the parentheses around to the first two lists is more expensive, since now some elements are traversed multiple times, which is inefficient.
-- Example as created by foldl
([1,2] ++ [3,4]) ++ [5,6]
== [1,2,3,4] ++ [5,6]
== [1,2,3,4,5,6] -- the sublist [1,2] was traversed twice due to the ordering of the appends
All in all, watch out for such cases and you should be fine.

What are the space complexities of inits and tails?

TL; DR
After reading the passage about persistence in Okasaki's Purely Functional Data Structures and going over his illustrative examples about singly linked lists (which is how Haskell's lists are implemented), I was left wondering about the space complexities of Data.List's inits and tails...
It seems to me that
the space complexity of tails is linear in the length of its argument, and
the space complexity of inits is quadratic in the length of its argument,
but a simple benchmark indicates otherwise.
Rationale
With tails, the original list can be shared. Computing tails xs simply consists in walking along list xs and creating a new pointer to each element of that list; no need to recreate part of xs in memory.
In contrast, because each element of inits xs "ends in a different way", there can be no such sharing, and all the possible prefixes of xs must be recreated from scratch in memory.
Benchmark
The simple benchmark below shows there isn't much of a difference in memory allocation between the two functions:
-- Main.hs
import Data.List (inits, tails)
main = do
let intRange = [1 .. 10 ^ 4] :: [Int]
print $ sum intRange
print $ fInits intRange
print $ fTails intRange
fInits :: [Int] -> Int
fInits = sum . map sum . inits
fTails :: [Int] -> Int
fTails = sum . map sum . tails
After compiling my Main.hs file with
ghc -prof -fprof-auto -O2 -rtsopts Main.hs
and running
./Main +RTS -p
the Main.prof file reports the following:
COST CENTRE MODULE %time %alloc
fInits Main 60.1 64.9
fTails Main 39.9 35.0
The memory allocated for fInits and that allocated for fTails have the same order of magnitude... Hum...
What is going on?
Are my conclusions about the space complexities of tails (linear) and inits (quadratic) correct?
If so, why does GHC allocate roughly as much memory for fInits and fTails? Does list fusion have something to do with this?
Or is my benchmark flawed?
The implementation of inits in the Haskell Report, which is identical to or nearly identical to implementations used up to base 4.7.0.1 (GHC 7.8.3) is horribly slow. In particular, the fmap applications stack up recursively, so forcing successive elements of the result gets slower and slower.
inits [1,2,3,4] = [] : fmap (1:) (inits [2,3,4])
= [] : fmap (1:) ([] : fmap (2:) (inits [3,4]))
= [] : [1] : fmap (1:) (fmap (2:) ([] : fmap (3:) (inits [4])))
....
The simplest asymptotically optimal implementation, explored by Bertram Felgenhauer, is based on applying take with successively larger arguments:
inits xs = [] : go (1 :: Int) xs where
go !l (_:ls) = take l xs : go (l+1) ls
go _ [] = []
Felgenhauer was able to eke some extra performance out of this using a private, non-fusing version of take, but it was still not as fast as it could be.
The following very simple implementation is significantly faster in most cases:
inits = map reverse . scanl (flip (:)) []
In some weird corner cases (like map head . inits), this simple implementation is asymptotically non-optimal. I therefore wrote a version using the same technique, but based on Chris Okasaki's Banker's queues, that is both asymptotically optimal and nearly as fast. Joachim Breitner optimized it further, primarily by using a strict scanl' rather than the usual scanl, and this implementation got into GHC 7.8.4. inits can now produce the spine of the result in O(n) time; forcing the entire result requires O(n^2) time because none of the conses can be shared among the different initial segments. If you want really absurdly fast inits and tails, your best bet is to use Data.Sequence; Louis Wasserman's implementation is magical. Another possibility would be to use Data.Vector—it presumably uses slicing for such things.

How lazy is Haskell's `++`?

I'm curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.
import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x
I'd imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn't actually need to bother tracking the indices myself to avoid excess copying, right?
I'm curious how the ++ impact the optimization though. I'd imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?
xs ++ ys adds some overhead in all the list cells from xs, but once it reaches the end of xs it's free — it just returns ys.
Looking at the definition of (++) helps to see why:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
i.e., it has to "re-build" the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.
The key thing to realise is that appending isn't done all at once; a new linked list is incrementally built by first walking through all of xs, and then putting ys where the [] would go.
So, you don't have to worry about reaching the end of b and suddenly incurring the one-time cost of "appending" a to it; the cost is spread out over all the elements of b.
Vectors are a different matter entirely; they're strict in their structure, so even examining just the first element of xs V.++ ys incurs the entire overhead of allocating a new vector and copying xs and ys to it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you'd have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as [Vector a] or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you're more interested in mutable vectors.
Try
minimumrotation :: Ord a => [a] -> [a]
minimumrotation xs = minimum . take len . map (take len) $ tails (cycle xs)
where
len = length xs
I expect that to be faster than what you have, though index-juggling on an unboxed Vector or UArray would probably be still faster. But, is it really a bottleneck?
If you're interested in fast concatenation and a fast splitAt, use Data.Sequence.
I've made some stylistic modifications to your code, to make it look more like idiomatic Haskell, but the logic is exactly the same, except for a few conversions to and from Seq:
import qualified Data.Sequence as S
import qualified Data.Foldable as F
minimumRotation :: Ord a => [a] -> [a]
minimumRotation xs = F.toList
. F.minimum
. fmap (`swapAt` xs')
. S.elemIndicesL (F.minimum xs')
$ xs'
where xs' = S.fromList xs
swapAt n = f . S.splitAt n
where f (a,b) = b S.>< a

Resources