How lazy is Haskell's `++`? - string

I'm curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.
import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x
I'd imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn't actually need to bother tracking the indices myself to avoid excess copying, right?
I'm curious how the ++ impact the optimization though. I'd imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?

xs ++ ys adds some overhead in all the list cells from xs, but once it reaches the end of xs it's free — it just returns ys.
Looking at the definition of (++) helps to see why:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
i.e., it has to "re-build" the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.
The key thing to realise is that appending isn't done all at once; a new linked list is incrementally built by first walking through all of xs, and then putting ys where the [] would go.
So, you don't have to worry about reaching the end of b and suddenly incurring the one-time cost of "appending" a to it; the cost is spread out over all the elements of b.
Vectors are a different matter entirely; they're strict in their structure, so even examining just the first element of xs V.++ ys incurs the entire overhead of allocating a new vector and copying xs and ys to it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you'd have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as [Vector a] or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you're more interested in mutable vectors.

Try
minimumrotation :: Ord a => [a] -> [a]
minimumrotation xs = minimum . take len . map (take len) $ tails (cycle xs)
where
len = length xs
I expect that to be faster than what you have, though index-juggling on an unboxed Vector or UArray would probably be still faster. But, is it really a bottleneck?

If you're interested in fast concatenation and a fast splitAt, use Data.Sequence.
I've made some stylistic modifications to your code, to make it look more like idiomatic Haskell, but the logic is exactly the same, except for a few conversions to and from Seq:
import qualified Data.Sequence as S
import qualified Data.Foldable as F
minimumRotation :: Ord a => [a] -> [a]
minimumRotation xs = F.toList
. F.minimum
. fmap (`swapAt` xs')
. S.elemIndicesL (F.minimum xs')
$ xs'
where xs' = S.fromList xs
swapAt n = f . S.splitAt n
where f (a,b) = b S.>< a

Related

Fast powerset implementation with complement set

I would like to have a function
powersetWithComplements :: [a] -> [([a], [a])]
Such that for example:
powersetWithComplements [1,2,3] = [([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]
It is easy to obtain some implementation, for example
powerset :: [a] -> [[a]]
powerset = filterM (const [False, True])
powersetWithComplements s = let p = powerset s in zip p (reverse p)
Or
powersetWithComplements s = [ (x, s \\ x) | x <- powerset s]
But I estimate that the performance of both these would be really poor. What would be an optimal approach? It is possible to use different data structure than the [] list.
Well you should see a powerset like this: you enumerate over the items of the set, and you decide whether you put these in the "selection" (first item of the tuple), or not (second item of the tuple). By enumerating over these selections exhaustively, we get the powerset.
So we can do the same, for instance using recursion:
import Control.Arrow(first, second)
powersetWithComplements [] = [([],[])]
powersetWithComplements (x:xs) = map (second (x:)) rec ++ map (first (x:)) rec
where rec = powersetWithComplements xs
So here the map (second (x:) prepends all the second items of the tuples of the rec with x, and the map (second (x:) does the same for the first item of the tuples of rec. where rec is the recursion on the tail of the items.
Prelude Control.Arrow> powersetWithComplements [1,2,3]
[([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]
The advantage of this approach is that we do not generate a complement list for every list we generate: we concurrently build the selection, and complement. Furthermore we can reuse the lists we construct in the recursion, which will reduce the memory footprint.
In both time complexity and memory complexity, the powersetWithComplements function will be equal (note that this is complexity, of course in terms of processing time it will require more time, since we do an extra amount of work) like the powerset function, since prepending a list is usually done in O(1)), and we now build two lists (and a tuple) for every original list.
Since you are looking for a "fast" implementation, I thought I would share some benchmark experiments I did with Willem's solution.
I thought using a DList instead of a plain list would be a big improvement, since DLists have constant-time append, whereas appending lists is linear in the size of the left argument.
psetDL :: [a] -> [([a],[a])]
psetDL = toList . go
where
go [] = DList.singleton ([],[])
go (x:xs) = (second (x:) <$> rec) <> (first (x:) <$> rec)
where
rec = go xs
But that did not have a significant effect.
I suspected this is because we are traversing both sublists anyway because of the fmap (<$>). We can avoid the traversal by doing something similar to CPS-converting the function, passing down the accumulated sets as parameters rather than returning them.
psetTail :: [a] -> [([a],[a])]
psetTail = go [] []
where
go a b [] = [(a,b)]
go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs
This yielded a 220% improvement on a list of size 20. Now since we aren't traversing the lists from fmapping, we can get rid of the append traversal by using a DList:
psetTailDL :: [a] -> [([a],[a])]
psetTailDL = toList . go [] []
where
go a b [] = DList.singleton (a,b)
go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs
Which yields an additional 20% improvement.
I guess the best is inspired by your reverse discovery
partitions s=filterM(const[False,True])s
`zip`filterM(const[True,False])s
rather than a likely stackoverflower
partitions[]=[([],[])]
partitions(x:xs)=[p|(f,t)<-partitions xs,p<-[(l,x:r),(x:l,r)]]
or a space-and-time-efficient finite list indexer
import Data.Array
import Data.Bits
import Data.List
partitions s=[(map(a!)f,map(a!)t)
|n<-[length s],a<-[listArray(0,n-1)s],
m<-[0..2^n-1],(f,t)<-[partition(testBit m)[0..n-1]]]

How does GHC know how to cache one function but not the others?

I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):
import qualified Data.Set as Set
import qualified Data.List as List
-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys =
foldl (\acc x -> if x == y then True else acc) False ys
-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys =
head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys
-- Then I discovered `nub` which seems to be highly optimized:
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys =
head $ List.nub $ filter (==True) $ map (==y) ys
I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:
3 `elem1` [1..1000000] -- => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] -- => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] -- => (0.01 secs, 77,272 bytes)
I then tried to do the same on a (much) bigger list:
3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]
elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?
Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.
No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.
Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.
This is because foldris defined as:
foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))
whereas foldl is implemented as:
foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3
Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.
If we implement the foldl and foldr version, we get:
y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys
We then get:
Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)
Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.
The performance is almost identical:
Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)
In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.
Explanation
foldl goes to the innermost element of the list, applies the computation, and does so again recursively to the result and the next innermost value of the list, and so on.
foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
So in order to produce the result, it has to traverse all the list.
Conversely, in your function elem3 as everything is lazy, nothing gets computed at all, until you call head.
But in order to compute that value, you just the first value of the (filtered) list, so you just need to go as far as 3 is encountered in your big list. which is very soon, so the list is not traversed. if you asked for the 1000000th element, eleme3 would probably perform as badly as the other ones.
Lazyness
Lazyness ensure that your language is always composable : breaking a function into subfunction does not changes what is done.
What you are seeing can lead to a space leak which is really about how control flow works in a lazy language. both in strict and in lazy, your code will decide what gets evaluated, but with a subtle difference :
In a strict language, the builder of the function will choose, as it forces evaluation of its arguments: whoever is called is in charge.
In a lazy language, the consumer of the function chooses. whoever called is in charge. It may choose to only evaluate the first element (by calling head), or every other element. All that provided its own caller choose to evaluate his own computation as well. there is a whole chain of command deciding what to do.
In that reading, your foldl based elem function uses that "inversion of control" in an essential way : elem gets asked to produce a value. foldl goes deep inside the list. if the first element if y then it return the trivial computation True. if not, it forwards the requests to the computation acc. In other words, what you read as values acc, x or even True, are really placeholders for computations, which you receive and yield back. And indeed, acc may be some unbelievably complex computation (or divergent one like undefined), as long as you transfer control to the computation True, your caller will never see the existence of acc.
foldr vs foldl vs foldl' VS speed
As suggested in another answer, foldr might best your intent on how to traverse the list, and will shield you away from space leaks (whereas foldl' will prevent space leaks as well if you really want to traverse the other way, which can lead to buildup of complex computations ... and can be very useful for circular computation for instance).
But the speed issue is really an algorithmic one. There might be better data structure for set membership if and only if you know beforehand that you have a certain pattern of usage.
For instance, it might be useful to pay some upfront cost to have a Set, then have fast membership queries, but that is only useful if you know that you will have such a pattern where you have a few sets and lots of queries to those sets. Other data structure are optimal for other patterns, and it's interesting to note that from a API/specification/interface point of view, they are usually the same to the consumer. That's a general phenomena in any languages, and why many people love abstract data types/modules in programming.
Using foldr and expecting to be faster really encodes the assumption that, given your static knowledge of your future access pattern, the values you are likely to test membership of will sit at the beginning. Using foldl would be fine if you expect your values to be at the end of it.
Note that using foldl, you might construct the entire list, you do not construct the values themselves, until you need it of course, for instance to test for equality, as long as you have not found the searched element.

Problems with enforcing strictness in haskell

If want to pretend that Haskell is strict and I have an algorithm in mind that does not exploit laziness (so for instance it does not use infinite lists), what problems can occur if I used only strict data types and annotated any function that I use, to be strict in its arguments? Will there be a performance penalty, if so how bad; can worse problems occur? I know it is dirty, pointless and ugly to mindlessly make every function and data type strict, and I do not intend to do so in practice but I only want to understand if by doing so, Haskell becomes strict by default?
Secondly, if I tone down the paranoia, and only make the data structures strict: will I have to worry about space leaks brought about by a lazy implementation only when I am using some form of accumulation? In other words, assume that the algorithm would not exhibit a space leak in a strict language. Also assume that I implemented it in Haskell using only strict data structures, but was careful to use seq to evaluate any variable that was being passed on in a recursion, or used functions which internally are careful to do that (like fold'), would I avoid any space leaks? Remember that I am assuming that in a strict language, the same algorithm does not lead to a space leak. So it is a question about the implementation difference between lazy and strict.
The reason I ask the second question is because apart from cases where one is trying to take advantage of laziness by using a lazy data structure, or a spine strict one, all the examples of space leaks that I have seen until now, only involve thunks developing in an accumulator because it was not the function that was recursively called did not evaluate the accumulator before applying itself on it. I am aware that if one wants to take advantage of laziness then one has to be extra careful, but that caution would be needed in a strict by default language too.
Thank you.
Laziness speeding things up
You could be worse off. The naive definition of ++ is:
xs ++ ys = case xs of (x:xs) -> x : (xs ++ ys)
[] -> ys
Laziness makes this O(1), though it may also add O(1) processing to extract the cons. Without laziness, the ++ needs to be evaluated immediately causing an O(n) operation. (If you've never seen the O(.) notation, it is something computer science has stolen from engineers: given a function f the set O( f(n) ) is the set of all algorithms which are eventually at-worst-proportional to f(n), where n is the number of bits of input fed to the function. [Formally, there exists a k and N such that for all n > N the algorithm takes time less than k * f(n).] So I'm saying that laziness makes the above operation O(1) or eventually constant-time, but adds a constant overhead to each extraction, whereas strictness makes the operation O(n) or eventually linear in the number of list elements, assuming that those elements have a fixed size.
There are some practical examples here but the O(1) added processing time can potentially also "stack up" into an O(n) dependency, so the most obvious examples are O(n2) both ways. Still there can be a difference in these examples. For example, one situation that doesn't work well is using a stack (last-in first-out, which is the style of Haskell lists) for a queue (first-in first-out).
So here's a quick library consisting of strict left-folds; I've used case statements so that each line can be pasted into GHCi (with a let):
data SL a = Nil | Cons a !(SL a) deriving (Ord, Eq, Show)
slfoldl' f acc xs = case xs of Nil -> acc; Cons x xs' -> let acc' = f acc x in acc' `seq` slfoldl' f acc' xs'
foldl' f acc xs = case xs of [] -> acc; x : xs' -> let acc' = f acc x in acc' `seq` foldl' f acc' xs'
slappend xs ys = case xs of Nil -> ys; Cons x xs' -> Cons x (slappend xs' ys)
sl_test n = foldr Cons Nil [1..n]
test n = [1..n]
sl_enqueue xs x = slappend xs (Cons x Nil)
sl_queue = slfoldl' sl_enqueue Nil
enqueue xs x = xs ++ [x]
queue = foldl' enqueue []
The trick here is that both queue and sl_queue follow the xs ++ [x] pattern to append an element to the end of the list, which takes a list and builds up an exact copy of that list. GHCi can then run some simple tests. First we make two items and force their thunks to prove that this operation itself is quite fast and not too prohibitively expensive in memory:
*Main> :set +s
*Main> let vec = test 10000; slvec = sl_test 10000
(0.02 secs, 0 bytes)
*Main> [foldl' (+) 0 vec, slfoldl' (+) 0 slvec]
[50005000,50005000]
(0.02 secs, 8604632 bytes)
Now we do the actual tests: summing the queue-versions:
*Main> slfoldl' (+) 0 $ sl_queue slvec
50005000
(22.67 secs, 13427484144 bytes)
*Main> foldl' (+) 0 $ queue vec
50005000
(1.90 secs, 4442813784 bytes)
Notice that both of these suck in terms of memory-performance (the list-append stuff is still secretly O(n2)) where they eventually occupy gigabytes of space, but the strict version nevertheless occupies three times the space and takes ten times the time.
Sometimes the data structures should be changed
If you really want a strict queue, there are a couple options. One is finger trees as in Data.Sequence -- the viewr way they do things is a little complicated but works to get the rightmost elements. However that is a bit heavy and one common solution is O(1) amortized: define the structure
data Queue x = Queue !(SL x) !(SL x)
where the SL terms are the strict stacks above. Define a strict reverse, let's call it slreverse, the obvious way, then consider:
enqueue :: Queue x -> x -> Queue x
enqueue (Queue xs ys) el = Queue xs (Cons el ys)
dequeue :: Queue x -> Maybe (x, Queue x)
dequeue (Queue Nil Nil) = Nothing
dequeue (Queue Nil (Cons x xs)) = Just (x, Queue (slreverse xs) Nil)
dequeue (Queue (Cons x xs ys)) = Just (x, Queue xs ys)
This is "amortized O(1)": each time that a dequeue reverses the list, costing O(k) steps for some k, we ensure that we are creating a structure which won't have to pay these costs for k more steps.
Laziness hides errors
Another interesting point comes from the data/codata distinction, where data are finite structures traversed by recursion on subunits (that is, every data expression halts) while codata are the rest of the structures -- strict lists vs. streams. It turns out that when you properly make this distinction, there is no formal difference between strict data and lazy data -- the only formal difference between strict and lazy is how they handle terms within themselves which loop infinitely: strict will explore the loop and hence will also loop infinitely, while lazy will simply hand the infinite-loop onwards without descending into it.
As such you will find that x = slhead (Cons x undefined) will fail where head (x : undefined) succeeds. So you may "uncover" hidden infinite loops or bugs when you do this.
Caution when making "everything strict"
Not everything necessarily becomes strict when you use strict data structures in your language: notice that I made a point above to define strict foldl, not foldl, for both lists and strict-lists. Common data structures in Haskell will be lazy -- lists, tuples, stuff in popular libraries -- and explicit calls to seq still help when building up a complicated expression.

Long working of program that count Ints

I want to write program that takes array of Ints and length and returns array that consist in position i all elements, that equals i, for example
[0,0,0,1,3,5,3,2,2,4,4,4] 6 -> [[0,0,0],[1],[2,2],[3,3],[4,4,4],[5]]
[0,0,4] 7 -> [[0,0],[],[],[],[4],[],[]]
[] 3 -> [[],[],[]]
[2,2] 3 -> [[],[],[2,2]]
So, that's my solution
import Data.List
import Data.Function
f :: [Int] -> Int -> [[Int]]
f ls len = g 0 ls' [] where
ls' = group . sort $ ls
g :: Int -> [[Int]] -> [[Int]] -> [[Int]]
g val [] accum
| len == val = accum
| otherwise = g (val+1) [] (accum ++ [[]])
g val (x:xs) accum
| len == val = accum
| val == head x = g (val+1) xs (accum ++ [x])
| otherwise = g (val+1) (x:xs) (accum ++ [[]])
But query f [] 1000000 works really long, why?
I see we're accumulating over some data structure. I think foldMap. I ask "Which Monoid"? It's some kind of lists of accumulations. Like this
newtype Bunch x = Bunch {bunch :: [x]}
instance Semigroup x => Monoid (Bunch x) where
mempty = Bunch []
mappend (Bunch xss) (Bunch yss) = Bunch (glom xss yss) where
glom [] yss = yss
glom xss [] = xss
glom (xs : xss) (ys : yss) = (xs <> ys) : glom xss yss
Our underlying elements have some associative operator <>, and we can thus apply that operator pointwise to a pair of lists, just like zipWith does, except that when we run out of one of the lists, we don't truncate, rather we just take the other. Note that Bunch is a name I'm introducing for purposes of this answer, but it's not that unusual a thing to want. I'm sure I've used it before and will again.
If we can translate
0 -> Bunch [[0]] -- single 0 in place 0
1 -> Bunch [[],[1]] -- single 1 in place 1
2 -> Bunch [[],[],[2]] -- single 2 in place 2
3 -> Bunch [[],[],[],[3]] -- single 3 in place 3
...
and foldMap across the input, then we'll get the right number of each in each place. There should be no need for an upper bound on the numbers in the input to get a sensible output, as long as you are willing to interpret [] as "the rest is silence". Otherwise, like Procrustes, you can pad or chop to the length you need.
Note, by the way, that when mappend's first argument comes from our translation, we do a bunch of ([]++) operations, a.k.a. ids, then a single ([i]++), a.k.a. (i:), so if foldMap is right-nested (which it is for lists), then we will always be doing cheap operations at the left end of our lists.
Now, as the question works with lists, we might want to introduce the Bunch structure only when it's useful. That's what Control.Newtype is for. We just need to tell it about Bunch.
instance Newtype (Bunch x) [x] where
pack = Bunch
unpack = bunch
And then it's
groupInts :: [Int] -> [[Int]]
groupInts = ala' Bunch foldMap (basis !!) where
basis = ala' Bunch foldMap id [iterate ([]:) [], [[[i]] | i <- [0..]]]
What? Well, without going to town on what ala' is in general, its impact here is as follows:
ala' Bunch foldMap f = bunch . foldMap (Bunch . f)
meaning that, although f is a function to lists, we accumulate as if f were a function to Bunches: the role of ala' is to insert the correct pack and unpack operations to make that just happen.
We need (basis !!) :: Int -> [[Int]] to be our translation. Hence basis :: [[[Int]]] is the list of images of our translation, computed on demand at most once each (i.e., the translation, memoized).
For this basis, observe that we need these two infinite lists
[ [] [ [[0]]
, [[]] , [[1]]
, [[],[]] , [[2]]
, [[],[],[]] , [[3]]
... ...
combined Bunchwise. As both lists have the same length (infinity), I could also have written
basis = zipWith (++) (iterate ([]:) []) [[[i]] | i <- [0..]]
but I thought it was worth observing that this also is an example of Bunch structure.
Of course, it's very nice when something like accumArray hands you exactly the sort of accumulation you need, neatly packaging a bunch of grungy behind-the-scenes mutation. But the general recipe for an accumulation is to think "What's the Monoid?" and "What do I do with each element?". That's what foldMap asks you.
The (++) operator copies the left-hand list. For this reason, adding to the beginning of a list is quite fast, but adding to the end of a list is very slow.
In summary, avoid adding things to the end of a list. Try to always add to the beginning instead. One simple way to do that is to build the list backwards, and then reverse it at the end. A more devious trick is to use "difference lists" (Google it). Another possibility is to use Data.Sequence rather than a list.
The first thing that should be noted is the most obvious way to implement this is use a data structure that allows random access, an array is an obviously choice. Note that you need to add the elements to the array multiple times and somehow "join them".
accumArray is perfect for this.
So we get:
f l i = elems $ accumArray (\l e -> e:l) [] (0,i-1) (map (\e -> (e,e)) l)
And we're good to go (see full code here).
This approach does involve converting the final array back into a list, but that step is very likely faster than say sorting the list, which often involves scanning the list at least a few times for a list of decent size.
Whenever you use ++ you have to recreate the entire list, since lists are immutable.
A simple solution would be to use :, but that builds a reversed list. However that can be fixed using reverse, which results in only building two lists (instead of 1 million in your case).
Your concept of glomming things onto an accumulator is a very useful one, and both MathematicalOrchid and Guvante show how you can use that concept reasonably efficiently. But in this case, there is a simpler approach that is likely also faster. You started with
group . sort $ ls
and this was a very good place to start! You get a list that's almost the one you want, except that you need to fill in some blanks. How can we figure those out? The simplest way, though probably not quite the most efficient, is to work with a list of all the numbers you want to count up to: [0 .. len-1].
So we start with
f ls len = g [0 .. len-1] (group . sort $ ls)
where
?
How do we define g? By pattern matching!
f ls len = g [0 .. len-1] (group . sort $ ls)
where
-- We may or may not have some lists left,
-- but we counted as high as we decided we
-- would
g [] _ = []
-- We have no lists left, so the rest of the
-- numbers are not represented
g ns [] = map (const []) ns
-- This shouldn't be possible, because group
-- doesn't make empty lists.
g _ ([]:_) = error "group isn't working!"
-- Finally, we have some work to do!
g (n:ns) xls#(xl#(x:_):xls')
| n == x = xl : g ns xls'
| otherwise = [] : g ns xls
That was nice, but making the list of numbers isn't free, so you might be wondering how you can optimize it. One method I invite you to try is using your original technique of keeping a separate counter, but following this same sort of structure.

How do you efficiently find a union of a list of lists of values in haskell?

Since a code example is worth a thousand words I'll start with that:
testList = [1,2,2,3,4,5]
testSet = map sumMapper $ tails testList
where sumMapper [] = []
sumMapper (a:b) = sumMap a b
sumMap a b = map (+ a) b
This code takes a list and adds up all the elements to get the sum of all of them (I'd also be interested in efficiency of this). The output of testSet is:
[[3,3,4,5,6],[4,5,6,7],[5,6,7],[7,8],[9],[],[]]
I would like to find the union of these lists (to make it into a set) but I feel that:
whatIWant = foldl1 union testSet
will have bad performance (the real lists will be thousands of elements long).
Is this the correct solution or am I missing something obvious?
You might want to try
nub $ concat theListOfLists
In the version using union, the code to cut out duplicates will get run many times. Here it only is run once.
It will only execute the code to pull out the unique values once.
There is also a Data.Set library, you could alternatively use
import Data.Set
S.fromList $ concat theListOfLists
The important point is that the code (here and above) that pulls out duplicates only gets run on the full list once, rather than over and over again.
edit- Rein mentions below that nub is O(n^2), so you should avoid the first solution above in favor of something O(n log n), as Data.Set.fromList should be. As others have mentioned in the comments, you need something that enforces Ord a to get the proper complexity O(n log n), and Data.Set does, nub does not.
I will leave the two solutions (poor performance and good performance) because I think the resulting discussion was useful.
If you're using elements that are members of the Ord typeclass, as in your example, you can use Data.Set:
import qualified Data.Set as Set
whatYouWant = foldl' (Set.union . Set.fromList) Set.empty testSet
This has the advantage of taking space proportional to the size of the largest sublist rather than to the size of the entire concatenated list as does the Set.fromList . concat solution. The strict foldl' also prevents buildup of unevaluated thunks, preventing O(n) stack and heap space usage.
Generally speaking, an Ord constraint allows more efficient algorithms than an Eq constraint because it allows you to build a tree. This is also the reason that nub is O(n^2): the more efficient algorithm requires Ord rather than just Eq.
Since union is an associative operation (a+(b+c)==(a+b)+c), you can use tree-shaped folding for a logarithmic advantage in time complexity:
_U [] = []
_U (xs:t) = union xs (_U (pairs t))
pairs (xs:ys:t) = union xs ys : pairs t
pairs t = t
Of course Data.List.union itself is O(n2) in general, but if your testList is ordered non-decreasing, all the lists will be too, and you can use a linear ordUnion instead of the union, for a solution which is linearithmic overall and shouldn't leak space:
ordUnion :: (Ord a) => [a] -> [a] -> [a]
ordUnion a [] = a
ordUnion [] b = b
ordUnion (x:xs) (y:ys) = case compare x y of
LT -> x : ordUnion xs (y:ys)
EQ -> x : ordUnion xs ys
GT -> y : ordUnion (x:xs) ys
To prevent duplicates which might slip through, one more function is needed to process _U's output—a linear ordNub :: (Ord a) => [a] -> [a], with an obvious implementation.
Using the left-preferential (\(x:xs) ys -> x:ordUnion xs ys) could be even more productive overall (force smaller portions of the input at each given moment):
g testList = ordNub . _U $ [map (+ a) b | (a:b) <- tails testList]
where
_U [] = []
_U ((x:xs):t) = x : ordUnion xs (_U (pairs t))
pairs ((x:xs):ys:t) = (x : ordUnion xs ys) : pairs t
pairs t = t
see also:
data-ordlist package
even less forcing "implicit heap" by apfelmus
Tree-like folds

Resources