Asymptotic runtime of list-to-tree function

Asymptotic runtime of list-to-tree function - haskell

I have a merge function which takes time O(log n) to combine two trees into one, and a listToTree function which converts an initial list of elements to singleton trees and repeatedly calls merge on each successive pair of trees until only one tree remains.
Function signatures and relevant implementations are as follows:
merge :: Tree a -> Tree a -> Tree a --// O(log n) where n is size of input trees
singleton :: a -> Tree a --// O(1)
empty :: Tree a --// O(1)
listToTree :: [a] -> Tree a --// Supposedly O(n)
listToTree = listToTreeR . (map singleton)
listToTreeR :: [Tree a] -> Tree a
listToTreeR [] = empty
listToTreeR (x:[]) = x
listToTreeR xs = listToTreeR (mergePairs xs)
mergePairs :: [Tree a] -> [Tree a]
mergePairs [] = []
mergePairs (x:[]) = [x]
mergePairs (x:y:xs) = merge x y : mergePairs xs
This is a slightly simplified version of exercise 3.3 in Purely Functional Data Structures by Chris Okasaki.
According to the exercise, I shall now show that listToTree takes O(n) time. Which I can't. :-(
There are trivially ceil(log n) recursive calls to listToTreeR, meaning ceil(log n) calls to mergePairs.
The running time of mergePairs is dependent on the length of the list, and the sizes of the trees. The length of the list is 2^h-1, and the sizes of the trees are log(n/(2^h)), where h=log n is the first recursive step, and h=1 is the last recursive step. Each call to mergePairs thus takes time (2^h-1) * log(n/(2^h))
I'm having trouble taking this analysis any further. Can anyone give me a hint in the right direction?

It's almost there. You already know the expression is
so the only problem is to evaluate this sum. Using log(AB) = log A + log B and log 2N = N we have
With help of calculators, we can find that X = O(2m) = O(n), which is expected.
(If you want to compute this yourself, search for "Geometric series", or approximate the sum using an integral.)

Related

Fast powerset implementation with complement set

I would like to have a function
powersetWithComplements :: [a] -> [([a], [a])]
Such that for example:
powersetWithComplements [1,2,3] = [([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]
It is easy to obtain some implementation, for example
powerset :: [a] -> [[a]]
powerset = filterM (const [False, True])
powersetWithComplements s = let p = powerset s in zip p (reverse p)
Or
powersetWithComplements s = [ (x, s \\ x) | x <- powerset s]
But I estimate that the performance of both these would be really poor. What would be an optimal approach? It is possible to use different data structure than the [] list.

Well you should see a powerset like this: you enumerate over the items of the set, and you decide whether you put these in the "selection" (first item of the tuple), or not (second item of the tuple). By enumerating over these selections exhaustively, we get the powerset.
So we can do the same, for instance using recursion:
import Control.Arrow(first, second)
powersetWithComplements [] = [([],[])]
powersetWithComplements (x:xs) = map (second (x:)) rec ++ map (first (x:)) rec
where rec = powersetWithComplements xs
So here the map (second (x:) prepends all the second items of the tuples of the rec with x, and the map (second (x:) does the same for the first item of the tuples of rec. where rec is the recursion on the tail of the items.
Prelude Control.Arrow> powersetWithComplements [1,2,3]
[([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]
The advantage of this approach is that we do not generate a complement list for every list we generate: we concurrently build the selection, and complement. Furthermore we can reuse the lists we construct in the recursion, which will reduce the memory footprint.
In both time complexity and memory complexity, the powersetWithComplements function will be equal (note that this is complexity, of course in terms of processing time it will require more time, since we do an extra amount of work) like the powerset function, since prepending a list is usually done in O(1)), and we now build two lists (and a tuple) for every original list.

Since you are looking for a "fast" implementation, I thought I would share some benchmark experiments I did with Willem's solution.
I thought using a DList instead of a plain list would be a big improvement, since DLists have constant-time append, whereas appending lists is linear in the size of the left argument.
psetDL :: [a] -> [([a],[a])]
psetDL = toList . go
where
go [] = DList.singleton ([],[])
go (x:xs) = (second (x:) <$> rec) <> (first (x:) <$> rec)
where
rec = go xs
But that did not have a significant effect.
I suspected this is because we are traversing both sublists anyway because of the fmap (<$>). We can avoid the traversal by doing something similar to CPS-converting the function, passing down the accumulated sets as parameters rather than returning them.
psetTail :: [a] -> [([a],[a])]
psetTail = go [] []
where
go a b [] = [(a,b)]
go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs
This yielded a 220% improvement on a list of size 20. Now since we aren't traversing the lists from fmapping, we can get rid of the append traversal by using a DList:
psetTailDL :: [a] -> [([a],[a])]
psetTailDL = toList . go [] []
where
go a b [] = DList.singleton (a,b)
go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs
Which yields an additional 20% improvement.

I guess the best is inspired by your reverse discovery
partitions s=filterM(const[False,True])s
`zip`filterM(const[True,False])s
rather than a likely stackoverflower
partitions[]=[([],[])]
partitions(x:xs)=[p|(f,t)<-partitions xs,p<-[(l,x:r),(x:l,r)]]
or a space-and-time-efficient finite list indexer
import Data.Array
import Data.Bits
import Data.List
partitions s=[(map(a!)f,map(a!)t)
|n<-[length s],a<-[listArray(0,n-1)s],
m<-[0..2^n-1],(f,t)<-[partition(testBit m)[0..n-1]]]

How to efficiently generate all lists of length `n^2` containing `n` copies of every `x < n`?

Given an integer n, how can I build the list containing all lists of length n^2 containing exactly n copies of each integer x < n? For example, for n = 2, we have:
[0,0,1,1], [0,1,0,1], [1,0,0,1], [0,1,1,0], [1,0,1,0], [1,1,0,0]
This can be easily done combining permutations and nub:
f :: Int -> [[Int]]
f n = nub . permutations $ concatMap (replicate n) [0..n-1]
But that is way too inefficient. Is there any simple way to encode the efficient/direct algorithm?

Sure, it's not too hard. We'll start with a list of n copies of each number less than n, and repeatedly choose one to start our result with. First, a function for choosing an element from a list:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go l (h:r) = (l,h,r) : go (h:l) r
go _ [] = []
Now we'll write a function that produces all possible interleavings of some input lists. Internally we'll maintain the invariant that each [a] is non-empty; hence we'll have to establish that invariant before we start recursing. In fact, this will be wasted work in the way we intend to call this function, but for good abstraction we might as well handle all inputs correctly, right?
interleavings :: [[a]] -> [[a]]
interleavings = go . filter (not . null) where
go [] = [[]]
go xss = do
(xssl, x:xs, xssr) <- zippers xss
(x:) <$> interleavings ([xs | not (null xs)] ++ xssl ++ xssr)
And now we're basically done. All we have to do is feed in an appropriate starting list.
f :: Int -> [[Int]]
f n = interleavings (replicate n <$> [1..n])
Try it in ghci:
> f 2
[[1,1,2,2],[1,2,2,1],[1,2,1,2],[2,2,1,1],[2,1,1,2],[2,1,2,1]]

Tree traversal inorder tail recursion

Did I implement inorder level-order tree transversal using tail-recursion correctly?
inorder (Leaf n) temp = n:temp
inorder (Node (n, left, right)) temp = inorder left (n:inorder right temp)
inorder :: Tree a -> [a] -> [a]
Tree is declared as
data Tree a = Leaf a | Node (a, Tree a, Tree a) deriving Show
and returns
[2,1,3] on call inorder three [] where three = Node (1, Leaf 2, Leaf 3)

This technically isn't tail recursive because you have a recursive call inorder right temp in a nontail position. One way to fix this would be with continuations. You write a function which takes an accumulator like before, but rather than the accumulator being just a list it's actually a function representing the work left to do in the computation. This means that instead of making a non-tail call and just returning, we can always tail call because the context we need is saved to the continuation.
inorder = go id
where go :: ([a] -> r) -> Tree a -> r
go k Leaf = k []
go k (Node a l r) = go l (\ls -> go r (\rs -> k $ ls ++ n : rs))
Here every call is a tail call as required but it's quite innefficient because it requires a ++ operation at every level, pushing us into quadratic costs. A more efficient algorithm would avoid building up an explicit list and instead build up a difference list, delaying the construction on the concrete structure and giving a more efficient algorithm
type Diff a = [a] -> [a] -- A difference list is just a function
nil :: Diff a
nil xs = xs
cons :: a -> Diff a -> Diff a
cons a d = (:) a . d
append :: Diff a -> Diff a -> Diff a
append xs ys = xs . ys
toList :: Diff a -> a
toList xs = xs []
Note that all of these operations are O(1) except for toList which is O(n) in the number of entries. The important point here is that diff lists are cheap and easy to append so we'll construct these in our algorithm and construct the concrete list at the very end
inorder = go toList
where go :: (Diff a -> r) -> Tree a -> r
go k Leaf = k nil
go k (Node a l r) =
go l (\ls -> go r (\rs -> k $ ls `append` cons n rs))
And now, through gratuitous application of functions we've gotten a completely unidiomatic Haskell program. You see in Haskell we don't really care about tail calls because we generally want to handle infinite structures correctly and that's not really possible if we demand everything be tail recursive. In fact, I would say that while not tail recursive, the code you originally had is the most idiomatic, that's even how it's implemented in Data.Set! It has the property that we can lazily consume the result of that toList and it will work with us and lazily process the tree. So in your implementation, something like
min :: Tree a -> a
min = listToMaybe . toList
is going to be pretty darn close to how you would implement it by hand efficiency wise! It will not construct traverse the whole tree first like my version will have to. These sort of compositional effects of laziness pay more dividends in real Haskell code than syntactically making our code use only tail calls (which does nothing to actually guarantee space usage anyways).

Improve efficiency by removing concat

I'm attempting to improve the efficiency of a particular function in my code which is taking up a large amount of the runtime. After profiling, I believe this is because of the concat within the code. How could I go about improving this code to be quicker?
chunk :: C -> [A] -> [[A]]
chunk c = go []
where s = Set.fromList (map snd (Map.toList c))
go :: [A] -> [A] -> [[A]]
go l [] = [l | member l s]
go l (x:xs) = if member l s then l : go [x] xs
else go (l ++ [x]) xs
Thanks for your help!

A simple solution would be to use Seq, where snoc operation is O(1). This involves converting the input first and then converting the result back, which would be worth, unless the set is large compared to the average length of the lists.
There is however another problem, and that is testing the membership of the lists (or similar structures). Comparison or testing equality on lists is O(n), and in your case, where you test the membership of a list that is likely a sub-list of a list contained in the set, the testing will indeed be Ω(n). So even then, the complexity of chunk could be in the order of O(n^2) where n is the length of the list argument.
It seems that using a trie would be a better solution. A trie is much more efficient than a set of lists in both memory and time complexity. And especially useful for this case is its operation which allows you to take a sub-trie constructed by filtering all elements with a given prefix in O(1).
An example code (untested):
chunk :: C -> [A] -> [[A]]
chunk c = go trie
where trie = Trie.fromList (map snd (Map.toList c))
go :: Trie -> [A] -> [A] -> [[A]]
go s l [] = [reverse l | Trie.member [] s]
go s l (x:xs)
| Trie.member [] s = reverse l : go (Trie.lookupPrefix [x] trie) xs
| otherwise = go (x : l) (Trie.lookupPrefix [x] s)
Now each step of go should take only O(n) amortized cost (the only non-O(1) operation is reverse, but this is O(1) amortized, as reversing a k-element list occurs only once after k steps).
And now we can also make a further improvement: When the sub-trie is empty, we know we'll never return an additional element, as we'll never reach the matching case. So we could add a pattern to the top
go s _ _ | Trie.null s = []
Package list-trie seem to be just perfect for this.

Linking in tree structures

Upon working with long strings now, I came across a rather big problem in creating suffix trees in Haskell.
Some constructing algorithms (as this version of Ukkonen's algorithm) require establishing links between nodes. These links "point" on a node in the tree. In imperative languages, such as Java, C#, etc. this is no problem because of reference types.
Are there ways of emulating this behaviour in Haskell? Or is there a completely different alternative?

You can use a value that isn't determined until the result of a computation in the construction of data in the computation by tying a recursive knot.
The following computation builds a list of values that each hold the total number of items in the list even though the total is computed by the same function that's building the list. The let binding in zipCount passes one of the results of zipWithAndCount as the first argument to zipWithAndCount.
zipCount :: [a] -> [(a, Int)]
zipCount xs =
let (count, zipped) = zipWithAndCount count xs
in zipped
zipWithAndCount :: Num n => b -> [a] -> (n, [(a, b)])
zipWithAndCount y [] = (0, [])
zipWithAndCount y (x:xs) =
let (count', zipped') = zipWithAndCount y xs
in (count' + 1, (x, y):zipped')
Running this example makes a list where each item holds the count of the total items in the list
> zipCount ['a'..'e']
[('a',5),('b',5),('c',5),('d',5),('e',5)]
This idea can be applied to Ukkonen's algorithm by passing in the #s that aren't known until the entire result is known.
The general idea of recursively passing a result into a function is called a least fixed point, and is implemented in Data.Function by
fix :: (a -> a) -> a
fix f = let x = f x in x
We can write zipCount in points-free style in terms of zipWithAndCount and fix.
import Data.Function
zipCount :: [a] -> [(a, Int)]
zipCount = snd . fix . (. fst) . flip zipWithAndCount

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Asymptotic runtime of list-to-tree function - haskell

Related

Fast powerset implementation with complement set

How to efficiently generate all lists of length `n^2` containing `n` copies of every `x < n`?

Tree traversal inorder tail recursion

Improve efficiency by removing concat

Linking in tree structures

Categories

Resources