Is there a way to avoid copying the whole search path of a binary tree on insert? - haskell

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.

In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.

GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.

As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.

If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

Related

Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.
There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;
My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

ghci compiler optimization: calling a function with same parameter twice

In the simple code below, part of the definition of a function that deletes an element from a binary search tree:
deleteB x (Node n l r) | x == n = Node (leastB r) l (deleteB (leastB r) r)
does the compiler optimize the code so that it calls (least B r) only once as if it were:
deleteB x (Node n l r) | x == n = Node k l (deleteB k r)
where k = leastB r
?
In other words, is the compiler able to understand that since parameter r isn't changed within the body of the function deleteB, the result of the call of the same function (leastB) on it can't give different results, hence it is useless to compute it twice?
More generally, how would I be able to understand if the compiler does this optimization or not in case amazing stackoverflow did not exist? thanks
If you want to know what GHC "really did", you want to look at the "Core" output.
GHC takes your Haskell source code, which is extremely high-level, and transforms it into a sequence of lower and lower-level languages:
Haskell ⇒ Core ⇒ STG ⇒ C−− ⇒ assembly language ⇒ machine code
Almost all of the high-level optimisations happen in Core. The one you're asking about is basically "common subexpression elimination" (CSE). If you think about it, this is a time / space tradeoff; by saving the previous result, you're using less CPU time, but also using more RAM. If the result you're trying to store is tiny (i.e., an integer), this can be worth it. If the result is huge (i.e., the entire contents of that 17GB text file you just loaded), this is probably a Very Bad Idea.
As I understand it (⇒ not very well!), GHC tends not to do CSE. But if you want to know for sure, in your specific case, you want to look at the Core that your program has actually been compiled into. I believe the switch you want is --ddump-prep.
http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/options-debugging.html
GHC does not perform this optimization because it is not always an optimization space-wise.
For instance, consider
n = 1000000
x = (length $ map succ [1..n], length $ map pred [1..n])
On a lazy language such as Haskell, one would expect this to run in constant space. Indeed, the list generating expression [1..n] should lazily produce an element at a time, which would be affected by succ/pred because of the maps, and then counted by length. (Even better, succ and pred are not computed at all since length does not force the list elements). After this, the produced element can be garbage collected, and the list generator can produce the next element, and so on. In real implementations, one would not expect every single element to be garbage collected immediately, but if the garbage collector is good, only a constant amount of them should be in memory at any time.
By comparison, the "optimized" code
n = 1000000
l = [1..n]
x = (length $ map succ l, length $ map pred l)
does not allow to garbage collect the elements of l until both components of x are evaluated. So, while it produces the list only once, it uses O(n) words of memory to store the full list. This is likely to lead to a lower performance than the unoptimized code.

Do I need to take explicit actions to facilitate sharing with persistent data structures?

I come from an imperative background and am trying to implement a simple disjoint sets (“union-find”) data structure to get some practice with creating and modifying (persistent) data structures in Haskell. The goal is to have a simple implementation, but I am also concerned about efficiency, and my question is related to this.
First, I created a disjoint-set forest implementation with union by rank and started by defining a data type for a “point”:
data Point = Point
{ _value :: Int
, _parent :: Maybe Point
, _rank :: Int
} deriving Show
A disjointed set forest is an IntMap with Int → Point mappings:
type DSForest = IntMap Point
empty :: DSForest
empty = I.empty
A singleton set is simply a mapping from its value x to a Point with value x, no parent and a rank of 1:
makeSet :: DSForest -> Int -> DSForest
makeSet dsf x = I.insert x (Point x Nothing 0) dsf
Now, the interesting part – union. This operation will modify a point by setting the other point as its parent (and in some cases change its rank). In the case where the Points' rank are different, the Point is simply “updated” (a new Point is created) to have its parent point to the other. In the case where they are equal, a new Point is created with its rank increased by one:
union :: DSForest -> Int -> Int -> DSForest
union dsf x y | x == y = dsf
union dsf x y =
if _value x' == _value y'
then dsf
else case compare (_rank x') (_rank y') of
GT -> I.insert (_value y') y'{ _parent = Just x' } dsf
LT -> I.insert (_value x') x'{ _parent = Just y' } dsf
-- 1) increase x's rank by one:
EQ -> let x'' = x'{ _rank = _rank x' + 1 }
-- 2) update the value for x's rank to point to the new x:
dsf' = I.insert (_value x'') x'' dsf
-- 3) then update y to have the new x as its parent:
in I.insert (_value y') y'{ _parent = Just x'' } dsf'
where x' = dsf ! findSet dsf x
y' = dsf ! findSet dsf y
Now, to my real question, if in the EQ case I had instead done the following:
EQ -> let dsf' = I.insert (_value x') x'{ _rank = _rank x' + 1} dsf
in I.insert (_value y') y'{ _parent = Just x'{ _rank = _rank x' + 1 }} dsf'
I.e. first insert a new Point x with its rank increased, and then having y''s parent be a new Point x with its rank increased, would this mean that they no longer point to the same Point in memory? (Does this even matter? Should I worry about these things when using/creating persistent data structures?)
And just for completeness, here is findSet:
findSet :: DSForest -> Int -> Int
findSet dsf' x' = case _parent (dsf' ! x') of
Just (Point v _ _) -> findSet dsf' v
Nothing -> x'
(General comments about the efficiency and design of this code are also welcome.)
would this mean that they no longer point to the same Point in memory?
I don't think you should be concerned with this as this is just an implementation detail of the runtime system (aka RTS of Haskell) for immutable values.
As far as other suggestion is concerned, I would say make the function findSet return the Point itself rather than the key as that would eliminate the lookup in union.
findSet :: DSForest -> Int -> Point
findSet dsf' x' = case _parent pt of
Just (Point v _ _) -> findSet dsf' v
Nothing -> pt
where
pt = (dsf' ! x')
Make appropriate changes in the union function.
First comment: the disjoint-set union-find data structure is very, very difficult to do well in a purely functional way. If you are just trying to get practice with persistent data structures, I strongly recommend starting with simpler structures like binary search trees.
Now, to see one problem, consider your findSet function. It does not implement path compression! That is, it does not make all the nodes along the path to the root point directly to the root. To do that, you would want to update all those points in the DSForest, so your function would then return (Int, DSForest) or perhaps (Point, DSForest). Doing this in a monad to handle all the plumbing of passing the DSForest around be easier than passing that forest around manually.
But now a second issue. Suppose you modify findSet as just described. It still wouldn't do quite what you want. In particular, suppose you have a chain where 2 is a child of 1, 3 is a child of 2, and 4 is a child of 3. And now you you do a findSet on 3. This will update 3's point so that its parent is 1 instead 2. But 4's parent is still the old 3 point whose parent is 2. This may not matter too much, because it looks like you never really do anything with the parent Point except pull out its value (in findSet). But the very fact that you never do anything with the parent Point except pull out its value says to me that it should be a Maybe Int instead of a Maybe Point.
Let me repeat and expand on what I said at the beginning. Disjoint sets are a particularly hard data structure to handle in a functional/persistent way, so I strongly recommend starting with an easier tree structure like binary search trees or leftist heaps or even abstract syntax trees. Those structures have the property that all access goes through the root--that is, you always start at the root and work your way down through the tree to get to the right place. This property makes the kind of sharing that is the hallmark of persistent data structures MUCH easier.
The disjoint set data structure does not have that property. Instead of always starting at the root and working down to the nodes of interest, you start at arbitrary nodes and work your way back up to the root. When you have unrestricted entry points like this, often the easiest way to handle it is to mediate all the sharing through a separate map (DSForest in your case), but that means passing that map back and forth everywhere.
Sharing is a compiler thing. When it recognizes common sub-expressions, a compiler may chose to represent them both by the same object in memory. But even if you use such a compiler switch (like -fno-cse), it is under no obligation to do so, and the two might be (and usually are, in the absence of the switch) represented by two different, though of equal value, objects in memory. Re: referential transparency.
OTOH when we name something and use that name twice, we (reasonably) expect it to represent the same object in memory. But compiler might choose to duplicate it and use two separate copies in two different use sites, although it is not known to do so. But it might. Re: referential transparency.
See also:
How is this fibonacci-function memoized?
double stream feed to prevent unneeded memoization?
Here's few examples with list-producing functions, drawing from the last link above. They rely on the compiler not duplicating anything, i.e. indeed sharing any named object as expected from call by need lambda calculus operational semantics (as explained by nponeccop in the comments), and not introducing any extra sharing on its own to eliminate common subexpressions:
Sharing fixpoint combinator, creating a loop:
fix f = x where x = f x
Non-sharing fixpoint combinator, creating telescoping multistage chain (i.e. regular recursion chain)
_Y f = f (_Y f)
Two-stages combination - a loop and a feed
_2 f = f (fix f)

tail call memory managment in haskell

I'm using the following control structure(which I think is tail recursive)
untilSuccessOrBigError :: (Eq e) => (Integer -> (Either e a)) -> Integer -> e -> (Either e a)
untilSuccessOrBigError f count bigError
= case f count of
Right x -> Right x
Left e -> (if e==bigError then Left e else untilSuccessOrBigError f (count - 1) e)
to do iterative deepening
iterativeDeepening :: (a -> [a]) -> (a -> Bool) -> (a -> Bool) -> a -> Either String a
iterativeDeepening stepFunc isAValidSolution isGraphBottom x
= untilSuccessOrBigError
(\count -> dfsWithMaxDepth stepFunc isAValidSolution isGraphBottom count x)
(-1)
"Reached graph bottom"
will this free memory (since it will no longer technically be able to reach it) as at after each iterative deepening, if not how should I rewrite the control structure?
P.S.
On second though it looks like this will fail since tail recursive structures frequently be able to access things on the stack like adding to the previous value, even if it doesn't in this case. – Roman A. Taycher Nov 28 at 12:33
P.P.S.
On third though it makes me think that it can discard the values inside dfsWithMaxDepth as soon as dfsWithMaxDepth returns and a bunch of answers won't take up much memory. – Roman A. Taycher Nov 2
At first glance, there's no reason that this won't be garbage collected properly, and why TCO won't be performed.
In general, you should think about tco and garbage collection in a different way in Haskell -- lots of related questions listed here on SO seem helpful. Fundamentally you want to imagine the operational semantics of GHC Haskell as lazy graph reduction.
Imagine that you just have the whole AST in memory, with additional arrows from every usage of a name to its definition, and you ask for the value of "main." Now to get that, you need to look at the value of the first function called in main, and substitute it in, which in turn means that you need to chase the next thing that needs to be evaluated, etc. Now at some point in this reduction, you'll notice that things that used to be pointed to as expressions have now been computed, and replaced with the values they represent. Then those expressions can get garbage collected. Meanwhile, the trace you've got from the top of the graph down to whatever piece you're now evaluating is the "stack". So if to evaluate a structure, you need to evaluate "inside" that structure, then that's going to grow your stack.
The above is sloppy and handwavy, but might help to give an intuition.

How can iterative deepening search implemented efficient in haskell?

I have an optimization problem I want to solve. You have some kind of data-structure:
data Foo =
{ fooA :: Int
, fooB :: Int
, fooC :: Int
, fooD :: Int
, fooE :: Int
}
and a rating function:
rateFoo :: myFoo -> Int
I have to optimize the result of rateFoo by changing the values in the struct. In this specific case, I decided to use iterative deepening search to solve the problem. The (infinite) search tree for the best optimization is created by another function, which simply applies all possible changes recursivly to the tree:
fooTree :: Foo -> Tree
My searching function looks something like this:
optimize :: Int -> Foo -> Foo
optimize threshold foo = undefined
The question I had, before I start is this:
As the tree can be generated by the data at each point, is it possible to have only the parts of the tree generated, which are currently needed by the algorithm? Is it possible to have the memory freed and the tree regenerated if needed in order to save memory (A leave at level n can be generated in O(n) and n remains small, but not small enough to have the whole tree in memory over time)?
Is this something I can excpect from the runtime? Can the runtime unevaluate expressions (turn an evaluated expression into an unevaluated one)? Or what is the dirty hack I have to do for this?
The runtime does not unevaluate expressions.
There's a straightforward way to get what you want however.
Consider a zipper-like structure for your tree. Each node holds a value and a thunk representing down, up, etc. When you move to the next node, you can either move normally (placing the previous node value in the corresponding slot) or forgetfully (placing an expression which evaluates to the previous node in the right slot). Then you have control over how much "history" you hang on to.
Here's my advice:
Just implement your algorithm in the
most straightforward way possible.
Profile.
Optimize for speed or memory use if necessary.
I very quickly learned that I'm not smart and/or experienced enough to reason about what GHC will do or how garbage collection will work. Sometimes things that I'm sure will be disastrously memory-inefficient work smoothly the first time around, and–less often–things that seem simple require lots of fussing with strictness annotations, etc.
The Real World Haskell chapter on profiling and optimization is incredibly helpful once you get to steps 2 and 3.
For example, here's a very simple implementation of IDDFS, where f expands children, p is the search predicate, and x is the starting point.
search :: (a -> [a]) -> (a -> Bool) -> a -> Bool
search f p x = any (\d -> searchTo f p d x) [1..]
where
searchTo f p d x
| d == 0 = False
| p x = True
| otherwise = any (searchTo f p $ d - 1) (f x)
I tested by searching for "abbaaaaaacccaaaaabbaaccc" with children x = [x ++ "a", x ++ "bb", x ++ "ccc"] as f. It seems reasonably fast and requires very little memory (linear with the depth, I think). Why not try something like this first and then move to a more complicated data structure if it isn't good enough?

Resources