Do I need to take explicit actions to facilitate sharing with persistent data structures? - haskell

I come from an imperative background and am trying to implement a simple disjoint sets (“union-find”) data structure to get some practice with creating and modifying (persistent) data structures in Haskell. The goal is to have a simple implementation, but I am also concerned about efficiency, and my question is related to this.
First, I created a disjoint-set forest implementation with union by rank and started by defining a data type for a “point”:
data Point = Point
{ _value :: Int
, _parent :: Maybe Point
, _rank :: Int
} deriving Show
A disjointed set forest is an IntMap with Int → Point mappings:
type DSForest = IntMap Point
empty :: DSForest
empty = I.empty
A singleton set is simply a mapping from its value x to a Point with value x, no parent and a rank of 1:
makeSet :: DSForest -> Int -> DSForest
makeSet dsf x = I.insert x (Point x Nothing 0) dsf
Now, the interesting part – union. This operation will modify a point by setting the other point as its parent (and in some cases change its rank). In the case where the Points' rank are different, the Point is simply “updated” (a new Point is created) to have its parent point to the other. In the case where they are equal, a new Point is created with its rank increased by one:
union :: DSForest -> Int -> Int -> DSForest
union dsf x y | x == y = dsf
union dsf x y =
if _value x' == _value y'
then dsf
else case compare (_rank x') (_rank y') of
GT -> I.insert (_value y') y'{ _parent = Just x' } dsf
LT -> I.insert (_value x') x'{ _parent = Just y' } dsf
-- 1) increase x's rank by one:
EQ -> let x'' = x'{ _rank = _rank x' + 1 }
-- 2) update the value for x's rank to point to the new x:
dsf' = I.insert (_value x'') x'' dsf
-- 3) then update y to have the new x as its parent:
in I.insert (_value y') y'{ _parent = Just x'' } dsf'
where x' = dsf ! findSet dsf x
y' = dsf ! findSet dsf y
Now, to my real question, if in the EQ case I had instead done the following:
EQ -> let dsf' = I.insert (_value x') x'{ _rank = _rank x' + 1} dsf
in I.insert (_value y') y'{ _parent = Just x'{ _rank = _rank x' + 1 }} dsf'
I.e. first insert a new Point x with its rank increased, and then having y''s parent be a new Point x with its rank increased, would this mean that they no longer point to the same Point in memory? (Does this even matter? Should I worry about these things when using/creating persistent data structures?)
And just for completeness, here is findSet:
findSet :: DSForest -> Int -> Int
findSet dsf' x' = case _parent (dsf' ! x') of
Just (Point v _ _) -> findSet dsf' v
Nothing -> x'
(General comments about the efficiency and design of this code are also welcome.)

would this mean that they no longer point to the same Point in memory?
I don't think you should be concerned with this as this is just an implementation detail of the runtime system (aka RTS of Haskell) for immutable values.
As far as other suggestion is concerned, I would say make the function findSet return the Point itself rather than the key as that would eliminate the lookup in union.
findSet :: DSForest -> Int -> Point
findSet dsf' x' = case _parent pt of
Just (Point v _ _) -> findSet dsf' v
Nothing -> pt
where
pt = (dsf' ! x')
Make appropriate changes in the union function.

First comment: the disjoint-set union-find data structure is very, very difficult to do well in a purely functional way. If you are just trying to get practice with persistent data structures, I strongly recommend starting with simpler structures like binary search trees.
Now, to see one problem, consider your findSet function. It does not implement path compression! That is, it does not make all the nodes along the path to the root point directly to the root. To do that, you would want to update all those points in the DSForest, so your function would then return (Int, DSForest) or perhaps (Point, DSForest). Doing this in a monad to handle all the plumbing of passing the DSForest around be easier than passing that forest around manually.
But now a second issue. Suppose you modify findSet as just described. It still wouldn't do quite what you want. In particular, suppose you have a chain where 2 is a child of 1, 3 is a child of 2, and 4 is a child of 3. And now you you do a findSet on 3. This will update 3's point so that its parent is 1 instead 2. But 4's parent is still the old 3 point whose parent is 2. This may not matter too much, because it looks like you never really do anything with the parent Point except pull out its value (in findSet). But the very fact that you never do anything with the parent Point except pull out its value says to me that it should be a Maybe Int instead of a Maybe Point.
Let me repeat and expand on what I said at the beginning. Disjoint sets are a particularly hard data structure to handle in a functional/persistent way, so I strongly recommend starting with an easier tree structure like binary search trees or leftist heaps or even abstract syntax trees. Those structures have the property that all access goes through the root--that is, you always start at the root and work your way down through the tree to get to the right place. This property makes the kind of sharing that is the hallmark of persistent data structures MUCH easier.
The disjoint set data structure does not have that property. Instead of always starting at the root and working down to the nodes of interest, you start at arbitrary nodes and work your way back up to the root. When you have unrestricted entry points like this, often the easiest way to handle it is to mediate all the sharing through a separate map (DSForest in your case), but that means passing that map back and forth everywhere.

Sharing is a compiler thing. When it recognizes common sub-expressions, a compiler may chose to represent them both by the same object in memory. But even if you use such a compiler switch (like -fno-cse), it is under no obligation to do so, and the two might be (and usually are, in the absence of the switch) represented by two different, though of equal value, objects in memory. Re: referential transparency.
OTOH when we name something and use that name twice, we (reasonably) expect it to represent the same object in memory. But compiler might choose to duplicate it and use two separate copies in two different use sites, although it is not known to do so. But it might. Re: referential transparency.
See also:
How is this fibonacci-function memoized?
double stream feed to prevent unneeded memoization?
Here's few examples with list-producing functions, drawing from the last link above. They rely on the compiler not duplicating anything, i.e. indeed sharing any named object as expected from call by need lambda calculus operational semantics (as explained by nponeccop in the comments), and not introducing any extra sharing on its own to eliminate common subexpressions:
Sharing fixpoint combinator, creating a loop:
fix f = x where x = f x
Non-sharing fixpoint combinator, creating telescoping multistage chain (i.e. regular recursion chain)
_Y f = f (_Y f)
Two-stages combination - a loop and a feed
_2 f = f (fix f)

Related

Random walk on a pointed container

Let us consider a dwarf wandering in a tunnel. I will define a type that represents this
situation thusly:
data X a = X { xs :: [a], i :: Int }
display :: X Bool -> IO ()
display X{..} = putStrLn (concatMap f xs) where { f True = "*" ; f False = "-" }
Here you see a dwarf in a section of a tunnel:
λ display x
-*---
It is discovered that a pointed container is an instance of Comonad. I can use this
instance here to define a function that simulates my dwarf moving right:
shiftRight :: X Bool -> Bool
shiftRight x#X{..} | let i' = i - 1 in i' `isInRange` x && xs !! i' = True
| otherwise = False
See:
λ traverse_ display $ scanl (&) x (replicate 4 (extend shiftRight))
-*---
--*--
---*-
----*
-----
Spectacularly, this same operation works with any number of dwarves, in any pointed container,
and so can be extended to a whole dwarf fortress if desired. I can similarly define a function
that moves a dwarf leftwards, or in any other deterministic fashion.
But now what if I want my dwarf to wander around aimlessly? Now my "shift randomly" must
only place a dwarf to the right if the same dwarf is not being placed to the left (for that would
make two dwarves out of one), and also it must never place two dwarves in the same place (which
would make one dwarf out of two). In other words, "shift randomly" must be linear (as in
"linear logic") when applied over a comonadic fortress.
One approach I have in mind is to assign some sort of state to dwarves that tracks the available
moves for a dwarf, removing moves from every relevant dwarf when we decide that the location is
taken by one of them. This way, the remaining dwarves will not be able to take that move. Or we
may track availability of locations. I am thinking that some sort of a "monadic" extendM
might be useful. (It would compare to the usual extend as traverse compares to fmap.)
But I am not aware of any prior art.
The easiest way to solve this is by using the MonadRandom library, which introduces a new monad for random computations. So let’s set up a computation using random numbers:
-- normal comonadic computation
type CoKleisli w a b = w a -> b
-- randomised comonadic computation
type RCoKleisli w a b = w a -> Rand b
Now, how to apply this thing? It’s easy enough to extend it:
halfApply :: Comonad w => (w a -> Rand b) -> (w a -> w (Rand b))
halfApply = extend
But this doesn’t quite work: it gives us a container of randomised values, whereas we want a randomised container of values. In other words, we need to find something which can do w (Rand b) -> Rand (w b). And in fact there does exist such a function: sequenceA! As the documentation states, if we apply sequenceA to a w (Rand b), it will run each Rand computation, then accumulate the results to get a Rand (w b) — which is exactly what we want! So:
fullApply :: (Comonad w, Traversible w, Applicative f)
=> (w a -> f b) -> (w a -> f (w b))
fullApply c = sequenceA . extend c
As you can see from the type signature above, this actually works for any Applicative (because all we require is that each applicative computation can be run in turn), but requires w to be Traversible (so we can traverse over each value in w).
(For more on this sort of thing, I recommend this blog post, plus its second part. If you want to see the above technique in action, I recommend my own probabilistic cellular automata library, back when it still used comonads instead of my own typeclass.)
So that answers one half of your question; that is, how to get probabilistic behaviour using comonads. The second half is:
… and also it must never place two dwarves in the same place …
This I’m not too sure about, but one solution could be to split your comonadic computation into three stages:
Convert every dwarf probabilistically to a diff stating whether that dwarf will move left, right, or stay. Type for this operation: mkDiffs :: X Dwarf -> Rand (X DwarfDiff)
Execute each diff, but keeping the original dwarf positions. Type for this operation: execDiffs :: X DwarfDiff -> X (DwarfDiff, [DwarfDiffed]).
Resolve situations where dwarfs have collided. Type for this operation: resolve :: X (Dwarf, [DwarfDiffed]) -> Rand (X Dwarf).
Types used above:
data Dwarf = Dwarf | NoDwarf
data DwarfDiff = MoveLeft | MoveRight | DontMove | NoDiff
data DwarfDiffed = MovedFromLeft | MovedFromRight | NothingMoved
Example of what I’m talking about:
myDwarfs = X [NoDwarf ,Dwarf ,NoDwarf ,Dwarf ,Dwarf ,Dwarf ] 0
mkDiffs myDwarfs
= X [NoDiff ,MoveRight ,NoDiff ,MoveLeft ,MoveRight ,DontMove ] 0
execDiffs (mkDiffs myDwarfs)
= X [(NoDiff,[NothingMoved]),(MoveRight,[NothingMoved]),(NoDiff,[MovedFromRight,MovedFromLeft]),(MoveLeft,[NothingMoved]),(MoveRight,[NothingMoved]),(DontMove,[MovedFromLeft])] 0
resolve (execDiffs (mkDiffs myDwarfs))
= X [NoDwarf ,NoDwarf ,Dwarf ,Dwarf ,Dwarf , Dwarf ] 0
As you can see, the above solution is pretty complicated. I have an alternate recommendation: don’t use comonads for this problem! Comonads are great for when you need to update one value based on its context, but are awful at updating multiple values simultaneously. The issue is that comonads such as your X are zippers, which store a data structure as a single ‘focused’ value plus a surrounding ‘context’. As I said, this is great for updating a focused value based on its context, but if you need to update multiple values, you have to shoehorn your computation into this value+context mould… which, as we saw above, can be pretty tricky. So possibly comonads aren’t the best choice for this application.

Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.
There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;
My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

What are the benefits of replacing Haskell record with a function

I was reading this interesting article about continuations and I discovered this clever trick. Where I would naturally have used a record, the author uses instead a function with a sum type as the first argument.
So for example, instead of doing this
data Processor = Processor { processString :: String -> IO ()
, processInt :: Int -> IO ()
}
processor = Processor (\s -> print $ "Hello "++ s)
(\x -> print $ "value" ++ (show x))
We can do this:
data Arg = ArgString String | ArgInt Int
processor :: Arg -> IO ()
processor (ArgString s) = print "Hello" ++ s
processor (ArgInt x) = print "value" ++ (show x)
Apart from being clever, what are the benefits of it over a simple record ?
Is it a common pattern and does it have a name ?
Well, it's just a simple isomorphism. In ADT algebraic:
IO()String × IO()Int
≅ IO()String+Int
The obvious benefit of the RHS is perhaps that it only contains IO() once – DRY FTW.
This is a very loose example but you can see the Arg method as being an initial encoding and the Processor method as being a final encoding. They are, as others have noted, of equal power when viewed in many lights; however, there are some differences.
Initial encodings enable us to examine the "commands" being executed. In some sense, it means we've sliced the operation so that the input and the output are separated. This lets us choose many different outputs given the same input.
Final encodings enable us to abstract over implementations more easily. For instance, if we have two values of type Processor then we can treat them identically even if the two have different effects or achieve their effects by different means. This kind of abstraction is popularized in OO languages.
Initial encodings enable (in some sense) an easier time adding new functions since we just have to add a new branch to the Arg type. If we had many different ways of building Processors then we'd have to update each of these mechanisms.
Honestly, what I've described above is rather stretched. It is the case that Arg and Processor fit these patterns somewhat, but they do not do so in such a significant way as to really benefit from the distinction. It may be worth studying more examples if you're interested—a good search term is the "expression problem" which emphasizes the distinction in points (2) and (3) above.
To expand a bit on leftroundabout's response, there is a way of writing functions as OutputInput, because of cardinality (how many things there are). So for example if you think about all of the mappings of the set {0, 1, 2} of cardinality 3 to the set {0, 1} of cardinality 2, you see that 0 can map to 0 or 1, independent of 1 mapping to 0 or 1, independent of 2 mapping to 0 or 1. When counting the total number of functions we get 2 * 2 * 2 or 23.
In this same way of writing, sum types are written with + and product types are written with * and there is a cute way to phrase this as OutIn1 + In2 = OutIn1 * OutIn2; we could write the isomorphism as:
combiner :: (a -> z, b -> z) -> Either a b -> z
combiner (za, zb) e_ab = case e_ab of Left a -> za a; Right b -> zb b
splitter :: (Either a b -> z) -> (a -> z, b -> z)
splitter z_eab = (\a -> z_eab $ Left a, \b -> z_eab $ Right b)
and we can reify it in your code with:
type Processor = Either String Int -> IO ()
So what's the difference? There aren't many:
The combined form requires both things to have the exact same tail-end. You can't apply combiner to something of type a -> b -> z since that parses as a -> (b -> z) and b -> z is not unifiable with z. If you wanted to unify a -> b -> z with c -> z then you have to first uncurry the function to (a, b) -> z, which looks like a bit of work -- it's just not an issue when you use the record version.
The split form is also a little more concise for application; you just write fst split a instead of combined $ Left a. But this also means that you can't quite do something like yz . combined (whose equivalent is (yz . fst split, yz . snd split)) so easily. When you've actually got the Processor record defined it might be worth it to extend its kind to * -> * and make it a Functor.
The record can in general participate in type classes more easily than the sum-type-function.
Sum types will look more imperative, so they'll probably be clearer to read. For example, if I hand you the pattern withProcState p () [Read path1, Apply (map toUpper), Write path2] it's pretty easy to see that this feeds the processor with commands to uppercase path1 into path2. The equivalent of defining processors would look like procWrite p path2 $ procApply p (map toUpper) $ procRead p path1 () which is still pretty clear but not quite as awesome as the previous case.

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.
As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.
If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

How can iterative deepening search implemented efficient in haskell?

I have an optimization problem I want to solve. You have some kind of data-structure:
data Foo =
{ fooA :: Int
, fooB :: Int
, fooC :: Int
, fooD :: Int
, fooE :: Int
}
and a rating function:
rateFoo :: myFoo -> Int
I have to optimize the result of rateFoo by changing the values in the struct. In this specific case, I decided to use iterative deepening search to solve the problem. The (infinite) search tree for the best optimization is created by another function, which simply applies all possible changes recursivly to the tree:
fooTree :: Foo -> Tree
My searching function looks something like this:
optimize :: Int -> Foo -> Foo
optimize threshold foo = undefined
The question I had, before I start is this:
As the tree can be generated by the data at each point, is it possible to have only the parts of the tree generated, which are currently needed by the algorithm? Is it possible to have the memory freed and the tree regenerated if needed in order to save memory (A leave at level n can be generated in O(n) and n remains small, but not small enough to have the whole tree in memory over time)?
Is this something I can excpect from the runtime? Can the runtime unevaluate expressions (turn an evaluated expression into an unevaluated one)? Or what is the dirty hack I have to do for this?
The runtime does not unevaluate expressions.
There's a straightforward way to get what you want however.
Consider a zipper-like structure for your tree. Each node holds a value and a thunk representing down, up, etc. When you move to the next node, you can either move normally (placing the previous node value in the corresponding slot) or forgetfully (placing an expression which evaluates to the previous node in the right slot). Then you have control over how much "history" you hang on to.
Here's my advice:
Just implement your algorithm in the
most straightforward way possible.
Profile.
Optimize for speed or memory use if necessary.
I very quickly learned that I'm not smart and/or experienced enough to reason about what GHC will do or how garbage collection will work. Sometimes things that I'm sure will be disastrously memory-inefficient work smoothly the first time around, and–less often–things that seem simple require lots of fussing with strictness annotations, etc.
The Real World Haskell chapter on profiling and optimization is incredibly helpful once you get to steps 2 and 3.
For example, here's a very simple implementation of IDDFS, where f expands children, p is the search predicate, and x is the starting point.
search :: (a -> [a]) -> (a -> Bool) -> a -> Bool
search f p x = any (\d -> searchTo f p d x) [1..]
where
searchTo f p d x
| d == 0 = False
| p x = True
| otherwise = any (searchTo f p $ d - 1) (f x)
I tested by searching for "abbaaaaaacccaaaaabbaaccc" with children x = [x ++ "a", x ++ "bb", x ++ "ccc"] as f. It seems reasonably fast and requires very little memory (linear with the depth, I think). Why not try something like this first and then move to a more complicated data structure if it isn't good enough?

Resources