tail call memory managment in haskell - haskell

I'm using the following control structure(which I think is tail recursive)
untilSuccessOrBigError :: (Eq e) => (Integer -> (Either e a)) -> Integer -> e -> (Either e a)
untilSuccessOrBigError f count bigError
= case f count of
Right x -> Right x
Left e -> (if e==bigError then Left e else untilSuccessOrBigError f (count - 1) e)
to do iterative deepening
iterativeDeepening :: (a -> [a]) -> (a -> Bool) -> (a -> Bool) -> a -> Either String a
iterativeDeepening stepFunc isAValidSolution isGraphBottom x
= untilSuccessOrBigError
(\count -> dfsWithMaxDepth stepFunc isAValidSolution isGraphBottom count x)
(-1)
"Reached graph bottom"
will this free memory (since it will no longer technically be able to reach it) as at after each iterative deepening, if not how should I rewrite the control structure?
P.S.
On second though it looks like this will fail since tail recursive structures frequently be able to access things on the stack like adding to the previous value, even if it doesn't in this case. – Roman A. Taycher Nov 28 at 12:33
P.P.S.
On third though it makes me think that it can discard the values inside dfsWithMaxDepth as soon as dfsWithMaxDepth returns and a bunch of answers won't take up much memory. – Roman A. Taycher Nov 2

At first glance, there's no reason that this won't be garbage collected properly, and why TCO won't be performed.
In general, you should think about tco and garbage collection in a different way in Haskell -- lots of related questions listed here on SO seem helpful. Fundamentally you want to imagine the operational semantics of GHC Haskell as lazy graph reduction.
Imagine that you just have the whole AST in memory, with additional arrows from every usage of a name to its definition, and you ask for the value of "main." Now to get that, you need to look at the value of the first function called in main, and substitute it in, which in turn means that you need to chase the next thing that needs to be evaluated, etc. Now at some point in this reduction, you'll notice that things that used to be pointed to as expressions have now been computed, and replaced with the values they represent. Then those expressions can get garbage collected. Meanwhile, the trace you've got from the top of the graph down to whatever piece you're now evaluating is the "stack". So if to evaluate a structure, you need to evaluate "inside" that structure, then that's going to grow your stack.
The above is sloppy and handwavy, but might help to give an intuition.

Related

Haskell taking in two list with a int and returning a tuple

I am trying to learn haskell and saw a exercise which says
Write two different Haskell functions having the same type:
[a] -> [b] -> Int -> (a,b)
So from my understanding the expressions should take in two lists, an int and return a tuple of the type of the lists.
What i tried so far was
together :: [a] -> [b] -> Int -> (a,b)
together [] [] 0 = (0,0)
together [b] [a] x = if x == a | b then (b,a) else (0,0)
I know I am way off but any help is appreciated!
First you need to make your mind up what the function should return. That is partly determined by the signature. But still you can come up with a lot of functions that return different things, but have the same signature.
Here one of the most straightforward functions is probably to return the elements that are placed on the index determined by the third parameter.
It makes no sense to return (0,0), since a and b are not per se numerical types. Furthermore if x == a | b is not semantically valid. You can write this as x == a || x == b, but this will not work, since a and b are not per se Ints.
We can implement a function that returns the heads of the two lists in case the index is 0. In case the index is negative, or at least one of the two lists is exhausted, then we can raise an error. I leave it as an exercise what to do in case the index is greater than 0:
together :: [a] -> [b] -> Int -> (a,b)
together [] _ = error "List exhausted"
together _ [] = error "List exhausted"
together (a:_) (b:_) 0 = (a, b)
together (a:_) (b:_) n | n < 0 = error "Negative index!"
| …
you thus still need to fill in the ….
I generally dislike those "write any function with this signature"-type excercises precisely because of how arbitrary they are. You're supposed to figure out a definition that would make sense for that particular signature and implement it. In a lot of cases, you can wing it by ignoring as many arguments as possible:
fa :: [a] -> [b] -> Int -> (a,b)
fa (a:_) (b:_) _ = (a,b)
fa _ _ _ = error "Unfortunately, this function can't be made total because lists can be empty"
The error here is the important bit to note. You attempted to go around that problem by returning 0s, but this will only work when 0 is a valid value for types of a and b. Your next idea could be some sort of a "Default" value, but not every type has such a concept. The key observation is that without any knowledge about a type, in order to produce a value from a function, you need to get this value from somewhere else first*.
If you actually wanted a more sensible definition, you'd need to think up a use for that Int parameter; maybe it's the nth element from each
list? With the help of take :: Int -> [a] -> [a] and head :: [a] -> a this should be doable as an excercise.
Again, your idea of comparing x with a won't work for all types; not every type is comparable with an Int. You might think that this would make generic functions awfully limited; that's the point where you typically learn about how to express certain expectations about the types you get, which will allow you to operate only on certain subsets of all possible types.
* That's also the reason why id :: a -> a has only one possible implementation.
Write two different Haskell functions having the same type:
[a] -> [b] -> Int -> (a,b)
As Willem and Bartek have pointed out, there's a lot of gibberish functions that have this type.
Bartek took the approach of picking two based on what the simplest functions with that type could look like. One was a function that did nothing but throw an error. And one was picking the first element of each list, hoping they were not empty and failing otherwise. This is a somewhat theoretical approach, since you probably don't ever want to use those functions in practice.
Willem took the approach of suggesting an actually useful function with that type and proceeded to explore how to exhaust the possible patterns of such a function: For lists, match the empty list [] and the non-empty list a:_, and for integers, match some stopping point, 0 and some categories n < 0 and ….
A question that arises to me is if there is any other equally useful function with this type signature, or if a second function would necessarily have to be hypothetically constructed. It would seem natural that the Int argument has some relation to the positions of elements in [a] and [b], since they are also integers, especially because a pair of single (a,b) is returned.
But the only remotely useful functions (in the sense of not being completely silly) that I can think of are small variations of this: For example, the Int could be the position from the end rather than from the beginning, or if there's not enough elements in one of the lists, it could default to the last element of a list rather than an error. Neither of these are very pleasing to make ("from the end" conflicts with the list being potentially infinite, and having a fall-back to the last element of a list conflicts with the fact that lists don't necessarily have a last element), so it is tempting to go with Bartek's approach of writing the simplest useless function as the second one.

Why Do We Need Maybe Monad Over Either Monad

I was playing around Maybe and Either monad types (Chaining, applying conditional functions according to returned value, also returning error message which chained function has failed etc.). So it seemes to me like we can achieve same and more things that Maybe does by using Either monad. So my question is where the practical or conceptual difference between those ?
You are of course right that Maybe a is isomorphic to Either Unit a. The thing is that they are often semantically used to denote different things, a bit like the difference between returning null and throwing a NoSuchElementException:
Nothing/None denotes the "expected" missing of something, while
Left e denotes an error in getting it, for whatever reason.
That said, we might even combine the two to something like:
query :: Either DBError (Maybe String)
where we express both the possibility of a missing value (a DB NULL) and an error in the connection, the DBMS, or whatever (not saying that there aren't better designs, but you get the point).
Sometimes, the border is fluid; for saveHead :: [a] -> Maybe a, we could say that the expected possibility of the error is encoded in the intent of the function, while something like saveDivide might be encoded as Float -> Float -> Either FPError Float or Float -> Float -> Maybe Float, depending on the use case (again, just some stupid examples...).
If in doubt, the best option is probably to use a custom result ADT with semantic encoding (like data QueryResult = Success String | Null | Failure DBError), and to prefer Maybe to simple cases where it is "traditionally expected" (a subjective point, which however will be mostly OK if you gain experience).
#phg's answer is great. I will chime in with something that helped clear it up for me when I was learning them:
Maybe is one (value) or none – ie, you have a value or you have nothing
Either is a logical disjunction, but you always have at least one (value) - ie, you have one or the other, but not both.
Maybe is great for things like where you may or may not have a value - for example looking for an item in a list. if the list contains it, we get (Just x) otherwise we get Nothing
Either is the perfect representation of a branch in your code - it's going to go one way or the other; Left or Right. We use a mnemonic to remember it: Right is the right (correct) way; Left is the wrong way (Error). This is not it's only use of course, but definitely the most common.
I know the differences might seem subtle at first, but really they're suitable for very different things.
Well, you see, we can put this to the extreme by saying that all product types can be represented by just 2-tuples and all non-recursive sum types by Either. To additionally represent recursive types, we need a fixpoint type.
For example, why have 4-tuples (a,b,c,d) when we could as well write (a, (b, (c,d))) or (((a,b), c), d) ?
Or why have lists, when the following works as well?
data Y f = Y (f (Y f))
type List a = Y ((,) (Either () a))
nil = Y (Left (), undefined)
cons a as = Y (Right a, as)
infixr 4 cons
numbers = 1 `cons` 2 `cons` 3 `cons` nil
-- this is like foldl
reduce f z (Y (Left (), _)) = z
reduce f z (Y (Right x, xs)) = reduce f (f z x) xs
total = reduce (+) 0 numbers

Correct terminology for continuations

I've been poking around continuations recently, and I got confused about the correct terminology. Here Gabriel Gonzalez says:
A Haskell continuation has the following type:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
i.e. the whole (a -> r) -> r thing is the continuation (sans the wrapping)
The wikipedia article seems to support this idea by saying
a continuation is an abstract representation of the control state
of a computer program.
However, here the authors say that
Continuations are functions that represent "the remaining computation to do."
but that would only be the (a->r) part of the Cont type. And this is in line to what Eugene Ching says here:
a computation (a function) that requires a continuation function in order
to fully evaluate.
We’re going to be seeing this kind of function a lot, hence, we’ll give it
a more intuitive name. Let’s call them waiting functions.
I've seen another tutorial (Brian Beckman and Erik Meijer) where they call the whole thing (the waiting function) the observable and the function which is required for it to complete the observer.
What is the the continuation, the (a->r)->r thingy or just the (a->r) thing (sans the wrapping)?
Is the wording observable/observer about correct?
Are the citations above really contradictory, is there a common truth?
What is the the continuation, the (a->r)->r thingy or just the (a->r) thing (sans the wrapping)?
I would say that the a -> r bit is the continuation and the (a -> r) -> r is "in continuation passing style" or "is the type of the continuation monad.
I am going to go off on a long digression on the history of continuations which is not really relivant to the question...so be warned.
It is my belief that the first published paper on continuations was "Continuations: A Mathematical Semantics for Handling Full Jumps" by Strachey and Wadsworth (although the concept was already folklore). The idea of that paper is I think a pretty important one. Early semantics for imperative programs attempted to model commands as state transformer functions. For example, consider the simple imperative language given by the following BNF
Command := set <Expression> to <Expression>
| skip
| <Command> ; <Command>
Expression := !<Expression>
| <Number>
| <Expression> + <Expression>
here we use a expressions as pointers. The simplest denotational function interprets the state as functions from natural numbers to natural numbers:
S = N -> N
We can interpret expressions as functions from state to the natural numbers
E[[e : Expression]] : S -> N
and commands as state transducers.
C[[c : Command]] : S -> S
This denotational semantics can be spelled out rather simply:
E[[n : Number]](s) = n
E[[a + b]](s) = E[[a]](s) + E[[b]](s)
E[[!e]](s) = s(E[[e]](s))
C[[skip]](s) = s
C[[set a to b]](s) = \n -> if n = E[[a]](s) then E[[b]](s) else s(n)
C[[c_1;c_2]](s) = (C[[c_2] . C[[c_1]])(s)
As simple program in this language might look like
set 0 to 1;
set 1 to (!0) + 1
which would be interpreted as function that turns a state function s into a new function that is just like s except it maps 0 to 1 and 1 to 2.
This was all well and good, but how do you handle branching? Well, if you think about it alot you can probably come up with a way to handle if and loops that go an exact number of times...but what about general while loops?
Strachey and Wadsworth's showed us how to do it. First of all, they pointed out that these "State transducer functions" were pretty important and so decided to call them "command continuations" or just "continuations."
C = S -> S
From this they defined a new semantics, which we will provisionally define this way
C'[[c : Command]] : C -> C
C'[[c]](cont) = cont . C[[c]]
What is going on here? Well, observe that
C'[[c_1]](C[[c_2]]) = C[[c_1 ; c_2]]
and further
C'[[c_1]](C'[[c_2]](cont) = C'[[c_1 ; c_2]](cont)
Instead of doing it this way, we can inline the definition
C'[[skip]](cont) = cont
C'[[set a to b]](cont) = cont . \s -> \n -> if n = E[[a]](s) then E[[b]](s) else s(n)
C'[[c_1 ; c_2]](cont) = C'[[c_1]](C'[[c_2]](cont)
What has this bought us? Well, a way to interpret while, thats what!
Command := ... | while <Expression> do <Command> end
C'[[while e do c end]](cont) =
let loop = \s -> if E[[e]](s) = 0 then C'[[c]](loop)(s) else cont(s)
in loop
or, using a fixpoint combinator
C'[[while e do c end]](cont)
= Y (\f -> \s -> if E[[e]](s) = 0 then C'[[c]](f)(s) else cont(s))
Anyways...that is history and not particularly important...except in so far as it showed how to interpret programs mathematically, and set the language of "continuation."
Also, the approach to denotational semantics of "1. define a new semantic function in terms of the old 2. inline 3. profit" works surprisingly often. For example, it is often useful to have your semantic domain form a lattice (think, abstract interpretation). How do you get that? Well, one option is to take the powerset of the domain, and inject into this by interpreting your functions as singletons. If you inline this powerset construction you get something that can either model non-determinism or, in the case of abstract interpretation, various amounts of information about a program other than exact certainty as to what it does.
Various other work followed. Here I skip over many greats such as the lambda papers... But, perhaps the most notable was Griffin's landmark paper "A Formulae-as-Types Notion of Control" which showed a connection between continuation passing style and classical logic. Here the connection between "continuation" and "Evaluation context" is emphasized
That is, E represents the rest of the computation that remains to be done after N is evaluated. The context E is called the continuation (or control context) of N at this point in the evalu- ation sequence. The notation of evaluation contexts allows, as we shall see below, a concise specification of the operational semantics of operators that ma- nipulate continuations (indeed, this was its intended use [3, 2, 4, 1]).
making clear that the "continuation" is "just the a -> r bit"
This all looks at things from the point of view of semantics and sees continuations as functions. The thing is, continuations as functions give you more power than you get with something like scheme's callCC. So, another perspective on continuations is that they are variables in the program which internalize the call stack. Parigot had the idea to make continuation variables a seperate syntactic category leading to the elegant lambda-mu calculus in "λμ-Calculus: An algorithmic interpretation of classical natural deduction."
Is the wording observable/observer about correct?
I think it is in so far as it is what Eric Mejier uses. It is non standard terminology in academic PLs.
Are the citations above really contradictory, is there a common truth?
Let us look at the citations again
a continuation is an abstract representation of the control state of a computer program.
In my interpretation (which I think is pretty standard) a continuation models what a program should do next. I think wikipedia is consistent with this.
A Haskell continuation has the following type:
This is a bit odd. But, note that later in the post Gabriel uses language which is more standard and supports my use of language.
That means that if we have a function with two continuations:
(a1 -> r) -> ((a2 -> r) -> r)
Fueled by reading about continuations via Andrzej Filinski's Declarative Continuations and Categorical Duality I adopt the following terminology and understanding.
A continuation on values of a is a "black hole which accepts values of a". You can see it as a black box with one operation—you feed it a value a and then the world ends. Locally at least.
Now let's assume we're in Haskell and I demand that you construct for me a function forall r . (a -> r) -> r. Let's say, for now, that a ~ Int and it'll look like
f :: forall r . (Int -> r) -> r
f cont = _
where the type hole has a context like
r :: Type
cont :: Int -> r
-----------------
_ :: r
Clearly, the only way we can comply with these demands is to pass an Int into the cont function and return it after which no further computation can happen. This models the idea of "feed an Int to the continuation and then the world ends".
So, I would call the function (a -> r) the continuation so long as it's in a context with a fixed-but-unknown r and a demand to return that r. For instance, the following is not so much of a continuation
forall r . (a -> r) -> (r, a)
as we're clearly allowed to pass back out more information from our failing universe than the continuation alone allows.
On "Observable"
I'm personally not a fan of the "observer"/"observable" terminology. In that terminology we might write
newtype Observable a = O { observe :: forall r . (a -> r) -> r }
so that we have observe :: Observable a -> (a -> r) -> r which ensures that exactly one a will be passed to an "observer" a -> r "observing" it. This gives a very operational view to the type above while Cont or even the scarily named Yoneda Identity explains much more declaratively what the type actually is.
I think the point is to somehow hide the complexity of Cont behind metaphor to make it less scary for "the average programmer", but that just adds an extra layer of metaphor for behavior to leak out of. Cont and Yoneda Identity explain exactly what the type is without dressing it up.
I suggest to recall the call convention for C on x86 platforms, because of its use of the stack and registers to pass the arguments around. This will turn out very useful to understand the abstraction.
Suppose, function f calls function g and passes 0 to it. This will look like so:
mov eax, 0
call g -- now eax is the first argument,
-- and the stack has the address of return point, f'
g: -- here goes g that uses eax to compute the return value
mov eax,1 -- which by calling convention is placed in eax
ret -- get the return point, f', off the stack, and jump there
f': ...
You see, placing the return point f' on the stack is the same as passing a function pointer as one of the arguments, and then the return is the same as calling the given function and pass it a value. So from g's point of view the return point to f looks like function of one argument, f' :: a -> r. As you understand, the state of the stack completely captures the state of the computation f was performing, and which needed a from g in order to proceed.
At the same time, at the point g is called it looks like a function that accepts a function of one argument (we place the pointer of that function on the stack), which will eventually compute the value of type r that the code from f': onwards was meant to compute, so the type becomes g :: (a->r)->r.
Since f' is given a value of type a from "somewhere", f' can be seen as the observer of g - which is, conversely, the observable.
This is only intended to give a basic idea and tie somehow to the world you probably already know. The magic of continuations permits to do more tricks than just convert "plain" computation into computation with continuations.
When we refer to a continuation, we mean the part that let us continue calculating a result.
An operation in the Continuation Monad is analogous to a function that is incomplete and so it is waiting on another function to complete it. Although, the Continuation Monad is itself a valid construct that can be used to complete another Continuation Monad, that is what the binding operator (>>=) for the Cont Monad does.
When writing code that involves callCC or Call with Current Continuation, you are passing the current Cont Monad into another Cont Monad so that the second one can make use of it. For example, it might prematurely end execution by calling the first Cont Monad, and from there the cycle can either repeat or diverge into a different Continuation Monad.
The part that is the continuation is different from which perspective you use. In my personal opinion, the best way to describe a continuation is in relation to another construct.
So if we return to our example of two Cont Monads interacting, from the perspective of the first Monad the continuation is the (a -> r) -> r (because that is the unwrapped type of the first Monad) and from the perspective of the second Monad the continuation is the (a -> r) (because that is the unwrapped type of the first monad when a is substituted for (a -> r)).

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.
As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.
If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

How can iterative deepening search implemented efficient in haskell?

I have an optimization problem I want to solve. You have some kind of data-structure:
data Foo =
{ fooA :: Int
, fooB :: Int
, fooC :: Int
, fooD :: Int
, fooE :: Int
}
and a rating function:
rateFoo :: myFoo -> Int
I have to optimize the result of rateFoo by changing the values in the struct. In this specific case, I decided to use iterative deepening search to solve the problem. The (infinite) search tree for the best optimization is created by another function, which simply applies all possible changes recursivly to the tree:
fooTree :: Foo -> Tree
My searching function looks something like this:
optimize :: Int -> Foo -> Foo
optimize threshold foo = undefined
The question I had, before I start is this:
As the tree can be generated by the data at each point, is it possible to have only the parts of the tree generated, which are currently needed by the algorithm? Is it possible to have the memory freed and the tree regenerated if needed in order to save memory (A leave at level n can be generated in O(n) and n remains small, but not small enough to have the whole tree in memory over time)?
Is this something I can excpect from the runtime? Can the runtime unevaluate expressions (turn an evaluated expression into an unevaluated one)? Or what is the dirty hack I have to do for this?
The runtime does not unevaluate expressions.
There's a straightforward way to get what you want however.
Consider a zipper-like structure for your tree. Each node holds a value and a thunk representing down, up, etc. When you move to the next node, you can either move normally (placing the previous node value in the corresponding slot) or forgetfully (placing an expression which evaluates to the previous node in the right slot). Then you have control over how much "history" you hang on to.
Here's my advice:
Just implement your algorithm in the
most straightforward way possible.
Profile.
Optimize for speed or memory use if necessary.
I very quickly learned that I'm not smart and/or experienced enough to reason about what GHC will do or how garbage collection will work. Sometimes things that I'm sure will be disastrously memory-inefficient work smoothly the first time around, and–less often–things that seem simple require lots of fussing with strictness annotations, etc.
The Real World Haskell chapter on profiling and optimization is incredibly helpful once you get to steps 2 and 3.
For example, here's a very simple implementation of IDDFS, where f expands children, p is the search predicate, and x is the starting point.
search :: (a -> [a]) -> (a -> Bool) -> a -> Bool
search f p x = any (\d -> searchTo f p d x) [1..]
where
searchTo f p d x
| d == 0 = False
| p x = True
| otherwise = any (searchTo f p $ d - 1) (f x)
I tested by searching for "abbaaaaaacccaaaaabbaaccc" with children x = [x ++ "a", x ++ "bb", x ++ "ccc"] as f. It seems reasonably fast and requires very little memory (linear with the depth, I think). Why not try something like this first and then move to a more complicated data structure if it isn't good enough?

Resources