I recently started using Haskell and it will probably be for a short while. Just being asked to use it to better understand functional programming for a class I am taking at Uni.
Now I have a slight problem I am currently facing with what I am trying to do. I want to build it breadth-first but I think I got my conditions messed up or my conditions are also just wrong.
So essentially if I give it
[“A1-Gate”, “North-Region”, “South-Region”, “Convention Center”, “Rectorate”, “Academic Building1”, “Academic Building2”] and [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2], my tree should come out like
But my test run results are haha not what I expected. So an extra sharp expert in Haskell could possibly help me spot what I am doing wrong.
Output:
*Main> l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center",
"Rectorate", "Academic Building1", "Academic Building2"]
*Main> l3 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
*Main> parkingtree = createBinaryParkingTree l1 l3
*Main> parkingtree
Node "North-Region" 0.5
(Node "A1-Gate" 0.0 EmptyTree EmptyTree)
(Node "Convention Center" 0.3
(Node "South-Region" 0.7 EmptyTree EmptyTree)
(Node "Academic Building2" 1.4
(Node "Academic Building1" 1.2 EmptyTree EmptyTree)
(Node "Rectorate" 0.6 EmptyTree EmptyTree)))
A-1 Gate should be the root but it ends up being a child with no children so pretty messed up conditions.
If I could get some guidance it would help. Below is what I've written so far::
data Tree = EmptyTree | Node [Char] Float Tree Tree deriving (Show,Eq,Ord)
insertElement location cost EmptyTree =
Node location cost EmptyTree EmptyTree
insertElement newlocation newcost (Node location cost left right) =
if (left == EmptyTree && right == EmptyTree)
then Node location cost (insertElement newlocation newcost EmptyTree)
right
else if (left == EmptyTree && right /= EmptyTree)
then Node location cost (insertElement newlocation newcost EmptyTree)
right
else if (left /= EmptyTree && right == EmptyTree)
then Node location cost left
(insertElement newlocation newcost EmptyTree)
else Node newlocation newcost EmptyTree
(Node location cost left right)
buildBPT [] = EmptyTree
--buildBPT (xs:[]) = insertElement (fst xs) (snd xs) (buildBPT [])
buildBPT (x:xs) = insertElement (fst x) (snd x) (buildBPT xs)
createBinaryParkingTree a b = buildBPT (zip a b)
Thank you for any guidance that might be provided. Yes I have looked at some of the similar questions I do think my problem is different but if you think a certain post has a clear answer that will help I am willing to go and take a look at it.
Here's a corecursive solution.
{-# bft(Xs,T) :- bft( Xs, [T|Q], Q). % if you don't read Prolog, see (*)
bft( [], Nodes , []) :- maplist( =(empty), Nodes).
bft( [X|Xs], [N|Nodes], [L,R|Q]) :- N = node(X,L,R),
bft( Xs, Nodes, Q).
#-}
data Tree a = Empty | Node a (Tree a) (Tree a) deriving Show
bft :: [a] -> Tree a
bft xs = head nodes -- Breadth First Tree
where
nodes = zipWith g (map Just xs ++ repeat Nothing) -- values and
-- Empty leaves...
(pairs $ tail nodes) -- branches...
g (Just x) (lt,rt) = Node x lt rt
g Nothing _ = Empty
pairs ~(a: ~(b:c)) = (a,b) : pairs c
{-
nodes!!0 = g (Just (xs!!0)) (nodes!!1, nodes!!2) .
nodes!!1 = g (Just (xs!!1)) (nodes!!3, nodes!!4) . .
nodes!!2 = g (Just (xs!!2)) (nodes!!5, nodes!!6) . . . .
................ .................
-}
nodes is the breadth-first enumeration of all the subtrees of the result tree. The tree itself is the top subtree, i.e., the first in this list. We create Nodes from each x in the input xs, and when the input
is exhausted we create Emptys by using an indefinite number of Nothings instead (the Empty leaves' true length is length xs + 1 but we don't need to care about that).
And we didn't have to count at all.
Testing:
> bft [1..4]
Node 1 (Node 2 (Node 4 Empty Empty) Empty) (Node 3 Empty Empty)
> bft [1..10]
Node 1
(Node 2
(Node 4
(Node 8 Empty Empty)
(Node 9 Empty Empty))
(Node 5
(Node 10 Empty Empty)
Empty))
(Node 3
(Node 6 Empty Empty)
(Node 7 Empty Empty))
How does it work: the key is g's laziness, that it doesn't force lt's nor rt's value, while the tuple structure is readily served by -- very lazy in its own right -- pairs. So both are just like the not-yet-set variables in that Prolog pseudocode(*), when served as 2nd and 3rd arguments to g. But then, for the next x in xs, the node referred to by this lt becomes the next invocation of g's result.
And then it's rt's turn, etc. And when xs end, and we hit the Nothings, g stops pulling the values from pairs's output altogether. So pairs stops advancing on the nodes too, which is thus never finished though it's defined as an unending stream of Emptys past that point, just to be on the safe side.
(*) Prolog's variables are explicitly set-once: they are allowed to be in a not-yet-assigned state. Haskell's (x:xs) is Prolog's [X | Xs].
The pseudocode: maintain a queue; enqueue "unassigned pointer"; for each x in xs: { set pointer in current head of the queue to Node(x, lt, rt) where lt, rt are unassigned pointers; enqueue lt; enqueue rt; pop queue }; set all pointers remaining in queue to Empty; find resulting tree in the original head of the queue, i.e. the original first "unassigned pointer" (or "empty box" instead of "unassigned pointer" is another option).
This Prolog's "queue" is of course fully persistent: "popping" does not mutate any data structure and doesn't change any outstanding references to the queue's former head -- it just advances the current pointer into the queue. So what's left in the wake of all this queuing, is the bfs-enumeration of the built tree's nodes, with the tree itself its head element -- the tree is its top node, with the two children fully instantiated to the bottom leaves by the time the enumeration is done.
Update: #dfeuer came up with much simplified version of it which is much closer to the Prolog original (that one in the comment at the top of the post), that can be much clearer. Look for more efficient code and discussion and stuff in his post. Using the simple [] instead of dfeuer's use of the more efficient infinite stream type data IS a = a :+ IS a for the sub-trees queue, it becomes
bftree :: [a] -> Tree a
bftree xs = t
where
t : q = go xs q
go [] _ = repeat Empty
go (x:ys) ~(l : ~(r : q)) = Node x l r : go ys q
---READ-- ----READ---- ---WRITE---
{-
xs = [ x x2 x3 x4 x5 x6 x7 x8 … ]
(t:q) = [ t l r ll lr rl rr llr … Empty Empty … … ]
-}
For comparison, the opposite operation of breadth-first enumeration of a tree is
bflist :: Tree a -> [a]
bflist t = [x | Node x _ _ <- q]
where
q = t : go 1 q
go 0 _ = []
go i (Empty : q) = go (i-1) q
go i (Node _ l r : q) = l : r : go (i+1) q
-----READ------ --WRITE--
How does bftree work: t : q is the list of the tree's sub-trees in breadth-first order. A particular invocation of go (x:ys) uses l and r before they are defined by subsequent invocations of go, either with another x further down the ys, or by go [] which always returns Empty. The result t is the very first in this list, the topmost node of the tree, i.e. the tree itself.
This list of tree nodes is created by the recursive invocations of go at the same speed with which the input list of values xs is consumed, but is consumed as the input to go at twice that speed, because each node has two child nodes.
These extra nodes thus must also be defined, as Empty leaves. We don't care how many are needed and simply create an infinite list of them to fulfill any need, although the actual number of empty leaves will be one more than there were xs.
This is actually the same scheme as used in computer science for decades for array-backed trees where tree nodes are placed in breadth-first order in a linear array. Curiously, in such setting both conversions are a no-op -- only our interpretation of the same data is what's changing, our handling of it, how are we interacting with / using it.
Update: the below solution is big-O optimal and (I think) pretty easy to understand, so I'm leaving it here in case anyone's interested. However, Will Ness's solution is much more beautiful and, especially when optimized a bit, can be expected to perform better in practice. It is much more worthy of study!
I'm going to ignore the fake edge labels for now and just focus on the core of what's happening.
A common pattern in algorithm design is that it's sometimes easier to solve a more general problem. So instead of trying to build a tree, I'm going to look at how to build a forest (a list of trees) with a given number of trees. I'll make the node labels polymorphic to avoid having to think about what they look like; you can of course use the same building technique with your original tree type.
data Tree a = Empty | Node a (Tree a) (Tree a)
-- Built a tree from a breadth-first list
bft :: [a] -> Tree a
bft xs = case dff 1 xs of
[] -> Empty
[t] -> t
_ -> error "something went wrong"
-- Build a forest of nonempty trees.
-- The given number indicates the (maximum)
-- number of trees to build.
bff :: Int -> [a] -> [Tree a]
bff _ [] = []
bff n xs = case splitAt n xs of
(front, rear) -> combine front (bff (2 * n) rear)
where
combine :: [a] -> [Tree a] -> [Tree a]
-- you write this
Here's a full, industrial-strength, maximally lazy implementation. This is the most efficient version I've been able to come up with that's as lazy as possible. A slight variant is less lazy but still works for fully-defined infinite inputs; I haven't tried to test which would be faster in practice.
bft' :: [a] -> Tree a
bft' xs = case bff 1 xs of
[] -> Empty
[t] -> t
_ -> error "whoops"
bff' :: Int -> [a] -> [Tree a]
bff' !_ [] = []
bff' n xs = combine n xs (bff (2 * n) (drop n xs))
where
-- The "take" portion of the splitAt in the original
-- bff is integrated into this version of combine. That
-- lets us avoid allocating an intermediate list we don't
-- really need.
combine :: Int -> [a] -> [Tree a] -> [Tree a]
combine 0 !_ ~[] = [] -- These two lazy patterns are just documentation
combine _k [] ~[] = []
combine k (y : ys) ts = Node y l r : combine (k - 1) ys dropped
where
(l, ~(r, dropped)) = case ts of -- This lazy pattern matters.
[] -> (Empty, (Empty, []))
t1 : ts' -> (t1, case ts' of
[] -> (Empty, [])
t2 : ts'' -> (t2, ts''))
For the less-lazy variant, replace (!l, ~(!r, dropped)) with (!l, !r, dropped) and adjust the RHS accordingly.
For true industrial strength, forests should be represented using lists strict in their elements:
data SL a = Cons !a (SL a) | Nil
And the pairs in the above (l, ~(r, dropped)) should both be represented using a type like
data LSP a b = LSP !a b
This should avoid some (pretty cheap) run-time checks. More importantly, it makes it easier to see where things are and aren't getting forced.
The method that you appear to have chosen is to build the tree up backwards: from bottom-to-top, right-to-left; starting from the last element of your list. This makes your buildBPT function look nice, but requires your insertElement to be overly complex. To construct a binary tree in a breadth-first fashion this way would require some difficult pivots at every step past the first three.
Adding 8 nodes to the tree would require the following steps (see how the nodes are inserted from last to first):
. 4
6 6
8 7 8 . .
. .
3
7 4 5
8 . 6 7 8 .
6 2
7 8 3 4
5 6 7 8
5
6 7 1
8 . . . 2 3
4 5 6 7
8 . . . . . . .
If, instead, you insert the nodes left-to-right, top-to-bottom, you end up with a much simpler solution, requiring no pivoting, but instead some tree structure introspection. See the insertion order; at all times, the existing values remain where they were:
. 1
2 3
1 4 5 . .
. .
1
1 2 3
2 . 4 5 6 .
1 1
2 3 2 3
4 5 6 7
1
2 3 1
4 . . . 2 3
4 5 6 7
8 . . . . . . .
The insertion step has an asymptotic time complexity on the order of O(n^2) where n is the number of nodes to insert, as you are inserting the nodes one-by-one, and then iterating the nodes already present in the tree.
As we insert left-to-right, the trick is to check whether the left sub-tree is complete:
if it is, and the right sub-tree is not complete, then recurse to the right.
if it is, and the right sub-tree is also complete, then recurse to the left (starting a new row).
if it is not, then recurse to the left.
Here is my (more generic) solution:
data Tree a = Leaf | Node a (Tree a) (Tree a)
deriving (Eq, Show)
main = do
let l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center",
"Rectorate", "Academic Building1", "Academic Building2"]
let l2 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
print $ treeFromList $ zip l1 l2
mkNode :: a -> Tree a
mkNode x = Node x Leaf Leaf
insertValue :: Tree a -> a -> Tree a
insertValue Leaf y = mkNode y
insertValue (Node x left right) y
| isComplete left && nodeCount left /= nodeCount right = Node x left (insertValue right y)
| otherwise = Node x (insertValue left y) right
where nodeCount Leaf = 0
nodeCount (Node _ left right) = 1 + nodeCount left + nodeCount right
depth Leaf = 0
depth (Node _ left right) = 1 + max (depth left) (depth right)
isComplete n = nodeCount n == 2 ^ (depth n) - 1
treeFromList :: (Show a) => [a] -> Tree a
treeFromList = foldl insertValue Leaf
EDIT: more detailed explanation:
The idea is to remember in what order you insert nodes: left-to-right first, then top-to-bottom. I compressed the different cases in the actual function, but you can expand them into three:
Is the left side complete? If not, then insert to the left side.
Is the right side as complete as the left side, which is complete? If not, then insert to the right side.
Both sides are full, so we start a new level by inserting to the left side.
Because the function fills the nodes up from left-to-right and top-to-bottom, then we always know (it's an invariant) that the left side must fill up before the right side, and that the left side can never be more than one level deeper than the right side (nor can it be shallower than the right side).
By following the growth of the second set of example trees, you can see how the values are inserted following this invariant. This is enough to describe the process recursively, so it extrapolates to a list of any size (the recursion is the magic).
Now, how do we determine whether a tree is 'complete'? Well, it is complete if it is perfectly balanced, or if – visually – its values form a triangle. As we are working with binary trees, then the base of the triangle (when filled) must have a number of values equal to a power of two. More specifically, it must have 2^(depth-1) values. Count for yourself in the examples:
depth = 1 -> base = 1: 2^(1-1) = 1
depth = 2 -> base = 2: 2^(2-1) = 2
depth = 3 -> base = 4: 2^(3-1) = 4
depth = 4 -> base = 8: 2^(4-1) = 8
The total number of nodes above the base is one less than the width of the base: 2^(n-1) - 1. The total number of nodes in the complete tree is therefore the number of nodes above the base, plus those of the base, so:
num nodes in complete tree = 2^(depth-1) - 1 + 2^(depth-1)
= 2 × 2^(depth-1) - 1
= 2^depth - 1
So now we can say that a tree is complete if it has exactly 2^depth - 1 non-empty nodes in it.
Because we go left-to-right, top-to-bottom, when the left side is complete, we move to the right, and when the right side is just as complete as the left side (meaning that it has the same number of nodes, which is means that it is also complete because of the invariant), then we know that the whole tree is complete, and therefore a new row must be added.
I originally had three special cases in there: when both nodes are empty, when the left node is empty (and therefore so was the right) and when the right node is empty (and therefore the left could not be). These three special cases are superseded by the final case with the guards:
If both sides are empty, then countNodes left == countNodes right, so therefore we add another row (to the left).
If the left side is empty, then both sides are empty (see previous point).
If the right side is empty, then the left side must have depth 1 and node count 1, meaning that it is complete, and 1 /= 0, so we add to the right side.
Did I implement inorder level-order tree transversal using tail-recursion correctly?
inorder (Leaf n) temp = n:temp
inorder (Node (n, left, right)) temp = inorder left (n:inorder right temp)
inorder :: Tree a -> [a] -> [a]
Tree is declared as
data Tree a = Leaf a | Node (a, Tree a, Tree a) deriving Show
and returns
[2,1,3] on call inorder three [] where three = Node (1, Leaf 2, Leaf 3)
This technically isn't tail recursive because you have a recursive call inorder right temp in a nontail position. One way to fix this would be with continuations. You write a function which takes an accumulator like before, but rather than the accumulator being just a list it's actually a function representing the work left to do in the computation. This means that instead of making a non-tail call and just returning, we can always tail call because the context we need is saved to the continuation.
inorder = go id
where go :: ([a] -> r) -> Tree a -> r
go k Leaf = k []
go k (Node a l r) = go l (\ls -> go r (\rs -> k $ ls ++ n : rs))
Here every call is a tail call as required but it's quite innefficient because it requires a ++ operation at every level, pushing us into quadratic costs. A more efficient algorithm would avoid building up an explicit list and instead build up a difference list, delaying the construction on the concrete structure and giving a more efficient algorithm
type Diff a = [a] -> [a] -- A difference list is just a function
nil :: Diff a
nil xs = xs
cons :: a -> Diff a -> Diff a
cons a d = (:) a . d
append :: Diff a -> Diff a -> Diff a
append xs ys = xs . ys
toList :: Diff a -> a
toList xs = xs []
Note that all of these operations are O(1) except for toList which is O(n) in the number of entries. The important point here is that diff lists are cheap and easy to append so we'll construct these in our algorithm and construct the concrete list at the very end
inorder = go toList
where go :: (Diff a -> r) -> Tree a -> r
go k Leaf = k nil
go k (Node a l r) =
go l (\ls -> go r (\rs -> k $ ls `append` cons n rs))
And now, through gratuitous application of functions we've gotten a completely unidiomatic Haskell program. You see in Haskell we don't really care about tail calls because we generally want to handle infinite structures correctly and that's not really possible if we demand everything be tail recursive. In fact, I would say that while not tail recursive, the code you originally had is the most idiomatic, that's even how it's implemented in Data.Set! It has the property that we can lazily consume the result of that toList and it will work with us and lazily process the tree. So in your implementation, something like
min :: Tree a -> a
min = listToMaybe . toList
is going to be pretty darn close to how you would implement it by hand efficiency wise! It will not construct traverse the whole tree first like my version will have to. These sort of compositional effects of laziness pay more dividends in real Haskell code than syntactically making our code use only tail calls (which does nothing to actually guarantee space usage anyways).
I am given the following tree definition and functions:
data Tree a = Node a [Tree a] deriving (Eq, Read, Show)
treeRoot :: Tree a -> a
treeRoot (Node a _) = a
treeSubtrees :: Tree a -> [Tree a]
treeSubtrees (Node _ subtrees) = subtrees
I need to create a function 'nodesAtLevel' that takes two arguments: a tree and an integer n >= 0. The function needs to return a list of all the nodes at the given level n. The root of the tree is level 0. I am very new to Haskell, and yes, this is part of a homework assignment, so if you could help walk me through the thought process of completing this, it would be greatly appreciated! I currently have the definition of the 'nodesAtLevel' function written as follows:
nodesAtLevel :: Int -> Tree a -> [a]
Ok, let's to this step by step (leaving you with some holes for now):
nodes at level 0
as I understand it you are suppost to only return the roots value here (of course in a list)
so what goes in the ... here?
nodesAtLevel 0 (Node a subtrees) = ...
nodes at deeper levels
well the structure is nicely recursive so we probably want to use recursion.
So let's go recursively deeper one level deeper (decreasing the way we have left to go - aka our n)
But there is a slight problem: nodesAtLevel acts on a single node but we only have a list of nodes left (the subtrees) .. so what to do?
Well maybe we should do something for each of the subtrees/subnodes - so we need to find the right function f here (hint you probably want to map and concat the results ...):
nodesAtLevel n (Node a subtrees) = f (nodesAtLevel (n-1)) subtrees
what can that f be? Can you guess it? Maybe you can write it yourself if not?
Additional Hint:
maybe you should first find out what type f has - because then you might be even able to use Hoogle to find a good fit.
If you use a newer GHC version you an let the compiler do the work if you write
nodesAtLevel n (Node _ subtrees) = _f (nodesAtLevel (n-1)) subtrees
where _f is a hole - now if you load this in GHCi it will give you a
Found hole ‘_f’ with type: ....
error with lot's of additional information ;)
so just find ... and f and this should work ;)
BTW you can use the functions you already have instead of pattern matching again as I did, but I feel it's easier to see the result this way - so if you follow this path you don't need to use treeSubtrees or treeRoot
solution
... = [a]
f = concatMap
so one possible way to do it is this:
nodesAtLevel :: Int -> Tree a -> [a]
nodesAtLevel 0 (Node a _) = [a]
nodesAtLevel n (Node _ subtrees) = concatMap (nodesAtLevel (n-1)) subtrees
I have this data definition for a tree:
data Tree = Leaf Int | Node Tree Int Tree
and I have to make a function, nSatisfy, to check how many items of the tree check some predicate.
Here's what I've done:
nSatisfy :: (Int->Bool) -> Tree -> Int
nSatisfy _ Leaf = 0
nSatisfy y (Node left x right)
|y x = 1 + nSatisfy y (Node left x right)
| otherwise = nSatisfy y (Node left x right)
Is this the right way to solve this problem?
In your nSatisfy function, you should add the number of nodes satisfying the condition in both subtrees with two recursive calls. The last two lines should be like this:
|x y=1+(nSatisfy y left)+(nSatisfy y right)
|otherwise=(nSatisfy y left)+(nSatisfy y right)
This way, it will call itself again on the same node but only on the subtrees.
Also, if a leaf contains an integer, as is implied in the data declaration, you should make it evaluate the condition for a leaf and return 1 if it is true, instead of always returning 0.
In addition to the main answer, I'd like to offer a slightly different way how to generalize your problem and solving it using existing libraries.
The operation you're seeking is common to many data structures - to go through all elements and perform some operation on them. Haskell defines Foldable type-class, which can be implemented by structures like yours.
First let's import some modules we'll need:
import Data.Foldable
import Data.Monoid
In order to use Foldable, we need to generalize the structure a bit, in particular parametrize its content:
data Tree a = Leaf a | Node (Tree a) a (Tree a)
In many cases this is a good idea as it separates the structure from its content and allows it to be easily reused.
Now let's define its Foldable instance. For tree-like structures it's easier to define it using foldMap, which maps each element into a monoid and then combines all values:
instance Foldable Tree where
foldMap f (Leaf x) = f x
foldMap f (Node lt x rt) = foldMap f lt <> f x <> foldMap f rt
This immediately gives us the whole library of functions in the Data.Foldable module, such as searching for an element, different kinds of folds, etc. While a function counting the number of values satisfying some predicate isn't defined there, we can easily define it for any Foldable. The idea is that we'll use the Sum:
nSatisfy :: (Foldable f) => (a -> Bool) -> f a -> Int
nSatisfy p = getSum . foldMap (\x -> Sum $ if p x then 1 else 0)
The idea behind this function is simple: Map each value to 1 if it satisfies the predicate, otherwise to 0. And then folding with the Sum monoid just adds all values up.
This submission to Programming Praxis gives an O(n) function that "undoes" a preorder traversal of a binary search tree, converting a list back into a tree. Supplying the missing data declaration:
data Tree a = Leaf | Branch {value::a, left::Tree a, right:: Tree a}
deriving (Eq, Show)
fromPreOrder :: Ord a => [a] -> Tree a
fromPreOrder [] = Leaf
fromPreOrder (a:as) = Branch a l (fromPreOrder bs)
where
(l,bs) = lessThan a as
lessThan n [] = (Leaf,[])
lessThan n all#(a:as)
| a >= n = (Leaf,all)
| otherwise = (Branch a l r,cs)
where (l,bs) = lessThan a as
(r,cs) = lessThan n bs
It's obvious that one constructor is added to the tree in each recursive step, which is key to its efficiency.
The only "problem" is that the list is threaded through the computation manually, which is not a terribly Haskellian way to do it and makes it a little harder to see that it is actually consumed element by element in a single-threaded manner.
I attempted to correct this using a state monad (prettified on Codepad):
import Control.Monad.State
data Tree a = Leaf
| Branch {root::a, left::Tree a, right::Tree a}
deriving (Eq,Show)
peek = State peek' where
peek' [] = (Nothing,[])
peek' a#(x:_) = (Just x,a)
pop = State pop' where
pop' [] = error "Tried to read past the end of the list"
pop' (_:xs) = ((),xs)
prebuild'::Ord a => State [a] (Tree a)
prebuild' = do
next <- peek
case next of
Nothing -> return Leaf
Just x -> do
pop
leftpart <- lessThan x
rightpart <- prebuild'
return (Branch x leftpart rightpart)
lessThan n = do
next <- peek
case next of
Nothing -> return Leaf
Just x -> do
if x < n
then do
pop
leftpart <- lessThan x
rightpart <- lessThan n
return (Branch x leftpart rightpart)
else
return Leaf
prebuild::Ord a => [a] -> Tree a
prebuild = evalState prebuild'
Unfortunately, this just looks obscenely messy, and doesn't seem any easier to reason about.
One thought I haven't been able to get anywhere with yet (in part because I don't have a deep enough understanding of the underlying concepts, quite likely): could I use a left fold over the list that builds a continuation that ultimately produces the tree? Would that be possible? Also, would it be anything short of insane?
Another thought was to write this as a tree unfold, but I don't think it's possible to do that efficiently; the list will end up being traversed too many times and the program will be O(n^2).
Edit
Taking things from another direction, I have the nagging suspicion that it might be possible to come up with an algorithm that starts by splitting up the list into increasing segments and decreasing segments, but I haven't yet found something concrete to do with that idea.
I think the problem you're having with State is that your primitives (push, pop, peek) are not the right ones. I think a better one would be something like available_, which checks if the front of the stack matches a particular condition, and executes something different in each case:
available_ p f m = do
s <- get
case s of
x:xs | p x -> put xs >> f x
_ -> m
Actually, in our use case, we can specialize a bit: we will always want to return a Leaf when the head of our stack doesn't satisfy the condition, and we'll always want to recurse when it does.
available p m = available_ p
(\x -> liftM2 (Branch x) (lessThan' x) m)
(return Leaf)
(You could also just write available to begin with and skip available_ entirely. In my first iteration, that is what I did.) Now writing fromPreOrder and lessThan are a snap, and also I think give some insight into their behavior. I'll name them with primes so we can double-check they do the right thing with QuickCheck.
fromPreOrder' = available (const True) fromPreOrder'
lessThan' n = available (<n) (lessThan' n)
And in ghci:
> quickCheck (\xs -> fromPreOrder (xs :: [Int]) == evalState fromPreOrder' xs)
+++ OK, passed 100 tests.
While I can't answer the question about continuation passing, I believe that the State monad based implementation can be written much more clearly. First, we can use notational convenience such as those from Control.Applicative to make it easier to read. Second, we can upgrade the effect stack to include Maybe in order to capture the notion of failure (a) from taking the head of an empty list and (b) from the (a >= n) comparison as an effect.
import Control.Monad.State
import Control.Applicative
The final code uses the backtracking-state monad transformer stack. This means that we wrap State around Maybe instead of Maybe around State. In some sense we can think of this as meaning that failure is the "primary" effect. In practice it means that if the algorithm fails there's no way to continue using potentially bad state and so it must backtrack to the last known good state.
type Preord a b = StateT [a] Maybe b
Since we keep taking the head of a list and want to capture that failure correctly, we'll use a "safe head" function (which is the natural destructor of a list anyway, despite not being in the base Haskell libraries)
-- Safe list destructor
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (a:as) = Just (a, as)
If we look at it cleverly we'll notice that this is already exactly the form of our monadic computation (StateT [a] Maybe b is isomorphic to [a] -> Maybe (b, [a])). We'll give it a more evocative name when lifted into the Monad.
-- Try to get the head or fail
getHead :: Preord a a
getHead = StateT uncons
A common feature of this algorithm is stopping local failures by providing a default value. I'll capture this in the certain combinator
-- Provides a default value for a failing computation
certain :: b -> Preord a b -> Preord a b
certain def p = p <|> return def
And now we can write the final algorithm very cleanly in our Preord monad.
fromPreOrder :: Ord a => Preord a (Tree a)
fromPreOrder = certain Leaf $ do
a <- getHead
Branch a <$> lessThan a <*> fromPreOrder
lessThan :: Ord a => a -> Preord a (Tree a)
lessThan n = certain Leaf $ do
a <- getHead
guard (a < n)
Branch a <$> lessThan a <*> lessThan n
Note that Applicative style helps to indicate that we're building the components of the Branch constructor using further effectful (state consuming) computations. The guard short-circuits lessThan when the pivot is already the least element in the pre-order traversal. We also explicitly see how both fromPreOrder and lessThan default out to Leaf when they cannot compute a better result.
(Also note that fromPreOrder and lessThan are nearly identical now, a commonality Daniel Wagner exploited in his own answer when writing available.)
We finally would want to hide all the monadic noise since, to an outside user, this is just a pure algorithm.
rebuildTree :: [a] -> Tree a
rebuildTree = fromMaybe Leaf . runStateT fromPreOrder
For a complete picture, here's the implementation of the algorithm using only the State monad. Note all the extra noise for handling failure! We've absorbed the entire popElse function into the effects of the backtracking state monad. We also lift the if up into the failure effect. Without that effect stack, our combinators are terrifically specific to the application instead of decomplected and useful elsewhere.
-- Try to take the head of the state list and return the default
-- if that's not possible.
popElse :: b -> (a -> State [a] b) -> State [a] b
popElse def go = do
x <- get
case x of
[] -> return def
(a:as) -> put as >> go a
push :: a -> State [a] ()
push a = modify (a:)
fromPreOrder :: Ord a => State [a] (Tree a)
fromPreOrder = popElse Leaf $ \a -> Branch a <$> lessThan a <*> fromPreOrder
lessThan :: Ord a => a -> State [a] (Tree a)
lessThan n =
popElse Leaf $ \a ->
if a >= n
then push a >> return Leaf
else Branch a <$> lessThan a <*> lessThan n
As you've said, the state monad doesn't really improve the situation, and I don't think it can be expected to, as it's both much too general in that it allows arbitrary access to the state, and annoying in that it enforces unnecessary sequencing.
At first glance, this looks quite like a foldr : we do one thing for the empty case, and in the (:) case we take the head off and make a recursive call based on the tail. However, as the recursive call isn't just using the tail directly, it isn't quite a foldr.
We could express it as a paramorphism but I don't think that really adds anything to the readability.
What I did notice is that the complicated recursion on the tail is all based on lessThan, which led me to the following idea for breaking down the algorithm:
lessThans [] = []
lessThans (a:as) = (a, l) : lessThans bs
where (l, bs) = lessThan a as
fromPreOrder2 :: Ord a => [a] -> Tree a
fromPreOrder2 = foldr (\(a, l) r -> Branch a l r) Leaf . lessThans
I'm sure lessThans could have a better name but I'm not quite sure what!
The foldr can also be expressed as foldr (uncurry Branch) Leaf but I'm not sure if that's an improvement.
EDIT: also, lessThans is an unfoldr, leading to this version:
fromPreOrder3 :: Ord a => [a] -> Tree a
fromPreOrder3 = foldr (uncurry Branch) Leaf . unfoldr lessThanList
lessThanList [] = Nothing
lessThanList (a:as) = Just ((a, l), bs)
where (l, bs) = lessThan a as