Find the max value of node in a tree - haskell

I have a problem.
I have to implement a function maxT in Haskell which returns the maximum value of a node from a binary tree.
data Tree a = Leaf a | Node (Tree a) a (Tree a)
This is given. What should I do next?
maxT :: (Tree Integer) -> Integer
maxT (Leaf a) = a
maxT (Node l a r) = max a (max (maxT l) (maxT r))
Is this right?

Let's see how hard this is to prove correct. Why? Because it's a great way to analyze programs for errors. Especially recursive ones. We'll technically use induction, but it isn't so complex. The key is to realize that maxT t must always be the largest value in the tree t—this declaration, "maxT t must always be the largest value in the tree t" is called an invariant and we'll try to prove it.
First, let's assume t is a Leaf. In this case, you've defined maxT (Leaf a) = a and since there are literally no other values in this tree, a must be the largest. Thus, maxT upholds our invariant when passed a Leaf. This is the "base case".
Now we'll consider what happens when we let t = Node (Leaf a) b (Leaf c) for some Integers a, b, and c. This is a height-1 tree and forms what you might call an "example case" for induction. Let's try out maxT and see if the invariant holds.
maxT t
===
maxT (Node (Leaf a) b (Leaf c))
===
max b (max (maxT (Leaf a)) (maxT (Leaf c)))
at this point we'll use our base-case step and say that since the only applications of maxT in this expression are on Leafs then each one must uphold our invariant. This is kind of dumb, but that's because it's just an example case. We'll see the more general pattern later.
For now, let's evaluate our maxT (Leaf _) bits knowing that the result is the maximal value in each particular left- or right-subtree.
===
max b (max a c)
Now, I don't much want to dive into the definition of max, but based on its name I'm happy to assume that max a b returns value that is maximal between a and b. We could pick our way through the details here, but it's clear that max b (max a c) has been given all the relevant information about our Node for computing the maximum of the entire height-1 tree. I'd call this a successful proof that maxT works for both height-0 and height-1 trees (Leafs and Nodes containing only Leafs).
The next step is to generalize this example case.
So let's apply that same pattern generalizing on the height of the tree. We'll ask what happens if we fix some number, n, and assume that maxT t upholds our invariant for all t of height n or less. This is a little bizarre—we have only shown this works for n = 0 and n = 1. It'll be clear why this works a little later.
So what does that assumption do for us? Well, let's take any two Trees of height n or less (call them l and r), any integer x, and combine them to form a new tree t = Node x l r. What happens when we do maxT t?
maxT t
===
maxT (Node x l r)
===
max x (max (maxT l) (maxT r))
and we know, per our assumption, that maxT l and maxT r uphold our invariant. Then the chain of maxes continues to uphold our invariant now for a tree t that's height-(n+1). Furthermore (and this is really important) our process of assembling new Trees is general—we can make any height-(n+1) tree in this method. This means that maxT works for any height-(n+1) tree.
Induction time! We now know that if we pick an n and believe (for some reason) that maxT works for any height-n tree, then it immediately must work for any height-(n+1) tree. Let's pick n = 0. We know by the "base case" that maxT works for Leafs, so suddenly we know that maxT works for height-1 trees. This was our "example case". Now, given that knowledge, we can immediately see maxT works for height-2 trees. And then height-3 trees. And then height-4. And so on and on and on.
This completes a proof* that maxT is correct.
*I have to leave a few caveats. We didn't really do the fiddly details to show that the max chains work out, though it makes sense. I also didn't really prove that the induction step works—what if there were more ways to make a height-(n+1) tree than just using Node on height-n or lesser trees? The more powerful way is to "break apart" a height-n tree, but that's a little harder to see, I think. Finally, we would want to really think hard about what happens if we send in maxT (Leaf undefined) or other pathological values like that. These arise in Haskell because it's a (turing-complete) computer language instead of pure math. Honestly, these little bits don't change a whole lot for your situation, though.

Related

Searching a Value in Binary tree haskell

I have just started learning Haskell and I am trying to write a code for searching for a particular value in a binary tree and if present return true else false
This is how my tree structure looks like
data Tree = Leaf Int | Node Tree Int Tree
I am not sure how to proceed with the function to traverse through the tree and return the value. I did try BFS and DFS but I am not sure on how to return once I have got my value.
An example of how my function should look
Search 5 (Node (Node (Leaf 1) 3 (Leaf 4)) 5 (Node (Leaf 6) 7 (Leaf 9)))
A binary search could be written as follows. The type can be more generic, as we only need the items to be orderable to store / search in a binary tree.
We visit each node and either return true, or search in 1 of the child nodes.
example Node
5
/ \
3 7
lets search for 7.
We first visit the root. since 5 != 7, we test a child node. Since 7 > 5, we search in the right node, since 7 cannot appear in the left child (all values guaranteed to be lower than 5 on the left child)
If we reach a leaf, we just check if it contains the search term.
search :: Ord a => a -> BinaryTree a -> Bool
search a (Leaf b) = compare a b == EQ
search a (Node left b right)
case compare a b of
EQ -> True
LT -> search a left
GT -> search a right
I am not sure how to proceed with the function to traverse through the tree and return the value.
From that sentence, I understand you would have no problem writing a traversal yourself, but that there is a mental leap you need to take to understand how Haskell works.
You see, you never return anything in Haskell. Returning is fundamentally an imperative statement. Haskell is a declarative language, which means that writing programs is done by stating facts. That nuance can be discomforting, especially if you've been introduced to programming through learning imperative languages like C, Java, JavaScript, etc. But once you truly understand it, you will see how much more expressive and easy declarative programming is.
Because of its strong mathematical roots, in Haskell facts are stated in the form of equations, i.e. expressions where the = sign literally means the left- and right-hand side are equal (whereas in an imperative language, it would probably mean that you assign a value to a variable -- that does not exist in Haskell).
The program #Haleemur Ali wrote is in 1:1 correspondence with how you would write search using math notation:
search(x, t) = { x == y if t = Leaf y
, true if t = Node l y r and x == y
, search(x, l) if t = Node l y r and x < y
, search(x, r) if t = Node l y r and x > y
}
Indeed many times, at least as a beginner, writing Haskell is just a matter of translation, from math notation to Haskell notation. Another interpretation of Haskell programs is as proofs of theorems. Your search is a theorem saying that "if you have a tree and an integer, you can always tell if the integer is somewhere inside the tree". That's what you are telling the compiler when you write a function signature:
search :: Int -> Tree -> Bool
The compiler will only be happy if you write a proof for that theorem ... you probably guessed that the algorithm above is the proof.
An interesting observation is that the algorithm is almost dictated by the shape of the data type. Imagine you wanted to sum all the values in a tree instead:
sum(t) = { x if t = Leaf x
, x + sum(l) + sum(r) if t = Node l x r
}
Every time you want to write an algorithm over a binary tree, you will write something like the above. That is fairly mechanical and repetitive. What if later on you expand your program to deal with rose trees? Tries? You don't want to write the same algorithms and take the risk of making a mistake. One would try to come up with a function that walks down a tree and combines its values (using Haskell notation from now on):
walk :: (Int -> b) -> (b -> b -> b) -> Tree -> b
walk f g (Leaf x) = f x
walk f g (Node l x r) =
let a = walk f g l
b = walk f g r
in g (g (f x) a) b
With this function alone, you can write all manners of traversals on trees:
sum t = walk id (+) t
search x t = walk (== x) (||) t
walk is such a recurring pattern that it has been abstracted. All the data structures that expose the same pattern of recursion are said to be foldable, and the implementation is often so obvious that you can ask the compiler to write it for you, like so:
{-# LANGUAGE DeriveFoldable #-}
data Tree a = Leaf a | Node (Tree a) a (Tree a) deriving (Foldable)
There's even a definition of sum for any foldable data structure.

number of leaves is one greater than number of nodes in Haskell

I have to prove that given a binary tree, the number of leaves is equal to the number of nodes plus one using induction in Haskell.
Given the following type called Tree:
data Tree = Leaf Int | Node Tree Tree
I defined two functions called leaves and nodes which return the number of leaves and nodes respectively:
With induction, I know that I need to prove the base case which is when the number of nodes is 0 and for the induction step, I need to use the induction hypothesis. But that's throwing me off here is that there are two functions and I don't really know to proceed.In the base case, am I supposed to show that if the number of nodes is 0, the number of leaves is 1 or?
The tractable way to do this "by induction" is not using induction on natural numbers but rather using structural induction. The proof breaks down like this:
Base case
The base case is for Leaf x, where x is an Int. So you have to prove that for any x
leaves (Leaf x) = 1 + nodes (Leaf x)
Inductive step
In the inductive step, you assume two inductive hypotheses:
leaves t = 1 + nodes t
leaves u = 1 + nodes u
to prove that
leaves (Node t u) = 1 + nodes (Node t u)
I'll let you fill in the actual proofs.
Side note:
Structural induction is a generalization of induction on natural numbers. In particular, you can define the natural numbers as
data Nat = Z | S Nat
You can now do induction with a base case of p Z, and an inductive step that assumes p n and proves p (S n).
Structural induction can itself be generalized further, to well-founded induction, which is the most general mathematical notion of induction of which I am aware. Note that the Wikipedia page is based on a classical notion of well-foundedness; nLab gives a constructive version that is more tightly tied to well-founded induction.

Haskell binary tree max int?

I'm trying to write a haskell function that will return the max int inside a binary tree of integers. My binary tree is defined as follows:
data Tree = Node Int Tree Tree | Leaf Int
deriving (Eq,Show)
The way I understand it this declaration is saying that for the 'Tree' data type, it can either be a single leaf int, or be a subtree containing two more trees.
So my maxInt function will look something like this ( I think )
maxInt :: Tree -> Int --maxInt function receives Tree, returns int
maxInt --something to detect if the Tree received is empty
--if only one node, return that int
--look through all nodes, find largest
and so when my function is given something like
maxInt (Node 5 (Leaf 7) (Leaf 2)) , the correct value for maxInt to return would be 7.
I'm new to haskell and don't really know where to start with this problem, I would really appreciate some guidance. Thank you
Let me start it for you:
maxInt :: Tree -> Int
maxInt (Leaf x) = ?
maxInt (Node x l r) = ?
You may find it helpful to use the standard function max, which takes two arguments and returns their maximum:
max 3 17 = 17
To begin with, we have this datatype:
data Tree = Node Int Tree Tree | Leaf Int
deriving (Eq,Show)
That means, we have two constructors for things of type Tree: either we have a Leaf with a single Int value, or we have a Node which allows us to represent bigger trees in a recursive fashion.
So, for example we can have these trees:
Leaf 0
And more complex ones:
Node 3 (Leaf 0) (Leaf 4)
Recall that this tree representation have information both in the leaves and in the nodes, so for our function we will need to take that into account.
You guessed correctly the type of the function maxInt, so you are halfway through!
In order to define this function, given we have a custom defined datatype, we can be confident in using pattern-matching.
Pattern-matching is, putting it simple, a way to define our functions by equations described by, on the left side, one element of our datatype (either Leaf or Node, in our case) and on the right side, the result value. I'd recommend you to learn more about pattern-matching here: pattern matching in Haskell
Hence, we start our function by its type, as you correctly guessed:
maxInt :: Tree -> Int
As we have seen earlier, we will use pattern-matching for this. What would be the first equation, that is, the first pattern-matching case for our function? The simplest tree we have given our datatype is Leaf value. So we start with:
maxInt (Leaf n) = n
Why n as a result? Because we don't have any other value than n in the tree and therefore it's the maximum.
What happens in a more complex case?
maxInt (Node n leftTree rightTree) = ...
Well... we can think that the maximum value for the tree (Node n leftTree rightTree) would be the maximum among n, the maximum value of leftTree and rightTree.
Would you be encouraged to write the second equation? I strongly recommend you to first read the chapter of the book I just linked above. Also, you might want to read about recursion in Haskell.

haskell create an unbalanced tree

My Tree definition is
data Tree = Leaf Integer | Node Tree Tree
This is a binary tree, with only values at the leaves.
I am given following definition for balanced trees:
We say that a tree is balanced if the number of leaves in the left and right subtree of every node differs by at most one, with leaves themselves being trivially balanced.
I try to create a balanced tree as follows:
t :: Tree
t = Node (Node (Node (Leaf 1) (Leaf 2)) (Node(Leaf 3)(Leaf 4))) (Node (Node (Leaf 5) (Leaf 6)) (Node (Leaf 7) (Leaf 8)) )
Can you please let me know if t above is a balanced tree with values only at the leaves?
Another question, how do I create another tree with values only at the leaves and it is unbalanced as per above definition.
Thanks
Can you please let me know if t above is a balanced tree with values only at the leaves?
I can, but I won't. However, I hope I can guide you through the process of writing a function that will determine whether a given tree is balanced.
The following is certainly not the most efficient way to do it (see the bottom for a hint about that), but it is a very modular way. It's also a good example of the "computation by transformation" approach that functional programming (and especially lazy functional programming) encourages. It seems pretty clear to me that the first question to ask is "how many leaves descend from each node?" There's no way for us to write down the answers directly in the tree, but we can make a new tree that has the answers:
data CountedTree = CLeaf Integer | CNode Integer Tree Tree
Each node of a CountedTree has an integer field indicating how many leaves descend from it.
You should be able to write a function that reads off the total number of leaves from a CountedTree, whether it's a Leaf or a Node:
getSize :: CountedTree -> Integer
The next step is to determine whether a CountedTree is balanced. Here's a skeleton:
countedBalanced :: CountedTree -> Bool
countedBalanced CLeaf = ?
countedBalanced (CNode _ left right)
= ?? && ?? && getSize left == getSize right
I've left the first step for last: convert a Tree into a CountedTree:
countTree :: Tree -> CountedTree
And finally you can wrap it all up:
balanced :: Tree -> Bool
balanced t = ?? (?? t)
Now it turns out that you don't actually have to copy and annotate the tree to figure out whether or not it's balanced. You can do it much more directly. This is a much more efficient approach, but a somewhat less modular one. I'll give you the relevant types, and you can fill in the function.
-- The balance status of a tree. Either it's
-- unbalanced, or it's balanced and we store
-- its total number of leaves.
data Balance = Unbalanced | Balanced Integer
getBalance :: Tree -> Balance

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.
As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.
If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

Resources