Haskell 2-3-4 Tree - haskell

We've been asked to create a 2-3-4 tree in Haskell, as in write the data type, the insert function, and a display function.
I'm finding it very difficult to get information on this kind of tree, even in a language I'm comfortable with (Java, C++).
What I have so far -
data Tree t = Empty
| Two t (Tree t)(Tree t)
| Three t t (Tree t)(Tree t)(Tree t)
| Four t t t (Tree t)(Tree t)(Tree t)(Tree t) deriving (Eq, Ord, Show)
leaf2 a = Two a Empty Empty
leaf3 a b = Three a b Empty Empty Empty
leaf4 a b c = Four a b c Empty Empty Empty Empty
addNode::(Ord t) => t -> Tree t -> Tree t
addNode t Empty = leaf2 t
addNode x (Two t left right)
| x < t = Two t (addNode x left) right
| otherwise = Two t left (addNode x right)
This compiles but I'm not sure if it's correct, but not sure how to start writing the insert into a three node or four node.
The assignment also says that "deriving show" for the display function is not enough, that it should print out the tree in the format normally seen in diagrams. Again, unsure on the way to go with this.
Any help or direction appreciated.

I know nothing about 2-3-4 trees, but for the Three node, you would start with something like this:
addNode t (Three x y left mid right)
| cond1 = expr1
| cond2 = expr2
(etc)
What cond1, cond2, expr1, and expr2 are, exactly, is dependent on the definition of what a 2-3-4 tree is.
As for a show method, the general outline would be this:
instance (Show t) => Show (Tree t) where
show Empty = ...
show (Two x l r) = ...show x...show l...show r...
show (Three x y l m r) = ...
show (Four x y z l m n r) = ...
The implementation depends on how you want it to look, but for the non-Empty cases, you will probably invoke show on all of the components of the tree being shown. If you want to indent the nested parts of the tree, then perhaps you should create a separate method:
instance (Show t) => Show (Tree t) where
show = showTree 0
showTree :: Show t => Int -> Tree t -> String
showTree n = indent . go
where indent = (replicate n ' ' ++)
go Empty = "Empty"
go (Two x l r) = (...show x...showTree (n+1) l...showTree (n+1) r...)
(etc)

We've been asked to create a 2-3-4 tree
My condolences. I myself once had to implement one for homework. A 2-3-4 tree is a B-tree with all the disadvantages of the B-tree and none of the advantages, because writing the cases separately for each number of children as you do is as cumbersome as having a list of only 2-4 elements.
Point being: B-tree insertion algorithms should work, just fix the size. Cormen et al. have pseudocode for one in their book Introduction to algorithms (heavy imperativeness warning!).
It might still be better to have lists of data elements and children instead of the four-case algebraic data type, even if the type wouldn't enforce the size of the nodes then. At least it would make it easier to expand the node size.

Related

Searching for a value in a TriTree in Haskell

I recently just started learning Haskell and I am trying to implement a function for searching for a specific value in a tri tree which returns true if the value is present and false otherwise. This is how my type looks:
data TriTree a
= Empty
| NodeOne a (TriTree a) (TriTree a) (TriTree a)
| NodeTwo a a (TriTree a) (TriTree a) (TriTree a)
deriving (Show)
This tree basically is empty or which contains at least one Internal Node. In which each Internal Nodes store one or two data values as well as have max three child nodes (left, middle, right).
It's not clear to me how to proceed with the search function to traverse through the tree and return the value.
Since you defined a recursive data structure, it makes sense to have a recursive function to parse it. To do that I'd start with anchoring the recursion in the trivial case. Since an empty tree doesn't contain anything, the check will always be false:
elem' _ Empty = False
Now to the recursive part: In the case of a NodeOne, we need to check if the value is inside that node or in any of the subtrees of that node, so we check if
elem' x (NodeOne v a b c) = x == v || x `elem'` a || x `elem'` b || x `elem'` c
The remaining case is for NodeTwo and I leave that for you to figure out, which shouldn't be difficult to as it is just a generalization of the line above:
elem' x _ = undefined -- remaining case
Try it online!

Can't get some simple Haskell code to compile, no matter how many different ways I try it

I'm trying to write a Haskell function that takes in a tree, and replaces every node with a pair containing the height of the subtree at that node, and the original node
Depending on where I place my parenthesis in the last line of code, I get all kinds of different errors. I know my height function works because I've used it for different functions previously. I'm clearly not grouping things together correctly, because I've gotten everything from not giving enough arguments to max, to giving too many arguments to pairs. Please help! I'm really stuck here and making no progress because I'm just moving parenthesis back and forth.
data Tree a = Tip | Bin (Tree a) a (Tree a) deriving (Show, Eq)
getHeight :: Tree a -> Integer
getHeight Tip = 0
getHeight (Bin l _ r) = (max (getHeight l) (getHeight r)) +1
pairs :: Tree a -> Tree (Integer, a)
pairs Tip = Tip
pairs (Bin l x r) = (Bin (pairs l) ((max (left right)) x) (pairs r))
where left = (getHeight l)
right = (getHeight r)
The way to call a function with two arguments is my separating them with whitespace, like this:
f x y
Or in your case, this would be:
max left right
The way to construct a pair is with parens and a comma, like this:
(42, "foo")
Or in your case, this would be:
(max left right, x)
Summing all of that up, the line should be:
pairs (Bin l x r) = Bin (pairs l) (max left right, x) (pairs r)

Pattern matching warning with binary tree

I'm a beginner at Haskell and am having some trouble with understanding the warnings I get. I have implemented a binary tree,
data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Eq, Show,
Read)
and it works fine but I get incomplete patterns warning on this code
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| x > v = get x rt
The pattern it wants me to match is _ (Node _ _ _ ). I'm not sure what this pattern means?
There are two problems here. First of all, the datatype:
data Tree a = Nil | Node a (Tree left) (Tree right) deriving (Eq, Show, Read)
-- ^ left? ^ right?
In your data definition, you make use of left and right, but those are not defined in the head of the data definition, therefore these are not type parameters. You probably wanted to say:
data Tree a = Nil
| Node { value :: a, left :: Tree a, right :: Tree a}
deriving (Eq, Show, Read)
But now we still get an error:
hs.hs:5:1: Warning:
Pattern match(es) are non-exhaustive
In an equation for ‘get’: Patterns not matched: _ (Node _ _ _)
Ok, modules loaded: Main.
The problem here is that Haskell does not know that two values can only be <, == or >).
If you write an instance of Ord, then you have a "contact" that you will define a total ordering. In other words, for any two values x and y, it holds that x < y, x > y or x == y. The problem is however that Haskell does not know that. For Haskell any of the functions (<), (==) or (>) can result in True or False. Therefore - since a compiler is always conservative - it considers the case where there are two values such that all x < y, x == y and x > y fail (say that you hypothetically would have written foo x y, bar x y and qux x y then this definitely could happens since those are three blackbox functions). You can resolve it by writing otherwise in the last case:
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| otherwise = get x rt
otherwise is an alias for True and therefore there is no possibility not to take that branch. So now the conservative compiler understands that, regardless what the values of x and y are, it will always take some branch, because if it does not take the first two, it will certainly take the last one.
You may think that it is weird, but since the contracts are usually not specified in a formal language (only in the documentation, so a natural language), the compiler has no means to know that: you could as a programmer decide not to respect the contracts (but note that this is a very bad idea). Even if you write a formal contract usually as a programmer you still can decide not to respect it and furthermore a compiler cannot always do the required logical reasoning about the formal contracts.
Willem Van Onsem has already explained the issue well. I only want to add that it is possible to perform a comparison between x and v in a very similar way to the posted code, whose branches are however found exhaustive by the compiler.
Instead of
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| x > v = get x rt
simply use
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt) = case compare x v of
EQ -> Just x
LT -> get x lt
GT -> get x rt
Indeed, compare is a function taking two arguments and returning a value in the enumerated type Ordering, which can only be EQ (equal), LT (less than), and GT (greater than). Since this is an algebraic type, GHC can see that all its constructors are handled by the case.
Further, depending on the actual type a, using compare can be more efficient. E.g., when comparing two potentially long strings, it's suboptimal to traverse them twice (if not three times, in the original code): compare does only a single pass to both strings and determines which order relation holds.

Splitting a BinTree with tail recursion in Haskell

So this week we learned about union types, tail recursion and binary trees in Haskell. We defined our tree data type like so:
data BinTree a = Empty
| Node (BinTree a) a (BinTree a)
deriving (Eq, Show)
leaf :: a -> BinTree a
leaf x = Node Empty x Empty
Now we were asked to write a function to find the most left node, return it, cut it out and also return the remaining tree without the node we just cut.
We did something like this, which worked quite well:
splitleftmost :: BinTree a -> Maybe (a, BinTree a)
splitleftmost Empty = Nothing
splitleftmost (Node l a r) = case splitleftmost l of
Nothing -> Just (a, r)
Just (a',l') -> Just (a', Node l' a r)
Now I need to make this function tail recursive. I think I understood what tail recursion is about, but found it hard to apply it to this problem. I was told to write a function which calls the main function with the fitting arguments, but was still not able to solve this.
Since nodes do not have a parent link, one approach would be to maintain root-to-leaf path within a list. At the end the modified tree can be constructed using a left fold:
slm :: BinTree a -> Maybe (a, BinTree a)
slm = run []
where
run _ Empty = Nothing
run t (Node Empty x r) = Just (x, foldl go r t)
where go l (Node _ x r) = Node l x r
run t n#(Node l _ _) = run (n:t) l
As others have hinted, there is no reason, in Haskell, to make this function tail-recursive. In fact, a tail-recursive solution will almost certainly be slower than the one you have devised! The main potential inefficiencies in the code you've provided involve allocation of pair and Just constructors. I believe GHC (with optimization enabled) will be able to figure out how to avoid these. My guess is that its ultimate code will probably look something like this:
splitleftmost :: BinTree a -> Maybe (a, BinTree a)
splitleftmost Empty = Nothing
splitleftmost (Node l a r) =
case slm l a r of
(# hd, tl #) -> Just (hd, tl)
slm :: BinTree a -> a -> BinTree a
-> (# a, BinTree a #)
slm Empty a r = (# a, r #)
slm (Node ll la lr) a r =
case slm ll la lr of
(# hd, tl' #) -> (# hd, Node tl' a r #)
Those funny-looking (# ..., ... #) things are unboxed pairs, which are handled pretty much like multiple return values. In particular, no actual tuple constructor is allocated until the end. By recognizing that every invocation of splitleftmost with a non-empty tree will produce a Just result, we (and thus almost certainly GHC) can separate the empty case from the rest to avoid allocating intermediate Just constructors. So this final code only allocates stack frames to handle the recursive results. Since some representation of such a stack is inherently necessary to solve this problem, using GHC's built-in one seems pretty likely to give the best results.
Here, not to spoil anything, are some "tail recursive" definitions of functions for summing along the left and right branches, at least as I understand "tail recursion":
sumLeftBranch tree = loop 0 tree where
loop n Empty = n
loop n (Node l a r) = loop (n+a) l
sumRightBranch tree = loop 0 tree where
loop n Empty = n
loop n (Node l a r) = loop (n+a) r
You can see that all the recursive uses of loop will have the same answer as the first call loop 0 tree - the arguments just keep getting put into better and better shape, til they are in the ideal shape, loop n Empty, which is n, the desired sum.
If this is the kind of thing that is wanted, the setup for splitleftmost would be
splitLeftMost tree = loop Nothing tree
where
loop m Empty = m
loop Nothing (Node l a r) = loop ? ?
loop (Just (a',r')) (Node l a r) = loop ? ?
Here, the first use of loop is in the form of loop Nothing tree, but that's the same as loop result Empty - when we come to it, namely result. It took me a couple of tries to get the missing arguments to loop ? ? right, but, as usual, they were obvious once I got them.

Working with Trees in Haskell

I have this data definition for a tree:
data Tree = Leaf Int | Node Tree Int Tree
and I have to make a function, nSatisfy, to check how many items of the tree check some predicate.
Here's what I've done:
nSatisfy :: (Int->Bool) -> Tree -> Int
nSatisfy _ Leaf = 0
nSatisfy y (Node left x right)
|y x = 1 + nSatisfy y (Node left x right)
| otherwise = nSatisfy y (Node left x right)
Is this the right way to solve this problem?
In your nSatisfy function, you should add the number of nodes satisfying the condition in both subtrees with two recursive calls. The last two lines should be like this:
|x y=1+(nSatisfy y left)+(nSatisfy y right)
|otherwise=(nSatisfy y left)+(nSatisfy y right)
This way, it will call itself again on the same node but only on the subtrees.
Also, if a leaf contains an integer, as is implied in the data declaration, you should make it evaluate the condition for a leaf and return 1 if it is true, instead of always returning 0.
In addition to the main answer, I'd like to offer a slightly different way how to generalize your problem and solving it using existing libraries.
The operation you're seeking is common to many data structures - to go through all elements and perform some operation on them. Haskell defines Foldable type-class, which can be implemented by structures like yours.
First let's import some modules we'll need:
import Data.Foldable
import Data.Monoid
In order to use Foldable, we need to generalize the structure a bit, in particular parametrize its content:
data Tree a = Leaf a | Node (Tree a) a (Tree a)
In many cases this is a good idea as it separates the structure from its content and allows it to be easily reused.
Now let's define its Foldable instance. For tree-like structures it's easier to define it using foldMap, which maps each element into a monoid and then combines all values:
instance Foldable Tree where
foldMap f (Leaf x) = f x
foldMap f (Node lt x rt) = foldMap f lt <> f x <> foldMap f rt
This immediately gives us the whole library of functions in the Data.Foldable module, such as searching for an element, different kinds of folds, etc. While a function counting the number of values satisfying some predicate isn't defined there, we can easily define it for any Foldable. The idea is that we'll use the Sum:
nSatisfy :: (Foldable f) => (a -> Bool) -> f a -> Int
nSatisfy p = getSum . foldMap (\x -> Sum $ if p x then 1 else 0)
The idea behind this function is simple: Map each value to 1 if it satisfies the predicate, otherwise to 0. And then folding with the Sum monoid just adds all values up.

Resources