Searching for a value in a TriTree in Haskell - haskell

I recently just started learning Haskell and I am trying to implement a function for searching for a specific value in a tri tree which returns true if the value is present and false otherwise. This is how my type looks:
data TriTree a
= Empty
| NodeOne a (TriTree a) (TriTree a) (TriTree a)
| NodeTwo a a (TriTree a) (TriTree a) (TriTree a)
deriving (Show)
This tree basically is empty or which contains at least one Internal Node. In which each Internal Nodes store one or two data values as well as have max three child nodes (left, middle, right).
It's not clear to me how to proceed with the search function to traverse through the tree and return the value.

Since you defined a recursive data structure, it makes sense to have a recursive function to parse it. To do that I'd start with anchoring the recursion in the trivial case. Since an empty tree doesn't contain anything, the check will always be false:
elem' _ Empty = False
Now to the recursive part: In the case of a NodeOne, we need to check if the value is inside that node or in any of the subtrees of that node, so we check if
elem' x (NodeOne v a b c) = x == v || x `elem'` a || x `elem'` b || x `elem'` c
The remaining case is for NodeTwo and I leave that for you to figure out, which shouldn't be difficult to as it is just a generalization of the line above:
elem' x _ = undefined -- remaining case
Try it online!

Related

Determine if binary tree is BST haskell

I'm trying to write a bool function to return True if a binary tree is a bst using recursion, and I need a little guidance on haskell syntax.
I understand that for a binary tree to be a bst, the left subtree must always contain only nodes less than the head. and the right subtree must always contain only nodes greater than the head. I was structuring my function as such:
isBST :: Tree -> Bool --recieve Tree, return bool
isBST (Lead i) = True --return true if its only one leaf in tree
isBST (Node h l r) = if (((isBST l) < h) && ((isBST r) > h)) then True else False
--return true if left subtree < head AND right subtree > head
But this code results in the error:
Couldn't match expected type ‘Bool’ with actual type ‘Int’
Referring to the < h and > h parts specifically. Is it something wrong with my haskell formatting? Thanks in advance
Is it something wrong with my haskell formatting?
No, it is a semantical error. You write:
(isBST l) < h
So this means you ask Haskell to determine whether l is a binary search tree, which is True or False, but you can not compare True or False with h. Even if you could (some languages see True as 1 and False as 0), then it would still be incorrect, since we want to know whether all nodes in the left subtree are less than h.
So we will somehow need to define bounds. A way to do this is to pass parameters through the recursion and perform checks. A problem with this is that the root of the tree for example, has no bounds. We can fix this by using a Maybe Int is a boundary: if it is Nothing, the boundary is "inactive" so to speak, if it is Just b, then the boundary is "active" with value b.
In order to make this check more convenient, we can first write a way to check this:
checkBound :: (a -> a -> Bool) -> Maybe a -> a -> Bool
checkBound _ Nothing _ = True
checkBound f (Just b) x = f b x
So now we can make a "sandwich check" with:
sandwich :: Ord a => Maybe a -> Maybe a -> a -> Bool
sandwich low upp x = checkBound (<) low x && checkBound (>) upp x
So sandwich is given a lowerbound and an upperbound (both Maybe as), and a value, and checks the lower and upper bounds.
So we can write a function isBST' with:
isBST' :: Maybe Int -> Maybe Int -> Tree -> Bool
isBST' low upp ... = ....
There are two cases we need to take into account: the Leaf x case, in which the "sandwich constraint" should be satisfied, and the Node h l r case in which h should satisfy the "sandwich constraint" and furthermore l and r should satsify different sandwhich constraints. For the Leaf x it is thus like:
isBST' low upp (Leaf x) = sandwich low upp x
For the node case, we first check the same constraint, and then enforce a sandwich between low and h for the left part l, and a sandwich between h and upp for the right part r, so:
isBST' low upp (Node h l r) = sandwich low upp h &&
isBST' low jh l &&
isBST' jh upp r
where jh = Just h
Now the only problem we still have is to call isBST' with the root element: here we use Nothing as intial bounds, so:
isBST :: Tree -> Bool
isBST = isBST' Nothing Nothing
There are of course other ways to enforce constraints, like passing and updating functions, or by implement four variants of the isBST' function that check a subset of the constraints.
Martin, I'd recommend you to look at Willem's answer.
Another thing, you could also use your maxInt function that you asked in a previous question to define this function:
isBST (Node h l r) = ... (maxInt l) ... -- at some point we will need to use this
Taking your definition of BSTs:
I understand that for a binary tree to be a bst, the left subtree must
always contain only nodes less than the head. and the right subtree
must always contain only nodes greater than the head.
I'll add that also the subtrees of a node should be BSTs as well.
So we can define this requirement with:
isBST (Node h l r) =
((maxInt l) < h) -- the left subtree must contain nodes less than the head
&& ((minInt r) > h) -- the right must contain nodes greater than the head
&& (...) -- the left subtree should be a BST
&& (...) -- the right subtree should be a BST
Recall that you might need to define minInt :: Tree -> Int, as you probably know how to do that.
I like Willem Van Onsem's pedagogical approach in his answer.
I was going to delete my answer, but am going to post a "correction" instead, at the risk of being wrong again:
data Tree = Empty | Node Int Tree Tree deriving show
isBST :: Tree -> Bool
isBST Empty = True
isBST (Node h l r) = f (<=h) l && f (>=h) r && isBST l && isBST r
where
f _ Empty = True
f c (Node h l r) = c h && f c l && f c r
Note that I'm using Wikipedia's definition of BST, that
the key in each node must be greater than or equal to any key stored
in the left sub-tree, and less than or equal to any key stored in the
right sub-tree.

Pattern matching warning with binary tree

I'm a beginner at Haskell and am having some trouble with understanding the warnings I get. I have implemented a binary tree,
data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Eq, Show,
Read)
and it works fine but I get incomplete patterns warning on this code
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| x > v = get x rt
The pattern it wants me to match is _ (Node _ _ _ ). I'm not sure what this pattern means?
There are two problems here. First of all, the datatype:
data Tree a = Nil | Node a (Tree left) (Tree right) deriving (Eq, Show, Read)
-- ^ left? ^ right?
In your data definition, you make use of left and right, but those are not defined in the head of the data definition, therefore these are not type parameters. You probably wanted to say:
data Tree a = Nil
| Node { value :: a, left :: Tree a, right :: Tree a}
deriving (Eq, Show, Read)
But now we still get an error:
hs.hs:5:1: Warning:
Pattern match(es) are non-exhaustive
In an equation for ‘get’: Patterns not matched: _ (Node _ _ _)
Ok, modules loaded: Main.
The problem here is that Haskell does not know that two values can only be <, == or >).
If you write an instance of Ord, then you have a "contact" that you will define a total ordering. In other words, for any two values x and y, it holds that x < y, x > y or x == y. The problem is however that Haskell does not know that. For Haskell any of the functions (<), (==) or (>) can result in True or False. Therefore - since a compiler is always conservative - it considers the case where there are two values such that all x < y, x == y and x > y fail (say that you hypothetically would have written foo x y, bar x y and qux x y then this definitely could happens since those are three blackbox functions). You can resolve it by writing otherwise in the last case:
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| otherwise = get x rt
otherwise is an alias for True and therefore there is no possibility not to take that branch. So now the conservative compiler understands that, regardless what the values of x and y are, it will always take some branch, because if it does not take the first two, it will certainly take the last one.
You may think that it is weird, but since the contracts are usually not specified in a formal language (only in the documentation, so a natural language), the compiler has no means to know that: you could as a programmer decide not to respect the contracts (but note that this is a very bad idea). Even if you write a formal contract usually as a programmer you still can decide not to respect it and furthermore a compiler cannot always do the required logical reasoning about the formal contracts.
Willem Van Onsem has already explained the issue well. I only want to add that it is possible to perform a comparison between x and v in a very similar way to the posted code, whose branches are however found exhaustive by the compiler.
Instead of
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt)
| x == v = Just x
| x < v = get x lt
| x > v = get x rt
simply use
get :: Ord a => a -> Tree a -> Maybe a
get _ Nil = Nothing
get x (Node v lt rt) = case compare x v of
EQ -> Just x
LT -> get x lt
GT -> get x rt
Indeed, compare is a function taking two arguments and returning a value in the enumerated type Ordering, which can only be EQ (equal), LT (less than), and GT (greater than). Since this is an algebraic type, GHC can see that all its constructors are handled by the case.
Further, depending on the actual type a, using compare can be more efficient. E.g., when comparing two potentially long strings, it's suboptimal to traverse them twice (if not three times, in the original code): compare does only a single pass to both strings and determines which order relation holds.

Splitting a BinTree with tail recursion in Haskell

So this week we learned about union types, tail recursion and binary trees in Haskell. We defined our tree data type like so:
data BinTree a = Empty
| Node (BinTree a) a (BinTree a)
deriving (Eq, Show)
leaf :: a -> BinTree a
leaf x = Node Empty x Empty
Now we were asked to write a function to find the most left node, return it, cut it out and also return the remaining tree without the node we just cut.
We did something like this, which worked quite well:
splitleftmost :: BinTree a -> Maybe (a, BinTree a)
splitleftmost Empty = Nothing
splitleftmost (Node l a r) = case splitleftmost l of
Nothing -> Just (a, r)
Just (a',l') -> Just (a', Node l' a r)
Now I need to make this function tail recursive. I think I understood what tail recursion is about, but found it hard to apply it to this problem. I was told to write a function which calls the main function with the fitting arguments, but was still not able to solve this.
Since nodes do not have a parent link, one approach would be to maintain root-to-leaf path within a list. At the end the modified tree can be constructed using a left fold:
slm :: BinTree a -> Maybe (a, BinTree a)
slm = run []
where
run _ Empty = Nothing
run t (Node Empty x r) = Just (x, foldl go r t)
where go l (Node _ x r) = Node l x r
run t n#(Node l _ _) = run (n:t) l
As others have hinted, there is no reason, in Haskell, to make this function tail-recursive. In fact, a tail-recursive solution will almost certainly be slower than the one you have devised! The main potential inefficiencies in the code you've provided involve allocation of pair and Just constructors. I believe GHC (with optimization enabled) will be able to figure out how to avoid these. My guess is that its ultimate code will probably look something like this:
splitleftmost :: BinTree a -> Maybe (a, BinTree a)
splitleftmost Empty = Nothing
splitleftmost (Node l a r) =
case slm l a r of
(# hd, tl #) -> Just (hd, tl)
slm :: BinTree a -> a -> BinTree a
-> (# a, BinTree a #)
slm Empty a r = (# a, r #)
slm (Node ll la lr) a r =
case slm ll la lr of
(# hd, tl' #) -> (# hd, Node tl' a r #)
Those funny-looking (# ..., ... #) things are unboxed pairs, which are handled pretty much like multiple return values. In particular, no actual tuple constructor is allocated until the end. By recognizing that every invocation of splitleftmost with a non-empty tree will produce a Just result, we (and thus almost certainly GHC) can separate the empty case from the rest to avoid allocating intermediate Just constructors. So this final code only allocates stack frames to handle the recursive results. Since some representation of such a stack is inherently necessary to solve this problem, using GHC's built-in one seems pretty likely to give the best results.
Here, not to spoil anything, are some "tail recursive" definitions of functions for summing along the left and right branches, at least as I understand "tail recursion":
sumLeftBranch tree = loop 0 tree where
loop n Empty = n
loop n (Node l a r) = loop (n+a) l
sumRightBranch tree = loop 0 tree where
loop n Empty = n
loop n (Node l a r) = loop (n+a) r
You can see that all the recursive uses of loop will have the same answer as the first call loop 0 tree - the arguments just keep getting put into better and better shape, til they are in the ideal shape, loop n Empty, which is n, the desired sum.
If this is the kind of thing that is wanted, the setup for splitleftmost would be
splitLeftMost tree = loop Nothing tree
where
loop m Empty = m
loop Nothing (Node l a r) = loop ? ?
loop (Just (a',r')) (Node l a r) = loop ? ?
Here, the first use of loop is in the form of loop Nothing tree, but that's the same as loop result Empty - when we come to it, namely result. It took me a couple of tries to get the missing arguments to loop ? ? right, but, as usual, they were obvious once I got them.

If given a list of tuples representing ranges, how can you merge continuous ranges?

If given a list of tuples representing ranges like this:
[(0,10),(10,100),(1000,5000)]
I'd like to merge the tuples that represent contiguous ranges, so the result is this:
[(0,100),(1000,5000)]
Any elegant solutions?
Here's mine
mergeRanges :: [(Int, Int)] -> [(Int, Int)]
mergeRanges xs = foldr f [] (sort xs)
where f new#(x,y) acc#((a,b):ys) =
if y == a
then (x,b):ys
else new:acc
f x acc = x:acc
EDIT: Ranges are non-overlapping
Unless this is a pattern that shows up more often in your program, I would just go for a direct recursion (untested code follows!):
mergeRanges (lo1,hi1) : (lo2,hi2) : rest)
| hi1 == lo2 = mergeRanges ((lo1,hi2) : rest)
-- or (lo1,hi2) : mergeRanges rest, to merge only adjacent ranges
mergeRanges (interval:rest) = interval : mergeRanges rest
mergeRanges [] = []
(where you could optimize a bit by using #-patterns at the cost of clutter).
But if you really want to, you could use the following helper function
merge :: (a -> a -> Maybe a) -> [a] -> [a]
merge f [] = []
merge f [x] = [x]
merge f (x:y:xs) = case f x y of
Nothing -> x : merge f (y:xs)
Just z -> merge (z:xs) -- or z : merge xs
and give as first argument
merge2Ranges (lo1, hi1) (lo2, hi2)
| hi1 == lo2 = Just (lo1, hi2)
| otherwise = Nothing
I doubt that merge is in a library somewhere, since it's pretty specific to the problem at hand.
Well, I think the best solutions in this space probably will involve specialized data structures that maintain the invariant in question. In Java-land, the Guava library has RangeSet, which does precisely this.
This is not a solution to your problem directly, but once I was playing around with this simple (too simple) implementation of "historical values" as a kind of binary search tree:
-- | A value that changes over time at discrete moments. #t# is the timeline type,
-- #a# is the value type.
data RangeMap t a = Leaf a
-- Invariant: all #t# values in the left branch must be less than
-- the one in the parent.
| Split t (RangeMap a) (RangeMap a)
valueAt :: RangeMap t a -> t -> a
valueAt _ (Leaf a) = a
valueAt t (Split t' before since)
| t < t' = get t before
| otherwise = get t since
The idea here is that Split t beforeT sinceT divides the timeline into two branches, one for values that held before t and a second for those that held since t.
So represented in terms of this type, your range set could be represented something like this:
example :: RangeMap Int Bool
example = Split 1000 (Split 100 (Split 0 (Leaf False) (Leaf False))
(Leaf False))
(Split 5000 (Leaf True) (Leaf False))
There are a few neat things about this, compared to the [(since, until, value)] representation that I've used in the past for similar applications:
The tree representation makes it impossible to have conflicting a values for the same time range. RangeMap is a true function from t to a.
The tree representation guarantees that some a is assigned to every t. Again, a RangeMap is a true function from t to a.
Since it's a tree and not a list, it supports log-time operations.
I did not go as far as working out a balanced tree representation for this or figuring out how to merge adjacent ranges with the same value, however...

Haskell 2-3-4 Tree

We've been asked to create a 2-3-4 tree in Haskell, as in write the data type, the insert function, and a display function.
I'm finding it very difficult to get information on this kind of tree, even in a language I'm comfortable with (Java, C++).
What I have so far -
data Tree t = Empty
| Two t (Tree t)(Tree t)
| Three t t (Tree t)(Tree t)(Tree t)
| Four t t t (Tree t)(Tree t)(Tree t)(Tree t) deriving (Eq, Ord, Show)
leaf2 a = Two a Empty Empty
leaf3 a b = Three a b Empty Empty Empty
leaf4 a b c = Four a b c Empty Empty Empty Empty
addNode::(Ord t) => t -> Tree t -> Tree t
addNode t Empty = leaf2 t
addNode x (Two t left right)
| x < t = Two t (addNode x left) right
| otherwise = Two t left (addNode x right)
This compiles but I'm not sure if it's correct, but not sure how to start writing the insert into a three node or four node.
The assignment also says that "deriving show" for the display function is not enough, that it should print out the tree in the format normally seen in diagrams. Again, unsure on the way to go with this.
Any help or direction appreciated.
I know nothing about 2-3-4 trees, but for the Three node, you would start with something like this:
addNode t (Three x y left mid right)
| cond1 = expr1
| cond2 = expr2
(etc)
What cond1, cond2, expr1, and expr2 are, exactly, is dependent on the definition of what a 2-3-4 tree is.
As for a show method, the general outline would be this:
instance (Show t) => Show (Tree t) where
show Empty = ...
show (Two x l r) = ...show x...show l...show r...
show (Three x y l m r) = ...
show (Four x y z l m n r) = ...
The implementation depends on how you want it to look, but for the non-Empty cases, you will probably invoke show on all of the components of the tree being shown. If you want to indent the nested parts of the tree, then perhaps you should create a separate method:
instance (Show t) => Show (Tree t) where
show = showTree 0
showTree :: Show t => Int -> Tree t -> String
showTree n = indent . go
where indent = (replicate n ' ' ++)
go Empty = "Empty"
go (Two x l r) = (...show x...showTree (n+1) l...showTree (n+1) r...)
(etc)
We've been asked to create a 2-3-4 tree
My condolences. I myself once had to implement one for homework. A 2-3-4 tree is a B-tree with all the disadvantages of the B-tree and none of the advantages, because writing the cases separately for each number of children as you do is as cumbersome as having a list of only 2-4 elements.
Point being: B-tree insertion algorithms should work, just fix the size. Cormen et al. have pseudocode for one in their book Introduction to algorithms (heavy imperativeness warning!).
It might still be better to have lists of data elements and children instead of the four-case algebraic data type, even if the type wouldn't enforce the size of the nodes then. At least it would make it easier to expand the node size.

Resources