Binary Trees Haskell - haskell

So i have this data structure:
class ordering a where
order :: a-> Int
And i want to create a search tree, where every node is a list of elements, specified by their own order number( root is 1, root of left subtree is 2, root of right subtree is 3, and so on..). Every type of data that is inserted in the tree has an "order" number associated with it that only matters for "tree insertion purposes", and if it is equal to 1, it stays in the root, if it is two, it stays on the left side of the tree, and so on..
Here's my attempt at this:
data Tree a = EmptyTree
| Node a order a (Tree [a]) (Tree [a]) deriving (Show, Read, Eq)
What i've done, makes sense to me, but apparently is wrong, but honestly i have no idea why...
I'm new to Haskell, and i've been struggling to learn the language, so i appreciate any kind of help from you guys!

The ordering that you have defined is a type class, not a data structure. order is an operation, not a type. Putting the order operation in the Tree data structure makes no sense.
You also haven't shown us any code to actually insert data, so I'm unsure how this is supposed to work.

Let's start from the function actually. Apparently you want this:
insert :: Ord key => (key,val) -> Tree key val -> Tree key val
since your tree carries values that are to be inserted according to keys, this Tree type must enclose both of them:
data Ord key => Tree key val = EmptyTree
| Node key val (Tree key val) (Tree key val)
now it's easy to implement the insert function. Each tree of a type Tree key val will be able to carry keys of type key and values of type val. To accommodate for various concrete value types in one tree you can use a tagged union type for it:
data Myval = My_c1 | My_c2 | MyInt Int | MyInts [Int] | MyString String | ...
now a tree of type, e.g., Tree Int Myval will carry values tagged with Myval constructors, inserted according to the user supplied Int keys.
If you mean that every data type has its own key,
ordkey :: Myval -> Int
ordkey My_c1 = 1
ordkey My_c2 = 2
ordkey (MyInt _) = 3
....
then you won't use insert directly, but rather through an intermediary,
ordinsert val tree = insert (ordkey val,val) tree
This is of course a simple, unsophisticated way to go about it, maybe this is what you meant.

Related

Accessing values in haskell custom data type

I'm very new to haskell and need to use a specific data type for a problem I am working on.
data Tree a = Leaf a | Node [Tree a]
deriving (Show, Eq)
So when I make an instance of this e.g Node[Leaf 1, Leaf2, Leaf 3] how do I access these? It won't let me use head or tail or indexing with !! .
You perform pattern matching. For example if you want the first child, you can use:
firstChild :: Tree a -> Maybe (Tree a)
firstChild (Node (h:_)) = Just h
firstChild _ = Nothing
Here we wrap the answer in a Maybe type, since it is possible that we process a Leaf x or a Node [], such that there is no first child.
Or we can for instance obtain the i-th item with:
iThChild :: Int -> Tree a -> Tree a
iThChild i (Node cs) = cs !! i
So here we unwrap the Node constructor, obtain the list of children cs, and then perform cs !! i to obtain the i-th child. Note however that (!!) :: [a] -> Int -> a is usually a bit of an anti-pattern: it is unsafe, since we have no guarantees that the list contains enough elements, and using length is an anti-pattern as well, since the list can have infinite length, so we can no do such bound check.
Usually if one writes algorithms in Haskell, one tends to make use of linear access, and write total functions: functions that always return something.

Set-like Data Structure without `Ord`?

Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?
You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

Binding together data, types and functions

I want to model a large tree (or forest) of some regular structure - tree can be decomposed to small tree (the irregular part) and (i.e.) large list of params, each of them with each of nodes make a node of big tree.
So, I want a data structure, where each node in a tree is representing many nodes. And real node is of type (node,param).
For algorithms that work on this kind of trees type of that param does not mattter. They are just placeholders. But some data should be possible to extract from the plain param or combination of node and param, and all possible params should be iterable. All that kinds of data is known apriori, they reflect semantic of that tree.
So, actual type, semantics and stuff of param is up to implementation of tree.
I model it in C++ using nested typedefs for params type, fixed method names for all kind of stuff that should be available to algorithm (this two together making a concept) and templates for algorithm itself.
I.e. if I want to associate with each node of big tree an integer, I would provide a function int data(const node& n, const param& p), where param is available as nested typedef, and algorithm could get list of all available params, and call data with nodes of interest and each of params
I have some plain data type, i.e. tree data, like this
data Tree = Node [Tree] | Leaf
Now I want to package up:
concrete tree
some type
some values of that type
some functions operating on (that concrete) tree nodes and (that) values
So one can write some function that use this packaged up types and functions, like, generic way.
How to achieve that?
With type families I came to
class PackagedUp t where
type Value t
tree :: Tree t
values :: [Value t]
f :: Tree t -> Value t -> Int
Tree now become Tree t because type families want type of their members to depend on typeclass argument.
Also, as in https://stackoverflow.com/a/16927632/1227578 type families to deal with injectivity will be needed.
With this I can
instance PackagedUp MyTree where
type Value MyTree = (Int,Int)
tree = Leaf
values = [(0,0),(1,1)]
f t v = fst v
And how to write such a function now? I.e. a function that will take root of a tree, all of values and make a [Int] of all f tree value.
First of all, your tree type should be defined like this:
data Tree a = Node a [Tree a] | Leaf
The type above is polymorphic. As far as semantics go that resembles what we would call a generic type in OO parlance (in C# or Java we might write Tree<A> instead). A node of a Tree a holds a value of type a and a list of subtrees.
Next, we come to PackagedUp. Classes in Haskell have little to do with the OO concept of the same name; they are not meant to package data and behaviour together. Things are actually much simpler: all you need to do is defining the appropriate functions for your tree type
getRoot :: Tree a -> Maybe a
getRoot Leaf = Nothing
getRoot (Node x _) = Just x
(Returning Maybe a is a simple way to handle failure with type safety. Think of the Nothing value as a polite cousin of null that doesn't explode with null reference exceptions.)
One thing that type classes are good at is in expressing data structure algorithm interfaces such as the ones you allude to. One of the most common classes is Functor, which provides a general interface for mapping over data structures.
instance Functor Tree where
fmap f Leaf = Leaf
fmap f (Node x ts) = Node (f x) (fmap f ts)
fmap has the following polymorphic type:
fmap :: Functor f => (a -> b) -> f a -> f b
With your tree, it specialises to
fmap :: (a -> b) -> Tree a -> Tree b
and with lists (as in fmap f ts) it becomes
fmap :: (a -> b) -> [a] -> [b]
Finally, the Data.Tree module provides a data structure which looks a lot like what you want to define.

List containing different types

I am currently writing my own structurewhich can handle ints and strings at the same time:
Something like
data Collection = One Int | Two String | Three(Collection)(Collection)
However, I was trying to write a function which could convert my structure into a list.
Am I right in thinking this is impossible because, by default doing:
[1,2,"test"]
in the console doesn't work and therefore my function is bound to always fail?
You should probably just define
type Collection = [Either Int String]
Then, instead of doing
l = [1,2,"test"]
you can do
l :: Collection
l = [Left 1, Left 2, Right "test"]
If you want more than two types, you'll need to define your own member type. So you would do something like this aswell
data MemberType = MyInt Int | MyString String | MyFloat Float deriving Show
type Collection = [MemberType]
l :: Collection
l = [MyInt 1, MyInt 2, MyString "test", MyFloat 2.2]
The deriving Show isn't necessary, but it's nice to be able to simply do print l to print the list in a nice way.
Your data structure is basically a binary tree which stores either an Int or a String at each leaf. A traversal of this tree would naturally be a [Either Int String].
Lists in Haskell can only have one type. If you want a list to handle multiple types, you'll need to create a new wrapper type which can represent both of the types you want to put into it, along with functions to extract the original type and handle it. For example, you could use Either.

Adding a leaf to Binary Search Tree, Haskell

The type is defined as
data BST = MakeNode BST String BST
| Empty
I'm trying to add a new leaf to the tree, but I don't really understand how to do it with recursion.
the function is set up like this
add :: String -> BST -> BST
The advantage of using binary trees is that you only need to look at the "current part" of the tree to know where to insert the node.
So, let's define the add function:
add :: String -> BST -> BST
If you insert something into an empty tree (Case #1), you just create a leaf directly:
add s Empty = MakeNode Empty s Empty
If you want to insert something into a node (Case #2), you have to decide which sub-node to insert the value in. You use comparisons to do this test:
add s t#(MakeNode l p r) -- left, pivot, right
| s > p = Node l p (add s r) -- Insert into right subtree
| s < p = Node (add s l) p r -- Insert into left subtree
| otherwise = t -- The tree already contains the value, so just return it
Note that this will not rebalance the binary tree. Binary tree rebalancing algorithms can be very complicated and will require a lot of code. So, if you insert a sorted list into the binary tree (e.g. ["a", "b", "c", "d"]), it will become very unbalanced, but such cases are very uncommon in practice.

Resources