What are the names used in computer science for some of the following tree data types? - haskell

Sometimes I get myself using different types of trees in Haskell and I don't know what they are called or where to get more information on algorithms using them or class instances for them, or even some pre-existing code or library on hackage.
Examples:
Binary trees where the labels are on the leaves or the branches:
data BinTree1 a = Leaf |
Branch {label :: a, leftChild :: BinTree1 a, rightChild :: BinTree1 a}
data BinTree2 a = Leaf {label :: a} |
Branch {leftChild :: BinTree2 a, rightChild :: BinTree2 a}
Similarly trees with the labels for each children node or a general label for all their children:
data Tree1 a = Branch {label :: a, children :: [Tree1 a]}
data Tree2 a = Branch {labelledChildren :: [(a, Tree2 a)]}
Sometimes I start using Tree2 and somehow on the course of developing it gets refactored into Tree1, which seems simpler to deal with, but I never gave a lot of thought about it. Is there some kind of duality here?
Also, if you can post some other different kinds of trees that you think are useful, please do.
In summary: everything you can tell me about those trees will be useful! :)
Thanks.
EDIT:
Clarification: this is not homework. It's just that I usually end up using those data types and creating instances (Functor, Monad, etc...) and maybe if I new their names I would find libraries with stuff implemented and more theoretical information on them.
Usually when a library on Hackage have Tree in the name, it implements BinTree2 or some version of a non-binary tree with labels only on the leaves, so it seems to me that maybe Tree2 and BinTree2 have some other name or identifier.
Also I feel that there may be some kind of duality or isomorphism, or a way of turning code that uses Tree1 into code that uses Tree2 with some transformation. Is there? May be it's just an impression.

The names I've heard:
BinTree1 is a binary tree
BinTree2 don't know a name but you can use such a tree to represent a prefix-free code like huffman coding for example
Tree1 is a Rose tree
Tree2 is isomoprhic to [Tree1] (a forest of Tree1) or another way to view it is a Tree1 without a label for the root.

A binary tree that only has labels in the leaves (BinTree2) is usually used for hash maps, because the tree structure itself doesn't offer any information other than the binary position of the leaves.
So, if you have 4 values with the following hash codes:
...000001 A
...000010 B
...000011 C
...000010 D
... you might store them in a binary tree (an implicit patricia trie) like so:
+ <- Bit #1 (least significant bit) of hash code
/ \ 0 = left, 1 = right
/ \
[B, D] + <- Bit #2
/ \
/ \
[A] [C]
We see that since the hash codes of B and D "start" with 0, they are stored in the left root child. They have exactly the same hash codes, so no more forks are necessary. The hash codes of A and C both "start" with 1, so another fork is necessary. A has bit 2 as 0, so it goes to the left, and C with 1 goes to the right.
This hash table implementation is kind of bad, because hashes might have to be recomputed when certain elements are inserted, but no matter.
BinTree1 is just an ordinary binary tree, and is used for fast order-based sets. Nothing more to say about it, really.
The only difference between Tree1 and Tree2 is that Tree2 can't have root node labels. This means that if used as a prefix tree, it cannot contain the empty string. It has very limited use, and I haven't seen anything like it in practice. Tree1, however, obviously has an use as a non-binary prefix tree, as I said.

Related

How to define Type safe constrained rose trees

I am trying to define a data structure with these characteristics:
It is a rose tree
The nodes in the tree are of variable sort
The only difference between the sorts of node is a constraint on the number of children they may take
The complete set of constraints is: None; OneOnly; TwoOnly; AtLeastOne; AtLeastTwo
I want the relevant constraint to be type checkable and checked.
(Eg when building, or editing the tree, trying to add a second child to IamJustOne :: OneOnly is an error)
I am having difficulty getting started defining this structure (especially points 3-5).
There is information on the web on the steps needed to define a rose tree.
There is information in Data.Tree.Rose sufficient to create a rose tree with variable nodes. (Though I am still not clear on the distinction in that module between Knuth trees, and Knuth forests.)
There are research level papers on heterogeneous containers well above my comprehension grade
My initial approach was to attempt to create subtypes of MyRose (not working code) as:
data MyRose sub = MyRose {label :: String, subtype :: sub, children :: [MyRose sub]}
type AtLeastOne a = snoc a [a]
type AtLeastTwo a = snoc a ( snoc a [a] )
...
instance MyRose AtLeastOne where children = AtLeastOne MyRose -- instances to provide defaults
...
instance None STree where children = Nothing
I have tried various approaches using data, newtype, class, type, and am now investigating type family and data family. None of my approaches have been productive.
Could you suggest pointers to defining this data structure. Babies first steps would be perfectly useful - it is difficult to underestimate my level of knowledge on this topic.
Before you go the crazy advanced route, I recommend making sure that the simple route isn't Good Enough. The simple route looks like this:
data Tree = Node { label :: String, children :: Children }
data Children
= Zero
| One Tree
| Two Tree Tree
| Positive Tree [Tree]
| Many Tree Tree [Tree]
Here's your criteria:
Is a rose tree -- uh, I guess?
Nodes in the tree are of variable sort -- check, the five Children constructors indicate the sort, and each Node may make a different choice of constructor
The only difference between sorts is a constraint on the number of children they may take -- check
The complete set of constraints -- check
Relevant constraint is type checkable and checked -- check, e.g. the application One child1 child2 does not typecheck
Even if you could define it, a tree of this sort seems very difficult to use. The type of the tree will have to reflect its entire structure, and a client will have to carry that type around everywhere, since all operations on the tree will need to know this type in order to do anything. They won't be able to just have a Rose String or something, they will need to know the exact shape.
Let's imagine you've succeeded in your goal. Then, you may have some example tree t:
t :: OnlyTwo (AtLeastOne None)
indicating a top level with 2 nodes, each of whom has at least one child, each of which is empty. What on Earth should be the type of insert t "hello"? Of deleteMin t? You can't really know which levels of the tree may need to collapse if you delete a single node, or where you may need to grow a level if you insert one.
Maybe you have answers to these questions, and some obscure use case where this is the best solution. But since you ask for baby's first solution: I think if I were you, I would step back and ask why I really want this. What do you hope to achieve with this level of type detail? What do you want client code to look like when it consumes or builds such a tree? Answers to these questions would make for a much clearer problem.

Real World Haskell Chapter 3 excercise: Binary Tree with 1 value constructor - follow up

This question is not a duplicate
A question with the same title already exists, but the answer only partially addressed it, in my opinion, and I am interested also in what it left unaswered.
Foreword
Real World Haskell proposes, in Chapter 3, page 58, the following definition for a binary tree datatype,
data Tree a = Node a (Tree a) (Tree a)
| Empty
deriving (Show)
which provides two constructors (for empty and non-empty Trees).
On the other hand, at page 60, an exercise challenges the reader to define the Tree datatype by using a single constructor.
After several attempts, I came up with the same solution as the one linked above:
data Tree a = Node a (Maybe (Tree a)) (Maybe (Tree a)) deriving(Show)
What is unanswered in the linked question
The drawback of this definition is that it does not allow the instantiation of an empty Tree, although it allows the instantiation of a Tree with empty children through the following syntax:
Node 3 Nothing (Just (Node 2 Nothing Nothing))
I think that there's not much a better solution than the above, if not having a "standalone" empty tree is acceptable and the requirement is to use one constructor only.
Having some comment on the above statement would be nice; however, my main question is how can I define Tree with one constructor such that I can instantiate an empty Tree?
Now that I've written the question, I think that one possible answer is the following, of which I am not at all sure:
If a children being empty or not is encoded in whether it is costructed through Nothing or Just (Node ...), pretty much the same holds for the whole tree (or root node), which can be indeed defined itself as a Nothing or Just (Node ...); this is to say that with only one constructor, Nothing is the way to instanciate an empty tree. (In other words, I'm just starting to think that this question is inherently "ill-formed". Notheless I'll post it, as I think I can learn something your comments/answers.)
Does the above make any sense?
A possible answer
A comment in the original question proposes the following solution
data Tree a = Tree (Maybe (a,Tree a,Tree a))
which (my understanding) allows to instantiate an emtpy tree by Node Nothing, or a non-empty tree by Node (Just (value,child1,child2)).
Here's a hint: you can turn any n-ary constructor into a 1-ary one using a n-tuple type.
For instance, your tree type is isomorphic to the following one:
data Tree a = Node (a, Tree a, Tree a)
| Empty
I think you should now be able to turn this type into one which only involves one constructor.
The answer of #chi alongside his comments says all, it could be done with either as:
data Tree a = T (Either () (a, Tree a, Tree a)) deriving Show
And an example of a tree is:
node1 = T $ Right ("data node 1", node2, node3)
node2 = T $ Left ()
node3 = T $ Right ("data node 3", node2, node2)
$> node1
T (Right ("data node 1",T (Left ()),T (Right ("data node 3",T (Left ()),T (Left ())))))
But everybody already also said, that can be replaced with Maybe, because Either () can be seen as Maybe a

How does this binary tree code represent a tree?

I was reading through past papers for my University's Haskell exam and came across a question involving trees where the tree type was implemented as such:
data Tree a = Lf a -- leaf
| Tree a :+: Tree a -- branch
It then goes on to outline an example tree that could be used with various functions, for example:
((Lf 1 :+: Lf 2) :+: (Lf 3 :+: Lf 4))
My confusion with this code is how it can represent a tree without having some notion of a root element. My intuition would suggest the tree being represented here would look like this:
/ \
/ \ / \
1 2 3 4
but such a tree would possess no root elements, only leaves, which certainly seems wrong to me. How is a tree actually expressed in this code?
It is a tree with data stored only in the leaves. There is no requirement that a tree have data in the interior nodes, although of course many algorithms can only operate on a tree with data in its interior nodes (e.g. BST search).
You might ask in what situation such a structure is useful. Consider Huffman decoding, where no data is needed in the interior nodes. You simply traverse down the tree, moving left on 0 and right on 1, until you reach a leaf node, at which point you have decoded a character.

Gather data about existing tree-like data

Let's say we have existing tree-like data and we would like to add information about depth of each node. How can we easily achieve that?
Data Tree = Node Tree Tree | Leaf
For each node we would like to know in constant complexity how deep it is. We have the data from external module, so we have information as it is shown above. Real-life example would be external HTML parser which just provides the XML tree and we would like to gather data e.g. how many hyperlinks every node contains.
Functional languages are created for traversing trees and gathering data, there should be an easy solution.
Obvious solution would be creating parallel structure. Can we do better?
The standard trick, which I learned from Chris Okasaki's wonderful Purely Functional Data Structures is to cache the results of expensive operations at each node. (Perhaps this trick was known before Okasaki's thesis; I don't know.) You can provide smart constructors to manage this information for you so that constructing the tree need not be painful. For example, when the expensive operation is depth, you might write:
module SizedTree (SizedTree, sizedTree, node, leaf, depth) where
data SizedTree = Node !Int SizedTree SizedTree | Leaf
node l r = Node (max (depth l) (depth r) + 1) l r
leaf = Leaf
depth (Node d _ _) = d
depth Leaf = 0
-- since we don't expose the constructors, we should
-- provide a replacement for pattern matching
sizedTree f v (Node _ l r) = f l r
sizedTree f v Leaf = v
Constructing SizedTrees costs O(1) extra work at each node (hence it is O(n) work to convert an n-node Tree to a SizedTree), but the payoff is that checking the depth of a SizedTree -- or of any subtree -- is an O(1) operation.
You do need some another data where you can store these Ints. Define Tree as
data Tree a = Node Tree a Tree | Leaf a
and then write a function
annDepth :: Tree a -> Tree (Int, a)
Your original Tree is Tree () and with pattern synonyms you can recover nice constructors.
If you want to preserve the original tree for some reason, you can define a view:
{-# LANGUAGE GADTs, DataKinds #-}
data Shape = SNode Shape Shape | SLeaf
data Tree a sh where
Leaf :: a -> Tree a SLeaf
Node :: Tree a lsh -> a -> Tree a rsh -> Tree a (SNode lsh rsh)
With this you have a guarantee that an annotated tree has the same shape as the unannotated. But this doesn't work good without proper dependent types.
Also, have a look at the question Boilerplate-free annotation of ASTs in Haskell?
The standard solution is what #DanielWagner suggested, just extend the data structure. This can be somewhat inconvenient, but can be solved: Smart constructors for creating instances and using records for pattern matching.
Perhaps Data types a la carte could help, although I haven't used this approach myself. There is a library compdata based on that.
A completely different approach would be to efficiently memoize the values you need. I was trying to solve a similar problem and one of the solutions is provided by the library stable-memo. Note that this isn't a purely functional approach, as the library is internally based on object identity, but the interface is pure and works perfectly for the purpose.

Trees with values on the leaves only

A few years ago, during a C# course I learned to write a binary tree that looked more or less like this:
data Tree a = Branch a (Tree a) (Tree a) | Leaf
I saw the benefit of it, it had its values on the branches, which allowed for quick and easy lookup and insertion of values, because it would encounter a value on the root of each branch all the way down until it hit a leaf, that held no value.
Ever since I started learning Haskell, however; I've seen numerous examples of trees that are defined like this:
data Tree a = Branch (Tree a) (Tree a) | Leaf a
That definition puzzles me. I can't see the usefulness of having data on the elements that don't branch, because it would end up leading to a tree that looks like this:
Which to me, seems like a poorly designed alternative to a List. It also makes me question the lookup time of it, since it can't asses which branch to go down to find the value it's looking for; but rather needs to go through every node to find what it's looking for.
So, can anyone shed some light on why the second version (value on leaves) is so much more prevalent in Haskell than the first version?
I think this depends on what you're trying to model and how you're trying to model it.
A tree where the internal nodes store values and the leaves are just leaves is essentially a standard binary tree (tree each leaf as NULL and you basically have an imperative-style binary tree). If the values are stored in sorted order, you now have a binary search tree. There are many specific advantages to storing data this way, most of which transfer directly over from imperative settings.
Trees where the leaves store the data and the internal nodes are just for structure do have their advantages. For example, red/black trees support two powerful operations called split and join that have advantages in some circumstances. split takes as input a key, then destructively modifies the tree to produce two trees, one of which contains all keys less than the specified input key and one containing the remaining keys. join is, in a sense, the opposite: it takes in two trees where one tree's values are all less than the other tree's values, then fuses them together into a single tree. These operations are particularly difficult to implement on most red/black trees, but are much simpler if all the data is stored in the leaves only rather than in the internal nodes. This paper detailing an imperative implementation of red/black trees mentions that some older implementations of red/black trees used this approach for this very reason.
As another potential advantage of storing keys in the leaves, suppose that you want to implement the concatenate operation, which joins two lists together. If you don't have data in the leaves, this is as simple as
concat first second = Branch first second
This works because no data is stored in those nodes. If the data is stored in the leaves, you need to somehow move a key from one of the leaves up to the new concatenation node, which takes more time and is trickier to work with.
Finally, in some cases, you might want to store the data in the leaves because the leaves are fundamentally different from internal nodes. Consider a parse tree, for example, where the leaves store specific terminals from the parse and the internal nodes store all the nonterminals in the production. In this case, there really are two different types of nodes, so it doesn't make sense to store arbitrary data in the internal nodes.
Hope this helps!
You described a tree with data at the leaves as "a poorly designed alternative to a List."
I agree that this could be used as an alternative to a list, but it's not necessarily poorly designed! Consider the data type
data Tree t = Leaf t | Branch (Tree t) (Tree t)
You can define cons and snoc (append to end of list) operations -
cons :: t -> Tree t -> Tree t
cons t (Leaf s) = Branch (Leaf t) (Leaf s)
cons t (Branch l r) = Branch (cons t l) r
snoc :: Tree t -> t -> Tree t
snoc (Leaf s) t = Branch (Leaf s) (Leaf t)
snoc (Branch l r) t = Branch l (snoc r t)
These run (for roughly balanced lists) in O(log n) time where n is the length of the list. This contrasts with the standard linked list, which has O(1) cons and O(n) snoc operations. You can also define a constant-time append (as in templatetypedef's answer)
append :: Tree t -> Tree t -> Tree t
append l r = Branch l r
which is O(1) for two lists of any size, whereas the standard list is O(n) where n is the length of the left argument.
In practice you would want to define slightly smarter versions of these functions which attempt to keep the tree balanced. To do this it is often useful to have some additional information at the branches, which could be done by having multiple kinds of branch (as in a red-black tree which has "red" and "black" nodes) or explicitly include additional data at the branches, as in
data Tree b a = Leaf a | Branch b (Tree b a) (Tree b a)
For example, you can support an O(1) size operation by storing the total number of elements in both subtrees in the nodes. All of your operations on the tree become slightly more complicated since you need to correctly persist the information about subtree sizes -- in effect the work of computing the size of the tree is amortized over all the operations that construct the tree (and cleverly persisted, so that minimal work is done whenever you need to reconstruct a size later).
More is better worse more. I'll explain just a couple basic considerations to show why your intuition fails. The general idea, though, is that different data structures need different things.
Empty leaf nodes can actually be a space (and therefore time) problem in some contexts. If a node is represented by a bit of information and two pointers to its children, you'll end up with two null pointers per node whose children are both leaves. That's two machine words per leaf node, which can add up to quite a bit of space. Some structures avoid this by ensuring that each leaf holds at least one piece of information to justify its existence. In some cases (such as ropes), each leaf may have a fairly large and dense payload.
Making internal nodes bigger (by storing information in them) makes it more expensive to modify the tree. Changing a leaf in a balanced tree typically forces you to allocate replacements for O(log n) internal nodes. If each of those is larger, you've just allocated more space and spent extra time to copy more words. The extra size of the internal nodes also means that you can fit less of the tree structure into the CPU cache.

Resources