Monad instance for binary tree - haskell

I built binary tree with:
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Ord, Read, Show)
How can i make Monad type class instance for this tree? And can i make it on not?
i try:
instance Monad Tree where
return x = Node x Empty Empty
Empty >>= f = Empty
(Node x Empty Empty) >>= f = f x
But i can't make (>>=) for Node x left right.
Thank you.

There is no (good) monad for the type you just described, exactly. It would require rebalancing the tree and merging together the intermediate trees that are generated by the bind, and you can't rebalance based on any information in 'a' because you know nothing about it.
However, there is a similar tree structure
data Tree a = Tip a | Bin (Tree a) (Tree a)
which admits a monad
instance Monad Tree where
return = Tip
Tip a >>= f = f a
Bin l r >>= f = Bin (l >>= f) (r >>= f)
I talked about this and other tree structures a year or two back at Boston Haskell as a lead-in to talking about finger trees. The slides there may be helpful in exploring the difference between leafy and traditional binary trees.
The reason I said there is no good monad, is that any such monad would have to put the tree into a canonical form for a given number of entries to pass the monad laws or quotient out some balance concerns by not exposing the constructors to the end user, but doing the former would require much more stringent reordering than you get for instance from an AVL or weighted tree.

Related

What is the correct definition of `unfold` for an untagged tree?

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

What is the most general way to compute the depth of a tree with something like a fold?

What minimal (most general) information is required to compute depth of a Data.Tree? Is instance of a Data.Foldable sufficient?
I initially tried to fold a Tree and got stuck trying to find right Monoid similar to Max. Something tells me that since Monoid (that would compute depth) needs to be associative, it probably cannot be used to express any fold that needs to be aware of the structure (as in 1 + maxChildrenDepth), but I'm not certain.
I wonder what thought process would let me arrive at right abstraction for such cases.
I can't say if it's a minimal/most general amount of information. But one general solution is that a given structure
is a catamorphism
underlying functor of the catamorphism is Foldable so that it's possible to enumerate sub-terms.
Here is sample code using recursion-schemes.
{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
import Data.Functor.Foldable
import Data.Semigroup
import Data.Tree
depth :: (Recursive f, Foldable (Base f)) => f -> Int
depth = cata ((+ 1) . maybe 0 getMax . getOption
. foldMap (Option . Just . Max))
-- Necessary instances for Tree:
data TreeF a t = NodeF { rootLabel' :: a, subForest :: [t] }
type instance Base (Tree a) = TreeF a
instance Functor (TreeF a) where
fmap f (NodeF x ts) = NodeF x (map f ts)
instance Foldable (TreeF a) where
foldMap f (NodeF _ ts) = foldMap f ts
instance Recursive (Tree a) where
project (Node x ts) = NodeF x ts
To answer the first question: Data.Foldable is not enough to compute the depth of the tree. The minimum complete definition of Foldable is foldr, which always has the following semantics:
foldr f z = Data.List.foldr f z . toList
In other words, a Foldable instance is fully characterized by how it behaves on a list projection of the input (ie toList), which will throw away the depth information of a tree.
Other ways of verifying this idea involve the fact that Foldable depends on a monoid instance which has to be associative or the fact that the various fold functions see the elements one by one in some particular order with no other information, which necessarily throws out the actual tree structure. (There has to be more than one tree with the same set of elements in the same relative order.)
I'm not sure what the minimal abstraction would be for trees specifically, but I think the core of your question is actually a bit broader: it would be interesting to see what minimum amount of information is needed to compute arbitrary facts about a type with a fold-like function.
To do this, the actual helper function in the fold would have to take a different sort of argument for each sort of data structure. This naturally leads us to catamorphisms, which are generalized folds over different data types.
You can read more about these generalized folds on a different Stack Overflow question: What constitutes a fold for types other than list? (In the interest of disclosure/self-promotion, I wrote one of the answeres there :P.)

What is the difference between value constructors and tuples?

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.
For example, a Tree data type in Haskell might be written as
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
which could be converted to "tuple form" like this:
data Tree a = EmptyTree | Node (a, Tree a, Tree a)
What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?
Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?
I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a
You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:
Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)
unless, I guess, you explicitly declare a type with ::.
Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?
Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.
There are some practical differences though:
Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:
data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.
You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.
Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)
So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.
If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.
They're what's called isomorphic, meaning "to have the same shape". You can write something like
data Option a = None | Some a
And this is isomorphic to
data Maybe a = Nothing | Just a
meaning that you can write two functions
f :: Maybe a -> Option a
g :: Option a -> Maybe a
Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor
data Triple a b c = Triple a b c
Because you can write
f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c
g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)
And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as
newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))
The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.
Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

Catamorphism and tree-traversing in Haskell

I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a#(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a#(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.
Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:
sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
, branch = (+) }
Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.
The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.
Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
data TreeNode a child
= Leaf a
| Branch child child
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.
type Mu f = f (Mu f)
Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).
So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:
type IntNode = TreeNode Int
Expanding Mu IntNode, we get:
Mu IntNode = IntNode (Mu IntNode)
= Leaf Int | Branch (Mu IntNode) (Mu IntNode)
Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.
Let's define:
type IntTree = Mu IntNode
I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):
cata :: (node r -> r) -> Mu node -> r
Making this concrete for our example, this looks like:
cata :: (IntNode r -> r) -> IntTree -> r
Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.
In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.
fmap :: (a -> b) -> node a -> node b
This can be done straightforwardly for IntNode.
fmap f (Leaf x) = Leaf x -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r) -- apply function to each child
Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):
cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)
I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.
Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:
foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r
Thus the tree catamorphism can be defined in terms of our generic one as follows:
type Tree a = Mu (TreeNode a)
treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!
I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0,
0 = z,
1 = f z,
2 = f (f z),
3 = f (f (f z)),
etc...
What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.

Converting rose trees to different binary tree types

I am completely lost on how to do some tree conversions in Haskell. I need to go from a rose tree defined as:
data Rose a = Node a [Rose a] deriving (Eq, Show, Ord)
to a binary tree which is defined as:
data Btree a = Empty | Fork a (Btree a) (Btree a) deriving (Eq, Show, Ord)
In my class I was given a function that is similar, but using a different definition of the binary tree. For that function the rose tree is defined the same and the binary tree is defined as:
Btree a = Leaf a | Fork (Btree a) (Btree a)
with the function from rose tree to binary tree defined as:
toB :: Rose a -> Btree a
toB (Node x xts) = foldl Fork (Leaf x) (map toB xts)
toB (Node x []) = foldl Fork (Leaf x) []
I have the answer but I don't know how to convert it so that it works with the new definition of Btree.
When I did something like this, I considered the "left" subtree to be the first child, and the "right" subtree to be the siblings of the node. This means that you convert the tree like so:
h h
/|\ /
/ | \ /
b d e ==> b->d->e
/ \ / \ / /
a c f g a->c f->g
h is still the root, but in the second diagram / is the left subtree and -> is the right. Leaves have no left subtree, but might have siblings (right subtrees). The root has no right subtree, but might have children (left subtree). Internal nodes have both.
Does that help?
Try to write a function converting from the first to the second definition of the binary tree. Then convB . toB is your new function! Now, systematically create a new function that acts directly as a fusion of the two, by inlining one into the other, and you'll get a straightforward and elegant solution.

Resources