Catamorphism and tree-traversing in Haskell - haskell

I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a#(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a#(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.

Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:
sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
, branch = (+) }
Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.
The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.
Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
data TreeNode a child
= Leaf a
| Branch child child
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.
type Mu f = f (Mu f)
Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).
So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:
type IntNode = TreeNode Int
Expanding Mu IntNode, we get:
Mu IntNode = IntNode (Mu IntNode)
= Leaf Int | Branch (Mu IntNode) (Mu IntNode)
Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.
Let's define:
type IntTree = Mu IntNode
I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):
cata :: (node r -> r) -> Mu node -> r
Making this concrete for our example, this looks like:
cata :: (IntNode r -> r) -> IntTree -> r
Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.
In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.
fmap :: (a -> b) -> node a -> node b
This can be done straightforwardly for IntNode.
fmap f (Leaf x) = Leaf x -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r) -- apply function to each child
Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):
cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)
I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.
Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:
foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r
Thus the tree catamorphism can be defined in terms of our generic one as follows:
type Tree a = Mu (TreeNode a)
treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!

I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0,
0 = z,
1 = f z,
2 = f (f z),
3 = f (f (f z)),
etc...
What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.

Related

Attempting to construct trees in Haskell

I am trying to use an unfold function to build trees.
Tree t = Leaf | Node (Tree t) t (Tree t)
unfoldT :: (b -> Maybe (b, a, b)) -> b -> Tree a
unfoldT f b =
case f b of
Nothing -> Leaf
Just (lt, x, rt) -> Node (unfoldT f lt) x (unfoldT f rt)
The build function needs to create a tree that has a height equal to the number provided, as well as be numbered in an in-order fashion. The base case being build 0 = Leaf and the next being build 1 = Node (Leaf 0 Leaf).
build :: Integer -> Tree Integer
My attempt at solving it:
build n = unfoldT (\x -> Just x) [0..2^n-2]
I am not entirely sure how to go about constructing the tree here.
Would love it if somebody could point me in the right direction.
Edit 1:
If I was to use a 2-tuple, what would I combine? I need to be able to refer to the current node, its left subtree and its right subtree somehow right?
If I was to use a 2-tuple, what would I combine?
I would recommend to pass the remaining depth as well as the offset from the left:
build = unfoldT level . (0,)
where
level (_, 0) = Nothing
level (o, n) = let mid = 2^(n-1)
in ((o, n-1), o+mid-1, (o+mid, n-1))
If I was to use a 2-tuple, what would I combine?
That's the key question behind the state-passing paradigm in functional programming, expressed also with the State Monad. We won't be dealing with the latter here, but maybe use the former.
But before that, do we really need to generate all the numbers in a list, and then work off that list? Don't we know in advance what are the numbers we'll be working with?
Of course we do, because the tree we're building is totally balanced and fully populated.
So if we have a function like
-- build2 (depth, startNum)
build2 :: (Int, Int) -> Tree Int
we can use it just the same to construct both halves of e.g. the build [0..14] tree:
build [0..14] == build2 (4,0) == Node (build2 (3,0)) 7 (build2 (3,8))
Right?
But if we didn't want to mess with the direct calculations of all the numbers involved, we could arrange for the aforementioned state-passing, with the twist to build2's interface:
-- depth, startNum tree, nextNum
build3 :: (Int, Int) -> (Tree Int, Int)
and use it like
build :: Int -> Tree Int -- correct!
build depth = build3 (depth, 0) -- intentionally incorrect
build3 :: (Int, Int) -> (Tree Int, Int) -- correct!
build3 (depth, start) = Node lt n rt -- intentionally incorrect
where
(lt, n) = build3 (depth-1, start) -- n is returned
(rt, m) = build3 (depth-1, n+1) -- and used, next
You will need to tweak the above to make all the pieces fit together (follow the types!), implementing the missing pieces of course and taking care of the corner / base cases.
Formulating this as an unfold would be the next step.

Can I use the Bundle module of the vector library?

Simple question: can I use the API under Data.Vector.Fusion.Bundle? Is it considered public and stable? And if yes: is there more information available how to use it?
An example would be writing gathering operations in trees where data is stored in vectors in the leaves:
data Tree a = Empty | Node a (Tree a) (Tree a) | Leaf a
toBundle Empty = B.empty
toBundle (Leaf x) = B.singleton x
toBundle (Node x l r) = toBundle l B.++ B.singleton x B.++ toBundle r
toVector t = V.unfoldrN s (\b -> Just (B.head b, B.tail b)) b
where b = toBundle t
(Just s) = B.upperBound (B.size b)
(There are no vectors in the leaves ... just imagine them :) )
As I wrote this I found out there is no Bundle v a -> v a function converting the bundle back to a vector ... am I missing something?
I think what you're looking for is the function: unstream :: Vector v a => Bundle v a -> v a as implemented in the data.vector.generic.(new) module.
Refering to your toVector code example, I would rather suggest to avoid relying on the size hint given for the manually constructed stream/bundle and instead use a (possibly) more efficient method for the construction of the vector, e.g. by using the function: generateM :: Monad m => Int -> (Int -> m a) -> m (Vector a).
In addition, from a pure conceptual view, I'm wondering whether Leaf a as it stands is basically equivalent to Node a Empty Empty and therefore possibly redundant? How's this: type Leaf a = Node a Empty Empty?

What is the correct definition of `unfold` for an untagged tree?

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Monads with Join() instead of Bind()

Monads are usually explained in turns of return and bind. However, I gather you can also implement bind in terms of join (and fmap?)
In programming languages lacking first-class functions, bind is excruciatingly awkward to use. join, on the other hand, looks quite easy.
I'm not completely sure I understand how join works, however. Obviously, it has the [Haskell] type
join :: Monad m => m (m x) -> m x
For the list monad, this is trivially and obviously concat. But for a general monad, what, operationally, does this method actually do? I see what it does to the type signatures, but I'm trying to figure out how I'd write something like this in, say, Java or similar.
(Actually, that's easy: I wouldn't. Because generics is broken. ;-) But in principle the question still stands...)
Oops. It looks like this has been asked before:
Monad join function
Could somebody sketch out some implementations of common monads using return, fmap and join? (I.e., not mentioning >>= at all.) I think perhaps that might help it to sink in to my dumb brain...
Without plumbing the depths of metaphor, might I suggest to read a typical monad m as "strategy to produce a", so the type m value is a first class "strategy to produce a value". Different notions of computation or external interaction require different types of strategy, but the general notion requires some regular structure to make sense:
if you already have a value, then you have a strategy to produce a value (return :: v -> m v) consisting of nothing other than producing the value that you have;
if you have a function which transforms one sort of value into another, you can lift it to strategies (fmap :: (v -> u) -> m v -> m u) just by waiting for the strategy to deliver its value, then transforming it;
if you have a strategy to produce a strategy to produce a value, then you can construct a strategy to produce a value (join :: m (m v) -> m v) which follows the outer strategy until it produces the inner strategy, then follows that inner strategy all the way to a value.
Let's have an example: leaf-labelled binary trees...
data Tree v = Leaf v | Node (Tree v) (Tree v)
...represent strategies to produce stuff by tossing a coin. If the strategy is Leaf v, there's your v; if the strategy is Node h t, you toss a coin and continue by strategy h if the coin shows "heads", t if it's "tails".
instance Monad Tree where
return = Leaf
A strategy-producing strategy is a tree with tree-labelled leaves: in place of each such leaf, we can just graft in the tree which labels it...
join (Leaf tree) = tree
join (Node h t) = Node (join h) (join t)
...and of course we have fmap which just relabels leaves.
instance Functor Tree where
fmap f (Leaf x) = Leaf (f x)
fmap f (Node h t) = Node (fmap f h) (fmap f t)
Here's an strategy to produce a strategy to produce an Int.
Toss a coin: if it's "heads", toss another coin to decide between two strategies (producing, respectively, "toss a coin for producing 0 or producing 1" or "produce 2"); if it's "tails" produce a third ("toss a coin for producing 3 or tossing a coin for 4 or 5").
That clearly joins up to make a strategy producing an Int.
What we're making use of is the fact that a "strategy to produce a value" can itself be seen as a value. In Haskell, the embedding of strategies as values is silent, but in English, I use quotation marks to distinguish using a strategy from just talking about it. The join operator expresses the strategy "somehow produce then follow a strategy", or "if you are told a strategy, you may then use it".
(Meta. I'm not sure whether this "strategy" approach is a suitably generic way to think about monads and the value/computation distinction, or whether it's just another crummy metaphor. I do find leaf-labelled tree-like types a useful source of intuition, which is perhaps not a surprise as they're the free monads, with just enough structure to be monads at all, but no more.)
PS The type of "bind"
(>>=) :: m v -> (v -> m w) -> m w
says "if you have a strategy to produce a v, and for each v a follow-on strategy to produce a w, then you have a strategy to produce a w". How can we capture that in terms of join?
mv >>= v2mw = join (fmap v2mw mv)
We can relabel our v-producing strategy by v2mw, producing instead of each v value the w-producing strategy which follows on from it — ready to join!
join = concat -- []
join f = \x -> f x x -- (e ->)
join f = \s -> let (f', s') = f s in f' s' -- State
join (Just (Just a)) = Just a; join _ = Nothing -- Maybe
join (Identity (Identity a)) = Identity a -- Identity
join (Right (Right a)) = Right a; join (Right (Left e)) = Left e;
join (Left e) = Left e -- Either
join ((a, m), m') = (a, m' `mappend` m) -- Writer
-- N.B. there is a non-newtype-wrapped Monad instance for tuples that
-- behaves like the Writer instance, but with the tuple order swapped
join f = \k -> f (\f' -> f' k) -- Cont
Calling fmap (f :: a -> m b) (x ::ma) produces values (y ::m(m b)) so it is a very natural thing to use join to get back values (z :: m b).
Then bind is defined simply as bind ma f = join (fmap f ma), thus achieving the Kleisly compositionality of functions of (:: a -> m b) variety, which is what it is really all about:
ma `bind` (f >=> g) = (ma `bind` f) `bind` g -- bind = (>>=)
= (`bind` g) . (`bind` f) $ ma
= join . fmap g . join . fmap f $ ma
And so, with flip bind = (=<<), we have
((g <=< f) =<<) = (g =<<) . (f =<<) = join . (g <$>) . join . (f <$>)
OK, so it's not really good form to answer your own question, but I'm going to note down my thinking in case it enlightens anybody else. (I doubt it...)
If a monad can be thought of as a "container", then both return and join have pretty obvious semantics. return generates a 1-element container, and join turns a container of containers into a single container. Nothing hard about that.
So let us focus on monads which are more naturally thought of as "actions". In that case, m x is some sort of action which yields a value of type x when you "execute" it. return x does nothing special, and then yields x. fmap f takes an action that yields an x, and constructs an action that computes x and then applies f to it, and returns the result. So far, so good.
It's fairly obvious that if f itself generates an action, then what you end up with is m (m x). That is, an action that computes another action. In a way, that's maybe even simpler to wrap your mind around than the >>= function which takes an action and a "function that produces an action" and so on.
So, logically speaking, it seems join would run the first action, take the action it produces, and then run that. (Or rather, join would return an action that does what I just described, if you want to split hairs.)
That seems to be the central idea. To implement join, you want to run an action, which then gives you another action, and then you run that. (Whatever "run" happens to mean for this particular monad.)
Given this insight, I can take a stab at writing some join implementations:
join Nothing = Nothing
join (Just mx) = mx
If the outer action is Nothing, return Nothing, else return the inner action. Then again, Maybe is more of a container than an action, so let's try something else...
newtype Reader s x = Reader (s -> x)
join (Reader f) = Reader (\ s -> let Reader g = f s in g s)
That was... painless. A Reader is really just a function that takes a global state and only then returns its result. So to unstack, you apply the global state to the outer action, which returns a new Reader. You then apply the state to this inner function as well.
In a way, it's perhaps easier than the usual way:
Reader f >>= g = Reader (\ s -> let x = f s in g x)
Now, which one is the reader function, and which one is the function that computes the next reader...?
Now let's try the good old State monad. Here every function takes an initial state as input but also returns a new state along with its output.
data State s x = State (s -> (s, x))
join (State f) = State (\ s0 -> let (s1, State g) = f s0 in g s1)
That wasn't too hard. It's basically run followed by run.
I'm going to stop typing now. Feel free to point out all the glitches and typos in my examples... :-/
I've found many explanations of monads that say "you don't have to know anything about category theory, really, just think of monads as burritos / space suits / whatever".
Really, the article that demystified monads for me just said what categories were, described monads (including join and bind) in terms of categories, and didn't bother with any bogus metaphors:
http://en.wikibooks.org/wiki/Haskell/Category_theory
I think the article is very readable without much math knowledge required.
Asking what a type signature in Haskell does is rather like asking what an interface in Java does.
It, in some literal sense, "doesn't". (Though, of course, you will typically have some sort of purpose associated with it, that's mostly in your mind, and mostly not in the implementation.)
In both cases you are declaring legal sequences of symbols in the language which will be used in later definitions.
Of course, in Java, I suppose you could say that an interface corresponds to a type signature which is going to be implemented literally in the VM. You can get some polymorphism this way -- you can define a name that accepts an interface, and you can provide a different definition for the name which accepts a different interface. Something similar happens in Haskell, where you can provide a declaration for a name which accepts one type and then another declaration for that name which treats a different type.
This is Monad explained in one picture. The 2 functions in the green category are not composable, when being mapped to the blue category with join . fmap (strictly speaking, they are one category), they become composable. Monad is about turning a function of type T -> Monad<U> into a function of type Monad<T> -> Monad<U>.

Monad instance for binary tree

I built binary tree with:
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Ord, Read, Show)
How can i make Monad type class instance for this tree? And can i make it on not?
i try:
instance Monad Tree where
return x = Node x Empty Empty
Empty >>= f = Empty
(Node x Empty Empty) >>= f = f x
But i can't make (>>=) for Node x left right.
Thank you.
There is no (good) monad for the type you just described, exactly. It would require rebalancing the tree and merging together the intermediate trees that are generated by the bind, and you can't rebalance based on any information in 'a' because you know nothing about it.
However, there is a similar tree structure
data Tree a = Tip a | Bin (Tree a) (Tree a)
which admits a monad
instance Monad Tree where
return = Tip
Tip a >>= f = f a
Bin l r >>= f = Bin (l >>= f) (r >>= f)
I talked about this and other tree structures a year or two back at Boston Haskell as a lead-in to talking about finger trees. The slides there may be helpful in exploring the difference between leafy and traditional binary trees.
The reason I said there is no good monad, is that any such monad would have to put the tree into a canonical form for a given number of entries to pass the monad laws or quotient out some balance concerns by not exposing the constructors to the end user, but doing the former would require much more stringent reordering than you get for instance from an AVL or weighted tree.

Resources