What is the correct definition of `unfold` for an untagged tree? - haskell

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?

If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Related

Understanding the Fix datatype in Haskell

In this article about the Free Monads in Haskell we are given a Toy datatype defined by:
data Toy b next =
Output b next
| Bell next
| Done
Fix is defined as follows:
data Fix f = Fix (f (Fix f))
Which allows to nest Toy expressions by preserving a common type:
Fix (Output 'A' (Fix Done)) :: Fix (Toy Char)
Fix (Bell (Fix (Output 'A' (Fix Done)))) :: Fix (Toy Char)
I understand how fixed points work for regular functions but I'm failing to see how the types are reduced in here. Which are the steps the compiler follows to evaluate the type of the expressions?
I'll make a more familiar, simpler type using Fix to see if you'll understand it.
Here's the list type in a normal recursive definition:
data List a = Nil | Cons a (List a)
Now, thinking back at how we use fix for functions, we know that we have to pass the function to itself as an argument. In fact, since List is recursive, we can write a simpler nonrecursive datatype like so:
data Cons a recur = Nil | Cons a recur
Can you see how this is similar to, say, the function f a recur = 1 + recur a? In the same way that fix would pass f as an argument to itself, Fix passes Cons as an argument to itself. Let's inspect the definitions of fix and Fix side-by-side:
fix :: (p -> p) -> p
fix f = f (fix f)
-- Fix :: (* -> *) -> *
newtype Fix f = Fix {nextFix :: f (Fix f)}
If you ignore the fluff of the constructor names and so on, you'll see that these are essentially exactly the same definition!
For the example of the Toy datatype, one could just define it recursively like so:
data Toy a = Output a (Toy a) | Bell (Toy a) | Done
However, we could use Fix to pass itself into itself, replacing all instances of Toy a with a second type parameter:
data ToyStep a recur = OutputS a recur | BellS recur | DoneS
so, we can then just use Fix (ToyStep a), which will be equivalent to Toy a, albeit in a different form. In fact, let's demonstrate them to be equivalent:
toyToStep :: Toy a -> Fix (ToyStep a)
toyToStep (Output a next) = Fix (OutputS a (toyToStep next))
toyToStep (Bell next) = Fix (BellS (toyToStep next))
toyToStep Done = Fix DoneS
stepToToy :: Fix (ToyStep a) -> Toy a
stepToToy (Fix (OutputS a next)) = Output a (stepToToy next)
stepToToy (Fix (BellS next)) = Bell (stepToToy next)
stepToToy (Fix (DoneS)) = DoneS
You might be wondering, "Why do this?" Well usually, there's not much reason to do this. However, defining these sort of simplified versions of datatypes actually allow you to make quite expressive functions. Here's an example:
unwrap :: Functor f => (f k -> k) -> Fix f -> k
unwrap f n = f (fmap (unwrap f) n)
This is really an incredible function! It surprised me when I first saw it! Here's an example using the Cons datatype we made earlier, assuming we made a Functor instance:
getLength :: Cons a Int -> Int
getLength Nil = 0
getLength (Cons _ len) = len + 1
length :: Fix (Cons a) -> Int
length = unwrap getLength
This essentially is fix for free, given that we use Fix on whatever datatype we use!
Let's now imagine a function, given that ToyStep a is a functor instance, that simply collects all the OutputSs into a list, like so:
getOutputs :: ToyStep a [a] -> [a]
getOutputs (OutputS a as) = a : as
getOutputs (BellS as) = as
getOutputs DoneS = []
outputs :: Fix (ToyStep a) -> [a]
outputs = unwrap getOutputs
This is the power of using Fix rather than having your own datatype: generality.

Finding a linear path in a traversable structure

Given a list of steps:
>>> let path = ["item1", "item2", "item3", "item4", "item5"]
And a labeled Tree:
>>> import Data.Tree
>>> let tree = Node "item1" [Node "itemA" [], Node "item2" [Node "item3" []]]
I'd like a function that goes through the steps in path matching the labels in tree until it can't go any further because there are no more labels matching the steps. Concretely, here it falls when stepping into "item4" (for my use case I still need to specify the last matched step):
>>> trav path tree
["item3", "item4", "item5"]
If I allow [String] -> Tree String -> [String] as the type of trav I could write a recursive function that steps in both structures at the same time until there are no labels to match the step. But I was wondering if a more general type could be used, specifically for Tree. For example: Foldable t => [String] -> t String -> [String]. If this is possible, how trav could be implemented?
I suspect there could be a way to do it using lens.
First, please let's use type Label = String. String is not exactly descriptive and might not be ideal in the end...
Now. To use Traversable, you need to pick a suitable Applicative that can contain the information you need for deciding what to do in its "structure". You only need to pass back information after a match has failed. That sounds like some Either!
A guess would thus be Either [Label] (t Label) as the pre-result. That would mean, we use the instantiation
traverse :: Traversable t
=> (Label -> Either [Label] Label) -> t Label -> Either [Label] (t Label)
So what can we pass as the argument function?
travPt0 :: [Label] -> Label -> Either [Label] Label
travPt0 ls#(l0 : _) label
| l0 /= label = Left ls
| otherwise = Right label ?
The problem is, traverse will then fail immediately and completely if any node has a non-matching label. Traversable doesn't actually have a notion of "selectively" diving down into a data structure, it just passes through everything, always. Actually, we only want to match on the topmost node at first, only that one is mandatory to match at first.
One way to circumvent immediate deep-traversal is to first split up the tree into a tree of sub-trees. Ok, so... we need to extract the topmost label. We need to split the tree in subtrees. Reminds you of anything?
trav' :: (Traversable t, Comonad t) => [Label] -> t Label -> [Label]
trav' (l0 : ls) tree
| top <- extract tree
= if top /= l0 then l0 : ls
else let subtrees = duplicate tree
in ... ?
Now amongst those subtrees, we're basically interested only in the one that matches. This can be determined from the result of trav': if the second element is passed right back again, we have a failure. Unlike normal nomenclature with Either, this means we wish to go on, but not use that branch! So we need to return Either [Label] ().
else case ls of
[] -> [l0]
l1:ls' -> let subtrees = duplicate tree
in case traverse (trav' ls >>> \case
(l1':_)
| l1'==l1 -> Right ()
ls'' -> Left ls''
) subtrees of
Left ls'' -> ls''
Right _ -> l0 : ls -- no matches further down.
I have not tested this code!
We'll take as reference the following recursive model
import Data.List (minimumBy)
import Data.Ord (comparing)
import Data.Tree
-- | Follows a path into a 'Tree' returning steps in the path which
-- are not contained in the 'Tree'
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail (a:as) (Node a' trees)
| a == a' = minimumBy (comparing length)
$ (a:as) : map (treeTail as) trees
| otherwise = as
which suggests that the mechanism here is less that we're traversing through the tree accumulating (which is what a Traversable instance might do) but more that we're stepping through the tree according to some state and searching for the deepest path.
We can characterize this "step" by a Prism if we like.
import Control.Lens
step :: Eq a => a -> Prism' (Tree a) (Forest a)
step a =
prism' (Node a)
(\n -> if rootLabel n == a
then Just (subForest n)
else Nothing)
This would allow us to write the algorithm as
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail pth#(a:as) t =
maybe (a:as)
(minimumBy (comparing length) . (pth:) . map (treeTail as))
(t ^? step a)
but I'm not sure that's significantly more clear.

QuickCheck giving up investigating a recursive data structure (rose tree.)

Given an arbitrary tree, I can construct a subtype relation over that tree, using Schubert numbering:
constructH :: Tree a -> Tree (Type a)
where Type nests the original label, and additionally provides the data needed to perform child/parent (or subtype) checks. With Schubert Numbering, the two Int parameters are sufficient for that.
data Type a where !Int -> !Int -> a -> Type a
This leads to the binary predicate
subtypeOf :: Type a -> Type a -> Bool
I now want to test with QuickCheck that this does indeed do what I want it to do. The following property, however, does not work, because QuickCheck just gives up:
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (\x → x `subtypeOf` t):(map subtypeSanity f))
If I leave out the recursive call to subtypeSanity, i.e. the tail of the list I'm passing to conjoin, the property runs fine, but tests just the root node of the tree! How can I descend into my data structure recursively without QuickCheck giving up on generating new test cases?
If needed, I could provide the code to construct the Schubert Hierarchy, and the Arbitrary instance for Tree (Type a), to provide a complete runnable example, but that would be quite a bit of code. I'm convinced that I'm just not "getting" QuickCheck, and using it in the wrong way here.
EDIT: unfortunately, the sized function does not seem to eliminate the problem here. It ends up with the same result (see comment to J. Abrahamson's answer.)
EDIT II: I ended up "fixing" my problem by avoiding the recursive step, and avoiding conjoin. We just make a list of all nodes in the tree, then test the single-node property (which worked fine from the beginning) on those.
allNodes ∷ Tree a → [Tree a]
allNodes n#(Node { subForest = f }) = n:(concatMap allNodes f)
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity tree = forAll (elements $ allNodes tree)
(\(Node { rootLabel = t, subForest = f }) →
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> forAll (elements subtypes) (\x → x `subtypeOf` t))
Tweaking the Arbitrary instance for trees did not work. Here is the arbitrary instance I'm still using:
instance (Arbitrary a, Eq a) ⇒ Arbitrary (Tree (Type a)) where
arbitrary = liftM (constructH) $ sized arbTree
arbTree ∷ Arbitrary a ⇒ Int → Gen (Tree a)
arbTree n = do
m ← choose (0,n)
if m == 0
then Node <$> arbitrary <*> (return [])
else do part ← randomPartition n m
Node <$> arbitrary <*> mapM arbTree part
-- this is a crude way to find a sufficiently random x1,..,xm,
-- such that x1 + .. + xm = n, for any n, m, with 0 < m.
randomPartition ∷ Int → Int → Gen [Int]
randomPartition n m' = do
let m = m' - 1
seed ← liftM ((++[n]) . sort) $ replicateM m (choose (0,n))
return $ zipWith (-) seed (0:seed)
I consider the problem "solved for now," but if someone could explain to me why the recursive step and/or conjoin made QuickCheck give up (after passing "only" 0 tests,) I would be more than grateful.
When generating Arbitrary recursive structures, QuickCheck is often a bit too eager and generates sprawling, enormous random examples. These are undesirable as they usually don't better check the properties of interest and can be very slow. Two solutions are
Use things like the size parameter (sized function) and frequency function to bias the generator toward small trees.
Use a small-type oriented generator like those in smallcheck. These try to exhaustively generate all "small" examples and thus help to keep the size of the tree down.
To clarify the sized and frequency method of controlling generation size, here's an example RoseTree
data Rose a = It a | Rose [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = frequency
[ (3, It <$> arbitrary) -- The 3-to-1 ratio is chosen, ah,
-- arbitrarily...
-- you'll want to tune it
, (1, Rose <$> children)
]
where children = sized $ \n -> vectorOf n arbitrary
It can be done even more simply with a different Rose formation by very carefully controlling the size of the child list
data Rose a = Rose a [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = Rose <$> arbitrary <*> sized (\n -> vectorOf (tuneUp n) arbitrary)
where tuneUp n = round $ fromIntegral n / 4.0
You could do this without referencing sized, but that gives the user of your Arbitrary instance a knob to ask for larger trees if needed.
In case it's useful for those stumbling across this issue: when QuickCheck "gives up", it's a sign that your pre-condition (using ==>) is too hard to satisfy.
QuickCheck uses a simple rejection sampling technique: pre-conditions have no effect on the generation of values. QuickCheck generates a bunch of random values like normal. After these are generated, they're sent through the pre-condition: if the result is True, the property is tested with that value; if it's False, that value is discarded. If your pre-condition rejects most of the values QuickCheck has generated, then QuickCheck will "give up" (better to give up completely, than to make statistically dubious pass/fail claims).
In particular, QuickCheck will not attempt to produce values which satisfy a given pre-condition. It's up to you to make sure that the generator you're using (arbitrary or otherwise) produces lots of values which pass your pre-condition.
Let's see how this is manifesting in your example:
subtypeSanity :: Tree (Type ()) -> Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (`subtypeOf` t):(map subtypeSanity f))
There is only one occurance of ==>, so its precondition (not $ null subtypes) must be too hard to satisfy. This is due to the recursive call map subtypeSanity f: not only are you rejecting any Tree which has an empty subForest, you're also (due to the recursion) rejecting any Tree where the subForest contains Trees with empty subForests, and rejecting any Tree where the subForest contains Trees with subForests containing Trees with empty subForests, and so on.
According to your arbitrary instance, Trees are only nested to finite depth: eventually we will always reach an empty subForest, hence your recursive precondition will always fail, and QuickCheck will give up.

Is there an elegant way to have functions return functions of the same type (in a tuple)

I'm using haskell to implement a pattern involving functions that return a value, and themselves (or a function of the same type). Right now I've implemented this like so:
newtype R a = R (a , a -> R a)
-- some toy functions to demonstrate
alpha :: String -> R String
alpha str
| str == reverse str = R (str , omega)
| otherwise = R (reverse str , alpha)
omega :: String -> R String
omega (s:t:r)
| s == t = R (s:t:r , alpha)
| otherwise = R (s:s:t:r , omega)
The driving force for these types of functions is a function called cascade:
cascade :: (a -> R a) -> [a] -> [a]
cascade _ [] = []
cascade f (l:ls) = el : cascade g ls where
R (el , g) = f l
Which takes a seed function and a list, and returns a list created by applying the seed function to the first element of the list, applying the function returned by that to the second element of the list, and so on and so forth.
This works--however, in the process of using this for slightly more useful things, I noticed that a lot of times I had the basic units of which are functions that returned functions other than themselves only rarely; and explicitly declaring a function to return itself was becoming somewhat tedious. I'd rather be able to use something like a Monad's return function, however, I have no idea what bind would do for functions of these types, especially since I never intended these to be linked with anything other than the function they return in the first place.
Trying to shoehorn this into a Monad started worrying me about whether or not what I was doing was useful, so, in short, what I want to know is:
Is what I'm doing a Bad Thing? if not,
Has what I'm doing been done before/am I reinventing the wheel here? if not,
Is there an elegant way to do this, or have I already reached this and am being greedy by wanting some kind of return analogue?
(Incidentally, besides, 'functions that return themeselves' or 'recursive data structure (of functions)', I'm not quite sure what this kind of pattern is called, and has made trying to do effective research in it difficult--if anyone could give me a name for this pattern (if it indeed has one), that alone would be very helpful)
As a high-level consideration, I'd say that your type represents a stateful stream transformer. What's a bit confusing here is that your type is defined as
newtype R a = R (a , a -> R a)
instead of
newtype R a = R (a -> (R a, a))
which would be a bit more natural in the streaming context because you typically don't "produce" something if you haven't received anything yet. Your functions would then have simpler types too:
alpha, omage :: R String
cascade :: R a -> [a] -> [a]
If we try to generalize this idea of a stream transformer, we soon realize that the case where we transform a list of as into a list of as is just a special case. With the proper infrastructure in place we could just as well produce a list of bs. So we try to generalize the type R:
newtype R a b = R (a -> (R a b, b))
I've seen this kind of structure being called a Circuit, which happens to be a full-blown arrow. Arrows are a generalization of the concept of functions and are an even more powerful construct than monads. I can't pretend to understand the category-theoretical background, but it's definitely interesting to play with them. For example, the trivial transformation is just Cat.id:
import Control.Category
import Control.Arrow
import Prelude hiding ((.), id)
import qualified Data.List as L
-- ... Definition of Circuit and instances
cascade :: Circuit a b -> [a] -> [b]
cascade cir = snd . L.mapAccumL unCircuit cir
--
ghci> cascade (Cat.id) [1,2,3,4]
[1,2,3,4]
We can also simulate state by parameterizing the circuit we return as the continuation:
countingCircuit :: (a -> b) -> Circuit a (Int, b)
countingCircuit f = cir 0
where cir i = Circuit $ \x -> (cir (i+1), (i, f x))
--
ghci> cascade (countingCircuit (+5)) [10,3,2,11]
[(0,15),(1,8),(2,7),(3,16)]
And the fact that our circuit type is a category gives us a nice way to compose circuits:
ghci> cascade (countingCircuit (+5) . arr (*2)) [10,3,2,11]
[(0,25),(1,11),(2,9),(3,27)]
It looks like what you have is a simplified version of a stream. That is to
say, a representation of an infinite stream of values. I don't think you can
easily define this as a monad, because you use the same type for your seed as
for your elements, which makes defining fmap difficult (it seems that you
would need to invert the function provided to fmap so as to be able to
recover the seed). You can make this a monad by making the seed type
independent of the element type like so
{-# LANGUAGE ExistentialQuantification #-}
data Stream a = forall s. Stream a s (s -> Stream a)
This will allow you to define a Functor and Monad instance as follows
unfold :: (b -> (a, b)) -> b -> Stream a
unfold f b = Stream a b' (unfold f)
where (a, b') = f b
shead :: Stream a -> a
shead (Stream a _ _) = a
stail :: Stream a -> Stream a
stail (Stream _ b f) = f b
diag :: Stream (Stream a) -> Stream a
diag = unfold f
where f str = (shead $ shead str, stail $ fmap stail str)
sjoin :: Stream (Stream a) -> Stream a
sjoin = diag
instance Functor Stream where
fmap f (Stream a b g) = Stream (f a) b (fmap f . g)
instance Monad Stream where
return = unfold (\x -> (x, x))
xs >>= f = diag $ fmap f xs
Note that this only obeys the Monad laws when viewed as a set, as it does not
preserve element ordering.
This explanation
of the stream monad uses infinite lists, which works just as well in Haskell
since they can be generated in a lazy fashion. If you check out the
documentation for the Stream type in the vector library, you will
find a more complicated version, so that it can be used in efficient stream fusion.
I don't have much to add, except to note that your cascade function can be written as a left fold (and hence also as a right fold, though I haven't done the transformation.)
cascade f = reverse . fst . foldl func ([], f)
where
func (rs,g) s = let R (r,h) = g s in (r:rs,h)

Catamorphism and tree-traversing in Haskell

I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a#(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a#(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.
Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:
sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
, branch = (+) }
Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.
The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.
Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
data TreeNode a child
= Leaf a
| Branch child child
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.
type Mu f = f (Mu f)
Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).
So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:
type IntNode = TreeNode Int
Expanding Mu IntNode, we get:
Mu IntNode = IntNode (Mu IntNode)
= Leaf Int | Branch (Mu IntNode) (Mu IntNode)
Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.
Let's define:
type IntTree = Mu IntNode
I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):
cata :: (node r -> r) -> Mu node -> r
Making this concrete for our example, this looks like:
cata :: (IntNode r -> r) -> IntTree -> r
Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.
In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.
fmap :: (a -> b) -> node a -> node b
This can be done straightforwardly for IntNode.
fmap f (Leaf x) = Leaf x -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r) -- apply function to each child
Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):
cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)
I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.
Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:
foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r
Thus the tree catamorphism can be defined in terms of our generic one as follows:
type Tree a = Mu (TreeNode a)
treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!
I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0,
0 = z,
1 = f z,
2 = f (f z),
3 = f (f (f z)),
etc...
What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.

Resources