What is the most general way to compute the depth of a tree with something like a fold? - haskell

What minimal (most general) information is required to compute depth of a Data.Tree? Is instance of a Data.Foldable sufficient?
I initially tried to fold a Tree and got stuck trying to find right Monoid similar to Max. Something tells me that since Monoid (that would compute depth) needs to be associative, it probably cannot be used to express any fold that needs to be aware of the structure (as in 1 + maxChildrenDepth), but I'm not certain.
I wonder what thought process would let me arrive at right abstraction for such cases.

I can't say if it's a minimal/most general amount of information. But one general solution is that a given structure
is a catamorphism
underlying functor of the catamorphism is Foldable so that it's possible to enumerate sub-terms.
Here is sample code using recursion-schemes.
{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
import Data.Functor.Foldable
import Data.Semigroup
import Data.Tree
depth :: (Recursive f, Foldable (Base f)) => f -> Int
depth = cata ((+ 1) . maybe 0 getMax . getOption
. foldMap (Option . Just . Max))
-- Necessary instances for Tree:
data TreeF a t = NodeF { rootLabel' :: a, subForest :: [t] }
type instance Base (Tree a) = TreeF a
instance Functor (TreeF a) where
fmap f (NodeF x ts) = NodeF x (map f ts)
instance Foldable (TreeF a) where
foldMap f (NodeF _ ts) = foldMap f ts
instance Recursive (Tree a) where
project (Node x ts) = NodeF x ts

To answer the first question: Data.Foldable is not enough to compute the depth of the tree. The minimum complete definition of Foldable is foldr, which always has the following semantics:
foldr f z = Data.List.foldr f z . toList
In other words, a Foldable instance is fully characterized by how it behaves on a list projection of the input (ie toList), which will throw away the depth information of a tree.
Other ways of verifying this idea involve the fact that Foldable depends on a monoid instance which has to be associative or the fact that the various fold functions see the elements one by one in some particular order with no other information, which necessarily throws out the actual tree structure. (There has to be more than one tree with the same set of elements in the same relative order.)
I'm not sure what the minimal abstraction would be for trees specifically, but I think the core of your question is actually a bit broader: it would be interesting to see what minimum amount of information is needed to compute arbitrary facts about a type with a fold-like function.
To do this, the actual helper function in the fold would have to take a different sort of argument for each sort of data structure. This naturally leads us to catamorphisms, which are generalized folds over different data types.
You can read more about these generalized folds on a different Stack Overflow question: What constitutes a fold for types other than list? (In the interest of disclosure/self-promotion, I wrote one of the answeres there :P.)

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)
Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)
universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

In Haskell, why is map a separate function to fmap? [duplicate]

Everywhere I've tried using map, fmap has worked as well. Why did the creators of Haskell feel the need for a map function? Couldn't it just be what is currently known as fmap and fmap could be removed from the language?
I would like to make an answer to draw attention to augustss's comment:
That's not actually how it happens. What happened was that the type of map was generalized to cover Functor in Haskell 1.3. I.e., in Haskell 1.3 fmap was called map. This change was then reverted in Haskell 1.4 and fmap was introduced. The reason for this change was pedagogical; when teaching Haskell to beginners the very general type of map made error messages more difficult to understand. In my opinion this wasn't the right way to solve the problem.
Haskell 98 is seen as a step backwards by some Haskellers (including me), previous versions having defined a more abstract and consistent library. Oh well.
Quoting from the Functor documentation at https://wiki.haskell.org/Typeclassopedia#Functor
You might ask why we need a separate map function. Why not just do
away with the current list-only map function, and rename fmap to map
instead? Well, that’s a good question. The usual argument is that
someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functor.
They look the same on the application site but they're different, of course. When you apply either of those two functions, map or fmap, to a list of values they will produce the same result but that doesn't mean they're meant for the same purpose.
Run a GHCI session (the Glasgow Haskell Compiler Interactive) to query for information about those two functions, then have a look at their implementations and you will discover many differences.
map
Query GHCI for information about map
Prelude> :info map
map :: (a -> b) -> [a] -> [b] -- Defined in ‘GHC.Base’
and you'll see it defined as an high-order function applicable to a list of values of any type a yielding a list of values of any type b. Although polymorphic (the a and b in the above definition stand for any type) the map function is intended to be applied to a list of values which is just one possible data type amongst many others in Haskell. The map function could not be applied to something which is not a list of values.
As you can read from the GHC.Base source code, the map function is implemented as follows
map _ [] = []
map f (x:xs) = f x : map f xs
which makes use of pattern matching to pull the head (the x) off the tail (the xs) of the list, then constructs a new list by using the : (cons) value constructor so to prepend f x (read it as "f applied to x") to the recursion of map over the tail until the list is empty. It's worth noticing that the implementation of the mapfunction does not rely upon any other function but just on itself.
fmap
Now try to query for information about fmap and you'll see something quite different.
Prelude> :info fmap
class Functor (f :: * -> *) where
fmap :: (a -> b) -> f a -> f b
...
-- Defined in ‘GHC.Base’
This time fmap is defined as one of the functions whose implementations must be provided by those data types which wish to belong to the Functor type class. That means that there can be more than one data types, not only the "list of values" data type, able to provide an implementation for the fmap function. That makes fmap applicable to a much larger set of data types: the functors indeed!
As you can read from the GHC.Base source code, a possible implementation of the fmap function is the one provided by the Maybe data type:
instance Functor Maybe where
fmap _ Nothing = Nothing
fmap f (Just a) = Just (f a)
and another possible implementation is the one provided by the 2-tuple data type
instance Functor ((,) a) where
fmap f (x,y) = (x, f y)
and another possible implementation is the one provided by the list data type (of course!):
instance Functor [] where
fmap f xs = map f xs
which relies upon the map function.
Conclusion
The map function can be applied to nothing more than list of values (where values are of any type) whereas the fmap function can be applied much more data types: all of those which belongs to the functor class (e.g. maybes, tuples, lists, etc.). Since the "list of values" data type is also a functor (because it provides an implementation for it) then fmap can be applied to is as well producing the very same result as map.
map (+3) [1..5]
fmap (+3) (Just 15)
fmap (+3) (5, 7)

Are Functor instances unique?

I was wondering to what extent Functor instances in Haskell are determined (uniquely) by the functor laws.
Since ghc can derive Functor instances for at least "run-of-the-mill" data types, it seems that they must be unique at least in a wide variety of cases.
For convenience, the Functor definition and functor laws are:
class Functor f where
fmap :: (a -> b) -> f a -> f b
fmap id = id
fmap (g . h) = (fmap g) . (fmap h)
Questions:
Can one derive the definition of map starting from the assumption that it is a Functor instance for data List a = Nil | Cons a (List a)? If so, what assumptions have to be made in order to do this?
Are there any Haskell data types which have more than one Functor instances which satisfy the functor laws?
When can ghc derive a functor instance and when can't it?
Does all of this depend how we define equality? The Functor laws are expressed in terms of an equality of values, yet we don't require Functors to have Eq instances. So is there some choice here?
Regarding equality, there is certainly a notion of what I call "constructor equality" which allows us to reason that [a,a,a] is "equal" to [a,a,a] for any value of a of any type even if a does not have (==) defined for it. All other (useful) notions of equality are probably coarser that this equivalence relationship. But I suspect that the equality in the Functor laws are more of an "reasoning equality" relationship and can be application specific. Any thoughts on this?
See Brent Yorgey's Typeclassopedia:
Unlike some other type classes we will encounter, a given type has at most one valid instance of Functor. This can be proven via the free theorem for the type of fmap. In fact, GHC can automatically derive Functor instances for many data types.
"When can GHC derive a functor instance and when can't it?"
When we have intentional circular data structures. The type system does not allow us to express our intent of a forced circularity.
So, ghc can derive an instance, similar to that which we want, but not the same.
Circular data structures
are probably the only case where the Functor should be implemented a different way.
But then again, it would have the same semantic.
data HalfEdge a = HalfEdge { label :: a , companion :: HalfEdge a }
instance Functor HalfEdge where
fmap f (HalfEdge a (HalfEdge b _)) = fix $ HalfEdge (f a) . HalfEdge (f b)
EDIT:
HalfEdges are structures (known in computer graphics, 3d meshes...) that represent undirected Edges in a graph, where you can have a reference to either end. Usually they store more references to neighbour HalfEdges, Nodes and Faces.
newEdge :: a -> a -> HalfEdge a
newEdge a b = fix $ HalfEdge a . HalfEdge b
Semantically, there is no fix $ HalfEdge 0 . HalfEdge 1 . HalfEdge 2, because edges are always composed out of exactly two half edges.
EDIT 2:
In the haskell community, the quote "Tying the Knot" is known for this kind of data structure. It is about data structures that are semantically infinite because they cycle. They consume only finite memory. Example: given ones = 1:ones, we will have these semantically equivalent implementations of twos:
twos = fmap (+1) ones
twos = fix ((+1)(head ones) :)
If we traverse (first n elements of) twos and still have a reference to the begin of that list, these implementations differ in speed (evaluate 1+1 each time vs only once) and memory consumption (O(n) vs O(1)).

Can someone explain the traverse function in Haskell?

I am trying and failing to grok the traverse function from Data.Traversable. I am unable to see its point. Since I come from an imperative background, can someone please explain it to me in terms of an imperative loop? Pseudo-code would be much appreciated. Thanks.
traverse is the same as fmap, except that it also allows you to run effects while you're rebuilding the data structure.
Take a look at the example from the Data.Traversable documentation.
data Tree a = Empty | Leaf a | Node (Tree a) a (Tree a)
The Functor instance of Tree would be:
instance Functor Tree where
fmap f Empty = Empty
fmap f (Leaf x) = Leaf (f x)
fmap f (Node l k r) = Node (fmap f l) (f k) (fmap f r)
It rebuilds the entire tree, applying f to every value.
instance Traversable Tree where
traverse f Empty = pure Empty
traverse f (Leaf x) = Leaf <$> f x
traverse f (Node l k r) = Node <$> traverse f l <*> f k <*> traverse f r
The Traversable instance is almost the same, except the constructors are called in applicative style. This means that we can have (side-)effects while rebuilding the tree. Applicative is almost the same as monads, except that effects cannot depend on previous results. In this example it means that you could not do something different to the right branch of a node depending on the results of rebuilding the left branch for example.
For historical reasons, the Traversable class also contains a monadic version of traverse called mapM. For all intents and purposes mapM is the same as traverse - it exists as a separate method because Applicative only later became a superclass of Monad.
If you would implement this in an impure language, fmap would be the same as traverse, as there is no way to prevent side-effects. You can't implement it as a loop, as you have to traverse your data structure recursively. Here's a small example how I would do it in Javascript:
Node.prototype.traverse = function (f) {
return new Node(this.l.traverse(f), f(this.k), this.r.traverse(f));
}
Implementing it like this limits you to the effects that the language allows though. If you f.e. want non-determinism (which the list instance of Applicative models) and your language doesn't have it built-in, you're out of luck.
traverse turns things inside a Traversable into a Traversable of things "inside" an Applicative, given a function that makes Applicatives out of things.
Let's use Maybe as Applicative and list as Traversable. First we need the transformation function:
half x = if even x then Just (x `div` 2) else Nothing
So if a number is even, we get half of it (inside a Just), else we get Nothing. If everything goes "well", it looks like this:
traverse half [2,4..10]
--Just [1,2,3,4,5]
But...
traverse half [1..10]
-- Nothing
The reason is that the <*> function is used to build the result, and when one of the arguments is Nothing, we get Nothing back.
Another example:
rep x = replicate x x
This function generates a list of length x with the content x, e.g. rep 3 = [3,3,3]. What is the result of traverse rep [1..3]?
We get the partial results of [1], [2,2] and [3,3,3] using rep. Now the semantics of lists as Applicatives is "take all combinations", e.g. (+) <$> [10,20] <*> [3,4] is [13,14,23,24].
"All combinations" of [1] and [2,2] are two times [1,2]. All combinations of two times [1,2] and [3,3,3] are six times [1,2,3]. So we have:
traverse rep [1..3]
--[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
I think it's easiest to understand in terms of sequenceA, as traverse can be defined as
follows.
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
traverse f = sequenceA . fmap f
sequenceA sequences together the elements of a structure from left to right, returning a structure with the same shape containing the results.
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
sequenceA = traverse id
You can also think of sequenceA as reversing the order of two functors, e.g. going from a list of actions into an action returning a list of results.
So traverse takes some structure, and applies f to transform every element in the structure into some applicative, it then sequences up the effects of those applicatives from left to right, returning a structure with the same shape containing the results.
You can also compare it to Foldable, which defines the related function traverse_.
traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
So you can see that the key difference between Foldable and Traversable is that the latter allows you to preserve the shape of the structure, whereas the former requires you to fold the result up into some other value.
A simple example of its usage is using a list as the traversable structure, and IO as the applicative:
λ> import Data.Traversable
λ> let qs = ["name", "quest", "favorite color"]
λ> traverse (\thing -> putStrLn ("What is your " ++ thing ++ "?") *> getLine) qs
What is your name?
Sir Lancelot
What is your quest?
to seek the holy grail
What is your favorite color?
blue
["Sir Lancelot","to seek the holy grail","blue"]
While this example is rather unexciting, things get more interesting when traverse is used on other types of containers, or using other applicatives.
It's kind of like fmap, except that you can run effects inside the mapper function, which also changes the result type.
Imagine a list of integers representing user IDs in a database: [1, 2, 3]. If you want to fmap these user IDs to usernames, you can't use a traditional fmap, because inside the function you need to access the database to read the usernames (which requires an effect -- in this case, using the IO monad).
The signature of traverse is:
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
With traverse, you can do effects, therefore, your code for mapping user IDs to usernames looks like:
mapUserIDsToUsernames :: (Num -> IO String) -> [Num] -> IO [String]
mapUserIDsToUsernames fn ids = traverse fn ids
There's also a function called mapM:
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Any use of mapM can be replaced with traverse, but not the other way around. mapM only works for monads, whereas traverse is more generic.
If you just want to achieve an effect and not return any useful value, there are traverse_ and mapM_ versions of these functions, both of which ignore the return value from the function and are slightly faster.
traverse is the loop. Its implementation depends on the data structure to be traversed. That might be a list, tree, Maybe, Seq(uence), or anything that has a generic way of being traversed via something like a for-loop or recursive function. An array would have a for-loop, a list a while-loop, a tree either something recursive or the combination of a stack with a while-loop; but in functional languages you do not need these cumbersome loop commands: you combine the inner part of the loop (in the shape of a function) with the data structure in a more directly manner and less verbose.
With the Traversable typeclass, you could probably write your algorithms more independent and versatile. But my experience says, that Traversable is usually only used to simply glue algorithms to existing data structures. It is quite nice not to need to write similar functions for different datatypes qualified, too.

What's the point of map in Haskell, when there is fmap?

Everywhere I've tried using map, fmap has worked as well. Why did the creators of Haskell feel the need for a map function? Couldn't it just be what is currently known as fmap and fmap could be removed from the language?
I would like to make an answer to draw attention to augustss's comment:
That's not actually how it happens. What happened was that the type of map was generalized to cover Functor in Haskell 1.3. I.e., in Haskell 1.3 fmap was called map. This change was then reverted in Haskell 1.4 and fmap was introduced. The reason for this change was pedagogical; when teaching Haskell to beginners the very general type of map made error messages more difficult to understand. In my opinion this wasn't the right way to solve the problem.
Haskell 98 is seen as a step backwards by some Haskellers (including me), previous versions having defined a more abstract and consistent library. Oh well.
Quoting from the Functor documentation at https://wiki.haskell.org/Typeclassopedia#Functor
You might ask why we need a separate map function. Why not just do
away with the current list-only map function, and rename fmap to map
instead? Well, that’s a good question. The usual argument is that
someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functor.
They look the same on the application site but they're different, of course. When you apply either of those two functions, map or fmap, to a list of values they will produce the same result but that doesn't mean they're meant for the same purpose.
Run a GHCI session (the Glasgow Haskell Compiler Interactive) to query for information about those two functions, then have a look at their implementations and you will discover many differences.
map
Query GHCI for information about map
Prelude> :info map
map :: (a -> b) -> [a] -> [b] -- Defined in ‘GHC.Base’
and you'll see it defined as an high-order function applicable to a list of values of any type a yielding a list of values of any type b. Although polymorphic (the a and b in the above definition stand for any type) the map function is intended to be applied to a list of values which is just one possible data type amongst many others in Haskell. The map function could not be applied to something which is not a list of values.
As you can read from the GHC.Base source code, the map function is implemented as follows
map _ [] = []
map f (x:xs) = f x : map f xs
which makes use of pattern matching to pull the head (the x) off the tail (the xs) of the list, then constructs a new list by using the : (cons) value constructor so to prepend f x (read it as "f applied to x") to the recursion of map over the tail until the list is empty. It's worth noticing that the implementation of the mapfunction does not rely upon any other function but just on itself.
fmap
Now try to query for information about fmap and you'll see something quite different.
Prelude> :info fmap
class Functor (f :: * -> *) where
fmap :: (a -> b) -> f a -> f b
...
-- Defined in ‘GHC.Base’
This time fmap is defined as one of the functions whose implementations must be provided by those data types which wish to belong to the Functor type class. That means that there can be more than one data types, not only the "list of values" data type, able to provide an implementation for the fmap function. That makes fmap applicable to a much larger set of data types: the functors indeed!
As you can read from the GHC.Base source code, a possible implementation of the fmap function is the one provided by the Maybe data type:
instance Functor Maybe where
fmap _ Nothing = Nothing
fmap f (Just a) = Just (f a)
and another possible implementation is the one provided by the 2-tuple data type
instance Functor ((,) a) where
fmap f (x,y) = (x, f y)
and another possible implementation is the one provided by the list data type (of course!):
instance Functor [] where
fmap f xs = map f xs
which relies upon the map function.
Conclusion
The map function can be applied to nothing more than list of values (where values are of any type) whereas the fmap function can be applied much more data types: all of those which belongs to the functor class (e.g. maybes, tuples, lists, etc.). Since the "list of values" data type is also a functor (because it provides an implementation for it) then fmap can be applied to is as well producing the very same result as map.
map (+3) [1..5]
fmap (+3) (Just 15)
fmap (+3) (5, 7)

Resources