Can trees be generalized to allow any traversable sub-tree? - haskell

Data.Tree uses a list to represent the subtree rooted at a particular node. Is it possible to have two tree types, for example one which uses a list and another which uses a vector? I want to be able to write functions which don't care how the sub-tree is represented concretely, only that the subtree is traversable, as well as functions which take advantage of a particular subtree type, e.g. fast indexing into vectors.
It seems like type families would be the right tool for the job, though I've never used them before and I have no idea how to actually define the right family.
If it matters, I'm not using the containers library tree instance, but instead I have types
data Tree a b = Node a b [Tree a b] deriving (Show, Foldable, Generic)
and
data MassivTree a b = V a b (Array B Ix1 (MassivTree a b))
where the latter uses vectors from massiv.

You could use a typeclass - in fact the typeclass you need probably already exists.
Consider this:
data Tree t a = Tree a (t (Tree t a))
Argument t is a higher-kinded type which represents a container of as.
Now define a set of Tree operations, constrained on Traversable like so:
:: (Foldable t) => Tree t a -> b
And you can now create and manipulate trees that use any Foldable. You would need to choose the right typeclass for the set of operations you want - Functor may be enough, or you may want Traversable if you are doing anything with monadic actions. You can choose the typeclass on a per-function basis, depending on what it does.
You can now define Tree types like so:
type ListTree a = Tree [] a
type MassivTree r ix a = Tree (Array r ix) a
You can also define instance-specific functions, with access to a full range of functionality:
:: ListTree a -> b
-- or
:: Tree [] a -> b
Happy Haskelling!

Related

True isomorphisms in Haskell

Are the following assertions true:
The only real isomorphism, accessible programatically to the user, verified by Haskell type system, and that the Haskell compiler is/can be made aware of, is between:
the set of values of a Haskell datatype
the set of values of types those required by its constructors
Even generic programming can't produce "true" isomorphism, whose composition results at run time in an identity (thus staged-sop - and similarly in Ocaml)
Haskell itself is the only producing isomorphism, Coercible, but those isomorphism are restricted to the identity isomorphism
By "real isomorphism, accessible programatically to the user, verified by Haskell type system, and that the Haskell compiler is/can be made aware of" I mean a pair of function u : a -> b and v : b -> a such that Haskell knows (by being informed or otherwise) that u.v=id and v.u=id. Just like it knows (at compile time) how to rewrite some code to do "fold fusion", which is akin to, at once, recognize and apply it.
Look into Homotopy Type Theory/Cubical Agda where an "equality is isomorphism". I am not familiar enough with it to know what happens operationally, even if Agda knows isomorphic types are equal I still think your "true isomorphism" (i.e. with a proof and fusion) is too tall of an order.
In GHC it is possible to derive via "isomorphisms" but we need to wait for dependent types to properly verify isomorphisms in the type system. Even so they can be used to produce bone fide code even if you have to do some work operationally.
You already mentioned "representational equality" (Coercible) but it is worth discussing it. It underpins the two coerce-based deriving strategies: GeneralizedNewtypeDeriving and DerivingVia which generalizes GND.
GND is the simplest way to turn an isomorphism (Coercible USD Int) into code:
type USD :: Type
newtype USD = MkUSD Int
deriving
newtype (Eq, Ord, Show, Num)
Operationally coerce is zero-cost at so they incur no cost at run-time. This is the only way you will get what you want in Haskell.
Isomorphisms can also be done through user-defined type classes.
An instance of Representable f means f is (naturally) isomorphic to functions from its representing object (Rep f ->). The newtype Co uses this isomorphism to derive function instances for representable functor. A Pair a of two values is represented by Bool, and is thus isomorphic to Bool -> a.
This isomorphism lets Pair derive Functor, Applicative and Monad by roundtripping through (Bool ->):
type Pair :: Type -> Type
data Pair a = a :# a
deriving (Functor, Applicative, Monad)
via Co Pair
instance Distributive Pair where
distribute :: Functor f => f (Pair a) -> Pair (f a)
distribute = distributeRep
instance Representable Pair where
type Rep Pair = Bool
index :: Pair a -> (Bool -> a)
index (false :# true) = \case
False -> false
True -> true
tabulate :: (Bool -> a) -> Pair a
tabulate make = make False :# make True
When you derive Generic/Generic1 the compiler generates an isomorphism between a generic type and its generic representation Rep/Rep1 (not to be confused with the representing object Rep from the above example).
The class laws state that to/from and to1/from1 witness that isomorphism. The type system does not enforce these laws but if you derive them they should hold.
They are the main way to define generic implementations in Haskell. I recently introduced two newtypes Generically and Generically1 to base, as standard names for generic behaviour (use generic-data until the next GHC release). You can derive a generic isomorphism and programmatically use it in the next line without leaving the data declaration:
type Lists :: Type -> Type
data Lists a = Lists [a] [a] [a]
deriving
stock (Show, Generic, Generic1)
deriving (Semigroup, Monoid)
via Generically (Lists a)
deriving (Functor, Applicative, Alternative)
via Generically1 Lists
>> mempty #(Lists _)
Lists [] [] []
>> empty #Lists
Lists [] [] []
>> Lists "a" "b" "c" <> Lists "!" "." "?"
Lists "a!" "b." "c?"
>> pure #Lists 'a'
Lists "a" "a" "a"
You will however have to pay for the converstion cost, it's not as simple as adding {-# Rules "to/from" to . from = id #-} because the actual instances will appear with intermediate terms like to (from a <> from b). Even your "true isomorphisms" GHC could not fuse away the conversion since it's not of the form to . from.
There is also a library iso-deriving (blog) that allows deriving via arbitrary isomorphisms.

Haskell QuickCheck for testing n-ary tree eval [duplicate]

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree

What is the difference between value constructors and tuples?

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.
For example, a Tree data type in Haskell might be written as
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
which could be converted to "tuple form" like this:
data Tree a = EmptyTree | Node (a, Tree a, Tree a)
What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?
Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?
I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a
You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:
Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)
unless, I guess, you explicitly declare a type with ::.
Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?
Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.
There are some practical differences though:
Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:
data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.
You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.
Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)
So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.
If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.
They're what's called isomorphic, meaning "to have the same shape". You can write something like
data Option a = None | Some a
And this is isomorphic to
data Maybe a = Nothing | Just a
meaning that you can write two functions
f :: Maybe a -> Option a
g :: Option a -> Maybe a
Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor
data Triple a b c = Triple a b c
Because you can write
f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c
g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)
And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as
newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))
The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.
Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

Binding together data, types and functions

I want to model a large tree (or forest) of some regular structure - tree can be decomposed to small tree (the irregular part) and (i.e.) large list of params, each of them with each of nodes make a node of big tree.
So, I want a data structure, where each node in a tree is representing many nodes. And real node is of type (node,param).
For algorithms that work on this kind of trees type of that param does not mattter. They are just placeholders. But some data should be possible to extract from the plain param or combination of node and param, and all possible params should be iterable. All that kinds of data is known apriori, they reflect semantic of that tree.
So, actual type, semantics and stuff of param is up to implementation of tree.
I model it in C++ using nested typedefs for params type, fixed method names for all kind of stuff that should be available to algorithm (this two together making a concept) and templates for algorithm itself.
I.e. if I want to associate with each node of big tree an integer, I would provide a function int data(const node& n, const param& p), where param is available as nested typedef, and algorithm could get list of all available params, and call data with nodes of interest and each of params
I have some plain data type, i.e. tree data, like this
data Tree = Node [Tree] | Leaf
Now I want to package up:
concrete tree
some type
some values of that type
some functions operating on (that concrete) tree nodes and (that) values
So one can write some function that use this packaged up types and functions, like, generic way.
How to achieve that?
With type families I came to
class PackagedUp t where
type Value t
tree :: Tree t
values :: [Value t]
f :: Tree t -> Value t -> Int
Tree now become Tree t because type families want type of their members to depend on typeclass argument.
Also, as in https://stackoverflow.com/a/16927632/1227578 type families to deal with injectivity will be needed.
With this I can
instance PackagedUp MyTree where
type Value MyTree = (Int,Int)
tree = Leaf
values = [(0,0),(1,1)]
f t v = fst v
And how to write such a function now? I.e. a function that will take root of a tree, all of values and make a [Int] of all f tree value.
First of all, your tree type should be defined like this:
data Tree a = Node a [Tree a] | Leaf
The type above is polymorphic. As far as semantics go that resembles what we would call a generic type in OO parlance (in C# or Java we might write Tree<A> instead). A node of a Tree a holds a value of type a and a list of subtrees.
Next, we come to PackagedUp. Classes in Haskell have little to do with the OO concept of the same name; they are not meant to package data and behaviour together. Things are actually much simpler: all you need to do is defining the appropriate functions for your tree type
getRoot :: Tree a -> Maybe a
getRoot Leaf = Nothing
getRoot (Node x _) = Just x
(Returning Maybe a is a simple way to handle failure with type safety. Think of the Nothing value as a polite cousin of null that doesn't explode with null reference exceptions.)
One thing that type classes are good at is in expressing data structure algorithm interfaces such as the ones you allude to. One of the most common classes is Functor, which provides a general interface for mapping over data structures.
instance Functor Tree where
fmap f Leaf = Leaf
fmap f (Node x ts) = Node (f x) (fmap f ts)
fmap has the following polymorphic type:
fmap :: Functor f => (a -> b) -> f a -> f b
With your tree, it specialises to
fmap :: (a -> b) -> Tree a -> Tree b
and with lists (as in fmap f ts) it becomes
fmap :: (a -> b) -> [a] -> [b]
Finally, the Data.Tree module provides a data structure which looks a lot like what you want to define.

Relationship between TypeRep and "Type" GADT

In Scrap your boilerplate reloaded, the authors describe a new presentation of Scrap Your Boilerplate, which is supposed to be equivalent to the original.
However, one difference is that they assume a finite, closed set of "base" types, encoded with a GADT
data Type :: * -> * where
Int :: Type Int
List :: Type a -> Type [a]
...
In the original SYB, type-safe cast is used, implemented using the Typeable class.
My questions are:
What is the relationship between these two approaches?
Why was the GADT representation chosen for the "SYB Reloaded" presentation?
[I am one of the authors of the "SYB Reloaded" paper.]
TL;DR We really just used it because it seemed more beautiful to us. The class-based Typeable approach is more practical. The Spine view can be combined with the Typeable class and does not depend on the Type GADT.
The paper states this in its conclusions:
Our implementation handles the two central ingredients of generic programming differently from the original SYB paper: we use overloaded functions with
explicit type arguments instead of overloaded functions based on a type-safe
cast 1 or a class-based extensible scheme [20]; and we use the explicit spine
view rather than a combinator-based approach. Both changes are independent
of each other, and have been made with clarity in mind: we think that the structure of the SYB approach is more visible in our setting, and that the relations
to PolyP and Generic Haskell become clearer. We have revealed that while the
spine view is limited in the class of generic functions that can be written, it is
applicable to a very large class of data types, including GADTs.
Our approach cannot be used easily as a library, because the encoding of
overloaded functions using explicit type arguments requires the extensibility of
the Type data type and of functions such as toSpine. One can, however, incorporate Spine into the SYB library while still using the techniques of the SYB
papers to encode overloaded functions.
So, the choice of using a GADT for type representation is one we made mainly for clarity. As Don states in his answer, there are some obvious advantages in this representation, namely that it maintains static information about what type a type representation is for, and that it allows us to implement cast without any further magic, and in particular without the use of unsafeCoerce. Type-indexed functions can also be implemented directly by using pattern matching on the type, and without falling back to various combinators such as mkQ or extQ.
Fact is that I (and I think the co-authors) simply were not very fond of the Typeable class. (In fact, I'm still not, although it is finally becoming a bit more disciplined now in that GHC adds auto-deriving for Typeable, makes it kind-polymorphic, and will ultimately remove the possibility to define your own instances.) In addition, Typeable wasn't quite as established and widely known as it is perhaps now, so it seemed appealing to "explain" it by using the GADT encoding. And furthermore, this was the time when we were also thinking about adding open datatypes to Haskell, thereby alleviating the restriction that the GADT is closed.
So, to summarize: If you actually need dynamic type information only for a closed universe, I'd always go for the GADT, because you can use pattern matching to define type-indexed functions, and you do not have to rely on unsafeCoerce nor advanced compiler magic. If the universe is open, however, which is quite common, certainly for the generic programming setting, then the GADT approach might be instructive, but isn't practical, and using Typeable is the way to go.
However, as we also state in the conclusions of the paper, the choice of Type over Typeable isn't a prerequisite for the other choice we're making, namely to use the Spine view, which I think is more important and really the core of the paper.
The paper itself shows (in Section 8) a variation inspired by the "Scrap your Boilerplate with Class" paper, which uses a Spine view with a class constraint instead. But we can also do a more direct development, which I show in the following. For this, we'll use Typeable from Data.Typeable, but define our own Data class which, for simplicity, just contains the toSpine method:
class Typeable a => Data a where
toSpine :: a -> Spine a
The Spine datatype now uses the Data constraint:
data Spine :: * -> * where
Constr :: a -> Spine a
(:<>:) :: (Data a) => Spine (a -> b) -> a -> Spine b
The function fromSpine is as trivial as with the other representation:
fromSpine :: Spine a -> a
fromSpine (Constr x) = x
fromSpine (c :<>: x) = fromSpine c x
Instances for Data are trivial for flat types such as Int:
instance Data Int where
toSpine = Constr
And they're still entirely straightforward for structured types such as binary trees:
data Tree a = Empty | Node (Tree a) a (Tree a)
instance Data a => Data (Tree a) where
toSpine Empty = Constr Empty
toSpine (Node l x r) = Constr Node :<>: l :<>: x :<>: r
The paper then goes on and defines various generic functions, such as mapQ. These definitions hardly change. We only get class constraints for Data a => where the paper has function arguments of Type a ->:
mapQ :: Query r -> Query [r]
mapQ q = mapQ' q . toSpine
mapQ' :: Query r -> (forall a. Spine a -> [r])
mapQ' q (Constr c) = []
mapQ' q (f :<>: x) = mapQ' q f ++ [q x]
Higher-level functions such as everything also just lose their explicit type arguments (and then actually look exactly the same as in original SYB):
everything :: (r -> r -> r) -> Query r -> Query r
everything op q x = foldl op (q x) (mapQ (everything op q) x)
As I said above, if we now want to define a generic sum function summing up all Int occurrences, we cannot pattern match anymore, but have to fall back to mkQ, but mkQ is defined purely in terms of Typeable and completely independent of Spine:
mkQ :: (Typeable a, Typeable b) => r -> (b -> r) -> a -> r
(r `mkQ` br) a = maybe r br (cast a)
And then (again exactly as in original SYB):
sum :: Query Int
sum = everything (+) sumQ
sumQ :: Query Int
sumQ = mkQ 0 id
For some of the stuff later in the paper (e.g., adding constructor information), a bit more work is needed, but it can all be done. So using Spine really does not depend on using Type at all.
Well, obviously the Typeable use is open -- new variants can be added after the fact, and without modifying the original definitions.
The important change though is that in that TypeRep is untyped. That is, there is no connection between the runtime type , TypeRep, and the static type it encodes. With the GADT approach we can encode the mapping between a type a and its Type, given by the GADT Type a.
We thus bake in evidence for the type rep being statically linked to its origin type, and can write statically typed dynamic application (for example) using Type a as evidence that we have a runtime a.
In the older TypeRep case, we have no such evidence and it comes down to runtime string equality, and a coerce and hope for the best through fromDynamic.
Compare the signatures:
toDyn :: Typeable a => a -> TypeRep -> Dynamic
versus GADT style:
toDyn :: Type a => a -> Type a -> Dynamic
I can't fake my type evidence, and I can use that later when reconstructing things, to e.g. lookup the type class instances for a when all I have is a Type a.

Resources