The simplest way to generically traverse a tree in haskell - haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)

Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)

universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

Related

Generate injective functions with QuickCheck?

I'm using QuickCheck to generate arbitrary functions, and I'd like to generate arbitrary injective functions (i.e. f a == f b if and only if a == b).
I thought I had it figured out:
newtype Injective = Injective (Fun Word Char) deriving Show
instance Arbitrary Injective where
arbitrary = fmap Injective fun
where
fun :: Gen (Fun Word Char)
fun = do
a <- arbitrary
b <- arbitrary
arbitrary `suchThat` \(Fn f) ->
(f a /= f b) || (a == b)
But I'm seeing cases where the generated function maps distinct inputs to the same output.
What I want:
f such that for all inputs a and b, either f a does not equal f b or a equals b.
What I think I have:
f such that there exist inputs a and b where either f a does not equal f b or a equals b.
How can I fix this?
You've correctly identified the problem: what you're generating is functions with the property ∃ a≠b. f a≠f b (which is readily true for most random functions anyway), whereas what you want is ∀ a≠b. f a≠f b. That is a much more difficult property to ensure, because you need to know about all the other function values for generating each individual one.
I don't think this is possible to ensure for general input types, however for word specifically what you can do is “fake” a function by precomputing all the output values sequentially, making sure that you don't repeat one that has already been done, and then just reading off from that predetermined chart. It requires a bit of laziness fu to actually get this working:
import qualified Data.Set as Set
newtype Injective = Injective ([Char] {- simply a list without duplicates -})
deriving Show
instance Arbitrary Injective where
arbitrary = Injective . lazyNub <$> arbitrary
lazyNub :: Ord a => [a] -> [a]
lazyNub = go Set.empty
where go _ [] = []
go forbidden (x:xs)
| x `Set.member` forbidden = go forbidden xs
| otherwise = x : go (Set.insert x forbidden) xs
This is not very efficient, and may well not be ok for your application, but it's probably the best you can do.
In practice, to actually use Injective as a function, you'll want to wrap the values in a suitable structure that has only O (log n) lookup time. Unfortunately, Data.Map.Lazy is not lazy enough, you may need to hand-bake something like a list of exponentially-growing maps.
There's also the concern that for some insufficiently big result types, it is just not possible to generate injective functions because there aren't enough values available. In fact as Joseph remarked, this is the case here. The lazyNub function will go into an infinite loop in this case. I'd say for a QuickCheck this is probably ok though.

Haskell QuickCheck for testing n-ary tree eval [duplicate]

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree

What is the most general way to compute the depth of a tree with something like a fold?

What minimal (most general) information is required to compute depth of a Data.Tree? Is instance of a Data.Foldable sufficient?
I initially tried to fold a Tree and got stuck trying to find right Monoid similar to Max. Something tells me that since Monoid (that would compute depth) needs to be associative, it probably cannot be used to express any fold that needs to be aware of the structure (as in 1 + maxChildrenDepth), but I'm not certain.
I wonder what thought process would let me arrive at right abstraction for such cases.
I can't say if it's a minimal/most general amount of information. But one general solution is that a given structure
is a catamorphism
underlying functor of the catamorphism is Foldable so that it's possible to enumerate sub-terms.
Here is sample code using recursion-schemes.
{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
import Data.Functor.Foldable
import Data.Semigroup
import Data.Tree
depth :: (Recursive f, Foldable (Base f)) => f -> Int
depth = cata ((+ 1) . maybe 0 getMax . getOption
. foldMap (Option . Just . Max))
-- Necessary instances for Tree:
data TreeF a t = NodeF { rootLabel' :: a, subForest :: [t] }
type instance Base (Tree a) = TreeF a
instance Functor (TreeF a) where
fmap f (NodeF x ts) = NodeF x (map f ts)
instance Foldable (TreeF a) where
foldMap f (NodeF _ ts) = foldMap f ts
instance Recursive (Tree a) where
project (Node x ts) = NodeF x ts
To answer the first question: Data.Foldable is not enough to compute the depth of the tree. The minimum complete definition of Foldable is foldr, which always has the following semantics:
foldr f z = Data.List.foldr f z . toList
In other words, a Foldable instance is fully characterized by how it behaves on a list projection of the input (ie toList), which will throw away the depth information of a tree.
Other ways of verifying this idea involve the fact that Foldable depends on a monoid instance which has to be associative or the fact that the various fold functions see the elements one by one in some particular order with no other information, which necessarily throws out the actual tree structure. (There has to be more than one tree with the same set of elements in the same relative order.)
I'm not sure what the minimal abstraction would be for trees specifically, but I think the core of your question is actually a bit broader: it would be interesting to see what minimum amount of information is needed to compute arbitrary facts about a type with a fold-like function.
To do this, the actual helper function in the fold would have to take a different sort of argument for each sort of data structure. This naturally leads us to catamorphisms, which are generalized folds over different data types.
You can read more about these generalized folds on a different Stack Overflow question: What constitutes a fold for types other than list? (In the interest of disclosure/self-promotion, I wrote one of the answeres there :P.)

What is the difference between value constructors and tuples?

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.
For example, a Tree data type in Haskell might be written as
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
which could be converted to "tuple form" like this:
data Tree a = EmptyTree | Node (a, Tree a, Tree a)
What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?
Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?
I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a
You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:
Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)
unless, I guess, you explicitly declare a type with ::.
Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?
Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.
There are some practical differences though:
Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:
data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.
You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.
Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)
So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.
If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.
They're what's called isomorphic, meaning "to have the same shape". You can write something like
data Option a = None | Some a
And this is isomorphic to
data Maybe a = Nothing | Just a
meaning that you can write two functions
f :: Maybe a -> Option a
g :: Option a -> Maybe a
Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor
data Triple a b c = Triple a b c
Because you can write
f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c
g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)
And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as
newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))
The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.
Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

Type-safe difference lists

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?
Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

Resources