QuickCheck giving up investigating a recursive data structure (rose tree.) - haskell

Given an arbitrary tree, I can construct a subtype relation over that tree, using Schubert numbering:
constructH :: Tree a -> Tree (Type a)
where Type nests the original label, and additionally provides the data needed to perform child/parent (or subtype) checks. With Schubert Numbering, the two Int parameters are sufficient for that.
data Type a where !Int -> !Int -> a -> Type a
This leads to the binary predicate
subtypeOf :: Type a -> Type a -> Bool
I now want to test with QuickCheck that this does indeed do what I want it to do. The following property, however, does not work, because QuickCheck just gives up:
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (\x → x `subtypeOf` t):(map subtypeSanity f))
If I leave out the recursive call to subtypeSanity, i.e. the tail of the list I'm passing to conjoin, the property runs fine, but tests just the root node of the tree! How can I descend into my data structure recursively without QuickCheck giving up on generating new test cases?
If needed, I could provide the code to construct the Schubert Hierarchy, and the Arbitrary instance for Tree (Type a), to provide a complete runnable example, but that would be quite a bit of code. I'm convinced that I'm just not "getting" QuickCheck, and using it in the wrong way here.
EDIT: unfortunately, the sized function does not seem to eliminate the problem here. It ends up with the same result (see comment to J. Abrahamson's answer.)
EDIT II: I ended up "fixing" my problem by avoiding the recursive step, and avoiding conjoin. We just make a list of all nodes in the tree, then test the single-node property (which worked fine from the beginning) on those.
allNodes ∷ Tree a → [Tree a]
allNodes n#(Node { subForest = f }) = n:(concatMap allNodes f)
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity tree = forAll (elements $ allNodes tree)
(\(Node { rootLabel = t, subForest = f }) →
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> forAll (elements subtypes) (\x → x `subtypeOf` t))
Tweaking the Arbitrary instance for trees did not work. Here is the arbitrary instance I'm still using:
instance (Arbitrary a, Eq a) ⇒ Arbitrary (Tree (Type a)) where
arbitrary = liftM (constructH) $ sized arbTree
arbTree ∷ Arbitrary a ⇒ Int → Gen (Tree a)
arbTree n = do
m ← choose (0,n)
if m == 0
then Node <$> arbitrary <*> (return [])
else do part ← randomPartition n m
Node <$> arbitrary <*> mapM arbTree part
-- this is a crude way to find a sufficiently random x1,..,xm,
-- such that x1 + .. + xm = n, for any n, m, with 0 < m.
randomPartition ∷ Int → Int → Gen [Int]
randomPartition n m' = do
let m = m' - 1
seed ← liftM ((++[n]) . sort) $ replicateM m (choose (0,n))
return $ zipWith (-) seed (0:seed)
I consider the problem "solved for now," but if someone could explain to me why the recursive step and/or conjoin made QuickCheck give up (after passing "only" 0 tests,) I would be more than grateful.

When generating Arbitrary recursive structures, QuickCheck is often a bit too eager and generates sprawling, enormous random examples. These are undesirable as they usually don't better check the properties of interest and can be very slow. Two solutions are
Use things like the size parameter (sized function) and frequency function to bias the generator toward small trees.
Use a small-type oriented generator like those in smallcheck. These try to exhaustively generate all "small" examples and thus help to keep the size of the tree down.
To clarify the sized and frequency method of controlling generation size, here's an example RoseTree
data Rose a = It a | Rose [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = frequency
[ (3, It <$> arbitrary) -- The 3-to-1 ratio is chosen, ah,
-- arbitrarily...
-- you'll want to tune it
, (1, Rose <$> children)
]
where children = sized $ \n -> vectorOf n arbitrary
It can be done even more simply with a different Rose formation by very carefully controlling the size of the child list
data Rose a = Rose a [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = Rose <$> arbitrary <*> sized (\n -> vectorOf (tuneUp n) arbitrary)
where tuneUp n = round $ fromIntegral n / 4.0
You could do this without referencing sized, but that gives the user of your Arbitrary instance a knob to ask for larger trees if needed.

In case it's useful for those stumbling across this issue: when QuickCheck "gives up", it's a sign that your pre-condition (using ==>) is too hard to satisfy.
QuickCheck uses a simple rejection sampling technique: pre-conditions have no effect on the generation of values. QuickCheck generates a bunch of random values like normal. After these are generated, they're sent through the pre-condition: if the result is True, the property is tested with that value; if it's False, that value is discarded. If your pre-condition rejects most of the values QuickCheck has generated, then QuickCheck will "give up" (better to give up completely, than to make statistically dubious pass/fail claims).
In particular, QuickCheck will not attempt to produce values which satisfy a given pre-condition. It's up to you to make sure that the generator you're using (arbitrary or otherwise) produces lots of values which pass your pre-condition.
Let's see how this is manifesting in your example:
subtypeSanity :: Tree (Type ()) -> Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (`subtypeOf` t):(map subtypeSanity f))
There is only one occurance of ==>, so its precondition (not $ null subtypes) must be too hard to satisfy. This is due to the recursive call map subtypeSanity f: not only are you rejecting any Tree which has an empty subForest, you're also (due to the recursion) rejecting any Tree where the subForest contains Trees with empty subForests, and rejecting any Tree where the subForest contains Trees with subForests containing Trees with empty subForests, and so on.
According to your arbitrary instance, Trees are only nested to finite depth: eventually we will always reach an empty subForest, hence your recursive precondition will always fail, and QuickCheck will give up.

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)
Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)
universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

What is the correct definition of `unfold` for an untagged tree?

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Finding a linear path in a traversable structure

Given a list of steps:
>>> let path = ["item1", "item2", "item3", "item4", "item5"]
And a labeled Tree:
>>> import Data.Tree
>>> let tree = Node "item1" [Node "itemA" [], Node "item2" [Node "item3" []]]
I'd like a function that goes through the steps in path matching the labels in tree until it can't go any further because there are no more labels matching the steps. Concretely, here it falls when stepping into "item4" (for my use case I still need to specify the last matched step):
>>> trav path tree
["item3", "item4", "item5"]
If I allow [String] -> Tree String -> [String] as the type of trav I could write a recursive function that steps in both structures at the same time until there are no labels to match the step. But I was wondering if a more general type could be used, specifically for Tree. For example: Foldable t => [String] -> t String -> [String]. If this is possible, how trav could be implemented?
I suspect there could be a way to do it using lens.
First, please let's use type Label = String. String is not exactly descriptive and might not be ideal in the end...
Now. To use Traversable, you need to pick a suitable Applicative that can contain the information you need for deciding what to do in its "structure". You only need to pass back information after a match has failed. That sounds like some Either!
A guess would thus be Either [Label] (t Label) as the pre-result. That would mean, we use the instantiation
traverse :: Traversable t
=> (Label -> Either [Label] Label) -> t Label -> Either [Label] (t Label)
So what can we pass as the argument function?
travPt0 :: [Label] -> Label -> Either [Label] Label
travPt0 ls#(l0 : _) label
| l0 /= label = Left ls
| otherwise = Right label ?
The problem is, traverse will then fail immediately and completely if any node has a non-matching label. Traversable doesn't actually have a notion of "selectively" diving down into a data structure, it just passes through everything, always. Actually, we only want to match on the topmost node at first, only that one is mandatory to match at first.
One way to circumvent immediate deep-traversal is to first split up the tree into a tree of sub-trees. Ok, so... we need to extract the topmost label. We need to split the tree in subtrees. Reminds you of anything?
trav' :: (Traversable t, Comonad t) => [Label] -> t Label -> [Label]
trav' (l0 : ls) tree
| top <- extract tree
= if top /= l0 then l0 : ls
else let subtrees = duplicate tree
in ... ?
Now amongst those subtrees, we're basically interested only in the one that matches. This can be determined from the result of trav': if the second element is passed right back again, we have a failure. Unlike normal nomenclature with Either, this means we wish to go on, but not use that branch! So we need to return Either [Label] ().
else case ls of
[] -> [l0]
l1:ls' -> let subtrees = duplicate tree
in case traverse (trav' ls >>> \case
(l1':_)
| l1'==l1 -> Right ()
ls'' -> Left ls''
) subtrees of
Left ls'' -> ls''
Right _ -> l0 : ls -- no matches further down.
I have not tested this code!
We'll take as reference the following recursive model
import Data.List (minimumBy)
import Data.Ord (comparing)
import Data.Tree
-- | Follows a path into a 'Tree' returning steps in the path which
-- are not contained in the 'Tree'
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail (a:as) (Node a' trees)
| a == a' = minimumBy (comparing length)
$ (a:as) : map (treeTail as) trees
| otherwise = as
which suggests that the mechanism here is less that we're traversing through the tree accumulating (which is what a Traversable instance might do) but more that we're stepping through the tree according to some state and searching for the deepest path.
We can characterize this "step" by a Prism if we like.
import Control.Lens
step :: Eq a => a -> Prism' (Tree a) (Forest a)
step a =
prism' (Node a)
(\n -> if rootLabel n == a
then Just (subForest n)
else Nothing)
This would allow us to write the algorithm as
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail pth#(a:as) t =
maybe (a:as)
(minimumBy (comparing length) . (pth:) . map (treeTail as))
(t ^? step a)
but I'm not sure that's significantly more clear.

"Zipping" a plain list with a nested list

I am looking for an elegant solution to the following problem. I have two lists of the following types:
[Float] and, [[Float]]
The first list contains an infinite amount of random values. The second list contains values I no longer care about. Its structure is finite and must be preserved. The values of the first list needs to be replacing those of the second.
Obviously, since the first list contains random values, I do not want to use them twice. Can anyone help me do this in a clear, concise, and terse way?
scramble :: [Float] -> [[Float]] -> [[Float]]
Give me your best shot
Using the split package for splitting:
import Data.List.Split (splitPlaces)
scramble x y = splitPlaces (map length y) x
Will this do?
flip . (evalState .) . traverse . traverse . const . state $ head &&& tail
EDIT: let me expand on the construction...
The essential centre of it is traverse . traverse. If you stare at the problem with sufficiently poor spectacles, you can see that it's "do something with the elements of a container of containers". For that sort of thing, traverse (from Data.Traversable) is a very useful gadget (ok, I'm biased).
traverse :: (Traversable f, Applicative a) => (s -> a t) -> f s -> a (f t)
or, if I change to longer but more suggestive type variables
traverse :: (Traversable containerOf, Applicative doingSomethingToGet) =>
(s -> doingSomethingToGet t) ->
containerOf s -> doingSomethingToGet (containerOf t)
Crucially, traverse preserves the structure of the container it operates on, whatever that might be. If you view traverse as a higher-order function, you can see that it gives back an operator on containers whose type fits with the type of operators on elements it demands. That's to say (traverse . traverse) makes sense, and gives you structure-preserving operations on two layers of container.
traverse . traverse ::
(Traversable g, Traversable f, Applicative a) => (s -> a t) -> g (f s) -> a (g (f t))
So we've got the key gadget for structure-preserving "do something" operations on lists of lists. The length and splitAt approach works fine for lists (the structure of a list is given by its length), but the essential characteristic of lists which enables that approach is already pretty much bottled by the Traversable class.
Now we need to figure out how to "do something". We want to replace the old elements with new things drawn successively from a supply stream. If we were allowed the side-effect of updating the supply, we could say what to do at each element: "return head of supply, updating supply with its tail". The State s monad (in Control.Monad.State which is an instance of Applicative, from Control.Applicative) lets us capture that idea. The type State s a represents computations which deliver a value of type a whilst mutating a state of type s. Typical such computations are made by this gadget.
state :: (s -> (a, s)) -> State s a
That's to say, given an initial state, just compute the value and the new state. In our case, s is a stream, head gets the value, tail gets the new state. The &&& operator (from Control.Arrow) is a nice way to glue two functions on the same data to get a function making a pair. So
head &&& tail :: [x] -> (x, [x])
which makes
state $ head &&& tail :: State [x] x
and thus
const . state $ head &&& tail :: u -> State [x] x
explains what to "do" with each element of the old container, namely ignore it and take a new element from the head of the supply stream.
Feeding that into (traverse . traverse) gives us a big mutatey traversal of type
f (g u) -> State [x] (f (g x))
where f and g are any Traversable structures (e.g. lists).
Now, to extract the function we want, taking the initial supply stream, we need to unpack the state-mutating computation as a function from initial state to final value. That's what this does:
evalState :: State s a -> s -> a
So we end up with something in
f (g u) -> [x] -> f (g x)
which had better get flipped if it's to match the original spec.
tl;dr The State [x] monad is a readymade tool for describing computations which read and update an input stream. The Traversable class captures a readymade notion of structure-preserving operation on containers. The rest is plumbing (and/or golf).
This is the obvious way to do it, but I take it this isn't terse enough?
scramble :: [a] -> [[a]] -> [[a]]
scramble _ [] = []
scramble xs (y : ys) = some : scramble rest ys
where (some, rest) = splitAt (length y) xs

Can someone explain the traverse function in Haskell?

I am trying and failing to grok the traverse function from Data.Traversable. I am unable to see its point. Since I come from an imperative background, can someone please explain it to me in terms of an imperative loop? Pseudo-code would be much appreciated. Thanks.
traverse is the same as fmap, except that it also allows you to run effects while you're rebuilding the data structure.
Take a look at the example from the Data.Traversable documentation.
data Tree a = Empty | Leaf a | Node (Tree a) a (Tree a)
The Functor instance of Tree would be:
instance Functor Tree where
fmap f Empty = Empty
fmap f (Leaf x) = Leaf (f x)
fmap f (Node l k r) = Node (fmap f l) (f k) (fmap f r)
It rebuilds the entire tree, applying f to every value.
instance Traversable Tree where
traverse f Empty = pure Empty
traverse f (Leaf x) = Leaf <$> f x
traverse f (Node l k r) = Node <$> traverse f l <*> f k <*> traverse f r
The Traversable instance is almost the same, except the constructors are called in applicative style. This means that we can have (side-)effects while rebuilding the tree. Applicative is almost the same as monads, except that effects cannot depend on previous results. In this example it means that you could not do something different to the right branch of a node depending on the results of rebuilding the left branch for example.
For historical reasons, the Traversable class also contains a monadic version of traverse called mapM. For all intents and purposes mapM is the same as traverse - it exists as a separate method because Applicative only later became a superclass of Monad.
If you would implement this in an impure language, fmap would be the same as traverse, as there is no way to prevent side-effects. You can't implement it as a loop, as you have to traverse your data structure recursively. Here's a small example how I would do it in Javascript:
Node.prototype.traverse = function (f) {
return new Node(this.l.traverse(f), f(this.k), this.r.traverse(f));
}
Implementing it like this limits you to the effects that the language allows though. If you f.e. want non-determinism (which the list instance of Applicative models) and your language doesn't have it built-in, you're out of luck.
traverse turns things inside a Traversable into a Traversable of things "inside" an Applicative, given a function that makes Applicatives out of things.
Let's use Maybe as Applicative and list as Traversable. First we need the transformation function:
half x = if even x then Just (x `div` 2) else Nothing
So if a number is even, we get half of it (inside a Just), else we get Nothing. If everything goes "well", it looks like this:
traverse half [2,4..10]
--Just [1,2,3,4,5]
But...
traverse half [1..10]
-- Nothing
The reason is that the <*> function is used to build the result, and when one of the arguments is Nothing, we get Nothing back.
Another example:
rep x = replicate x x
This function generates a list of length x with the content x, e.g. rep 3 = [3,3,3]. What is the result of traverse rep [1..3]?
We get the partial results of [1], [2,2] and [3,3,3] using rep. Now the semantics of lists as Applicatives is "take all combinations", e.g. (+) <$> [10,20] <*> [3,4] is [13,14,23,24].
"All combinations" of [1] and [2,2] are two times [1,2]. All combinations of two times [1,2] and [3,3,3] are six times [1,2,3]. So we have:
traverse rep [1..3]
--[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
I think it's easiest to understand in terms of sequenceA, as traverse can be defined as
follows.
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
traverse f = sequenceA . fmap f
sequenceA sequences together the elements of a structure from left to right, returning a structure with the same shape containing the results.
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
sequenceA = traverse id
You can also think of sequenceA as reversing the order of two functors, e.g. going from a list of actions into an action returning a list of results.
So traverse takes some structure, and applies f to transform every element in the structure into some applicative, it then sequences up the effects of those applicatives from left to right, returning a structure with the same shape containing the results.
You can also compare it to Foldable, which defines the related function traverse_.
traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
So you can see that the key difference between Foldable and Traversable is that the latter allows you to preserve the shape of the structure, whereas the former requires you to fold the result up into some other value.
A simple example of its usage is using a list as the traversable structure, and IO as the applicative:
λ> import Data.Traversable
λ> let qs = ["name", "quest", "favorite color"]
λ> traverse (\thing -> putStrLn ("What is your " ++ thing ++ "?") *> getLine) qs
What is your name?
Sir Lancelot
What is your quest?
to seek the holy grail
What is your favorite color?
blue
["Sir Lancelot","to seek the holy grail","blue"]
While this example is rather unexciting, things get more interesting when traverse is used on other types of containers, or using other applicatives.
It's kind of like fmap, except that you can run effects inside the mapper function, which also changes the result type.
Imagine a list of integers representing user IDs in a database: [1, 2, 3]. If you want to fmap these user IDs to usernames, you can't use a traditional fmap, because inside the function you need to access the database to read the usernames (which requires an effect -- in this case, using the IO monad).
The signature of traverse is:
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
With traverse, you can do effects, therefore, your code for mapping user IDs to usernames looks like:
mapUserIDsToUsernames :: (Num -> IO String) -> [Num] -> IO [String]
mapUserIDsToUsernames fn ids = traverse fn ids
There's also a function called mapM:
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Any use of mapM can be replaced with traverse, but not the other way around. mapM only works for monads, whereas traverse is more generic.
If you just want to achieve an effect and not return any useful value, there are traverse_ and mapM_ versions of these functions, both of which ignore the return value from the function and are slightly faster.
traverse is the loop. Its implementation depends on the data structure to be traversed. That might be a list, tree, Maybe, Seq(uence), or anything that has a generic way of being traversed via something like a for-loop or recursive function. An array would have a for-loop, a list a while-loop, a tree either something recursive or the combination of a stack with a while-loop; but in functional languages you do not need these cumbersome loop commands: you combine the inner part of the loop (in the shape of a function) with the data structure in a more directly manner and less verbose.
With the Traversable typeclass, you could probably write your algorithms more independent and versatile. But my experience says, that Traversable is usually only used to simply glue algorithms to existing data structures. It is quite nice not to need to write similar functions for different datatypes qualified, too.

Resources