Finding a linear path in a traversable structure - haskell

Given a list of steps:
>>> let path = ["item1", "item2", "item3", "item4", "item5"]
And a labeled Tree:
>>> import Data.Tree
>>> let tree = Node "item1" [Node "itemA" [], Node "item2" [Node "item3" []]]
I'd like a function that goes through the steps in path matching the labels in tree until it can't go any further because there are no more labels matching the steps. Concretely, here it falls when stepping into "item4" (for my use case I still need to specify the last matched step):
>>> trav path tree
["item3", "item4", "item5"]
If I allow [String] -> Tree String -> [String] as the type of trav I could write a recursive function that steps in both structures at the same time until there are no labels to match the step. But I was wondering if a more general type could be used, specifically for Tree. For example: Foldable t => [String] -> t String -> [String]. If this is possible, how trav could be implemented?
I suspect there could be a way to do it using lens.

First, please let's use type Label = String. String is not exactly descriptive and might not be ideal in the end...
Now. To use Traversable, you need to pick a suitable Applicative that can contain the information you need for deciding what to do in its "structure". You only need to pass back information after a match has failed. That sounds like some Either!
A guess would thus be Either [Label] (t Label) as the pre-result. That would mean, we use the instantiation
traverse :: Traversable t
=> (Label -> Either [Label] Label) -> t Label -> Either [Label] (t Label)
So what can we pass as the argument function?
travPt0 :: [Label] -> Label -> Either [Label] Label
travPt0 ls#(l0 : _) label
| l0 /= label = Left ls
| otherwise = Right label ?
The problem is, traverse will then fail immediately and completely if any node has a non-matching label. Traversable doesn't actually have a notion of "selectively" diving down into a data structure, it just passes through everything, always. Actually, we only want to match on the topmost node at first, only that one is mandatory to match at first.
One way to circumvent immediate deep-traversal is to first split up the tree into a tree of sub-trees. Ok, so... we need to extract the topmost label. We need to split the tree in subtrees. Reminds you of anything?
trav' :: (Traversable t, Comonad t) => [Label] -> t Label -> [Label]
trav' (l0 : ls) tree
| top <- extract tree
= if top /= l0 then l0 : ls
else let subtrees = duplicate tree
in ... ?
Now amongst those subtrees, we're basically interested only in the one that matches. This can be determined from the result of trav': if the second element is passed right back again, we have a failure. Unlike normal nomenclature with Either, this means we wish to go on, but not use that branch! So we need to return Either [Label] ().
else case ls of
[] -> [l0]
l1:ls' -> let subtrees = duplicate tree
in case traverse (trav' ls >>> \case
(l1':_)
| l1'==l1 -> Right ()
ls'' -> Left ls''
) subtrees of
Left ls'' -> ls''
Right _ -> l0 : ls -- no matches further down.
I have not tested this code!

We'll take as reference the following recursive model
import Data.List (minimumBy)
import Data.Ord (comparing)
import Data.Tree
-- | Follows a path into a 'Tree' returning steps in the path which
-- are not contained in the 'Tree'
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail (a:as) (Node a' trees)
| a == a' = minimumBy (comparing length)
$ (a:as) : map (treeTail as) trees
| otherwise = as
which suggests that the mechanism here is less that we're traversing through the tree accumulating (which is what a Traversable instance might do) but more that we're stepping through the tree according to some state and searching for the deepest path.
We can characterize this "step" by a Prism if we like.
import Control.Lens
step :: Eq a => a -> Prism' (Tree a) (Forest a)
step a =
prism' (Node a)
(\n -> if rootLabel n == a
then Just (subForest n)
else Nothing)
This would allow us to write the algorithm as
treeTail :: Eq a => [a] -> Tree a -> [a]
treeTail [] _ = []
treeTail pth#(a:as) t =
maybe (a:as)
(minimumBy (comparing length) . (pth:) . map (treeTail as))
(t ^? step a)
but I'm not sure that's significantly more clear.

Related

One-line implementation of split in Haskell

What I want is the following one (which I think should be included in prelude since it is very useful in text processing):
split :: Eq a => [a] -> [a] -> [[a]]
e.g:
split ";;" "hello;;world" = ["hello", "world"]
split from Data.List.Utils isn't in base. I feel there should be a short-and-sweet implementation by composing a few base functions, but I can't figure it out. Am I missing something?
Arguably, the best way to check how feasible a short-and-sweet splitOn (or split, as you and MissingH call it -- here I will stick to the name used by the split and extra packages) is trying to write it [note 1].
(By the way, I will use recursion-schemes functions and concepts in this answer, as I find systematising things a bit helps me think about this kind of problem. Do let me know if anything is unclear.)
The type of splitOn is [note 2]:
splitOn :: Eq a => [a] -> [a] -> [[a]]
One way to write a recursive function that builds one data structure from another, like splitOn does, begins by asking whether to do it by walking the original structure in a bottom-up or a top-down way (for lists, that amounts to right-to-left and left-to-right respectively). A bottom-up walk is more naturally expressed as some kind of fold:
foldr #[] :: (a -> b -> b) -> b -> [a] -> b
cata #[_] :: (ListF a b -> b) -> [a] -> b
(cata, short for catamorphism, is how recursion-schemes expresses a vanilla fold. The ListF a b -> b function, called an algebra in the jargon, specifies what happens in each fold step. data ListF a b = Nil | Cons a b, and so, in the case of lists, the algebra amounts to the two first arguments of foldr rolled into one -- the binary function corresponds to the Cons case, and the seed of the fold, to the Nil one.)
A top-down walk, on the other hand, lends itself to an unfold:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a] -- found in Data.List
ana #[_] :: (b -> ListF a b) -> b -> [a]
(ana, short for anamorphism, is the vanilla unfold in recursion-schemes. The b -> ListF a b function is a coalgebra; it specifies what happens in each unfold step. For a list, the possibilities are either emitting a list element and an updated seed or generating an empty list and terminating the unfold.)
Should splitOn be bottom-up or top-down? To implement it, we need to, at any given position in the list, look ahead in order to check whether the current list segment starts with the delimiter. That being so, it makes sense to reach for a top-down solution i.e. an unfold/anamorphism.
Playing with ways to write splitOn as an unfold shows another thing to consider: you want each individual unfold step to generate a fully-formed list chunk. Not doing so will, at best, lead you to unnecessarily walk the original list twice [note 3]; at worst, catastrophic memory usage and stack overflows on long list chunks await [note 4]. One way to achieve that is through a breakOn function, like the one in Data.List.Extra...
breakOn :: Eq a => [a] -> [a] -> ([a], [a])
... which is like break from the Prelude, except that, instead of applying a predicate to each element, it checks whether the remaining list segment has the first argument as a prefix [note 5].
With breakOn at hand, we can write a proper splitOn implementation -- one that, compiled with optimisations, matches in performance the library ones mentioned at the beginning:
splitOnAtomic :: Eq a => [a] -> [a] -> [[a]]
splitOnAtomic delim
| null delim = error "splitOnAtomic: empty delimiter"
| otherwise = apo coalgSplit
where
delimLen = length delim
coalgSplit = \case
[] -> Cons [] (Left [])
li ->
let (ch, xs) = breakOn (delim `isPrefixOf`) li
in Cons ch (Right (drop delimLen xs))
(apo, short for apomorphism, is an unfold that can be short-circuited. That is done by emitting from an unfold step, rather than the usual updated seed -- signaled by Right -- a final result -- signaled by Left. Short-circuiting is needed here because, in the empty list case, we want neither to produce an empty list by returning Nil -- which would wrongly result in splitOn delim [] = [] -- nor to resort to Cons [] [] -- which would generate an infinite tail of []. This trick corresponds directly to the additional splitOn _ [] = [[]] case added to the Data.List.Extra implementation.)
After a few slight detours, we can now address your actual question. splitOn is tricky to write in a short way because, firstly, the recursion pattern it uses isn't entirely trivial; secondly, a good implementation requires a few details that are inconvenient for golfing; and thirdly, what appears to be the best implementation relies crucially on breakOn, which is not in base.
Notes:
[note 1]: Here are the imports and pragmas needed to run the snippets in this answer:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
import Data.List
import Data.Maybe
[note 2]: An alternative type might be Eq a => NonEmpty a -> [a] -> NonEmpty [a], if one wants to put precision above all else. I won't bother with that here to avoid unnecessary distractions.
[note 3]: As in this rather neat implementation, which uses two unfolds -- the first one (ana coalgMark) replaces the delimiters with Nothing, so that the second one (apo coalgSplit) can split in a straightforward way:
splitOnMark :: Eq a => [a] -> [a] -> [[a]]
splitOnMark delim
| null delim = error "splitOnMark: empty delimiter"
| otherwise = apo coalgSplit . ana coalgMark
where
coalgMark = \case
[] -> Nil
li#(x:xs) -> case stripPrefix delim li of
Just ys -> Cons Nothing ys
Nothing -> Cons (Just x) xs
coalgSplit = \case
[] -> Cons [] (Left [])
mxs ->
let (mch, mys) = break isNothing mxs
in Cons (catMaybes mch) (Right (drop 1 mys))
(What apo is and what Left and Right are doing here will be covered a little further in the main body of the answer.)
This implementation has fairly acceptable performance, though with optimisations it is a slower than the one in the main body of the answer by a (modest) constant factor. It might be a little easier to golf this one, though...
[note 4]: As in this single unfold implementation, which uses a coalgebra that calls itself recursively to build each chunk as a (difference) list:
splitOnNaive :: Eq a => [a] -> [a] -> [[a]]
splitOnNaive delim
| null delim = error "splitOn: empty delimiter"
| otherwise = apo coalgSplit . (,) id
where
coalgSplit = \case
(ch, []) -> Cons (ch []) (Left [])
(ch, li#(x:xs)) -> case stripPrefix delim li of
Just ys -> Cons (ch []) (Right (id, ys))
Nothing -> coalg (ch . (x :), xs)
Having to decide at each element whether to grow the current chunk or to start a new one is in itself problematic, as it breaks laziness.
[note 5]: Here is how Data.List.Extra implements breakOn. If we want to achieve that using a recursion-schemes unfold, one good strategy is defining a data structure that encodes exactly what we are trying to build:
data BrokenList a = Broken [a] | Unbroken a (BrokenList a)
deriving (Eq, Show, Functor, Foldable, Traversable)
makeBaseFunctor ''BrokenList
A BrokenList is just like a list, except that the empty list is replaced by the (non-recursive) Broken constructor, which marks the break point and holds the remainder of the list. Once generated by an unfold, a BrokenList can be easily folded into a pair of lists: the elements in the Unbroken values are consed into one list, and the list in Broken becomes the other one:
breakOn :: ([a] -> Bool) -> [a] -> ([a], [a])
breakOn p = hylo algPair coalgBreak
where
coalgBreak = \case
[] -> BrokenF []
li#(x:xs)
| p li -> BrokenF li
| otherwise -> UnbrokenF x xs
algPair = \case
UnbrokenF x ~(xs, ys) -> (x : xs, ys)
BrokenF ys -> ([], ys)
(hylo, short for hylomorphism, is simply an ana followed by a cata, i.e. an unfold followed by a fold. hylo, as implemented in recursion-schemes, takes advantage of the fact that the intermediate data structure, created by the unfold and then immediately consumed by the fold, can be fused away, leading to significant performance gains.)
It is worth mentioning that the lazy pattern match in algPair is crucial to preserve laziness. The Data.List.Extra implementation linked to above achieves that by using first from Control.Arrow, which also matches the pair given to it lazily.

What is the correct definition of `unfold` for an untagged tree?

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Haskell: associating a polymorphic function for each level in a tree

Good day!
I have a tree of elements:
data Tree a = Node [Tree a]
| Leaf a
I need to get to the leaf of that tree. The path from the root to the leaf is determined by a sequence of switch functions. Each layer in that tree corresponds to a specific switch function that takes something as a parameter and returns an index to the subtree.
class Switchable a where
getIndex :: a -> Int
data SwitchData = forall c. Switchable c => SwitchData c
The final goal is to provide a function that expects all necessary SwitchData and returns
the leaf in a tree.
But I see no way
to enforce the one-to-one mapping at type-level. My current implementation of switch
functions accepts a list of SwitchData instances:
switch :: SwitchData -> Tree a -> Tree a
switch (SwitchData c) (Node vec) = vec `V.unsafeIndex` getIndex c
switch _ _ = error "switch: performing switch on a Leaf of context tree"
switchOnData :: [SwitchData] -> Tree a -> a
switchOnData (x:xs) tree = switchOnData xs $ switch x tree
switchOnData [] (Leaf c) = c
switchOnData [] (Node _) = error "switchOnData: partial data"
The order and the exact types of Switchable instances
are not considered neither at compile time nor at runtime, the correctness is left for a programmer, which bothers me a lot. That gist reflects the current state of affair.
Could you suggest some ways of establishing that one-to-one mapping between layers in a context tree and particular instances of Switchable?
Your current solution is equivalent to simply switchOnIndices :: [Int] -> Tree a -> a (simply apply getIndex before storing in list!). Make this explicitly "partial" by wrapping the return in Maybe and this may actually be the ideal signature, simple and fine.
But apparently, your real use case is more complex; you want to have basically different dictionaries at each level. Then you actually need to link the multiple levels of types to those of tree depths. You're in for some crazy almost-dependent-typed hackery!
{-# LANGUAGE GADTs, TypeOperators, LambdaCase #-}
infixr 5 :&
data Nil = Nil
data s :& ss where
(:&) :: s -> ss -> s :& ss
data Tree switch a where
Node ::
(s -> Tree ss a) -- Such a function is equiv. to `Switchable c => (c, [Tree a])`.
-> Tree (s :& ss) a
Leaf :: a -> Tree Nil a
switch :: s -> Tree (s :& ss) a -> Tree ss a
switch s (Node f) = f s
switchOnData :: s -> Tree s a -> a
switchOnData sw (Node f) = switchOnData ss $ f s
where (s :& ss) = sw
switchOnData _ (Leaf a) = a
data Sign = P | M
signTree :: Tree (Sign :& Sign :& Nil) Char
signTree = Node $ \case P -> Node $ \case P -> Leaf 'a'
M -> Leaf 'b'
M -> Node $ \case P -> Leaf 'c'
M -> Leaf 'd'
testSwitch :: IO()
testSwitch = print $ switchOnData (P :& M :& Nil) signTree
main = testSwitch
Of course, this greatly limits the flexibility of the tree structure: each level has a fixed predetermined number of nodes. In fact, that makes the entire structure of e.g. signTree equivalent to simply (Sign, Sign) -> Char, so unless you really need a tree for some specific reason (e.g. extra information attached to the nodes), why not just use that!
Or, again, the way simpler [Int] -> Tree a -> a signature. But using existentials makes no sense to me at all here.
Let's assume that we're traversing a Tree at a particular Node and have our particular kind of desired polymorphic Switchable value available as well. In other words, we want to take a step
step :: Switchable -> Tree a -> Maybe (Tree a)
Here the output Tree is one of the children of the input Tree and we wrap it in a Maybe just in case something goes wrong. So let's try to write step
step s Leaf{} = Nothing
step s (Node children) = switch s (?extract children)
Here, the challenge arises out of defining ?extract as we need it to polymorphically produce the right kind of value for switch s to operate on. We can't meaningfully use polymorphism to get this to work:
-- won't work!
class Extractable b where
extract :: [Tree a] -> b
since now switch . extract is ambiguous. In fact, if Switchable operates by existentially quantifying a value that it needs passed in then there is no way (outside of Typeable) to build an extract that works properly. Instead, we need each Switchable to have its own individual extract with the proper type, a type that only exists inside of the context of the data type.
{-# LANGUAGE ExistentialQuantification #-}
data Switchable = forall pass . Switchable
{ extract :: [Tree a] -> pass
, switch :: pass -> Int
}
step s (Node children) = children `safeLookup` switch s (extract s children)
But due to the containment of this existential type, we know that there's absolutely no way to use a Switchable except to immediately compute the existentially hidden pass value and then consume it with switch. There's not even any reason to store pass since the only way we can get any information out of it is to use switch.
Altogether this leads to the intuition that Switchable ought to be this instead.
data Switchable = Switchable { switch :: [Tree a] -> Int }
step s (Node children) = children `safeLookup` switch s children
Or even
data Switchable = Switchable { switch :: [Tree a] -> Tree a }
step s (Node children) = Just (switch s children)

QuickCheck giving up investigating a recursive data structure (rose tree.)

Given an arbitrary tree, I can construct a subtype relation over that tree, using Schubert numbering:
constructH :: Tree a -> Tree (Type a)
where Type nests the original label, and additionally provides the data needed to perform child/parent (or subtype) checks. With Schubert Numbering, the two Int parameters are sufficient for that.
data Type a where !Int -> !Int -> a -> Type a
This leads to the binary predicate
subtypeOf :: Type a -> Type a -> Bool
I now want to test with QuickCheck that this does indeed do what I want it to do. The following property, however, does not work, because QuickCheck just gives up:
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (\x → x `subtypeOf` t):(map subtypeSanity f))
If I leave out the recursive call to subtypeSanity, i.e. the tail of the list I'm passing to conjoin, the property runs fine, but tests just the root node of the tree! How can I descend into my data structure recursively without QuickCheck giving up on generating new test cases?
If needed, I could provide the code to construct the Schubert Hierarchy, and the Arbitrary instance for Tree (Type a), to provide a complete runnable example, but that would be quite a bit of code. I'm convinced that I'm just not "getting" QuickCheck, and using it in the wrong way here.
EDIT: unfortunately, the sized function does not seem to eliminate the problem here. It ends up with the same result (see comment to J. Abrahamson's answer.)
EDIT II: I ended up "fixing" my problem by avoiding the recursive step, and avoiding conjoin. We just make a list of all nodes in the tree, then test the single-node property (which worked fine from the beginning) on those.
allNodes ∷ Tree a → [Tree a]
allNodes n#(Node { subForest = f }) = n:(concatMap allNodes f)
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity tree = forAll (elements $ allNodes tree)
(\(Node { rootLabel = t, subForest = f }) →
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> forAll (elements subtypes) (\x → x `subtypeOf` t))
Tweaking the Arbitrary instance for trees did not work. Here is the arbitrary instance I'm still using:
instance (Arbitrary a, Eq a) ⇒ Arbitrary (Tree (Type a)) where
arbitrary = liftM (constructH) $ sized arbTree
arbTree ∷ Arbitrary a ⇒ Int → Gen (Tree a)
arbTree n = do
m ← choose (0,n)
if m == 0
then Node <$> arbitrary <*> (return [])
else do part ← randomPartition n m
Node <$> arbitrary <*> mapM arbTree part
-- this is a crude way to find a sufficiently random x1,..,xm,
-- such that x1 + .. + xm = n, for any n, m, with 0 < m.
randomPartition ∷ Int → Int → Gen [Int]
randomPartition n m' = do
let m = m' - 1
seed ← liftM ((++[n]) . sort) $ replicateM m (choose (0,n))
return $ zipWith (-) seed (0:seed)
I consider the problem "solved for now," but if someone could explain to me why the recursive step and/or conjoin made QuickCheck give up (after passing "only" 0 tests,) I would be more than grateful.
When generating Arbitrary recursive structures, QuickCheck is often a bit too eager and generates sprawling, enormous random examples. These are undesirable as they usually don't better check the properties of interest and can be very slow. Two solutions are
Use things like the size parameter (sized function) and frequency function to bias the generator toward small trees.
Use a small-type oriented generator like those in smallcheck. These try to exhaustively generate all "small" examples and thus help to keep the size of the tree down.
To clarify the sized and frequency method of controlling generation size, here's an example RoseTree
data Rose a = It a | Rose [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = frequency
[ (3, It <$> arbitrary) -- The 3-to-1 ratio is chosen, ah,
-- arbitrarily...
-- you'll want to tune it
, (1, Rose <$> children)
]
where children = sized $ \n -> vectorOf n arbitrary
It can be done even more simply with a different Rose formation by very carefully controlling the size of the child list
data Rose a = Rose a [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = Rose <$> arbitrary <*> sized (\n -> vectorOf (tuneUp n) arbitrary)
where tuneUp n = round $ fromIntegral n / 4.0
You could do this without referencing sized, but that gives the user of your Arbitrary instance a knob to ask for larger trees if needed.
In case it's useful for those stumbling across this issue: when QuickCheck "gives up", it's a sign that your pre-condition (using ==>) is too hard to satisfy.
QuickCheck uses a simple rejection sampling technique: pre-conditions have no effect on the generation of values. QuickCheck generates a bunch of random values like normal. After these are generated, they're sent through the pre-condition: if the result is True, the property is tested with that value; if it's False, that value is discarded. If your pre-condition rejects most of the values QuickCheck has generated, then QuickCheck will "give up" (better to give up completely, than to make statistically dubious pass/fail claims).
In particular, QuickCheck will not attempt to produce values which satisfy a given pre-condition. It's up to you to make sure that the generator you're using (arbitrary or otherwise) produces lots of values which pass your pre-condition.
Let's see how this is manifesting in your example:
subtypeSanity :: Tree (Type ()) -> Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (`subtypeOf` t):(map subtypeSanity f))
There is only one occurance of ==>, so its precondition (not $ null subtypes) must be too hard to satisfy. This is due to the recursive call map subtypeSanity f: not only are you rejecting any Tree which has an empty subForest, you're also (due to the recursion) rejecting any Tree where the subForest contains Trees with empty subForests, and rejecting any Tree where the subForest contains Trees with subForests containing Trees with empty subForests, and so on.
According to your arbitrary instance, Trees are only nested to finite depth: eventually we will always reach an empty subForest, hence your recursive precondition will always fail, and QuickCheck will give up.

Nil Value for Tree a -> a in Haskell

So I have a tree defined as
data Tree a = Leaf | Node a (Tree a) (Tree a) deriving Show
I know I can define Leaf to be Leaf a. But I really just want my nodes to have values. My problem is that when I do a search I have a return value function of type
Tree a -> a
Since leafs have no value I am confused how to say if you encounter a leaf do nothing. I tried nil, " ", ' ', [] nothing seems to work.
Edit Code
data Tree a = Leaf | Node a (Tree a) (Tree a) deriving Show
breadthFirst :: Tree a -> [a]
breadthFirst x = _breadthFirst [x]
_breadthFirst :: [Tree a] -> [a]
_breadthFirst [] = []
_breadthFirst xs = map treeValue xs ++
_breadthFirst (concat (map immediateChildren xs))
immediateChildren :: Tree a -> [Tree a]
immediateChildren (Leaf) = []
immediateChildren (Node n left right) = [left, right]
treeValue :: Tree a -> a
treeValue (Leaf) = //this is where i need nil
treeValue (Node n left right) = n
test = breadthFirst (Node 1 (Node 2 (Node 4 Leaf Leaf) Leaf) (Node 3 Leaf (Node 5 Leaf Leaf)))
main =
do putStrLn $ show $ test
In Haskell, types do not have an "empty" or nil value by default. When you have something of type Integer, for example, you always have an actual number and never anything like nil, null or None.
Most of the time, this behavior is good. You can never run into null pointer exceptions when you don't expect them, because you can never have nulls when you don't expect them. However, sometimes we really need to have a Nothing value of some sort; your tree function is a perfect example: if we don't find a result in the tree, we have to signify that somehow.
The most obvious way to add a "null" value like this is to just wrap it in a type:
data Nullable a = Null | NotNull a
So if you want an Integer which could also be Null, you just use a Nullable Integer. You could easily add this type yourself; there's nothing special about it.
Happily, the Haskell standard library has a type like this already, just with a different name:
data Maybe a = Nothing | Just a
you can use this type in your tree function as follows:
treeValue :: Tree a -> Maybe a
treeValue (Node value _ _) = Just value
treeValue Leaf = Nothing
You can use a value wrapped in a Maybe by pattern-matching. So if you have a list of [Maybe a] and you want to get a [String] out, you could do this:
showMaybe (Just a) = show a
showMaybe Nothing = ""
myList = map showMaybe listOfMaybes
Finally, there are a bunch of useful functions defined in the Data.Maybe module. For example, there is mapMaybe which maps a list and throws out all the Nothing values. This is probably what you would want to use for your _breadthFirst function, for example.
So my solution in this case would be to use Maybe and mapMaybe. To put it simply, you'd change treeValue to
treeValue :: Tree a -> Maybe a
treeValue (Leaf) = Nothing
treeValue (Node n left right) = Just n
Then instead of using map to combine this, use mapMaybe (from Data.Maybe) which will automatically strip away the Just and ignore it if it's Nothing.
mapMaybe treeValue xs
Voila!
Maybe is Haskell's way of saying "Something might not have a value" and is just defined like this:
data Maybe a = Just a | Nothing
It's the moral equivalent of having a Nullable type. Haskell just makes you acknowledge the fact that you'll have to handle the case where it is "null". When you need them, Data.Maybe has tons of useful functions, like mapMaybe available.
In this case, you can simply use a list comprehension instead of map and get rid of treeValue:
_breadthFirst xs = [n | Node n _ _ <- xs] ++ ...
This works because using a pattern on the left hand side of <- in a list comprehension skips items that don't match the pattern.
This function treeValue :: Tree a -> a can't be a total function, because not all Tree a values actually contain an a for you to return! A Leaf is analogous to the empty list [], which is still of type [a] but doesn't actually contain an a.
The head function from the standard library has the type [a] -> a, and it also can't work all of the time:
*Main> head []
*** Exception: Prelude.head: empty list
You could write treeValue to behave similarly:
treeValue :: Tree a -> a
treeValue Leaf = error "empty tree"
treeValue (Node n _ _) = n
But this won't actually help you, because now map treeValue xs will throw an error if any of the xs are Leaf values.
Experienced Haskellers usually try to avoid using head and functions like it for this very reason. Sure, there's no way to get an a from any given [a] or Tree a, but maybe you don't need to. In your case, you're really trying to get a list of a from a list of Tree, and you're happy for Leaf to simply contribute nothing to the list rather than throw an error. But treeValue :: Tree a -> a doesn't help you build that. It's the wrong function to help you solve your problem.
A function that helps you do whatever you need would be Tree a -> Maybe a, as explained very well in some other answers. That allows you to later decide what to do about the "missing" value. When you go to use the Maybe a, if there's really nothing else to do, you can call error then (or use fromJust which does exactly that), or you can decide what else to do. But a treeValue that claims to be able to return an a from any Tree a and then calls error when it can't denies any caller the ability to decide what to do if there's no a.

Resources