Indexing into containers: the mathematical underpinnings - haskell

When you want to pull an element out of a data structure, you have to give its index. But the meaning of index depends on the data structure itself.
class Indexed f where
type Ix f
(!) :: f a -> Ix f -> Maybe a -- indices can be out of bounds
For example...
Elements in a list have numeric positions.
data Nat = Z | S Nat
instance Indexed [] where
type Ix [] = Nat
[] ! _ = Nothing
(x:_) ! Z = Just x
(_:xs) ! (S n) = xs ! n
Elements in a binary tree are identified by a sequence of directions.
data Tree a = Leaf | Node (Tree a) a (Tree a)
data TreeIx = Stop | GoL TreeIx | GoR TreeIx -- equivalently [Bool]
instance Indexed Tree where
type Ix Tree = TreeIx
Leaf ! _ = Nothing
Node l x r ! Stop = Just x
Node l x r ! GoL i = l ! i
Node l x r ! GoR j = r ! j
Looking for something in a rose tree entails stepping down the levels one at a time by selecting a tree from the forest at each level.
data Rose a = Rose a [Rose a] -- I don't even like rosé
data RoseIx = Top | Down Nat RoseIx -- equivalently [Nat]
instance Indexed Rose where
type Ix Rose = RoseIx
Rose x ts ! Top = Just x
Rose x ts ! Down i j = ts ! i >>= (! j)
It seems that the index of a product type is a sum (telling you which arm of the product to look at), the index of an element is the unit type, and the index of a nested type is a product (telling you where to look in the nested type). Sums seem to be the only one which aren't somehow linked to the derivative. The index of a sum is also a sum - it tells you which part of the sum the user is hoping to find, and if that expectation is violated you're left with a handful of Nothing.
In fact I had some success implementing ! generically for functors defined as the fixed point of a polynomial bifunctor. I won't go into detail, but Fix f can be made an instance of Indexed when f is an instance of Indexed2...
class Indexed2 f where
type IxA f
type IxB f
ixA :: f a b -> IxA f -> Maybe a
ixB :: f a b -> IxB f -> Maybe b
... and it turns out you can define an instance of Indexed2 for each of the bifunctor building blocks.
But what's really going on? What is the underlying relationship between a functor and its index? How does it relate to the functor's derivative? Does one need to understand the theory of containers (which I don't, really) to answer this question?

It seems like the index into the type is an index into the set of constructors, following by an index into the product representing that constructor. This can be implemented quite naturally with e.g. generics-sop.
First you need a datatype to represent possible indices into a single element of the product. This could be an index pointing to an element of type a,
or an index pointing to something of type g b - which requires an index pointing into g and an index pointing to an element of type a in b. This is encoded with the following type:
import Generics.SOP
data ArgIx f x x' where
Here :: ArgIx f x x
There :: (Generic (g x')) => Ix g -> ArgIx f x x' -> ArgIx f x (g x')
newtype Ix f = ...
The index itself is just a sum (implemented by NS for n-ary sum) of sums over the generic representation of the type (choice of constructor, choice of constructor element):
newtype Ix f = MkIx (forall x . NS (NS (ArgIx f x)) (Code (f x)))
You can write smart constructors for various indices:
listIx :: Natural -> Ix []
listIx 0 = MkIx $ S $ Z $ Z Here
listIx k = MkIx $ S $ Z $ S $ Z $ There (listIx (k-1)) Here
treeIx :: [Bool] -> Ix Tree
treeIx [] = MkIx $ S $ Z $ S $ Z Here
treeIx (b:bs) =
case b of
True -> MkIx $ S $ Z $ Z $ There (treeIx bs) Here
False -> MkIx $ S $ Z $ S $ S $ Z $ There (treeIx bs) Here
roseIx :: [Natural] -> Ix Rose
roseIx [] = MkIx $ Z $ Z Here
roseIx (k:ks) = MkIx $ Z $ S $ Z $ There (listIx k) (There (roseIx ks) Here)
Note that e.g. in the list case, you cannot construct an (non-bottom) index pointing to the [] constructor - likewise for Tree and Empty, or constructors containing values whose type is not a or something containing some values of type a. The quantification in MkIx prevents the construction bad things like an index pointing to the first Int in data X x = X Int x where x is instantiated to Int.
The implementation of the index function is fairly straightforward, even if the types are scary:
(!) :: (Generic (f x)) => f x -> Ix f -> Maybe x
(!) arg (MkIx ix) = go (unSOP $ from arg) ix where
atIx :: a -> ArgIx f x a -> Maybe x
atIx a Here = Just a
atIx a (There ix0 ix1) = a ! ix0 >>= flip atIx ix1
go :: (All SListI xss) => NS (NP I) xss -> NS (NS (ArgIx f x)) xss -> Maybe x
go (Z a) (Z b) = hcollapse $ hzipWith (\(I x) -> K . atIx x) a b
go (S x) (S x') = go x x'
go Z{} S{} = Nothing
go S{} Z{} = Nothing
The go function compares the constructor pointed to by the index and the actual constructor used by the type. If the constructors don't match, the indexing returns Nothing. If they do, the actual indexing is done - which is trivial in the case that the index points exactly Here, and in the case of some substructure, both indexing operations must succeed one after the other, which is handled by >>=.
And a simple test:
>map (("hello" !) . listIx) [0..5]
[Just 'h',Just 'e',Just 'l',Just 'l',Just 'o',Nothing]

Related

How does repmin place values in the tree in Haskell?

I really like the repmin problem:
Write down repmin :: Tree Int -> Tree Int, which replaces all the numbers in the tree by their minimum in a single pass.
If I were writing something like this in python, I would go for passing values by their reference (let's say one-element lists instead of numbers is good enough):
def repmin(tree, wrapped_min_link=None):
x, subforest = tree
if wrapped_min_link is None:
wrapped_min_link = [x]
else:
[m] = wrapped_min_link
wrapped_min_link = [min(m, x)]
n = len(subforest)
subforest_min = [None] * n
for i in range(n):
if subforest[i]:
subforest_min[i] = repmin(subforest[i], wrapped_min_link)
return (wrapped_min_link, subforest_min)
It seems to me like a fitting way to wrap one's head around the knot-tying solution in Haskell (I wrote this one for rose trees from Data.Tree):
copyRose :: Tree Int -> Int -> (Tree Int, Int)
copyRose (Node x []) m = (Node m [], x)
copyRose (Node x fo) m =
let
unzipIdMinimum =
foldr (\ ~(a, b) ~(as, bmin) -> (a:as, b `min` bmin)) ([], maxBound :: Int)
(fo', y) = unzipIdMinimum . map (flip copyRose m) $ fo
in (Node m fo', x `min` y)
repmin :: Tree Int -> Tree Int
repmin = (loop . uncurry) copyRose
Yet, I reckon the solutions to work very differently. Here is my understanding of the latter one:
Let us rewrite loop for (->) a bit:
loop f b = let cd = f (b, snd cd) in fst cd
I reckon it to be loop for (->)'s workalike as snd gives the same degree of laziness as pattern-matching within let.
So, when repmin traverses through the tree, it is:
Building up the minimum in the tree to be returned as the second element of the pair.
Leaves snd $ copyRose (tree, m) behind in every node.
Thus, when the traversal comes to an end, the programme knows the value of snd $ copyRose (tree, m) (that is, the minimum in the tree) and is able to show it whenever some node of the tree is being computed.
Do I understand repmin in Haskell correctly?
This is more an extended comment than an answer, but I don't really think of your implementation as single-pass. It looks like it traverses the tree once, producing a new, lazily-generated, tree and the global minimum, but it actually produces a lazily generated tree and an enormous tree of thunks that will eventually calculate the minimum. To avoid this, you can get closer to the Python code by generating the tree eagerly, keeping track of the minimum as you go.
You'll note that I've generalized the type from Int to an arbitrary Ord type. You'll also note that I've used to different type variables to refer to the type of elements in the given tree and the type of the minimum passed in to generate a new tree—this lets the type system tell me if I mix them up.
repmin :: Tree a -> Tree a
repmin = (loop . uncurry) copyRose
copyRose :: Ord a => Tree a -> b -> (Tree b, a)
copyRose (Node x ts) final_min
| (ts', m) <- copyForest x ts final_min
= (Node final_min ts', m)
copyForest :: Ord a => a -> [Tree a] -> b -> ([Tree b], a)
copyForest !m [] _final_min = ([], m)
copyForest !m (t : ts) final_min
| (t', m') <- copyTree m t final_min
, (ts', m'') <- copyForest m' ts final_min
= (t' : ts', m'')
copyTree :: Ord a => a -> Tree a -> b -> (Tree b, a)
copyTree !m (Node x ts) final_min
| (ts', m') <- copyForest (min m x) ts final_min
= (Node final_min ts', m')
Exercise: rewrite this in monadic style using ReaderT to pass the global minimum and State to keep track of the minimum so far.

Haskell Function for checking if element is in Tree, returning Depth

I am currently doing an assigment for a class in which I have to implement a function which checks if an element is in a tree.
It is supposed to return Nothing when the element is not in the tree and Just (depth at which it was found) when it is.
An example:
sample1
##1
#3 2
###7 5 6 4
- contains 6 sample1 returns Just 2
- contains 1 sample1 returns Just 0
- contains 2 sample1 returns Just 1
- contains 8 sample1 returns Nothing
Here is what we are given:
Heap functional data structure:
module Fdata.Heap where
-- A signature for min-heaps
data Heap e t = Heap {
empty :: t e,
insert :: e -> t e -> t e,
findMin :: t e -> Maybe e,
deleteMin :: t e -> Maybe (t e),
merge :: t e -> t e -> t e,
contains :: e -> t e -> Maybe Int
}
An implementation of self-adjusting heaps:
import Fdata.Heap
import Fdata.Tree
-- An implementation of self-adjusting heaps
heap :: (Eq e, Ord e) => Heap e Tree
heap = Heap {
empty = Empty,
insert = \x t -> merge' (Node x Empty Empty) t,
findMin = \t -> case t of
Empty -> Nothing
(Node x _ _) -> Just x,
deleteMin = \t -> case t of
Empty -> Nothing
(Node _ l r) -> Just (merge' r l),
merge = \l r -> case (l, r) of
(Empty, t) -> t
(t, Empty) -> t
(t1#(Node x1 l1 r1), t2#(Node x2 l2 r2)) ->
if x1 <= x2
then Node x1 (merge' t2 r1) l1
else Node x2 (merge' t1 r2) l2,
contains = \x t -> case (x,t) of
(x,Empty)-> Nothing
(x,tx#(Node x1 l1 r1) ->
|x==x1 = Just 0
|x>x1 = (1+ (contains x l)
|x<x1 = (1+ (contains x r)
}
where
merge' = merge heap
The tree implementation
module Fdata.Tree where
import Fdata.Heap
data Tree x
= Empty
| Node x (Tree x) (Tree x)
deriving (Eq, Show)
leaf x = Node x Empty Empty
-- Convert a list to a heap
list2heap :: Heap x t -> [x] -> t x
list2heap i = foldl f z
where
f = flip $ insert i
z = empty i
-- Convert a heap to a list
heap2list :: Heap x t -> t x -> [x]
heap2list i t
= case (findMin i t, deleteMin i t) of
(Nothing, Nothing) -> []
(Just x, Just t') -> x : heap2list i t'
I am supposed to implement the contains function in the implementation for self-adjusting heaps.
I am not allowed to use any helper functions and I am supposed to use the maybe function.
My current implementation:
contains = \x t -> case (x,t) of
(x,Empty) -> Nothing
(x,tx#(Node x1 l1 r1))
|x==x1 -> Just 0
|x>x1 -> (1+ (contains x l1)
|x<x1 -> (1+ (contains x r1)
This does not work, since I get a parse error on input |.
I really dont know how to fix this since I did use 4 spaces instead of tabs and according to this: https://wiki.haskell.org/Case
the syntax is correct...
I once managed to fix this, but I got a type error about (1+ (contains x l), so this probably is not correct.
Any hint would be appreciated.
EDIT:
Thanks to everyone who answered!
Really appreciate that everyone took the time to explain their answers in great detail.
First of all:
there were some smaller mistakes, as pointed out by some of you in the comments:
I missed one closing parenthesis and accidentially named one argument l1 and another r1 and afterwards used r and l.
Fixed both mistakes.
Someone wrote that I do not need to use a lambda function. The problem is when I use something like:
contains _ Empty = Nothing
I get the error:
parse Error on input '_'.
However, lambda functions do not give me any errors about the input arguments.
Currently the only function that works without any errors is:
contains = \e t -> case (e,t) of
(_,Empty) -> Nothing
(e , Node x t1 t2) ->
if e == (head (heap2list heap (Node x t1 t2)))
then Just 0
else if (fmap (+1) (contains heap e t1))== Nothing
then (fmap (+1) (contains heap e t2))
else (fmap (+1) (contains heap e t1))
Found at:
Counting/Getting "Level" of a hierarchical data
Found by:Krom
One way of structuring contains :: Eq a => a -> Tree a -> Maybe Integer is to first label each element in your tree with its depth, using something like this, then fold the tree to find the element you're looking for, pulling its depth out with it. You can do this without very much code!
Jumping right in where this answer left off, here's contains.
contains :: Eq a => a -> Tree a -> Maybe Integer
contains x = fmap fst . find ((== x) . snd) . labelDepths
That's the whole function! This is classic functional programming style: rather than hand-crank a bespoke recursive tree traversal function I've structured the code as a pipeline of reusable operations. In Haskell pipelines are constructed using the composition operator (.) and are read from left to right. The result of labelDepths is passed to find ((== x) . snd), whose result is then passed to fmap fst.
labelDepths :: Tree a -> Tree (Integer, a), which I've explained in detail in the answer I linked above, attaches an Integer depth to each element of the input tree.
find :: Foldable t => (a -> Bool) -> t a -> Maybe a is a standard function which extracts the first element of a container (like a tree, or a list) that satisfies a predicate. In this instance, the Foldable structure in question is a Tree, so t ~ Tree and find :: (a -> Bool) -> Tree a -> Maybe a. The predicate I've given to find is ((== x) . snd), which returns True if the second element of its input tuple equals x: find ((== x) . snd) :: Tree (Integer, a) -> Maybe (Integer, a). find works by folding the input structure - testing its elements one at a time until it finds one that matches the predicate. The order in which elements are processed is defined by the container's Foldable instance, of which more below.
fmap :: Functor f => (a -> b) -> f a -> f b is another standard function. It applies a mapping function uniformly to each element of a container, transforming its elements from type a to type b. This time the container in question is the return value of find, which is a Maybe, so fmap :: (a -> b) -> Maybe a -> Maybe b. The mapping function I've supplied is fst, which extracts the first element of a tuple: fmap fst :: Maybe (Integer, a) -> Maybe Integer.
So putting it all together, you can see that this is a fairly direct implementation of my English description of the process above. First we label every element in the tree with its depth, then we find an element which matches the item we're looking for, then we extract the depth with which the element was previously labelled.
I mentioned above that Tree is a Foldable container. In fact, this isn't the case quite yet - there's no instance of Foldable for Tree. The easiest way to get a Foldable instance for Tree is to turn on the DeriveFoldable GHC extension and utter the magic words deriving Foldable.
{-# LANGUAGE DeriveFoldable #-}
data Tree x = Empty | Node x (Tree x) (Tree x) deriving Foldable
This automatically-implemented instance of Foldable will perform a preorder traversal, processing the tree in a top-down fashion. (x is considered to be "to the left of" l and r in the expression Node x l r.) You can adjust the derived traversal order by adjusting the layout of the Node constructor.
That said, I'm guessing that this is an assignment and you're not allowed to modify the definition of Tree or apply any language extensions. So you'll need to hand-write your own instance of Foldable, following the template at the bottom of this post. Here's an implementation of foldr which performs a preorder traversal.
instance Foldable Tree where
foldr f z Empty = z
foldr f z (Node x l r) = f x (foldr f (foldr f z r) l)
The Node case is the interesting one. We fold the tree from right to left (since this is a foldr) and from bottom to top. First we fold the right subtree, placing z at the rightmost leaf. Then we use the aggregated result of the right subtree as the seed for folding the left subtree. Finally we use the result of folding all of the Node's children as the aggregator to apply to f x.
Hopefully you didn't find this answer too advanced! (Happy to answer any questions you have.) While the other answers do a good job of showcasing how to write recursive tree traversal functions, I really wanted to give you a glimpse of the real power of functional programming. When you think at a higher level - breaking down a problem into its component parts, structuring operations as pipelines, and learning to spot common patterns like zipping, folding and mapping - you can be very productive and solve problems with very little code.
An instance of Foldable for a binary tree
To instantiate Foldable you need to provide a definition for at least foldMap or foldr.
data Tree a = Leaf
| Node (Tree a) a (Tree a)
instance Foldable Tree where
foldMap f Leaf = mempty
foldMap f (Node l x r) = foldMap f l `mappend` f x `mappend` foldMap f r
foldr f acc Leaf = acc
foldr f acc (Node l x r) = foldr f (f x (foldr f acc r)) l
This implementation performs an in-order traversal of the tree.
ghci> let myTree = Node (Node Leaf 'a' Leaf) 'b' (Node Leaf 'c' Leaf)
-- +--'b'--+
-- | |
-- +-'a'-+ +-'c'-+
-- | | | |
-- * * * *
ghci> toList myTree
"abc"
The DeriveFoldable extension allows GHC to generate Foldable instances based on the structure of the type. We can vary the order of the machine-written traversal by adjusting the layout of the Node constructor.
data Inorder a = ILeaf
| INode (Inorder a) a (Inorder a) -- as before
deriving Foldable
data Preorder a = PrLeaf
| PrNode a (Preorder a) (Preorder a)
deriving Foldable
data Postorder a = PoLeaf
| PoNode (Postorder a) (Postorder a) a
deriving Foldable
-- injections from the earlier Tree type
inorder :: Tree a -> Inorder a
inorder Leaf = ILeaf
inorder (Node l x r) = INode (inorder l) x (inorder r)
preorder :: Tree a -> Preorder a
preorder Leaf = PrLeaf
preorder (Node l x r) = PrNode x (preorder l) (preorder r)
postorder :: Tree a -> Postorder a
postorder Leaf = PoLeaf
postorder (Node l x r) = PoNode (postorder l) (postorder r) x
ghci> toList (inorder myTree)
"abc"
ghci> toList (preorder myTree)
"bac"
ghci> toList (postorder myTree)
"acb"
This function doesn't need to be a lambda:
contains x t =
Adding x to the case serves no purpose, since you only match it back to x. You can instead use pattern matching in the function head:
contains _ Empty = Nothing
The Node case has three sub-cases, where the value being searched for is less-than, greater-than, or equal to the value in the Node. If you order them that way, you get a symmetry from the less-than and greater-than tests, and can handle the equal case with an otherwise.
When recusring, you are going to get a Maybe Int, to which you want to add one. You can't do that directly because the Int is inside the Maybe. Normally, you would lift the addition, but I suspect that this is where the required call to maybe should go (however unnatural it may seem):
contains x (Node x' l r) | x < x' = maybe Nothing (Just . (+1)) $ contains x l
| x > x' = maybe Nothing (Just . (+1)) $ contains x r
| otherwise = Just 0
Instead of using maybe, the (+1) could have been lifted into the Maybe with fmap (or <$>):
... = fmap (+1) $ contains ...
Using maybe is unnatural because it has to explicitly pass the Nothing, and also re-wrap the Just.
This does not work, since I get a parse error on input |
Your previous line misses a closing parenthesis.
I got a Typ error about (1+ (contains x l)), so this probably is not correct.
The idea is totally correct, the issue is that contains x l returns a Maybe Int instead of an Int so you cannot directly add to that. You can only add to the result when it's a Just. There's a helper function that does exactly that, do something to Justs and keep Nothings: fmap (from Functor).
contains = \x t -> case (x,t) of
(x,Empty)-> Nothing
(x,tx#(Node x1 l1 r1))
|x==x1 -> Just 0
|x>x1 -> fmap (1+) (contains x l)
|x<x1 -> fmap (1+) (contains x r)
Btw, I'd write this as
contains x Empty = Nothing
contains x (Node v l r) = if x == v
then Just 0
else fmap (+1) $ contains x $ if x > v then l else r

Mapping while showing intermediate states

I need a function that does this:
>>> func (+1) [1,2,3]
[[2,2,3],[2,3,3],[2,3,4]]
My real case is more complex, but this example shows the gist of the problem. The main difference is that in reality using indexes would be infeasible. The List should be a Traversable or Foldable.
EDIT: This should be the signature of the function:
func :: Traversable t => (a -> a) -> t a -> [t a]
And closer to what I really want is the same signature to traverse but can't figure out the function I have to use, to get the desired result.
func :: (Traversable t, Applicative f) :: (a -> f a) -> t a -> f (t a)
It looks like #Benjamin Hodgson misread your question and thought you wanted f applied to a single element in each partial result. Because of this, you've ended up thinking his approach doesn't apply to your problem, but I think it does. Consider the following variation:
import Control.Monad.State
indexed :: (Traversable t) => t a -> (t (Int, a), Int)
indexed t = runState (traverse addIndex t) 0
where addIndex x = state (\k -> ((k, x), k+1))
scanMap :: (Traversable t) => (a -> a) -> t a -> [t a]
scanMap f t =
let (ti, n) = indexed (fmap (\x -> (x, f x)) t)
partial i = fmap (\(k, (x, y)) -> if k < i then y else x) ti
in map partial [1..n]
Here, indexed operates in the state monad to add an incrementing index to elements of a traversable object (and gets the length "for free", whatever that means):
> indexed ['a','b','c']
([(0,'a'),(1,'b'),(2,'c')],3)
and, again, as Ben pointed out, it could also be written using mapAccumL:
indexed = swap . mapAccumL (\k x -> (k+1, (k, x))) 0
Then, scanMap takes the traversable object, fmaps it to a similar structure of before/after pairs, uses indexed to index it, and applies a sequence of partial functions, where partial i selects "afters" for the first i elements and "befores" for the rest.
> scanMap (*2) [1,2,3]
[[2,2,3],[2,4,3],[2,4,6]]
As for generalizing this from lists to something else, I can't figure out exactly what you're trying to do with your second signature:
func :: (Traversable t, Applicative f) => (a -> f a) -> t a -> f (t a)
because if you specialize this to a list you get:
func' :: (Traversable t) => (a -> [a]) -> t a -> [t a]
and it's not at all clear what you'd want this to do here.
On lists, I'd use the following. Feel free to discard the first element, if not wanted.
> let mymap f [] = [[]] ; mymap f ys#(x:xs) = ys : map (f x:) (mymap f xs)
> mymap (+1) [1,2,3]
[[1,2,3],[2,2,3],[2,3,3],[2,3,4]]
This can also work on Foldable, of course, after one uses toList to convert the foldable to a list. One might still want a better implementation that would avoid that step, though, especially if we want to preserve the original foldable type, and not just obtain a list.
I just called it func, per your question, because I couldn't think of a better name.
import Control.Monad.State
func f t = [evalState (traverse update t) n | n <- [0..length t - 1]]
where update x = do
n <- get
let y = if n == 0 then f x else x
put (n-1)
return y
The idea is that update counts down from n, and when it reaches 0 we apply f. We keep n in the state monad so that traverse can plumb n through as you walk across the traversable.
ghci> func (+1) [1,1,1]
[[2,1,1],[1,2,1],[1,1,2]]
You could probably save a few keystrokes using mapAccumL, a HOF which captures the pattern of traversing in the state monad.
This sounds a little like a zipper without a focus; maybe something like this:
data Zippy a b = Zippy { accum :: [b] -> [b], rest :: [a] }
mapZippy :: (a -> b) -> [a] -> [Zippy a b]
mapZippy f = go id where
go a [] = []
go a (x:xs) = Zippy b xs : go b xs where
b = a . (f x :)
instance (Show a, Show b) => Show (Zippy a b) where
show (Zippy xs ys) = show (xs [], ys)
mapZippy succ [1,2,3]
-- [([2],[2,3]),([2,3],[3]),([2,3,4],[])]
(using difference lists here for efficiency's sake)
To convert to a fold looks a little like a paramorphism:
para :: (a -> [a] -> b -> b) -> b -> [a] -> b
para f b [] = b
para f b (x:xs) = f x xs (para f b xs)
mapZippy :: (a -> b) -> [a] -> [Zippy a b]
mapZippy f xs = para g (const []) xs id where
g e zs r d = Zippy nd zs : r nd where
nd = d . (f e:)
For arbitrary traversals, there's a cool time-travelling state transformer called Tardis that lets you pass state forwards and backwards:
mapZippy :: Traversable t => (a -> b) -> t a -> t (Zippy a b)
mapZippy f = flip evalTardis ([],id) . traverse g where
g x = do
modifyBackwards (x:)
modifyForwards (. (f x:))
Zippy <$> getPast <*> getFuture

Recursion scheme in Haskell for repeatedly breaking datatypes into "head" and "tail" and yielding a structure of results

In Haskell, I recently found the following function useful:
listCase :: (a -> [a] -> b) -> [a] -> [b]
listCase f [] = []
listCase f (x:xs) = f x xs : listCase f xs
I used it to generate sliding windows of size 3 from a list, like this:
*Main> listCase (\_ -> take 3) [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]
Is there a more general recursion scheme which captures this pattern? More specifically, that allows you to generate a some structure of results by repeatedly breaking data into a "head" and "tail"?
What you are asking for is a comonad. This may sound scarier than monad, but is a simpler concept (YMMV).
Comonads are Functors with additional structure:
class Functor w => Comonad w where
extract :: w a -> a
duplicate :: w a -> w (w a)
extend :: (w a -> b) -> w a -> w b
(extendand duplicate can be defined in terms of each other)
and laws similar to the monad laws:
duplicate . extract = id
duplicate . fmap extract = id
duplicate . duplicate = fmap duplicate . duplicate
Specifically, the signature (a -> [a] -> b) takes non-empty Lists of type a. The usual type [a] is not an instance of a comonad, but the non-empty lists are:
data NE a = T a | a :. NE a deriving Functor
instance Comonad NE where
extract (T x) = x
extract (x :. _) = x
duplicate z#(T _) = T z
duplicate z#(_ :. xs) = z :. duplicate xs
The comonad laws allow only this instance for non-empty lists (actually a second one).
Your function then becomes
extend (take 3 . drop 1 . toList)
Where toList :: NE a -> [a] is obvious.
This is worse than the original, but extend can be written as =>> which is simpler if applied repeatedly.
For further information, you may start at What is the Comonad typeclass in Haskell?.
This looks like a special case of a (jargon here but it can help with googling) paramorphism, a generalisation of primitive recursion to all initial algebras.
Reimplementing ListCase
Let's have a look at how to reimplement your function using such a combinator. First we define the notion of paramorphism: a recursion principle where not only the result of the recursive call is available but also the entire substructure this call was performed on:
The type of paraList tells me that in the (:) case, I will have access to the head, the tail and the value of the recursive call on the tail and that I need to provide a value for the base case.
module ListCase where
paraList :: (a -> [a] -> b -> b) -- cons
-> b -- nil
-> [a] -> b -- resulting function on lists
paraList c n [] = n
paraList c n (x : xs) = c x xs $ paraList c n xs
We can now give an alternative definition of listCase:
listCase' :: (a -> [a] -> b) -> [a] -> [b]
listCase' c = paraList (\ x xs tl -> c x xs : tl) []
Considering the general case
In the general case, we are interested in building a definition of paramorphism for all data structures defined as the fixpoint of a (strictly positive) functor. We use the traditional fixpoint operator:
newtype Fix f = Fix { unFix :: f (Fix f) }
This builds an inductive structure layer by layer. The layers have an f shape which maybe better grasped by recalling the definition of List using this formalism. A layer is either Nothing (we're done!) or Just (head, tail):
newtype ListF a as = ListF { unListF :: Maybe (a, as) }
type List a = Fix (ListF a)
nil :: List a
nil = Fix $ ListF $ Nothing
cons :: a -> List a -> List a
cons = curry $ Fix . ListF .Just
Now that we have this general framework, we can define para generically for all Fix f where f is a functor:
para :: Functor f => (f (Fix f, b) -> b) -> Fix f -> b
para alg = alg . fmap (\ rec -> (rec, para alg rec)) . unFix
Of course, ListF a is a functor. Meaning we could use para to reimplement paraList and listCase.
instance Functor (ListF a) where fmap f = ListF . fmap (fmap f) . unListF
paraList' :: (a -> List a -> b -> b) -> b -> List a -> b
paraList' c n = para $ maybe n (\ (a, (as, b)) -> c a as b) . unListF
listCase'' :: (a -> List a -> b) -> List a -> List b
listCase'' c = paraList' (\ x xs tl -> cons (c x xs) tl) nil
You can implement a simple bijection toList, fromList to test it if you want. I could not be bothered to reimplement take so it's pretty ugly:
toList :: [a] -> List a
toList = foldr cons nil
fromList :: List a -> [a]
fromList = paraList' (\ x _ tl -> x : tl) []
*ListCase> fmap fromList . fromList . listCase'' (\ _ as -> toList $ take 3 $ fromList as). toList $ [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]

How can I check if a BST is valid?

How can I check if a BST is a valid one, given its definition and using a generalized version of fold for BST?
data(Ord a, Show a, Read a) => BST a = Void | Node {
val :: a,
left, right :: BST a
} deriving (Eq, Ord, Read, Show)
fold :: (Read a, Show a, Ord a) => (a -> b -> b -> b) -> b -> BST a -> b
fold _ z Void = z
fold f z (Node x l r) = f x (fold f z l) (fold f z r)
The idea is to check that a node value is greater then all values in left-subtree and smaller than all values in its right-subtree. This must be True for all nodes in the tree. A function bstList simply output the list of (ordered) values in the BST.
Of course something like this won't work:
--isBST :: (Read a, Show a, Ord a) => BST a -> Bool
isBST t = fold (\x l r -> all (<x) (bstList l) && all (>x) (bstList r)) (True) t
because, for example, applying the fold function to the node 19 ends up all (<19) (bstList True) && all (>19) (bstList True).
Your problem seems to be that you lose information because your function only returns a boolean when it examines the left and right subtrees. So change it to also return the minimum and maximum values of the subtrees. (This is probably more efficient as well, since you don't need to used bslist to check all elements anymore)
And make a wrapper function to ignore these "auxiliary" values after you are done, of course.
(Please don't put typeclass constraints on the data type.)
A BST is valid iff an in-order traversal is monotonically increasing.
flatten tree = fold (\a l r -> l . (a:) . r) id tree []
ordered list#(_:rest) = and $ zipWith (<) list rest
ordered _ = True
isBST = ordered . flatten
A nice way of encoding this is to lean on the traversal provided by Data.Foldable.
{-# LANGUAGE DeriveFunctor, DeriveFoldable #-}
import Data.Foldable
import Data.Monoid
We can derive an instance of it automatically using an extension, but we need to reorder the fields of the Node constructor to provide us an in-order traversal.
While we're at it, we should eliminate the constraints on the data type itself. They actually provide no benefit, and has been removed from the language as of Haskell 2011. (When you want to use such constraints you should put them on instances of classes, not on the data type.)
data BST a
= Void
| Node
{ left :: BST a
, val :: a
, right :: BST a
} deriving (Eq, Ord, Read, Show, Foldable)
First we define what it means for a list to be strictly sorted.
sorted :: Ord a => [a] -> Bool
sorted [] = True
sorted [x] = True
sorted (x:xs) = x < head xs && sorted xs
-- head is safe because of the preceeding match.
Then we can use the toList method provided by Data.Foldable and the above helper.
isBST :: Ord a => BST a -> Bool
isBST = sorted . toList
We can also implement this more directly, like you asked. Since we removed the spurious constraints on the data type, we can simplify the definition of your fold.
cata :: (b -> a -> b -> b) -> b -> BST a -> b
cata _ z Void = z
cata f z (Node l x r) = f (cata f z l) x (cata f z r)
Now we need a data type to model the result of our catamorphism, which is that we either have no nodes (Z), or a range of strictly increasing nodes (T) or have failed (X)
data T a = Z | T a a | X deriving Eq
And we can then implement isBST directly
isBST' :: Ord a => BST a -> Bool
isBST' b = cata phi Z b /= X where
phi X _ _ = X
phi _ _ X = X
phi Z a Z = T a a
phi Z a (T b c) = if a < b then T a c else X
phi (T a b) c Z = if b < c then T a c else X
phi (T a b) c (T d e) = if b < c && c < d then T a e else X
This is a bit tedious, so perhaps it would be better to decompose the way we compose the interim states a bit:
cons :: Ord a => a -> T a -> T a
cons _ X = X
cons a Z = T a a
cons a (T b c) = if a < b then T a c else X
instance Ord a => Monoid (T a) where
mempty = Z
Z `mappend` a = a
a `mappend` Z = a
X `mappend` _ = X
_ `mappend` X = X
T a b `mappend` T c d = if b < c then T a d else X
isBST'' :: Ord a => BST a -> Bool
isBST'' b = cata phi Z b /= X where
phi l a r = l `mappend` cons a r
Personally, I'd probably just use the Foldable instance.
If you don't insist on using a fold you can do it like this:
ord Void = True
ord (Node v l r) = every (< v) l && every (> v) r && ord l && ord r where
every p Void = True
every p (Node v l r) = p v && every p l && every p r

Resources