How can I check if a BST is valid? - haskell

How can I check if a BST is a valid one, given its definition and using a generalized version of fold for BST?
data(Ord a, Show a, Read a) => BST a = Void | Node {
val :: a,
left, right :: BST a
} deriving (Eq, Ord, Read, Show)
fold :: (Read a, Show a, Ord a) => (a -> b -> b -> b) -> b -> BST a -> b
fold _ z Void = z
fold f z (Node x l r) = f x (fold f z l) (fold f z r)
The idea is to check that a node value is greater then all values in left-subtree and smaller than all values in its right-subtree. This must be True for all nodes in the tree. A function bstList simply output the list of (ordered) values in the BST.
Of course something like this won't work:
--isBST :: (Read a, Show a, Ord a) => BST a -> Bool
isBST t = fold (\x l r -> all (<x) (bstList l) && all (>x) (bstList r)) (True) t
because, for example, applying the fold function to the node 19 ends up all (<19) (bstList True) && all (>19) (bstList True).

Your problem seems to be that you lose information because your function only returns a boolean when it examines the left and right subtrees. So change it to also return the minimum and maximum values of the subtrees. (This is probably more efficient as well, since you don't need to used bslist to check all elements anymore)
And make a wrapper function to ignore these "auxiliary" values after you are done, of course.

(Please don't put typeclass constraints on the data type.)
A BST is valid iff an in-order traversal is monotonically increasing.
flatten tree = fold (\a l r -> l . (a:) . r) id tree []
ordered list#(_:rest) = and $ zipWith (<) list rest
ordered _ = True
isBST = ordered . flatten

A nice way of encoding this is to lean on the traversal provided by Data.Foldable.
{-# LANGUAGE DeriveFunctor, DeriveFoldable #-}
import Data.Foldable
import Data.Monoid
We can derive an instance of it automatically using an extension, but we need to reorder the fields of the Node constructor to provide us an in-order traversal.
While we're at it, we should eliminate the constraints on the data type itself. They actually provide no benefit, and has been removed from the language as of Haskell 2011. (When you want to use such constraints you should put them on instances of classes, not on the data type.)
data BST a
= Void
| Node
{ left :: BST a
, val :: a
, right :: BST a
} deriving (Eq, Ord, Read, Show, Foldable)
First we define what it means for a list to be strictly sorted.
sorted :: Ord a => [a] -> Bool
sorted [] = True
sorted [x] = True
sorted (x:xs) = x < head xs && sorted xs
-- head is safe because of the preceeding match.
Then we can use the toList method provided by Data.Foldable and the above helper.
isBST :: Ord a => BST a -> Bool
isBST = sorted . toList
We can also implement this more directly, like you asked. Since we removed the spurious constraints on the data type, we can simplify the definition of your fold.
cata :: (b -> a -> b -> b) -> b -> BST a -> b
cata _ z Void = z
cata f z (Node l x r) = f (cata f z l) x (cata f z r)
Now we need a data type to model the result of our catamorphism, which is that we either have no nodes (Z), or a range of strictly increasing nodes (T) or have failed (X)
data T a = Z | T a a | X deriving Eq
And we can then implement isBST directly
isBST' :: Ord a => BST a -> Bool
isBST' b = cata phi Z b /= X where
phi X _ _ = X
phi _ _ X = X
phi Z a Z = T a a
phi Z a (T b c) = if a < b then T a c else X
phi (T a b) c Z = if b < c then T a c else X
phi (T a b) c (T d e) = if b < c && c < d then T a e else X
This is a bit tedious, so perhaps it would be better to decompose the way we compose the interim states a bit:
cons :: Ord a => a -> T a -> T a
cons _ X = X
cons a Z = T a a
cons a (T b c) = if a < b then T a c else X
instance Ord a => Monoid (T a) where
mempty = Z
Z `mappend` a = a
a `mappend` Z = a
X `mappend` _ = X
_ `mappend` X = X
T a b `mappend` T c d = if b < c then T a d else X
isBST'' :: Ord a => BST a -> Bool
isBST'' b = cata phi Z b /= X where
phi l a r = l `mappend` cons a r
Personally, I'd probably just use the Foldable instance.

If you don't insist on using a fold you can do it like this:
ord Void = True
ord (Node v l r) = every (< v) l && every (> v) r && ord l && ord r where
every p Void = True
every p (Node v l r) = p v && every p l && every p r

Related

Selectively recurse into left or right subtree of a binary tree using a catamorphism (or any recursion scheme)

I'm trying to implement a binary search tree (or set) using fixed points of functors. I've defined my fixed point as follows:
newtype Fix f = In (f (Fix f))
out :: Fix f -> f (Fix f)
out (In f) = f
-- Catamorphism
type Algebra f a = f a -> a
cata :: (Functor f) => Algebra f a -> Fix f -> a
cata f = f . fmap (cata f) . out
To make the binary tree, I'm using a red-black tree like so:
data NodeColor = Red | Black deriving (Eq, Show)
data RedBlackTreeF a r = EmptyRedBlackTreeF | RedBlackTreeNodeF NodeColor r a r deriving (Eq, Show)
instance Functor (RedBlackTreeF a) where
fmap _ EmptyRedBlackTreeF = EmptyRedBlackTreeF
fmap f (RedBlackTreeNodeF color r1 a r2) =
RedBlackTreeNodeF color (f r1) a (f r2)
type RedBlackTreeF' a = Fix (RedBlackTreeF a)
The traditional benefit of a binary tree is being able to cut down search time by choosing whether to search further in the left or right subtree like so (in psuedocode):
fun member (x, E) = false
| member (x, T (_, a, y, b)) =
if x < y then member (x, a)
else if x > y then member (x, b)
else true
The function member will go left if the element that is being searched for is less than the current element and right if the opposite is true. It therefore improves search time to O(logn).
However, in a recursion scheme, the algebra is recursively applied to the entire data structure. I've written an member algebra here:
memberPAlg :: Ord a => a -> RedBlackTreeF a Bool -> Bool
memberPAlg _ EmptyRedBlackTreeF = False
memberPAlg elem (RedBlackTreeNodeF _ left cur right) =
(elem == cur) || (left || right)
But it seems to be O(nlogn) rather than O(logn). Is there any way to selectively recurse using a recursion scheme to save time complexity? Am I thinking about this the wrong way?
Because of laziness, left and right are evaluated only if you ask for them. So, just compare the current node with the value being searched for to decide which subtree to go into:
memberPAlg :: Ord a => a -> RedBlackTreeF a Bool -> Bool
memberPAlg _ EmptyRedBlackTreeF = False
memberPAlg elem (RedBlackTreeNodeF _ left cur right) =
case compare elem cur of
EQ -> True
LT -> left
GT -> right

Map-like container with intervals as keys and zip-like combining operation

I'm looking for a Haskell container type like Data.Map that uses intervals as keys, where the left-most and right-most keys may also be unbounded intervals, but are otherwise non-overlapping. Additionally, the container should support a function similar to zipWith that allows to merge two containers into a new one, using the intersection of both key sets as the new key set and the argument function for a pointwise combination of both value sets.
There already are several packages that provide interval-based maps. I've had a look at IntervalMap, fingertree and SegmentTree, but none of these packages seem to provide the desired combination function. They all seem to use intervals for the intersection functions, that are equal in both maps, while I need a version that breaks intervals down into smaller ones if necessary.
The container should basically provide an efficient and storable mapping for key/value series of the form Ord k => k -> Maybe a, i.e. functions only defined on specific intervals or having larger intervals mapping to the same value.
Here is a small example to demonstrate the issue:
... -4 -3 -2 -1 0 1 2 3 4 ... -- key set
-----------------------------------
... -1 -1 -1 -1 0 1 1 1 1 ... -- series corresponding to signum
... 5 5 5 5 5 5 5 5 5 ... -- series corresponding to const 5
The first series could be efficiently expressed by a mapping [-infinity, -1] -> -1; [0, 0] -> 0; [1, infinity] -> 1 and the second one by [-infinity, infinity] -> 5. Now applying a combination function with (*) as arument function should give a new series
... -4 -3 -2 -1 0 1 2 3 4 ... -- key set
-----------------------------------
... -5 -5 -5 -5 0 5 5 5 5 ... -- combined series
The crucial point here—and all of the afore-mentioned packages don't seem to be able to do that—is that, when combining the key sets for these two series, you have to take the different values also into account. Both series span the full range of [-infinity, infinity] but it's necessary to break it into three parts for the final series.
There are also packages for working with intervals, e.g. the range package, which also provides an intersection operation on lists of intervals. However, I didn't found a way to use that in combination with one of the Map variants because it collapses adjacents intervals when doing calculations with them.
NB: Such a container is somewhat similar to a ZipList that extends to both sides, which is why I think it should also be possible to define a lawful Applicative instance for it, where <*> corresponds to the above-mentioned combining function.
To cut a long story short, is there already a package that provides such a container? Or is there an easy way to use the existing packages to build one?
The best suggestion from the comments above seems to be the step-function package, as suggested by B. Mehta. I haven't tried that package yet, but it looks like building a wrapper around that SF type is what I was looking for.
Meanwhile, I implemented another solution which I'd like to share. The code for the combining function (combineAscListWith in the code below) is a bit clumsy as it's more general than for just getting the intersection of both maps, so I'll sketch the idea:
First we need an Interval type with an Ord instance which stores pairs of Val a values which can either be -infinity, some value x or +infinity. Form that we can build an IntervalMap which is just a normal Map that maps these intervals to the final values.
When combining two such IntervalMaps by intersection, we first convert the maps into lists of key/value pairs. Next we traverse both lists in parallel to zip both lists into another one which corresponds to the final intersection map. There are two main cases when combining the list elements:
Both left-most intervals start at the same value. In that case we found an interval that actually overlaps/intersects. We clip the longer interval to the shorter one, and use the values associated with the two intervals to get the result value, which now—together with the shorter interval—goes into the result list. The rest of the longer interval goes back to the input lists.
One of the intervals starts at a smaller value than the other, which means we found a part of the two series that do not overlap. So for the intersection, all of the non-overlapping part of the interval (or even the whole interval) can be discared. The rest (if any) goes back to the input list.
For completeness, here's the full example code. Again, the code is rather clumsy; a step-function-based implementation would certainly be more elegant.
import Control.Applicative
import Data.List
import qualified Data.Map as Map
data Val a = NegInf | Val a | Inf deriving (Show, Read, Eq, Ord)
instance Enum a => Enum (Val a) where
succ v = case v of
NegInf -> NegInf
Val x -> Val $ succ x
Inf -> Inf
pred v = case v of
NegInf -> NegInf
Val x -> Val $ pred x
Inf -> Inf
toEnum = Val . toEnum
fromEnum (Val x) = fromEnum x
data Interval a = Interval { lowerBound :: Val a, upperBound :: Val a } deriving (Show, Read, Eq)
instance Ord a => Ord (Interval a) where
compare ia ib = let (a, a') = (lowerBound ia, upperBound ia)
(b, b') = (lowerBound ib, upperBound ib)
in case () of
_ | a' < b -> LT
_ | b' < a -> GT
_ | a == b && a' == b' -> EQ
_ -> error "Ord.Interval.compare: undefined for overlapping intervals"
newtype IntervalMap i a = IntervalMap { unIntervalMap :: Map.Map (Interval i) a }
deriving (Show, Read)
instance Functor (IntervalMap i) where
fmap f = IntervalMap . fmap f . unIntervalMap
instance (Ord i, Enum i) => Applicative (IntervalMap i) where
pure = IntervalMap . Map.singleton (Interval NegInf Inf)
(<*>) = intersectionWith ($)
intersectionWith :: (Ord i, Enum i) => (a -> b -> c)
-> IntervalMap i a -> IntervalMap i b -> IntervalMap i c
intersectionWith f = combineWith (liftA2 f)
combineWith :: (Ord i, Enum i) => (Maybe a -> Maybe b -> Maybe c)
-> IntervalMap i a -> IntervalMap i b -> IntervalMap i c
combineWith f (IntervalMap mpA) (IntervalMap mpB) =
let cs = combineAscListWith f (Map.toAscList mpA) (Map.toAscList mpB)
in IntervalMap $ Map.fromList [ (i, v) | (i, Just v) <- cs ]
combineAscListWith :: (Ord i, Enum i) => (Maybe a -> Maybe b -> c)
-> [(Interval i, a)] -> [(Interval i, b)] -> [(Interval i, c)]
combineAscListWith f as bs = case (as, bs) of
([], _) -> map (\(i, v) -> (i, f Nothing (Just v))) bs
(_, []) -> map (\(i, v) -> (i, f (Just v) Nothing)) as
((Interval a a', va) : as', (Interval b b', vb) : bs')
| a == b -> case () of
_ | a' == b' -> (Interval a a', f (Just va) (Just vb)) : combineAscListWith f as' bs'
_ | a' < b' -> (Interval a a', f (Just va) (Just vb)) : combineAscListWith f as' ((Interval (succ a') b', vb) : bs')
_ | a' > b' -> (Interval a b', f (Just va) (Just vb)) : combineAscListWith f ((Interval (succ b') a', va) : as') bs'
| a < b -> case () of
_ | a' < b -> ((Interval a a', f (Just va) Nothing)) :
(if succ a' == b then id else ((Interval (succ a') (pred b), f Nothing Nothing) :)) (combineAscListWith f as' bs)
_ | True -> (Interval a (pred b), f (Just va) Nothing) : combineAscListWith f ((Interval b a', va) : as') bs
| a > b -> case () of
_ | b' < a -> ((Interval b b', f Nothing (Just vb))) :
(if succ b' == a then id else ((Interval (succ b') (pred a), f Nothing Nothing) :)) (combineAscListWith f as bs')
_ | True -> (Interval b (pred a), f Nothing (Just vb)) : combineAscListWith f as ((Interval a b', vb) : bs')
showIntervalMap :: (Show i, Show a, Eq i) => IntervalMap i a -> String
showIntervalMap = intercalate "; " . map (\(i, v) -> showInterval i ++ " -> " ++ show v)
. Map.toAscList . unIntervalMap
where
showInterval (Interval (Val a) (Val b)) | a == b = "[" ++ show a ++ "]"
showInterval (Interval a b) = "[" ++ showVal a ++ " .. " ++ showVal b ++ "]"
showVal NegInf = "-inf"
showVal (Val x) = show x
showVal Inf = "inf"
main :: IO ()
main = do
let signumMap = IntervalMap $ Map.fromList [(Interval NegInf (Val $ -1), -1),
(Interval (Val 0) (Val 0), 0), (Interval (Val 1) Inf, 1)]
putStrLn $ showIntervalMap $ (*) <$> signumMap <*> pure 5

Find tree height using folde function in Haskell

One of the assignments I am working on leading up to exams had me create
data Exp = T | F | And Exp Exp | Or Exp Exp | Not Exp deriving (Eq, Show, Ord, Read)
Then it asked to make
folde :: a -> a -> (a -> a -> a) -> (a -> a -> a) -> (a -> a) -> Exp -> a
This is what I came up with
folde :: a -> a -> (a -> a -> a) -> (a -> a -> a) -> (a -> a) -> Exp -> a
folde t f a o n T = t
folde t f a o n F = f
folde t f a o n (And x y) = a (folde t f a o n x) (folde t f a o n y)
folde t f a o n (Or x y) = o (folde t f a o n x) (folde t f a o n y)
folde t f a o n (Not x) = n (folde t f a o n x)
The assignment asks for evb, evi and evh.
They are all supposed to work with one single call to folde using correct parameters.
Evb evaluates boolean expressions.
evb :: Exp -> Bool
evb = folde True False (&&) (||) not
Evi evaluates to an integer, treating T as Int 1, F as Int 5, And as +, Or as * and Not as negate.
evi :: Exp -> Int
evi = folde 1 5 (+) (*) negate
So far so good, it all works. I'll be happy for any feedback on this as well.
However, I can't seem to understand how to solve the evh.
evh is supposed to calculate the heigh of the tree.
It should be evh :: Exp -> Int
The assignment says it should treat T and F as height 1.
It goes on that Not x should evaluate to height x + 1. And and Or has the height of its tallest subtree + 1.
I can't seem to figure out what I should pass on to my folde function
The assignment says it should treat T and F as height 1. It goes on that Not x should evaluate to height x + 1. And and Or has the height of its tallest subtree + 1.
You can write this pretty directly with explicit recursion:
height T = 1
height F = 1
height (Not x) = height x + 1
height (And x y) = max (height x) (height y) + 1
height (Or x y) = max (height x) (height y) + 1
Now, how do you write this with folde? The key thing about recursive folding is that folde gives each of your functions the result of folding all the subtrees. When you folde on And l r, it folds both subtrees first, and then passes those results into the argument to folde. So, instead of you manually calling height x, folde is going to calculate that for you and pass it as an argument, so your own work ends up something like \x y -> max x y + 1. Essentially, split height into 5 definitions, one per constructor, and instead of destructuring and recursing down subtrees, take the heights of the subtrees as arguments:
heightT = 1 -- height T = 1
heightF = 1 -- height F = 1
heightN x = x + 1 -- height (Not x) = height x + 1
heightA l r = max l r + 1 -- height (And l r) = max (height l) (height r) + 1
heightO l r = max l r + 1 -- height (Or l r) = max (height l) (height r) + 1
Feed them to folde, and simplify
height = folde 1 1 -- T F
ao -- And
ao -- Or
(+1) -- Not
where ao x y = max x y + 1
And now for something new! Take this definition:
data ExpF a = T | F | Not a | And a a | Or a a
deriving (Functor, Foldable, Traversable)
This looks like your Exp, except instead of recursion it's got a type parameter and a bunch of holes for values of that type. Now, take a look at the types of expressions under ExpF:
T :: forall a. ExpF a
Not F :: forall a. ExpF (ExpF a)
And F (Not T) :: forall a. ExpF (ExpF (ExpF a))
If you set a = ExpF (ExpF (ExpF (ExpF (ExpF ...)))) (on to infinity) in each of the above, you find that they can all be made to have the same type:
T :: ExpF (ExpF (ExpF ...))
Not F :: ExpF (ExpF (ExpF ...))
And F (Not T) :: ExpF (ExpF (ExpF ...))
Infinity is fun! We can encode this infinitely recursive type with Fix
newtype Fix f = Fix { unFix :: f (Fix f) }
-- Compare
-- Type level: Fix f = f (Fix f)
-- Value level: fix f = f (fix f)
-- Fix ExpF = ExpF (ExpF (ExpF ...))
-- fix (1:) = 1:( 1:( 1: ...))
-- Recover original Exp
type Exp = Fix ExpF
-- Sprinkle Fix everywhere to make it work
Fix T :: Exp
Fix $ And (Fix T) (Fix $ Not $ Fix F) :: Exp
-- can also use pattern synonyms
pattern T' = Fix T
pattern F' = Fix F
pattern Not' t = Fix (Not t)
pattern And' l r = Fix (And l r)
pattern Or' l r = Fix (Or l r)
T' :: Exp
And' T' (Not' F') :: Exp
And now here's the nice part: one definition of fold to rule them all:
fold :: Functor f => (f a -> a) -> Fix f -> a
fold alg (Fix ffix) = alg $ fold alg <$> ffix
-- ffix :: f (Fix f)
-- fold alg :: Fix f -> a
-- fold alg <$> ffix :: f a
-- ^ Hey, remember when I said folds fold the subtrees first?
-- Here you can see it very literally
Here's a monomorphic height
height = fold $ \case -- LambdaCase extension: \case ... ~=> \fresh -> case fresh of ...
T -> 1
F -> 1
Not x -> x + 1
And x y -> max x y + 1
Or x y -> max x y + 1
And now a very polymorphic height (in your case it's off by one; oh well).
height = fold $ option 0 (+1) . fmap getMax . foldMap (Option . Just . Max)
height $ Fix T -- 0
height $ Fix $ And (Fix T) (Fix $ Not $ Fix F) -- 2
See the recursion-schemes package to learn these dark arts. It also makes this work for base types like [] with a type family, and removes the need to Fix everything with said trickery + some TH.

Is there any type-algebra function mapping an ADT to the set of elements of that ADT?

For some simple ADTs, we can obtain the ADT of sets of that ADT as data Set = Full Set (*repeated N times) | Empty Set(*repeated N times), where N is the number of non terminal constructors of that ADT. For a more solid example, take Nat:
data Nat = Succ Nat | Zero
We can obtain a type for sets of nats as follows:
data NatSet = Full NatSet | Empty NatSet
So, for example,
empty :: NatSet
empty = Empty empty
insert :: Nat -> NatSet -> NatSet
insert Zero (Empty natverse) = Full natverse
insert Zero (Full natverse) = Full natverse
insert (Succ nat) (Empty natverse) = Empty (insert nat natverse)
insert (Succ nat) (Full natverse) = Full (insert nat natverse)
member :: Nat -> NatSet -> Bool
member Zero (Full natverse) = True
member Zero (Empty natverse) = False
member (Succ nat) (Empty natverse) = member nat natverse
member (Succ nat) (Full natverse) = member nat natverse
main = do
let set = foldr insert empty [Zero, Succ (Succ Zero), Succ (Succ (Succ (Succ (Succ Zero))))]
print $ member Zero set
print $ member (Succ Zero) set
print $ member (Succ (Succ Zero)) set
print $ member (Succ (Succ (Succ Zero))) set
print $ member (Succ (Succ (Succ (Succ Zero)))) set
print $ member (Succ (Succ (Succ (Succ (Succ Zero))))) set
For other types, it is equally simple:
data Bin = A Bin | B Bin | C
data BinSet = Empty BinSet BinSet | Full BinSet BinSet
But what about types that branch? It isn't obvious to me:
data Tree = Branch Tree Tree | Tip
data TreeSet = ???
Is there any type-algebraic argument that maps ADTs to the ADTs of sets of that type?
Let's take another look at your set type.
data NatSet = Full NatSet | Empty NatSet
There's always another NatSet inside this one; the two branches of the sum type indicate whether the current Nat is present in the set. This is a structural representation of sets. As you observed, the structure of the set depends on the structure of the values.
It's equivalent to a stream of Booleans: we test for membership by indexing into the stream.
data NatSet = NatSet Bool NatSet
empty = NatSet False empty
insert Z (NatSet _ bs) = NatSet True bs
insert (S n) (NatSet b bs) = NatSet b (insert n bs)
(NatSet b _) `contains` Z = b
(NatSet _ bs) `contains` (S n) = bs `contains` n
Based on the insight that set membership is like indexing into a collection of Booleans, let's pursue a generic implementation of sets of values of types formed as the fixed point of a polynomial functor (what's the problem?).
First things first. As you observed, the set's structure depends on the type of the things inside it. Here's the class of things which can be elements of a set,
class El a where
data Set a
empty :: Set a
full :: Set a
insert :: a -> Set a -> Set a
remove :: a -> Set a -> Set a
contains :: Set a -> a -> Bool
For a first example, I'll modify NatSet from above to suit this format.
instance El Nat where
data Set Nat = NatSet Bool (Set Nat)
empty = NatSet False empty
full = NatSet True empty
insert Z (NatSet _ bs) = NatSet True bs
insert (S n) (NatSet b bs) = NatSet b (insert n bs)
remove Z (NatSet _ bs) = NatSet False bs
remove (S n) (NatSet b bs) = NatSet b (remove n bs)
(NatSet b _) `contains` Z = b
(NatSet _ bs) `contains` (S n) = bs `contains` n
Another simple El instance which we'll need later is (). A set of ()s is either full or empty, with nothing in between.
instance El () where
newtype Set () = UnitSet Bool
empty = UnitSet False
full = UnitSet True
insert () _ = UnitSet True
remove () _ = UnitSet False
(UnitSet b) `contains` () = b
The fixed point of a functor f,
newtype Fix f = Fix (f (Fix f))
in an instance of El whenever f is an instance of the following El1 class.
class El1 f where
data Set1 f a
empty1 :: El a => Set1 f (Set a)
full1 :: El a => Set1 f (Set a)
insert1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
remove1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
contains1 :: El a => Set1 f (Set a) -> f a -> Bool
instance El1 f => El (Fix f) where
newtype Set (Fix f) = FixSet (Set1 f (Set (Fix f)))
empty = FixSet empty1
full = FixSet full1
insert (Fix x) (FixSet s) = FixSet (insert1 x s)
remove (Fix x) (FixSet s) = FixSet (remove1 x s)
(FixSet s) `contains` (Fix x) = s `contains1` x
As usual we'll be composing bits of functors into bigger functors, before taking the resulting functor's fixed point to produce a concrete type.
newtype I a = I a
newtype K c a = K c
data (f :* g) a = f a :* g a
data (f :+ g) a = L (f a) | R (g a)
type N = K ()
For example,
type Nat = Fix (N :+ N)
type Tree = Fix (N :+ (I :* I))
Let's make sets of these things.
I (for identity) is the location in the polynomial where Fix will plug in the recursive substructure. We can just delegate its implementation of El_ to Fix's.
instance El1 I where
newtype Set1 I a = ISet1 a
empty1 = ISet1 empty
full1 = ISet1 full
insert1 (I x) (ISet1 s) = ISet1 (insert x s)
remove1 (I x) (ISet1 s) = ISet1 (remove x s)
(ISet1 s) `contains1` (I x) = s `contains` x
The constant functor K c features no recursive substructure, but it does feature a value of type c. If c can be put in a set, K c can be too.
instance El c => El1 (K c) where
newtype Set1 (K c) a = KSet1 (Set c)
empty1 = KSet1 empty
full1 = KSet_ full
insert1 (K x) (KSet1 s) = KSet1 (insert x s)
remove1 (K x) (KSet1 s) = KSet1 (remove x s)
(KSet1 s) `contains1` (K x) = s `contains` x
Note that this definition makes Set1 N isomorphic to Bool.
For sums, let's use our intuition that testing for membership is like indexing. When you index into a tuple, you choose between the left-hand and right-hand members of the tuple.
instance (El1 f, El1 g) => El1 (f :+ g) where
data Set1 (f :+ g) a = SumSet1 (Set1 f a) (Set1 g a)
empty1 = SumSet1 empty1 empty1
full1 = SumSet1 full1 full1
insert1 (L x) (SumSet1 l r) = SumSet1 (insert1 x l) r
insert1 (R y) (SumSet1 l r) = SumSet1 l (insert1 y r)
remove1 (L x) (SumSet1 l r) = SumSet1 (remove1 x l) r
remove1 (R y) (SumSet1 l r) = SumSet1 l (remove1 y r)
(SumSet1 l r) `contains1` (L x) = l `contains1` x
(SumSet1 l r) `contains1` (R y) = r `contains1` y
This is now enough to do sets of natural numbers. With appropriate instances of Show, you can do this:
ghci> let z = Fix (L (K ()))
ghci> let s n = Fix (R (I n))
ghci> insert (s z) empty `contains` z
False
ghci> insert (s z) empty `contains` (s z)
True
ghci> empty :: Set (Fix (N :+ I))
FixSet (SumSet1 (KSet1 (UnitSet False)) (ISet1 (FixSet ( -- to infinity and beyond
I still haven't answered your original question, which was how should this work for product types? I can think of a few strategies, but none of them actually work.
We could make Set1 (f :* g) a a sum type. This has a pleasing symmetry: sets of sums are products, and sets of products are sums. In the context of the indexing idea, this is like saying "in order to get a Bool out of the set, you have to give an index to handle each of the two possible cases", rather like Either a b's elimination rule (a -> c) -> (b -> c) -> Either a b -> c. However, you get stuck when you try to come up with meaningful values for empty1 and full1:
instance (El1 f, El1 g) => El1 (f :* g) where
data Set1 (f :* g) a = LSet1 (Set1 f a) | RSet1 (Set1 g a)
insert1 (l :* r) (LSet1 x) = LSet1 (insert1 l x)
insert1 (l :* r) (RSet1 y) = RSet1 (insert1 r y)
remove1 (l :* r) (LSet1 x) = LSet1 (remove1 l x)
remove1 (l :* r) (RSet1 y) = RSet1 (remove1 r y)
(LSet1 x) `contains1` (l :* r) = x `contains1` l
(RSet1 y) `contains1` (l :* r) = y `contains1` r
empty1 = _
full1 = _
You could try adding hacky Empty and Full constructors to Set1 (f :* g) but then you'd struggle to implement insert1 and remove1.
You could interpret a set of a product type as a pair of the sets of the two halves of the product. An item is in the set if both of its halves are in their respective sets. Like a sort of generalised intersection.
instance (El1 f, El1 g) => El1 (f :* g) where
data Set1 (f :* g) a = ProdSet1 (Set1 f a) (Set1 g a)
insert1 (l :* r) (ProdSet1 s t) = ProdSet1 (insert1 l s) (insert1 r t)
remove1 (l :* r) (ProdSet1 s t) = ProdSet1 (remove1 l s) (remove1 r t)
(ProdSet1 s t) `contains1` (l :* r) = (s `contains1` l) && (t `contains1` r)
empty1 = ProdSet1 empty1 empty1
full1 = ProdSet1 full1 full1
But this implementation doesn't work correctly. Observe:
ghci> let tip = Fix (L (K ()))
ghci> let fork l r = Fix (R (I l :* I r))
ghci> let s1 = insert (fork tip tip) empty
ghci> let s2 = remove (fork tip (fork tip tip)) s1
ghci> s2 `contains` (fork tip tip)
False
Removing fork tip (fork tip tip) also removed fork tip tip. tip got removed from the left-hand half of the set, which meant any tree whose left-hand branch is tip got removed with it. We removed more items than we intended. (However, if you don't need a remove operation on your sets, this implementation works - though that's just another disappointing asymmetry.) You could implement contains with || instead of && but then you'll end up inserting more items than you intended.
Finally, I also thought about treating a set of products as a set of sets.
data Set1 (f :* g) a = ProdSet1 (Set1 f (Set1 g a))
This doesn't work - try implementing any of El1's methods and you'll get stuck right away.
So at the end of the day your intuition was right. Product types are the problem. There's no meaningful structural representation of sets for product types.
Of course, this is all academic really. Fixed points of polynomial functors have a well-defined notion of equality and ordering, so you can do what everyone else does and use Data.Set.Set to represent sets, even for product types. It won't have such good asymptotics, though: equality and ordering tests on such values are O(n) (where n is the size of the value), making membership operations on such sets O(n log m) (where m is the size of the set) because the set itself is represented as a balanced tree. Contrast with our generic structural sets, where membership operations are O(n).

Indexing into containers: the mathematical underpinnings

When you want to pull an element out of a data structure, you have to give its index. But the meaning of index depends on the data structure itself.
class Indexed f where
type Ix f
(!) :: f a -> Ix f -> Maybe a -- indices can be out of bounds
For example...
Elements in a list have numeric positions.
data Nat = Z | S Nat
instance Indexed [] where
type Ix [] = Nat
[] ! _ = Nothing
(x:_) ! Z = Just x
(_:xs) ! (S n) = xs ! n
Elements in a binary tree are identified by a sequence of directions.
data Tree a = Leaf | Node (Tree a) a (Tree a)
data TreeIx = Stop | GoL TreeIx | GoR TreeIx -- equivalently [Bool]
instance Indexed Tree where
type Ix Tree = TreeIx
Leaf ! _ = Nothing
Node l x r ! Stop = Just x
Node l x r ! GoL i = l ! i
Node l x r ! GoR j = r ! j
Looking for something in a rose tree entails stepping down the levels one at a time by selecting a tree from the forest at each level.
data Rose a = Rose a [Rose a] -- I don't even like rosé
data RoseIx = Top | Down Nat RoseIx -- equivalently [Nat]
instance Indexed Rose where
type Ix Rose = RoseIx
Rose x ts ! Top = Just x
Rose x ts ! Down i j = ts ! i >>= (! j)
It seems that the index of a product type is a sum (telling you which arm of the product to look at), the index of an element is the unit type, and the index of a nested type is a product (telling you where to look in the nested type). Sums seem to be the only one which aren't somehow linked to the derivative. The index of a sum is also a sum - it tells you which part of the sum the user is hoping to find, and if that expectation is violated you're left with a handful of Nothing.
In fact I had some success implementing ! generically for functors defined as the fixed point of a polynomial bifunctor. I won't go into detail, but Fix f can be made an instance of Indexed when f is an instance of Indexed2...
class Indexed2 f where
type IxA f
type IxB f
ixA :: f a b -> IxA f -> Maybe a
ixB :: f a b -> IxB f -> Maybe b
... and it turns out you can define an instance of Indexed2 for each of the bifunctor building blocks.
But what's really going on? What is the underlying relationship between a functor and its index? How does it relate to the functor's derivative? Does one need to understand the theory of containers (which I don't, really) to answer this question?
It seems like the index into the type is an index into the set of constructors, following by an index into the product representing that constructor. This can be implemented quite naturally with e.g. generics-sop.
First you need a datatype to represent possible indices into a single element of the product. This could be an index pointing to an element of type a,
or an index pointing to something of type g b - which requires an index pointing into g and an index pointing to an element of type a in b. This is encoded with the following type:
import Generics.SOP
data ArgIx f x x' where
Here :: ArgIx f x x
There :: (Generic (g x')) => Ix g -> ArgIx f x x' -> ArgIx f x (g x')
newtype Ix f = ...
The index itself is just a sum (implemented by NS for n-ary sum) of sums over the generic representation of the type (choice of constructor, choice of constructor element):
newtype Ix f = MkIx (forall x . NS (NS (ArgIx f x)) (Code (f x)))
You can write smart constructors for various indices:
listIx :: Natural -> Ix []
listIx 0 = MkIx $ S $ Z $ Z Here
listIx k = MkIx $ S $ Z $ S $ Z $ There (listIx (k-1)) Here
treeIx :: [Bool] -> Ix Tree
treeIx [] = MkIx $ S $ Z $ S $ Z Here
treeIx (b:bs) =
case b of
True -> MkIx $ S $ Z $ Z $ There (treeIx bs) Here
False -> MkIx $ S $ Z $ S $ S $ Z $ There (treeIx bs) Here
roseIx :: [Natural] -> Ix Rose
roseIx [] = MkIx $ Z $ Z Here
roseIx (k:ks) = MkIx $ Z $ S $ Z $ There (listIx k) (There (roseIx ks) Here)
Note that e.g. in the list case, you cannot construct an (non-bottom) index pointing to the [] constructor - likewise for Tree and Empty, or constructors containing values whose type is not a or something containing some values of type a. The quantification in MkIx prevents the construction bad things like an index pointing to the first Int in data X x = X Int x where x is instantiated to Int.
The implementation of the index function is fairly straightforward, even if the types are scary:
(!) :: (Generic (f x)) => f x -> Ix f -> Maybe x
(!) arg (MkIx ix) = go (unSOP $ from arg) ix where
atIx :: a -> ArgIx f x a -> Maybe x
atIx a Here = Just a
atIx a (There ix0 ix1) = a ! ix0 >>= flip atIx ix1
go :: (All SListI xss) => NS (NP I) xss -> NS (NS (ArgIx f x)) xss -> Maybe x
go (Z a) (Z b) = hcollapse $ hzipWith (\(I x) -> K . atIx x) a b
go (S x) (S x') = go x x'
go Z{} S{} = Nothing
go S{} Z{} = Nothing
The go function compares the constructor pointed to by the index and the actual constructor used by the type. If the constructors don't match, the indexing returns Nothing. If they do, the actual indexing is done - which is trivial in the case that the index points exactly Here, and in the case of some substructure, both indexing operations must succeed one after the other, which is handled by >>=.
And a simple test:
>map (("hello" !) . listIx) [0..5]
[Just 'h',Just 'e',Just 'l',Just 'l',Just 'o',Nothing]

Resources