I'm trying to implement a kind of zipper for length-indexed lists which would return each item of the list paired with a list where that element is removed. E.g. for ordinary lists:
zipper :: [a] -> [(a, [a])]
zipper = go [] where
go _ [] = []
go prev (x:xs) = (x, prev ++ xs) : go (prev ++ [x]) xs
So that
> zipper [1..5]
[(1,[2,3,4,5]), (2,[1,3,4,5]), (3,[1,2,4,5]), (4,[1,2,3,5]), (5,[1,2,3,4])]
My current attempt at implementing the same thing for length-indexed lists:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}
data Nat = Zero | Succ Nat
type One = Succ Zero
type family (+) (a :: Nat) (b :: Nat) :: Nat
type instance (+) Zero n = n
type instance (+) (Succ n) m = Succ (n + m)
data List :: Nat -> * -> * where
Nil :: List Zero a
Cons :: a -> List size a -> List (Succ size) a
single :: a -> List One a
single a = Cons a Nil
cat :: List a i -> List b i -> List (a + b) i
cat Nil ys = ys
cat (Cons x xs) ys = Cons x (xs `cat` ys)
zipper :: List (Succ n) a -> List (Succ n) (a, List n a)
zipper = go Nil where
go :: (p + Zero) ~ p
=> List p a -> List (Succ q) a -> List (Succ q) (a, List (p + q) a)
go prev (Cons x Nil) = single (x, prev)
go prev (Cons x xs) = (x, prev `cat` xs) `Cons` go (prev `cat` single x) xs
This feels like it should be rather straightforward, but as there doesn't seem to be any way to convey to GHC that e.g. + is commutative and associative or that zero is the identity, I'm running into lots of problems where the type checker (understandably) complains that it cannot determine that a + b ~ b + a or that a + Zero ~ a.
Do I need to add some sort of proof objects (data Refl a b where Refl :: Refl a a et al.) or is there some way to make this work with just adding more explicit type signatures?
Alignment
Dependently typed programming is like doing two jigsaws which some rogue has glued together. Less metaphorically, we express simultaneous computations at the value level and at the type level, and we must ensure their compatibility. Of course, we are each our own rogue, so if we can arrange for the jigsaws to be glued in alignment, we shall have an easier time of it. When you see proof obligations for type repair, you might be tempted to ask
Do I need to add some sort of proof objects (data Refl a b where Refl :: Refl a a et al.) or is there some way to make this work with just adding more explicit type signatures?
But you might first consider in what way the value- and type-level computations are out of alignment, and whether there is any hope to bring them closer.
A Solution
The question here is how to compute the vector (length-indexed list) of selections from a vector. So we'd like something with type
List (Succ n) a -> List (Succ n) (a, List n a)
where the element in each input position gets decorated with the one-shorter vector of its siblings. The proposed method is to scan left-to-right, accumulating the elder siblings in a list which grows on the right, then concatenate with the younger siblings at each position. Growing lists on the right is always a worry, especially when the Succ for the length is aligned to the Cons on the left. The need for concatenation necessitates type-level addition, but the arithmetic resulting from right-ended activity is out of alignment with the computation rules for addition. I'll get back to this style in a bit, but let's try thinking it out again.
Before we get into any accumulator-based solution, let's just try bog standard structural recursion. We have the "one" case and the "more" case.
picks (Cons x xs#Nil) = Cons (x, xs) Nil
picks (Cons x xs#(Cons _ _)) = Cons (x, xs) (undefined (picks xs))
In both cases, we put the first decomposition at the front. In the second case, we have checked that the tail is nonempty, so we can ask for its selections. We have
x :: a
xs :: List (Succ n) a
picks xs :: List (Succ n) (a, List n a)
and we want
Cons (x, xs) (undefined (picks xs)) :: List (Succ (Succ n)) (a, List (Succ n) a)
undefined (picks xs) :: List (Succ n) (a, List (Succ n) a)
so the undefined needs to be a function which grows all the sibling lists by reattaching x at the left end (and left-endedness is good). So, I define the Functor instance for List n
instance Functor (List n) where
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap f xs)
and I curse the Prelude and
import Control.Arrow((***))
so that I can write
picks (Cons x xs#Nil) = Cons (x, xs) Nil
picks (Cons x xs#(Cons _ _)) = Cons (x, xs) (fmap (id *** Cons x) (picks xs))
which does the job with not a hint of addition, let alone a proof about it.
Variations
I got annoyed about doing the same thing in both lines, so I tried to wriggle out of it:
picks :: m ~ Succ n => List m a -> List m (a, List n a) -- DOESN'T TYPECHECK
picks Nil = Nil
picks (Cons x xs) = Cons (x, xs) (fmap (id *** (Cons x)) (picks xs))
But GHC solves the constraint aggressively and refuses to allow Nil as a pattern. And it's correct to do so: we really shouldn't be computing in a situation where we know statically that Zero ~ Succ n, as we can easily construct some segfaulting thing. The trouble is just that I put my constraint in a place with too global a scope.
Instead, I can declare a wrapper for the result type.
data Pick :: Nat -> * -> * where
Pick :: {unpick :: (a, List n a)} -> Pick (Succ n) a
The Succ n return index means the nonemptiness constraint is local to a Pick. A helper function does the left-end extension,
pCons :: a -> Pick n a -> Pick (Succ n) a
pCons b (Pick (a, as)) = Pick (a, Cons b as)
leaving us with
picks' :: List m a -> List m (Pick m a)
picks' Nil = Nil
picks' (Cons x xs) = Cons (Pick (x, xs)) (fmap (pCons x) (picks' xs))
and if we want
picks = fmap unpick . picks'
That's perhaps overkill, but it might be worth it if we want to separate older and younger siblings, splitting lists in three, like this:
data Pick3 :: Nat -> * -> * where
Pick3 :: List m a -> a -> List n a -> Pick3 (Succ (m + n)) a
pCons3 :: a -> Pick3 n a -> Pick3 (Succ n) a
pCons3 b (Pick3 bs x as) = Pick3 (Cons b bs) x as
picks3 :: List m a -> List m (Pick3 m a)
picks3 Nil = Nil
picks3 (Cons x xs) = Cons (Pick3 Nil x xs) (fmap (pCons3 x) (picks3 xs))
Again, all the action is left-ended, so we're fitting nicely with the computational behaviour of +.
Accumulating
If we want to keep the style of the original attempt, accumulating the elder siblings as we go, we could do worse than to keep them zipper-style, storing the closest element in the most accessible place. That is, we can store the elder siblings in reverse order, so that at each step we need only Cons, rather than concatenating. When we want to build the full sibling list in each place, we need to use reverse-concatenation (really, plugging a sublist into a list zipper). You can type revCat easily for vectors if you deploy the abacus-style addition:
type family (+/) (a :: Nat) (b :: Nat) :: Nat
type instance (+/) Zero n = n
type instance (+/) (Succ m) n = m +/ Succ n
That's the addition which is in alignment with the value-level computation in revCat, defined thus:
revCat :: List m a -> List n a -> List (m +/ n) a
revCat Nil ys = ys
revCat (Cons x xs) ys = revCat xs (Cons x ys)
We acquire a zipperized go version
picksr :: List (Succ n) a -> List (Succ n) (a, List n a)
picksr = go Nil where
go :: List p a -> List (Succ q) a -> List (Succ q) (a, List (p +/ q) a)
go p (Cons x xs#Nil) = Cons (x, revCat p xs) Nil
go p (Cons x xs#(Cons _ _)) = Cons (x, revCat p xs) (go (Cons x p) xs)
and nobody proved anything.
Conclusion
Leopold Kronecker should have said
God made the natural numbers to perplex us: all the rest is the work of man.
One Succ looks very like another, so it is very easy to write down expressions which give the size of things in a way which is out of alignment with their structure. Of course, we can and should (and are about to) equip GHC's constraint solver with improved kit for type-level numerical reasoning. But before that kicks in, it's worth just conspiring to align the Conses with the Succs.
Related
As part of my journey in understanding singletons I have tried to bridge the gap between compile time safety, and lifting runtime values into that dependent type safety.
I think though that a minimal example of "runtime" values is a function that takes an unbounded list, and converts it to a size-indexed vector. The following skeleton provides length-indexed vectors, but I can't quite determine how to write fromList.
I have considered making the function take a size parameter, but I suspect it's possible to keep that implicit.
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE UndecidableInstances #-}
import Data.Singletons
import Data.Singletons.TH
$(singletons
[d|
data Nat = Z | S Nat deriving (Show)
|])
data Vec a n where
Nil :: Vec a Z
Cons :: a -> Vec a n -> Vec a (S n)
instance Show a => Show (Vec a n) where
show Nil = "Nil"
show (Cons x xs) = show x ++ " :< " ++ show xs
fromListExplicit :: forall (n :: Nat) a. SNat n -> [a] -> Vec a n
fromListExplicit SZ _ = Nil
fromListExplicit (SS n) (x : xs) = Cons x (fromListExplicit n xs)
ex1 = fromListExplicit (SS (SS (SS SZ))) [1..99]
-- 1 :< 2 :< 3 :< Nil
fromListImplicit :: (?????) => [a] -> Vec a n
fromListImplicit = ?????
main :: IO ()
main = do
xs <- readLn :: IO [Int]
print $ fromListImplicit xs
This is not possible using Haskell because Haskell does not yet have full dependent types (although GHC might in the future). Notice that
fromList :: [a] -> Vec a n
Has both a and n quantified universally, which means that the user should be able to pick their n and get back a Vec of the right size. That makes no sense! The trick is that n is not really for the user to choose - it has to be the length of the input list. (For the same reason, fromList :: Integer -> [a] -> Vec a n would not be any more useful - the size hint has to be something type-level.)
Looking to a dependently typed language like Idris, you can define
fromList : (l : List elem) -> Vec (length l) elem
And in fact they define this in the standard library.
So, what can you do? Short of saying that Vec has the length equal to the size of the input list (which requires lifting "length of the input list" to the type level), you can say it has some length.
data SomeVec a where { SomeVec :: Vec a n -> SomeVec a }
list2SomeVec :: [a] -> SomeVec a
list2SomeVec [] = SomeVec Nil
list2SomeVec (x:xs) = case list2SomeVec xs of
SomeVec ys -> SomeVec (x `Cons` ys)
That isn't spectacularly useful, but it is better than nothing.
You can do this with an existentially quantified type variable, as in #Alec’s answer, or equivalently by rewriting in continuation-passing style. The trick is to give fromList a continuation (function) that’s polymorphic in the size of the Vec; then, within the continuation, you have access to a type variable representing the size:
data Vec n a where
Nil :: Vec Z a
Cons :: a -> Vec n a -> Vec (S n) a
deriving instance (Show a) => Show (Vec n a)
fromList :: [a] -> (forall n. Vec n a -> r) -> r
fromList [] k = k Nil
fromList (x : xs) k = fromList xs $ \ xs' -> k (Cons x xs')
-- fromList [1, 2, 3] show == "Cons 1 (Cons 2 (Cons 3 Nil))"
You can’t know the actual value of n, because it’s not available at compile time.
If you replace your Nat with the one from GHC.TypeLits, I think you can get a KnownNat constraint for n by creating a SomeNat from the runtime length using fromJust (someNatVal (fromIntegral (length xs))), then get back the actual length value at runtime with natVal. I’m not really familiar with how to do it, and it might require the ghc-typelits-natnormalise plugin, but it might be a starting point.
In Haskell, difference lists, in the sense of
[a] representation of a list with an efficient concatenation operation
seem to be implemented in terms of function composition.
Functions and (dynamic) function compositions, though, must also be represented somehow in the computer's memory using data structures, which raises the question of how dlists could be implemented in Haskell without using function compositions, but, rather, through some basic purely-functional node-based data structures. How could this be done with the same performance guarantees as those of function composition?
(++)'s notoriously bad asymptotics come about when you use it in a left-associative manner - that is, when (++)'s left argument is the result of another call to (++). Right-associative expressions run efficiently, though.
To put it more concretely: evaluating a left-nested append of m lists like
((ws ++ xs) ++ ys) ++ zs -- m = 3 in this example
to WHNF requires you to force m thunks, because (++) is strict in its left argument.
case (
case (
case ws of { [] -> xs ; (w:ws) -> w:(ws ++ xs) }
) of { [] -> ys ; (x:xs) -> x:(xs ++ ys) }
) of { [] -> zs ; (y:ys) -> y:(ys ++ zs) }
In general, to fully evaluate n elements of such a list, this'll require forcing that stack of m thunks n times, for O(m*n) complexity. When the entire list is built from appends of singleton lists (ie (([w] ++ [x]) ++ [y]) ++ [z]), m = n so the cost is O(n2).
Evaluating a right-nested append like
ws ++ (xs ++ (ys ++ zs))
to WHNF is much easier (O(1)):
case ws of
[] -> xs ++ (ys ++ zs)
(w:ws) -> w:(ws ++ (xs ++ (ys ++ zs)))
Evaluating n elements just requires evaluating n thunks, which is about as good as you can expect to do.
Difference lists work by paying a small (O(m)) up-front cost to automatically re-associate calls to (++).
newtype DList a = DList ([a] -> [a])
fromList xs = DList (xs ++)
toList (DList f) = f []
instance Monoid (DList a) where
mempty = fromList []
DList f `mappend` DList g = DList (f . g)
Now a left-nested expression,
toList (((fromList ws <> fromList xs) <> fromList ys) <> fromList zs)
gets evaluated in a right-associative manner:
((((ws ++) . (xs ++)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
(((\l0 -> ws ++ (xs ++ l0)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
((\l1 -> (\l0 -> ws ++ (xs ++ l0)) (ys ++ l1)) . (zs ++)) []
-- beta reduce
((\l1 -> ws ++ (xs ++ (ys ++ l1))) . (zs ++)) []
-- expand innermost (.)
(\l2 -> (\l1 -> ws ++ (xs ++ (ys ++ l1))) (zs ++ l2)) []
-- beta reduce
(\l2 -> ws ++ (xs ++ (ys ++ (zs ++ l2)))) []
-- beta reduce
ws ++ (xs ++ (ys ++ (zs ++ [])))
You perform O(m) steps to evaluate the composed function, then O(n) steps to evaluate the resulting expression, for a total complexity of O(m+n), which is asymptotically better than O(m*n). When the list is made up entirely of appends of singleton lists, m = n and you get O(2n) ~ O(n), which is asymptotically better than O(n2).
This trick works for any Monoid.
newtype RMonoid m = RMonoid (m -> m) -- "right-associative monoid"
toRM m = RMonoid (m <>)
fromRM (RMonoid f) = f mempty
instance Monoid m => Monoid (RMonoid m):
mempty = toRM mempty
RMonoid f `mappend` RMonoid g = RMonoid (f . g)
See also, for example, the Codensity monad, which applies this idea to monadic expressions built using (>>=) (rather than monoidal expressions built using (<>)).
I hope I have convinced you that (++) only causes a problem when used in a left-associative fashion. Now, you can easily write a list-like data structure for which append is strict in its right argument, and left-associative appends are therefore not a problem.
data Snoc a = Nil | Snoc (Snoc a) a
xs +++ Nil = xs
xs +++ (Snoc ys y) = Snoc (xs +++ ys) y
We recover O(1) WHNF of left-nested appends,
((ws +++ xs) +++ ys) +++ zs
case zs of
Nil -> (ws +++ xs) +++ ys
Snoc zs z -> Snoc ((ws +++ xs) +++ ys) +++ zs) z
but at the expense of slow right-nested appends.
ws +++ (xs +++ (ys +++ zs))
case (
case (
case zs of { Nil -> ys ; (Snoc zs z) -> Snoc (ys +++ zs) z }
) of { Nil -> xs ; (Snoc ys y) -> Snoc (xs +++ ys) y }
) of { Nil -> ws ; (Snoc xs x) -> Snoc (ws +++ xs) y }
Then, of course, you end up writing a new type of difference list which reassociates appends to the left!
newtype LMonoid m = LMonoid (m -> m) -- "left-associative monoid"
toLM m = LMonoid (<> m)
fromLM (LMonoid f) = f mempty
instance Monoid m => Monoid (LMonoid m):
mempty = toLM mempty
LMonoid f `mappend` LMonoid g = LMonoid (g . f)
Carl hit it in his comment. We can write
data TList a = Nil | Single a | Node !(TList a) (TList a)
singleton :: a -> TList a
singleton = Single
instance Monoid (TList a) where
mempty = Nil
mappend = Node
We could get toList by just deriving Foldable, but let's write it out instead to see just what's going on.
instance Foldable TList where
foldMap _ Nil = mempty
foldMap f (Single a) = f a
foldMap f (Node t u) = foldMap f t <> foldMap f u
toList as0 = go as0 [] where
go Nil k = k
go (Single a) k = a : k
go (Node l r) k = go l (go r k)
toList is O(n), where n is the total number of internal nodes (i.e., the total number of mappend operations used to form the TList). This should be pretty clear: each Node is inspected exactly once. mempty, mappend, and singleton are each obviously O(1).
This is exactly the same as for a DList:
newtype DList a = DList ([a] -> [a])
singletonD :: a -> DList a
singletonD a = DList (a:)
instance Monoid (DList a) where
mempty = DList id
mappend (DList f) (DList g) = DList (f . g)
instance Foldable DList where
foldr c n xs = foldr c n (toList xs)
toList (DList f) = f []
Why, operationally, is this the same? Because, as you indicate in your question, the functions are represented in memory as trees. And they're represented as trees that look a lot like TLists! singletonD x produces a closure containing a (boring) (:) and an exciting x. When applied, it does O(1) work. mempty just produces the id function, which when applied does O(1) work. mappend as bs produces a closure that, when applied, does O(1) work of its own, plus O(length as + length bs) work in its children.
The shapes of the trees produced for TList and DList are actually the same. You should be able to convince yourself that they also have identical asymptotic performance when used incrementally: in each case, the program has to walk down the left spine of the tree to get to the first list element.
Both DList and TList are equally okay when built up and then used only once. They're equally lousy when built once and converted to lists multiple times.
As Will Ness showed with a similar type, the explicit tree representation is better if you want to add support for deconstructing the representation, as you can actually get your hands on the structure. TList can support a reasonably efficient uncons operation (that improves the structure as it works). To get efficient unsnoc as well, you'll need to use a fancier representation (a catenable deque). This implementation also has potentially wretched cache performance. You can switch to a cache-oblivious data structure, but it is practically guaranteed to be complicated.
As shown in this answer the trick is the rearrangement of (.) tree into the ($) list on access.
We can emulate this with
data Dlist a = List [a]
| Append (Dlist a) (Dlist a)
which will juggle the Append nodes, rotating the tree to the right, to push the leftmost node up so it becomes the uppermost-left, on the first access, after which the next tail(**) operation becomes O(1) (*) :
let x = (List [1..10] `Append` List [11..20]) `Append` List [21..30]
and tail x(**) is to produce
List [2..10] `Append` (List [11..20] `Append` List [21..30])
tail(**) should be trivial to implement. Of course if will only pattern match the leftmost List node's list (with (x:xs)) when that node is finally discovered, and won't touch contents of anything else inside the Append nodes as they are juggled. Laziness is thus naturally preserved.
(**) 2020 edit: what this actually means is to have one uncons :: Dlist a -> (a, Dlist a) operation producing the head and the new, rotated tail, simultaneously, so that uncons on the new tail is O(1).(*)
(*) edit: O(1) in case the leftmost List node's list isn't empty. Overall, taking into account the possible nested-on-the-left Append nodes that will have to be rearranged as they come to the fore after the first leftmost List node is exhausted, the access to all n elements of the result will be O(n+m), where m is the number of empty lists.
update: A historical note: this is actually quite similar (if not exactly the same) as the efficient tree-fringe enumeration problem, dealt with in 1977 by John McCarthy, whose gopher function did exactly the same kind of nodes re-arrangement (tree rotations to the right) as the proposed here tail(**) would.
Suppose a list L, with length n, is interleaved in list J, with length n + 1.
We'd like to know, for each element of J, which of its neighbors from L is the greater.
The following function takes L as its input, and produces a list K, also of length
n + 1, such that the ith element of K is the desired neighbor of the ith element of J.
aux [] prev acc = prev:acc
aux (hd:tl) prev acc = aux tl hd ((max hd prev):acc)
expand row = reverse (aux row 0 [])
I can prove to myself, informally, that the length of the result of this function (which I
originally wrote in Ocaml) is one greater than the length of the input. But I
hopped over to Haskell (a new language for me) because I got interested in being
able to prove via the type system that this invariant holds. With the help
of this previous answer, I was
able to get as far as the following:
{-# LANGUAGE GADTs, TypeOperators, TypeFamilies #-}
data Z
data S n
type family (:+:) a b :: *
type instance (:+:) Z n = n
type instance (:+:) (S m) n = S (m :+: n)
-- A List of length 'n' holding values of type 'a'
data List a n where
Nil :: List a Z
Cons :: a -> List a m -> List a (S m)
aux :: List a n -> a -> List a m -> List a (n :+: (S m))
aux Nil prev acc = Cons prev acc
aux (Cons hd tl) prev acc = aux tl hd (Cons (max hd prev) acc)
However, the last line produces the following error:
* Could not deduce: (m1 :+: S (S m)) ~ S (m1 :+: S m)
from the context: n ~ S m1
bound by a pattern with constructor:
Cons :: forall a m. a -> List a m -> List a (S m),
in an equation for `aux'
at pyramid.hs:23:6-15
Expected type: List a (n :+: S m)
Actual type: List a (m1 :+: S (S m))
* In the expression: aux tl hd (Cons (max hd prev) acc)
In an equation for `aux':
aux (Cons hd tl) prev acc = aux tl hd (Cons (max hd prev) acc)
* Relevant bindings include
acc :: List a m (bound at pyramid.hs:23:23)
tl :: List a m1 (bound at pyramid.hs:23:14)
aux :: List a n -> a -> List a m -> List a (n :+: S m)
(bound at pyramid.hs:22:1)
It seems that what I need to do is teach the compiler that (x :+: (S y)) ~ S (x :+: y). Is this possible?
Alternatively, are there better tools for this problem than the type system?
First, some imports and language extensions:
{-# LANGUAGE GADTs, TypeInType, RankNTypes, TypeOperators, TypeFamilies, TypeApplications, AllowAmbiguousTypes #-}
import Data.Type.Equality
We now have DataKinds (or TypeInType) which allows us to promote any data to the type level (with its own kind), so the type level naturals really deserve to be defined as a regular data (heck, this is exactly the motivating examples the previous link to the GHC docs give!). Nothing changes with your List type, but (:+:) really should be a closed type family (now over things of kind Nat).
-- A natural number type (that can be promoted to the type level)
data Nat = Z | S Nat
-- A List of length 'n' holding values of type 'a'
data List a n where
Nil :: List a Z
Cons :: a -> List a m -> List a (S m)
type family (+) (a :: Nat) (b :: Nat) :: Nat where
Z + n = n
S m + n = S (m + n)
Now, in order to make the proofs work for aux, it is useful to define singleton types for natural numbers.
-- A singleton type for `Nat`
data SNat n where
SZero :: SNat Z
SSucc :: SNat n -> SNat (S n)
-- Utility for taking the predecessor of an `SNat`
sub1 :: SNat (S n) -> SNat n
sub1 (SSucc x) = x
-- Find the size of a list
size :: List a n -> SNat n
size Nil = SZero
size (Cons _ xs) = SSucc (size xs)
Now, we are in shape to start proving some stuff. From Data.Type.Equality, a :~: b represents a proof that a ~ b. We need to prove one simple thing about arithmetic.
-- Proof that n + (S m) == S (n + m)
plusSucc :: SNat n -> SNat m -> (n + S m) :~: S (n + m)
plusSucc SZero _ = Refl
plusSucc (SSucc n) m = gcastWith (plusSucc n m) Refl
Finally, we can use gcastWith to use this proof in aux. Oh and you were missing the Ord a constraint. :)
aux :: Ord a => List a n -> a -> List a m -> List a (n + S m)
aux Nil prev acc = Cons prev acc
aux (Cons hd tl) prev acc = gcastWith (plusSucc (size tl) (SSucc (size acc)))
aux tl hd (Cons (max hd prev) acc)
-- append to a list
(|>) :: List a n -> a -> List a (S n)
Nil |> y = Cons y Nil
(Cons x xs) |> y = Cons x (xs |> y)
-- reverse 'List'
rev :: List a n -> List a n
rev Nil = Nil
rev (Cons x xs) = rev xs |> x
Let me know if this answers your question - getting started with this sort of thing involves a lot of new stuff.
I have a type synonym type Entity = ([Feature], Body) for whatever Feature and Body mean. Objects of Entity type are to be grouped together:
type Bunch = [Entity]
and the assumption, crucial for the algorithm working with Bunch, is that any two entities in the same bunch have the equal number of features.
If I were to implement this constraint in an OOP language, I would add the corresponding check to the method encapsulating the addition of entities into a bunch.
Is there a better way to do it in Haskell? Preferably, on the definition level. (If the definition of Entity also needs to be changed, no problem.)
Using type-level length annotations
So here's the deal. Haskell does have type-level natural numbers and you can annotate with types using "phantom types". However you do it, the types will look like this:
data Z
data S n
data LAList x len = LAList [x] -- length-annotated list
Then you can add some construction functions for convenience:
lalist1 :: x -> LAList x (S Z)
lalist1 x = LAList [x]
lalist2 :: x -> x -> LAList x (S (S Z))
lalist2 x y = LAList [x, y]
-- ...
And then you've got more generic methods:
(~:) :: x -> LAList x n -> LAList x (S n)
x ~: LAList xs = LAList (x : xs)
infixr 5 ~:
nil :: LAList x Z
nil = LAList []
lahead :: LAList x (S n) -> x
lahead (LAList xs) = head xs
latail :: LAList x (S n) -> LAList x n
latail (LAList xs) = tail xs
but by itself the List definition doesn't have any of this because it's complicated. You may be interested in the Data.FixedList package for a somewhat different approach, too. Basically every approach is going to start off looking a little weird with some data type that has no constructor, but it starts to look normal after a little bit.
You might also be able to get a typeclass so that all of the lalist1, lalist2 operators above can be replaced with
class FixedLength t where
la :: t x -> LAList x n
but you will probably need the -XTypeSynonymInstances flag to do this, as you want to do something like
type Pair x = (x, x)
instance FixedLength Pair where
la :: Pair x -> LAList [x] (S (S Z))
la (a, b) = LAList [a, b]
(it's a kind mismatch when you go from (a, b) to Pair a).
Using runtime checking
You can very easily take a different approach and encapsulate all of this as a runtime error or explicitly model the error in your code:
-- this may change if you change your definition of the Bunch type
features :: Entity -> [Feature]
features = fst
-- we also assume a runBunch :: [Entity] -> Something function
-- that you're trying to run on this Bunch.
allTheSame :: (Eq x) => [x] -> Bool
allTheSame (x : xs) = all (x ==) xs
allTheSame [] = True
permissiveBunch :: [Entity] -> Maybe Something
permissiveBunch es
| allTheSame (map (length . features) es) = Just (runBunch es)
| otherwise = Nothing
strictBunch :: [Entity] -> Something
strictBunch es
| allTheSame (map (length . features) es) = runBunch es
| otherwise = error ("runBunch requires all feature lists to be the same length; saw instead " ++ show (map (length . features) es))
Then your runBunch can just assume that all the lengths are the same and it's explicitly checked for above. You can get around pattern-matching weirdnesses with, say, the zip :: [a] -> [b] -> [(a, b)] function in the Prelude, if you need to pair up the features next to each other. (The goal here would be an error in an algorithm due to pattern-matching for both runBunch' (x:xs) (y:ys) and runBunch' [] [] but then Haskell warns that there are 2 patterns which you've not considered in the match.)
Using tuples and type classes
One final way to do it which is a compromise between the two (but makes for pretty good Haskell code) involves making Entity parametrized over all features:
type Entity x = (x, Body)
and then including a function which can zip different entities of different lengths together:
class ZippableFeatures z where
fzip :: z -> z -> [(Feature, Feature)]
instance ZippableFeatures () where
fzip () () = []
instance ZippableFeatures Feature where
fzip f1 f2 = [(f1, f2)]
instance ZippableFeatures (Feature, Feature) where
fzip (a1, a2) (b1, b2) = [(a1, b1), (a2, b2)]
Then you can use tuples for your feature lists, as long as they don't get any larger than the maximum tuple length (which is 15 on my GHC). If you go larger than that, of course, you can always define your own data types, but it's not going to be as general as type-annotated lists.
If you do this, your type signature for runBunch will simply look like:
runBunch :: (ZippableFeatures z) => [Entity z] -> Something
When you run it on things with the wrong number of features you'll get compiler errors that it can't unify the type (a, b) with (a, b, c).
There are various ways to enforce length constraints like that; here's one:
{-# LANGUAGE DataKinds, KindSignatures, GADTs, TypeFamilies #-}
import Prelude hiding (foldr)
import Data.Foldable
import Data.Monoid
import Data.Traversable
import Control.Applicative
data Feature -- Whatever that really is
data Body -- Whatever that really is
data Nat = Z | S Nat -- Natural numbers
type family Plus (m::Nat) (n::Nat) where -- Type level natural number addition
Plus Z n = n
Plus (S m) n = S (Plus m n)
data LList (n :: Nat) a where -- Lists tagged with their length at the type level
Nil :: LList Z a
Cons :: a -> LList n a -> LList (S n) a
Some functions on these lists:
llHead :: LList (S n) a -> a
llHead (Cons x _) = x
llTail :: LList (S n) a -> LList n a
llTail (Cons _ xs) = xs
llAppend :: LList m a -> LList n a -> LList (Plus m n) a
llAppend Nil ys = ys
llAppend (Cons x xs) ys = Cons x (llAppend xs ys)
data Entity n = Entity (LList n Feature) Body
data Bunch where
Bunch :: [Entity n] -> Bunch
Some instances:
instance Functor (LList n) where
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap f xs)
instance Foldable (LList n) where
foldMap f Nil = mempty
foldMap f (Cons x xs) = f x `mappend` foldMap f xs
instance Traversable (LList n) where
traverse f Nil = pure Nil
traverse f (Cons x xs) = Cons <$> f x <*> traverse f xs
And so on. Note that n in the definition of Bunch is existential. It can be anything, and what it actually is doesn't affect the type—all bunches have the same type. This limits what you can do with bunches to a certain extent. Alternatively, you can tag the bunch with the length of its feature lists. It all depends what you need to do with this stuff in the end.
Imagine I have the following data types and type classes (with proper language extensions):
data Zero=Zero
data Succ n = Succ n
class Countable t where
count :: t -> Int
instance Countable Zero where
count Zero = 0
instance (Countable n) => Countable (Succ n) where
count (Succ n) = 1 + count n
Would it be possible to write a function that gives me , from an Integer, an instance of the proper typeclass instance? Basically, a function f so that count (f n) = n
My attempts have been variants of the following, this gives me some type errors at compile time though:
f::(Countable k)=> Int -> k
f 0 = Zero
f n = Succ $ f (n-1)
I've run into discussions on dependent types a lot while looking for a solution, but I have yet to be able to map those discussions to my use-case.
Context: Because I realise that this will get a lot of "why would you want to do this" type of questions...
I'm currently using the Data.HList package to work with heterogeneous lists. In this library, I would like to build a function shuffle which , when given an integer n, would shift the nth element of the list to the end.
For example, if I had l=1:"Hello":32:500 , I'd want shuffle 1 l to give 1:32:500:"Hello".
I've been able to write the specialised function shuffle0 that would answer the usecase for shuffle 0:
shuffle0 ::(HAppend rest (HCons fst HNil) l')=>HCons fst rest -> l'
shuffle0 (HCons fst rest) = hAppend rest (HCons fst HNil)
I've also written this function next that would "wrap" a function , such that next (shuffle n) = shuffle (n+1):
next :: (forall l l'. l -> l') -> (forall e l l'.(HCons e l) -> (HCons e l'))
next f = \(HCons e l)-> HCons e $ (f l)
I feel like my type signature might not help, namely there isn't length encoding (which is where problems could appear):
shuffle 0=shuffle0
shuffle n= next (shuffle (n-1))
GHC complains about not being able to deduce the type of shuffle.
This doesn't really surprise me as the type is probably not very well founded.
Intuitively I feel like there should be a mention of the length of the lists. I can get the length of a specific HList type through the HLength type function, and with some nicely chosen constraints rewrite shuffle so that it's sound (at least I think).
The problem is that I still need to get the type-level version of my chosen length, so that I can use it in my call. I'm not even sure if with that this can work but I feel I'll have a better chance.
To answer your initial question, you cannot write such an f from Int to a type-level inductive representation of integers without a full dependent-type system (which Haskell does not have). However, the problem you describe in the 'context' does not require such a function in Haskell.
I believe the following is roughly what you are looking for:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts, FunctionalDependencies, UndecidableInstances #-}
import Data.HList
data Z = Z
data S n = S n
class Nat t
instance Nat Z
instance Nat n => Nat (S n)
class (Nat n, HList l, HList l') => Shuffle n l l' | n l -> l' where
shuffle :: n -> l -> l'
instance Shuffle Z HNil HNil where
shuffle Z HNil = HNil
instance (HAppend xs (HCons x HNil) ys, HList xs, HList ys) => Shuffle Z (HCons x xs) ys where
shuffle Z (HCons x xs) = hAppend xs (HCons x HNil)
instance (Shuffle n xs ys) => Shuffle (S n) (HCons x xs) (HCons x ys) where
shuffle (S n) (HCons x xs) = HCons x (shuffle n xs)
e.g.
*Main> shuffle (S Z) (HCons 1 (HCons "Hello" (HCons 32 (HCons 500 HNil))))
HCons 1 (HCons 32 (HCons 500 (HCons "Hello" HNil)))
The general technique behind this definition is to first think about how to write the non-dependent-typed version (e.g. here, how to shuffle an element to the end of a list) and to then convert this to the type-level (constraint) version. Note, the recursive structure of shuffle is exactly mirrored by the recursive structure of constraints in the type class instances.
Does this solve what you are trying to do?