What does data Vector :: * -> Nat -> * where mean in Haskell? - haskell

I'm looking at https://wiki.haskell.org/GHC/Kinds and I found this:
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
I tried looking about what does the where does in Haskell but I could not find info about it together with data. I have an idea of what a kind is. A constructor that takes no parameters has kind * and a constructor that takes one parameter has kind * -> *, it's the 'type' of the constructor. I think it's defining the kind of Vector as being * -> Nat -> * which means something that can take anything, a natural number and return a Vector?
And more important: why would someone use this stuff for?

The data ... where ... syntax is GADT syntax. (The wiki you were looking at also has a page about GADTs, linked from the page you were reading)
GADT syntax is enabled with the GADTs language extension. It gives you an alternative to the standard data declaration syntax, where instead of listing the data constructors as if they were applied to the types of their fields (thus implicitly defining the overall type of the constructor), you instead write an explicit type signature for each constructor.
For example, the standard Maybe type is defined like this in traditional syntax:
-- Maybe type constructor takes an argument a;
-- all data constructors implicitly return Maybe a
data Maybe a
= Just a -- Just data constructor takes an argument **of type** a, not a itself
| Nothing -- Nothing data constructor takes no arguments
And like this in GADT syntax:
-- Maybe type constructor takes an argument a
data Maybe a where
Just :: a -> Maybe a -- Just takes an argument of type a to return a Maybe a
Nothing :: Maybe a -- Nothing is simply of type Maybe a
This syntax is more verbose. In complex cases it can arguably be clearer, since we explicitly write the type of the constructors in ordinary type expression syntax, rather than defining them in a weird special-purpose syntax where we pseudo-apply term-level data constructors to the types of their arguments.
But more importantly, GADT syntax1 opens up the door to new features that simply cannot be expressed in the original syntax. That is happening in your example.
Primarily the new features come from control over the return type of each constructor. If we wanted to try to define Vector using the traditional data syntax, we would have to do something like this:
data Vector a n
= VNil
| VCons a (Vector a n)
The n parameter is supposed to represent the vector's number of elements at the type level. The idea is so that we can do things like zip two vectors together with the compiler enforcing that the two vectors have the same length, rather than doing something like Data.List.zip where it simply gives up and silently discards any remaining elements from one list when the other runs out of elements.
But what we've written above can't do that. VNil always returns a value of type Vector a n, and it doesn't have any fields of type n (or a) so any VNil can be used with any n type parameter at all; n isn't going to say anything about the size of the vector! And similarly VCons has a field with a Vector a n in it, but is also going to end up constructing a value of type Vector a n, where the n parameter is the same, when we want it to indicate that the size is one larger than the tail-vector's size.
GADT syntax allows us to fix those problems:
{-# LANGUAGE GADTs #-}
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Now we can explicitly say that the VNil constructor is a vector of size Zero; it's not like Nothing :: Maybe a where it is always polymorphic in that type variable. (Our VNil is still polymorphic in the type variable a, of course, since having no elements means we shouldn't constrain what the element type will be). And VCons takes an a and a Vector a n to produce a Vector a (Succ n)2; specifically a vector that is "one item larger" than the vector inside it.
We missed a step though, which is actually defining Zero and Succ anywhere. The way to do that is simply:
data Nat = Zero | Succ Nat
This says: A natural number2 is either Zero, or it is the Succ of some other Nat. We can pattern match a Nat repeatedly and we'll eventually3 hit Zero; if we wanted to convert this to a "normal" number we'd do that and count the Succ constructors.
But that has only given us Zero and Succ as data constructors. We were wanting to use them in the types of our VNil and VCons. For that we'll need another extension: DataKinds. It just allows use to use any data declaration both to define a (type-level) type constructor and its associated (term-level) data constructors and to define a (kind-level) "kind constructor" and its associated type constructors (which in turn do not contain any values at the term-level; they're purely for type manipulation).
Note that the wiki page you were looking at appears to be describing a not-yet-implemented language extension called Kinds, which is clearly the same as what is now DataKinds. As such, that wiki page is extremely old, so you should probably just ignore it. If you were looking for reading material about kinds in general (rather than the DataKinds extension specifically), you probably need to continue your search.
But using DataKinds we can do this (actually compiles now):
{-# LANGUAGE GADTs, DataKinds #-}
data Nat = Zero | Succ Nat
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Here we defined Nat (at the type level) and Zero & Succ (at the term level) as normal. But then in the definition of Vector we use Zero and Succ at the type level. This is enough for GHC to infer that in Vector a n the n must be of kind Nat (because it is used with type constructors in that kind). But we can also use yet-another extension KindSignatures and be more explicit about the kind of the type constructor Vector, like so:
{-# LANGUAGE GADTs, DataKinds, KindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where -- this line is where the difference is
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Before we were just saying data Vector a n and trusting both the compiler and the reader to figure out from the usage below that a must be of kind * (it's used as the type of an actual value in the VCons constructor) and n is of kind Nat (because that position in the Vector type constructor is filled by type-level Zero and Succ n in the constructors). Now we can explicitly give the compiler and the reader more information up front with data Vector :: * -> Nat -> *; Vector takes a type parameter of kind *, another type parameter of kind Nat, and results in a type of kind * (which means it's a type that can actually have values at the term level).
It can be a little confusing with DataKinds whether a token like Zero refers to the term-level Zero (of type Nat, which is of kind *) or the type-level Zero (of kind Nat). Most of the time the compiler can perfectly tell which is which because it keeps very strong track of whether a given expression is a term expression or a type expression. But DataKinds gives you a way of being more explicit; if you prepend a type constructor with a single quote (like 'Zero) then it definitively means the data constructor promoted to type level. Some people consider it good style to always use this explicit marking.4
So that is pretty much everything in that is going on in your example.
As for what it's all for... at the most basic level it allows you do things like keeping track of the length of lists5, and then have the compiler enforce that multiple lists have the same size (or have sizes that have some specific other relationship, like being larger, or smaller, or twice as large, etc). People use that capability for a huge number of things that I can't possibly sum up in a single post (and not just with sizes of things; there are countless ways to use DataKinds and GADTs to reflect information at the type level so that the compiler can enforce things for you); it's a bit like asking "what are functions for". But here's a couple of example functions that do interesting things with the type-level length:
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
The zip that enforces the vectors have the same length, as I mentioned earlier. Guarantees that you can't have a bug because one list was accidentally shorter and you simply ignored elements of the other, instead the compiler will complain. Here's vzipWith at work in GHCI:
λ vzipWith replicate (1 `VCons` (2 `VCons` VNil)) ('a' `VCons` ('b' `VCons` VNil))
VCons "a" (VCons "bb" VNil)
it :: Vector [Char] ('Succ ('Succ 'Zero))
With a bit more work we can define how to add type level Nats (using even more extensions, which I won't explain in detail here). Then we can append vectors while keeping track of the combined length:
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
And at work:
λ vappend (1 `VCons` (2 `VCons` VNil)) (3 `VCons` (4 `VCons` (5 `VCons` VNil)))
VCons 1 (VCons 2 (VCons 3 (VCons 4 (VCons 5 VNil))))
it ::
Num a => Vector a ('Succ ('Succ ('Succ ('Succ ('Succ 'Zero)))))
If you want to play around with it in GHCI, put all of this in a file and load it:
{-# LANGUAGE DataKinds, GADTs, KindSignatures, StandaloneDeriving, TypeFamilies, TypeOperators, StandaloneKindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a 'Zero
VCons :: a -> Vector a n -> Vector a ('Succ n)
deriving instance Show a => Show (Vector a n)
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
Here I have also added yet another extension StandaloneDeriving so we can derive a Show instance for Vector, so you can play around in the interpreter and see what you get.
1 There is in fact an extension GADTSyntax that enables just the new syntax, but doesn't allow you to define any types that you couldn't have defined in the old syntax. Hardly anyone uses this extension as far as I know; GADTs are well-regarded and anyone who bothers to learn the new syntax only does so in the context of learning about GADTs, so if they like the new syntax and want to use it everywhere they're probably just enabling GADTs everywhere.
2 "Succ" is short for "successor". The standard way of defining natural numbers from first principles is to assume that there exists the first natural number (zero), and that for any natural number has a successor which is a different natural number (and is not the successor of any other number). It's a fancy way of saying you can start at zero and count up from there as far as you want to go. But this inductive structure of natural numbers happens to map very nicely into Haskell's type logic, making it very easy to count things at type level using numbers defined this way, which is why it's being used here.
3 Unless it's infinite, which means our attempt to "count the succs" will never terminate, which is another way to produce bottom/undefined in Haskell.
4 This "tick" syntax is necessary in some cases, for disambiguation. For example, with DataKinds we can have type level lists of types, like [Bool, Char, Maybe Integer], because we can promote lists to operate at type-level instead of term-level. [Bool, Char, Maybe Integer] is a type-level list of kind [*] (a list of things of kind *, i.e. a list of types). The problem arises when we consider things like [Bool]. This is definitely a type expression, but is it the list type constructor applied to Bool (meaning, the type of terms that are a list of boolean values), or is it a singleton list-of-types (i.e. the list data constructors : and [], promoted to type level), whose single value happens to be the type Bool? One is of kind *, and the other is of kind [*]. There's no way to tell the programmer's intent from looking at it, so the rules of DataKinds say we prioritise the pre-DataKinds interpretation; [Bool] is definitely the type of lists of boolean values. We can use the tick mark to explicitly choose the other interpretation: '[Bool] is a singleton list of types.
5 Lists whose type reflects the length of the list are traditionally called vectors, to distinguish them from ordinary lists whose type says nothing about the length. Perhaps this is not ideal, since there are several other things that are also called "vectors" to distinguish them from ordinary lists. But it's established terminology.

Related

Heterogenous sized vectors where the types "work elsewhere"

Suppose I have a function that works on a vector with size known at compile-time (these are provided by the vector-sized package):
{-# LANGUAGE DataKinds, GADTs #-}
module Test where
import Data.Vector.Sized
-- Processes vectors known at compile time to have size 4.
processVector :: Vector 4 Int -> String
processVector = undefined
Fine, but what if I don't want to process vector of ints, but a vector of vectors?
-- Same thing but has subvectors of size 3.
processVector2 :: Vector 4 (Vector 3 Int) -> String
processVector2 = undefined
Fine, but there each sub-vector is of a fixed size. I want a function where the subvectors can each be of a different size but still known at compile time.
We can do this with existential quantifications:
data InnerVector = forall n. InnerVector (Vector n Int)
processVector3 :: Vector 4 InnerVector -> String
processVector3 = undefined
Fine, but what if I want to return not a String but a vector of the same dimensions?
processVector4 :: Vector 4 InnerVector -> Vector 4 InnerVector
processVector4 = undefined
This does not work because the second vector might have differently sized subvectors from the input subvectors! I want them known to be same at compile time. (So the subvectors at index 0 have same size, subvectors at index 1 have the same size, and so on.)
Is this possible to achieve? If not, do you know of (or can you create) a data structure that makes this possible?
I am avoiding tuples because:
My vectors will have size over 100.
Vectors make general processing easy (using 0-based indexes), so my processing functions continue to work even if I add more items to my vector.
I do indeed only want values of one type within the inner vectors (Int in the example).
By using existential quantification you effectively hide the sizes of the inner
vectors. But if you want to write code with types that convey that you are
preserving those sizes, you don't want them hidden. Instead you want your types
to be loud and clear about them.
So, let's define some types that broadcast these inner sizes. Essentially, you
need your "vector-of-vectors" type to be a type of heterogenous lists that
restricts the elements of these lists to be vectors. For sure, there are some
libraries out there that can help you put together such a type, but here we'll
roll our own. Just because it's more fun to do so.
Let's start with enabling some language extensions and then writing some types for the inner vectors and their sizes:
{-# LANGUAGE DataKinds, GADTs, InstanceSigs, KindSignatures, TypeOperators #-}
data Nat = Zero | Succ Nat
data Vector :: Nat -> * -> * where
VNil :: Vector Zero a
VCons :: a -> Vector n a -> Vector (Succ n) a
instance Functor (Vector n) where
fmap f VNil = VNil
fmap f (VCons x xs) = VCons (f x) (fmap f xs)
Next up is our type of "jagged" matrices (i.e., a vector of variable-size
vectors). As said, this is just a specific type of heterogeneous lists:
data JaggedMatrix :: [Nat] -> * -> * where
MNil :: JaggedMatrix '[] a
MCons :: Vector n a -> JaggedMatrix ns a -> JaggedMatrix (n : ns) a
There, that's it. The type of jagged matrices is indexed by a list that contains the sizes of the inner vectors. The outer dimension is not explicated in the type, but can simply be derived from the length of the inner-dimensions list.
Let's put it to work and write a dimensions-preservering
function. Here's an obvious one:
instance Functor (JaggedMatrix ns) where
fmap :: (a -> b) -> JaggedMatrix ns a -> JaggedMatrix ns b
fmap f MNil = MNil
fmap f (MCons xs xss) = MCons (fmap f xs) (fmap f xss)

How to "iterate" over a function whose type changes among iteration but the formal definition is the same

I have just started learning Haskell and I come across the following problem. I try to "iterate" the function \x->[x]. I expect to get the result [[8]] by
foldr1 (.) (replicate 2 (\x->[x])) $ (8 :: Int)
This does not work, and gives the following error message:
Occurs check: cannot construct the infinite type: a ~ [a]
Expected type: [a -> a]
Actual type: [a -> [a]]
I can understand why it doesn't work. It is because that foldr1 has type signature foldr1 :: Foldable t => (a -> a -> a) -> a -> t a -> a, and takes a -> a -> a as the type signature of its first parameter, not a -> a -> b
Neither does this, for the same reason:
((!! 2) $ iterate (\x->[x]) .) id) (8 :: Int)
However, this works:
(\x->[x]) $ (\x->[x]) $ (8 :: Int)
and I understand that the first (\x->[x]) and the second one are of different type (namely [Int]->[[Int]] and Int->[Int]), although formally they look the same.
Now say that I need to change the 2 to a large number, say 100.
My question is, is there a way to construct such a list? Do I have to resort to meta-programming techniques such as Template Haskell? If I have to resort to meta-programming, how can I do it?
As a side node, I have also tried to construct the string representation of such a list and read it. Although the string is much easier to construct, I don't know how to read such a string. For example,
read "[[[[[8]]]]]" :: ??
I don't know how to construct the ?? part when the number of nested layers is not known a priori. The only way I can think of is resorting to meta-programming.
The question above may not seem interesting enough, and I have a "real-life" case. Consider the following function:
natSucc x = [Left x,Right [x]]
This is the succ function used in the formal definition of natural numbers. Again, I cannot simply foldr1-replicate or !!-iterate it.
Any help will be appreciated. Suggestions on code styles are also welcome.
Edit:
After viewing the 3 answers given so far (again, thank you all very much for your time and efforts) I realized this is a more general problem that is not limited to lists. A similar type of problem can be composed for each valid type of functor (what if I want to get Just Just Just 8, although that may not make much sense on its own?).
You'll certainly agree that 2 :: Int and 4 :: Int have the same type. Because Haskell is not dependently typed†, that means foldr1 (.) (replicate 2 (\x->[x])) (8 :: Int) and foldr1 (.) (replicate 4 (\x->[x])) (8 :: Int) must have the same type, in contradiction with your idea that the former should give [[8]] :: [[Int]] and the latter [[[[8]]]] :: [[[[Int]]]]. In particular, it should be possible to put both of these expressions in a single list (Haskell lists need to have the same type for all their elements). But this just doesn't work.
The point is that you don't really want a Haskell list type: you want to be able to have different-depth branches in a single structure. Well, you can have that, and it doesn't require any clever type system hacks – we just need to be clear that this is not a list, but a tree. Something like this:
data Tree a = Leaf a | Rose [Tree a]
Then you can do
Prelude> foldr1 (.) (replicate 2 (\x->Rose [x])) $ Leaf (8 :: Int)
Rose [Rose [Leaf 8]]
Prelude> foldr1 (.) (replicate 4 (\x->Rose [x])) $ Leaf (8 :: Int)
Rose [Rose [Rose [Rose [Leaf 8]]]]
†Actually, modern GHC Haskell has quite a bunch of dependently-typed features (see DaniDiaz' answer), but these are still quite clearly separated from the value-level language.
I'd like to propose a very simple alternative which doesn't require any extensions or trickery: don't use different types.
Here is a type which can hold lists with any number of nestings, provided you say how many up front:
data NestList a = Zero a | Succ (NestList [a]) deriving Show
instance Functor NestList where
fmap f (Zero a) = Zero (f a)
fmap f (Succ as) = Succ (fmap (map f) as)
A value of this type is a church numeral indicating how many layers of nesting there are, followed by a value with that many layers of nesting; for example,
Succ (Succ (Zero [['a']])) :: NestList Char
It's now easy-cheesy to write your \x -> [x] iteration; since we want one more layer of nesting, we add one Succ.
> iterate (\x -> Succ (fmap (:[]) x)) (Zero 8) !! 5
Succ (Succ (Succ (Succ (Succ (Zero [[[[[8]]]]])))))
Your proposal for how to implement natural numbers can be modified similarly to use a simple recursive type. But the standard way is even cleaner: just take the above NestList and drop all the arguments.
data Nat = Zero | Succ Nat
This problem indeed requires somewhat advanced type-level programming.
I followed #chi's suggestion in the comments, and searched for a library that provided inductive type-level naturals with their corresponding singletons. I found the fin library, which is used in the answer.
The usual extensions for type-level trickery:
{-# language DataKinds, PolyKinds, KindSignatures, ScopedTypeVariables, TypeFamilies #-}
Here's a type family that maps a type-level natural and an element type to the type of the corresponding nested list:
import Data.Type.Nat
type family Nested (n::Nat) a where
Nested Z a = [a]
Nested (S n) a = [Nested n a]
For example, we can test from ghci that
*Main> :kind! Nested Nat3 Int
Nested Nat3 Int :: *
= [[[[Int]]]]
(Nat3 is a convenient alias defined in Data.Type.Nat.)
And here's a newtype that wraps the function we want to construct. It uses the type family to express the level of nesting
newtype Iterate (n::Nat) a = Iterate { runIterate :: (a -> [a]) -> a -> Nested n a }
The fin library provides a really nifty induction1 function that lets us compute a result by induction on Nat. We can use it to compute the Iterate that corresponds to every Nat. The Nat is passed implicitly, as a constraint:
iterate' :: forall n a. SNatI n => Iterate (n::Nat) a
iterate' =
let step :: forall m. SNatI m => Iterate m a -> Iterate (S m) a
step (Iterate recN) = Iterate (\f a -> [recN f a])
in induction1 (Iterate id) step
Testing the function in ghci (using -XTypeApplications to supply the Nat):
*Main> runIterate (iterate' #Nat3) pure True
[[[[True]]]]

Haskell singletons: What do we gain with SNat

I'm trying to grook Haskell singletons.
In the paper Dependently Typed Programming with Singletons
and in his blog post singletons v0.9 Released!
Richard Eisenberg defines the data type Nat which defines natural numbers with the peano axioms:
data Nat = Zero | Succ Nat
By using the language extension DataKinds this data type is promoted to the type level.
The data constuctors Zero and Succ are promoted to the type constructors 'Zero and 'Succ.
With this we get for every Natural number a single and unique corresponding type on the type level. Eg for 3 we get 'Succ ( 'Succ ( 'Succ 'Zero)).
So we have now Natural numbers as types.
He then defines on the value level the function plus and on the type level the type family Plus
to have the addition operation available.
With the promote function/quasiqoter of the singletons library we can automatically
create the Plus type family from the plus function. So we can avoid writing the type family ourselfs.
So far so good!
With GADT syntax he also defines a data type SNat:
data SNat :: Nat -> * where
SZero :: SNat Zero
SSucc :: SNat n -> SNat (Succ n)
Basically he only wraps the Nat type into a SNat constructor.
Why is this necessary? What do we gain?
Are the data types Nat and SNat not isomorphic? Why is SNat a singleton, and why is Nat not
a singleton? In both cases every type is inhabited by one single value, the corresponding natural number.
What do we gain? Hmm. The status of singletons is that of awkward but currently necessary workaround, and the sooner we can do away with them, the better.
Let me see if I can clarify the picture. We have a data type Nat:
data Nat = Zero | Suc Nat
(wars have been started over even more trivial issues than the number of 'c's in Suc)
The type Nat has run-time values which are indistinguishable at the type level. The Haskell type system currently has the replacement property, which means that in any well typed program, you may replace any well typed subexpression by an alternative subexpression with the same scope and type, and the program will continue to be well typed. For example, you can rewrite every occurrence of
if <b> then <t> else <e>
to
if <b> then <e> else <t>
and you can be sure that nothing will go wrong...with the outcome of checking your program's type.
The replacement property is an embarrassment. It's clear proof that your type system gives up at the very moment that meaning starts to matter.
Now, by being a data type for run-time values, Nat also becomes a type of type-level values 'Zero and 'Suc. The latter live only in Haskell's type language and have no run-time presence at all. Please note that although 'Zero and 'Suc exist at the type level, it is unhelpful to refer to them as "types" and the people who currently do that should desist. They do not have type * and can thus not classify values which is what types worthy of the name do.
There is no direct means of exchange between run-time and type-level Nats, which can be a nuisance. The paradigmatic example concerns a key operation on vectors:
data Vec :: Nat -> * -> * where
VNil :: Vec 'Zero x
VCons :: x -> Vec n x -> Vec ('Suc n) x
We might like to compute a vector of copies of a given element (perhaps as part of an Applicative instance). It might look like a good idea to give the type
vec :: forall (n :: Nat) (x :: *). x -> Vec n x
but can that possibly work? In order to make n copies of something, we need to know n at run time: a program has to decide whether to deploy VNil and stop or to deploy VCons and keep going, and it needs some data to do that. A good clue is the forall quantifier, which is parametric: it indicates thats the quantified information is available only to types and is erased by run time.
Haskell currently enforces an entirely spurious coincidence between dependent quantification (what forall does) and erasure for run time. It does not support a dependent but not erased quantifier, which we often call pi. The type and implementation of vec should be something like
vec :: pi (n :: Nat) -> forall (x :: *). Vec n x
vec 'Zero x = VNil
vec ('Suc n) x = VCons x (vec n x)
where arguments in pi-positions are written in the type language, but the data are available at run time.
So what do we do instead? We use singletons to capture indirectly what it means to be a run-time copy of type-level data.
data SNat :: Nat -> * where
SZero :: SNat Zero
SSuc :: SNat n -> SNat (Suc n)
Now, SZero and SSuc make run-time data. SNat is not isomorphic to Nat: the former has type Nat -> *, while the latter has type *, so it is a type error to try to make them isomorphic. There are many run-time values in Nat, and the type system does not distinguish them; there is exactly one run-time value (worth speaking of) in each different SNat n, so the fact that the type system cannot distinguish them is beside the point. The point is that each SNat n is a different type for each different n, and that GADT pattern matching (where a pattern can be of a more specific instance of the GADT type it is known to be matching) can refine our knowledge of n.
We may now write
vec :: forall (n :: Nat). SNat n -> forall (x :: *). x -> Vec n x
vec SZero x = VNil
vec (SSuc n) x = VCons x (vec n x)
Singletons allow us to bridge the gap between run time and type-level data, by exploiting the only form of run-time analysis that allows the refinement of type information. It's quite sensible to wonder if they're really necessary, and they presently are, only because that gap has not yet been eliminated.

Can a Haskell type constructor have non-type parameters?

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *
Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E
I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a
Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.
No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

Real world use of GADT

How do I make use of Generalized Algebraic Data Type?
The example given in the haskell wikibook is too short to give me an insight of the real possibilities of GADT.
GADTs are weak approximations of inductive families from dependently typed languages—so let's begin there instead.
Inductive families are the core datatype introduction method in a dependently typed language. For instance, in Agda you define the natural numbers like this
data Nat : Set where
zero : Nat
succ : Nat -> Nat
which isn't very fancy, it's essentially just the same thing as the Haskell definition
data Nat = Zero | Succ Nat
and indeed in GADT syntax the Haskell form is even more similar
{-# LANGUAGE GADTs #-}
data Nat where
Zero :: Nat
Succ :: Nat -> Nat
So, at first blush you might think GADTs are just neat extra syntax. That's just the very tip of the iceberg though.
Agda has capacity to represent all kinds of types unfamiliar and strange to a Haskell programmer. A simple one is the type of finite sets. This type is written like Fin 3 and represents the set of numbers {0, 1, 2}. Likewise, Fin 5 represents the set of numbers {0,1,2,3,4}.
This should be quite bizarre at this point. First, we're referring to a type which has a regular number as a "type" parameter. Second, it's not clear what it means for Fin n to represent the set {0,1...n}. In real Agda we'd do something more powerful, but it suffices to say that we can define a contains function
contains : Nat -> Fin n -> Bool
contains i f = ?
Now this is strange again because the "natural" definition of contains would be something like i < n, but n is a value that only exists in the type Fin n and we shouldn't be able to cross that divide so easily. While it turns out that the definition is not nearly so straightforward, this is exactly the power that inductive families have in dependently typed languages—they introduce values that depend on their types and types that depend on their values.
We can examine what it is about Fin that gives it that property by looking at its definition.
data Fin : Nat -> Set where
zerof : (n : Nat) -> Fin (succ n)
succf : (n : Nat) -> (i : Fin n) -> Fin (succ n)
this takes a little work to understand, so as an example lets try constructing a value of the type Fin 2. There are a few ways to do this (in fact, we'll find that there are exactly 2)
zerof 1 : Fin 2
zerof 2 : Fin 3 -- nope!
zerof 0 : Fin 1 -- nope!
succf 1 (zerof 0) : Fin 2
This lets us see that there are two inhabitants and also demonstrates a little bit of how type computation happens. In particular, the (n : Nat) bit in the type of zerof reflects the actual value n up into the type allowing us to form Fin (n+1) for any n : Nat. After that we use repeated applications of succf to increment our Fin values up into the correct type family index (natural number that indexes the Fin).
What provides these abilities? In all honesty there are many differences in between a dependently typed inductive family and a regular Haskell ADT, but we can focus on the exact one that is most relevant to understanding GADTs.
In GADTs and inductive families you get an opportunity to specify the exact type of your constructors. This might be boring
data Nat where
Zero :: Nat
Succ :: Nat -> Nat
Or, if we have a more flexible, indexed type we can choose different, more interesting return types
data Typed t where
TyInt :: Int -> Typed Int
TyChar :: Char -> Typed Char
TyUnit :: Typed ()
TyProd :: Typed a -> Typed b -> Typed (a, b)
...
In particular, we're abusing the ability to modify the return type based on the particular value constructor used. This allows us to reflect some value information up into the type and produce more finely specified (fibered) typed.
So what can we do with them? Well, with a little bit of elbow grease we can produce Fin in Haskell. Succinctly it requires that we define a notion of naturals in types
data Z
data S a = S a
> undefined :: S (S (S Z)) -- 3
... then a GADT to reflect values up into those types...
data Nat where
Zero :: Nat Z
Succ :: Nat n -> Nat (S n)
... then we can use these to build Fin much like we did in Agda...
data Fin n where
ZeroF :: Nat n -> Fin (S n)
SuccF :: Nat n -> Fin n -> Fin (S n)
And finally we can construct exactly two values of Fin (S (S Z))
*Fin> :t ZeroF (Succ Zero)
ZeroF (Succ Zero) :: Fin (S (S Z))
*Fin> :t SuccF (Succ Zero) (ZeroF Zero)
SuccF (Succ Zero) (ZeroF Zero) :: Fin (S (S Z))
But notice that we've lost a lot of convenience over the inductive families. For instance, we can't use regular numeric literals in our types (though that's technically just a trick in Agda anyway), we need to create a separate "type nat" and "value nat" and use the GADT to link them together, and we'd also find, in time, that while type level mathematics is painful in Agda it can be done. In Haskell it's incredibly painful and often cannot.
For instance, it's possible to define a weaken notion in Agda's Fin type
weaken : (n <= m) -> Fin n -> Fin m
weaken = ...
where we provide a very interesting first value, a proof that n <= m which allows us to embed "a value less than n" into the set of "values less than m". We can do the same in Haskell, technically, but it requires heavy abuse of type class prolog.
So, GADTs are a resemblance of inductive families in dependently typed languages that are weaker and clumsier. Why do we want them in Haskell in the first place?
Basically because not all type invariants require the full power of inductive families to express and GADTs pick a particular compromise between expressiveness, implementability in Haskell, and type inference.
Some examples of useful GADTs expressions are Red-Black Trees which cannot have the Red-Black property invalidated or simply-typed lambda calculus embedded as HOAS piggy-backing off the Haskell type system.
In practice, you also often see GADTs use for their implicit existential context. For instance, the type
data Foo where
Bar :: a -> Foo
implicitly hides the a type variable using existential quantification
> :t Bar 4 :: Foo
in a way that is sometimes convenient. If you look carefully the HOAS example from Wikipedia uses this for the a type parameter in the App constructor. To express that statement without GADTs would be a mess of existential contexts, but the GADT syntax makes it natural.
GADTs can give you stronger type enforced guarantees than regular ADTs. For example, you can force a binary tree to be balanced on the type system level, like in this implementation of 2-3 trees:
{-# LANGUAGE GADTs #-}
data Zero
data Succ s = Succ s
data Node s a where
Leaf2 :: a -> Node Zero a
Leaf3 :: a -> a -> Node Zero a
Node2 :: Node s a -> a -> Node s a -> Node (Succ s) a
Node3 :: Node s a -> a -> Node s a -> a -> Node s a -> Node (Succ s) a
Each node has a type-encoded depth where all its leaves reside. A tree is then
either an empty tree, a singleton value, or a node of unspecified depth, again
using GADTs.
data BTree a where
Root0 :: BTree a
Root1 :: a -> BTree a
RootN :: Node s a -> BTree a
The type system guarantees you that only balanced nodes can be constructed.
This means that when implementing operations like insert on such trees, your
code type-checks only if its result is always a balanced tree.
I have found the "Prompt" monad (from the "MonadPrompt" package) a very useful tool in several places (along with the equivalent "Program" monad from the "operational" package. Combined with GADTs (which is how it was intended to be used), it allows you to make embedded languages very cheaply and very flexibly. There was a pretty good article in the Monad Reader issue 15 called "Adventures in Three Monads" that had a good introduction to the Prompt monad along with some realistic GADTs.
I like the example in the GHC manual. It's a quick demo of a core GADT idea: that you can embed the type system of a language you're manipulating into Haskell's type system. This lets your Haskell functions assume, and forces them to preserve, that the syntax trees correspond to well-typed programs.
When we define Term, it doesn't matter what types we choose. We could write
data Term a where
...
IsZero :: Term Char -> Term Char
or
...
IsZero :: Term a -> Term b
and the definition of Term would still go through.
It's only once we want to compute on Term, such as in defining eval, that the types matter. We need to have
...
IsZero :: Term Int -> Term Bool
because we need our recursive call to eval to return an Int, and we want to in turn return a Bool.
This is a short answer, but consult the Haskell Wikibook. It walks you though a GADT for a well-typed expression tree, which is a fairly canonical example: http://en.wikibooks.org/wiki/Haskell/GADT
GADTs are also used for implementing type equality: http://hackage.haskell.org/package/type-equality. I can't find the right paper to reference for this offhand -- this technique has made its way well into folklore by now. It is used quite well, however, in Oleg's typed tagless stuff. See, e.g. the section on typed compilation into GADTs. http://okmij.org/ftp/tagless-final/#tc-GADT

Resources