Parameterizing types by integers in Haskell - haskell

I am trying to make some Haskell types which are parametrized not by types but by elements of a type, specifically, integers. For instance, a (linear-algebra) vector in R^2 and a vector in R^3 are different typed objects. Specifically, I am writing a K-D tree in Haskell and I want to parametrize my data-structure by a positive integer so a 3-D tree and 4-D tree have different type.
I've tried to parametrize my tree by tuples, but it didn't seem to be going anywhere (and it seems somewhat unlikely this can be pushed through, especially since it doesn't seem that triples or anything bigger are even functors (and I don't know any way to say like, instance HomogeneousTuple a => Functor a). I want to do something like this:
data (TupleOfDoubles a) => KDTree a b = ... ---so in a 3DTree a is (Double,Double,Double)
that would be nice, or something like this would be equally good
data KDTree Int a = ... -- The Int is k, so KDTree has kind Int -> * -> *
Does anybody know if either of these effects are workable or reasonable?
Thanks
-Joseph

There's a GHC extension being worked on called TypeNats, which would be exactly what you want. However the milestone for that is currently set to be 7.4.1 according to the ticket, so that'll be a bit of a wait still.
Until that extension is available, the only thing you can do is encode the dimension using types. For example something along these lines might work:
{-# LANGUAGE ScopedTypeVariables #-}
class MyTypeNat a where
toInteger :: a -> Integer
data Zero
data Succ a
instance MyTypeNat Zero where
toInteger _ = 0
instance MyTypeNat a => MyTypeNat (Succ a) where
toInteger _ = toInteger (undefined :: a) + 1
data KDTree a b = -- ...
dimension :: forall a b. MyTypeNat a => KDTree a b -> Integer
dimension = toInteger (undefined :: a)
The downside of an approach like this is of course that you have to write something like KDTree (Succ (Succ (Succ Zero))) Foo instead of KDTree 3 Foo.

sepp2k's answer shows the basic approach to doing this. In fact, a lot of the work has already been done.
Type-level number packages
natural-number and type-level-natural-number
numtype
type-level
Stuff using type-level encodings of natural numbers (examples)
Vec
Dimensional
llvm
ForSyDe
Unfortunately something like this:
data KDTree Int a = ...
isn't really possible. The final type (constructed by KDTree) depends on the value of the Int, which requires a feature called dependent types. Languages like Agda and Epigram support this, but not Haskell.

Related

How to create a generic Complex type in haskell?

I want to create a Complex type to represent complex numbers.
Following works:
Prelude> data Complex = Complex Int Int
Prelude> :t Complex
Complex :: Int -> Int -> Complex
How can I change this to accept any Num type, instead of just Int.
I tried following:
Prelude> data Complex a = Num a => Complex a a
but got this:
* Data constructor `Complex' has existential type variables, a context, or a specialised result type
Complex :: forall a. Num a => a -> a -> Complex a
(Use ExistentialQuantification or GADTs to allow this)
* In the definition of data constructor `Complex'
In the data type declaration for `Complex'
I'm not really sure what to make of this error. Any help is appreciated.
Traditional data in Haskell is just that: data. It doesn't need to know anything about the properties of its fields, it just needs to be able to store them. Hence there's no real need to constrain the fields at that point; just make it
data Complex a = Complex !a !a
(! because strict fields are better for performance).
Of course when you then implement the Num instance, you will need a constraint:
instance (Num a) => Num (Complex a) where
fromInteger = (`Complex`0) . fromInteger
Complex r i + Complex ρ ι = Complex (r+ρ) (i+ι)
...
...in fact, you need the much stronger constraint RealFloat a to implement abs, at least that's how the standard version does it. (Which means, Complex Int is actually not usable, not with the standard Num hierarchy; you need e.g. Complex Double.)
That said, it is also possible to bake the constraint in to the data type itself. The ExistentialTypes syntax you tried is highly limiting though and not suitable for this; what you want instead is the GADT
data Complex a where
Complex :: Num a => a -> a -> Complex a
With that in place, you could then implement e.g. addition without mentioning any constraint in the signature
cplxAdd :: Complex a -> Complex a -> Complex a
cplxAdd (Complex r i) (Complex ρ ι) = Complex (r+ρ) (i+ι)
You would now need to fulfill Num whenever you try to construct a Complex value though. That means, you'd still need an explicit constraint in the Num instance.
Also, this version is potentially much slower, because the Num dictionary actually needs to be stored in the runtime representation.
Type constructors cannot be constrained in pure Haskell, only functions can. So it is supposed that you declare
data Complex a = Complex a a
and then constrain functions, like
conjugate :: (Num a) => Complex a -> Complex a
conjugate (Complex x y) = Complex x (-y)
In fact, the type and constraint for conjugate can be derived by the compiler, so you can just define the implementation:
conjugate (Complex x y) = Complex x (-y)
However, if you really wish to constrain the type constructor Complex, you can turn on some extensions that enable it, namely ExistentialQuantification or GADTs, as the compiler suggests. To do this, add this line to the very beginning of your file:
{-# LANGUAGE ExistentialQuantification #-}
or
{-# LANGUAGE GADTs #-}
Those are called pragmas.
While you could, as the compiler message instructs, use ExistentialQuantification, you could also define the type like this:
data Complex a = Complex a a deriving (Show, Eq)
It's a completely unconstrained type, so perhaps another name would be more appropriate... This type seems to often be called Pair...
When you write functions, however, you can constrain the values contained in the type:
myFunction :: Num a => Complex a -> a
myFunction (Complex x y) = x + y

Do newtypes incur no cost even when you cannot pattern-match on them?

Context
Most Haskell tutorials I know (e.g. LYAH) introduce newtypes as a cost-free idiom that allows enforcing more type safety. For instance, this code will type-check:
type Speed = Double
type Length = Double
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
but this won't:
newtype Speed = Speed { getSpeed :: Double }
newtype Length = Length { getLength :: Double }
-- wrong!
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
and this will:
-- right
computeTime :: Speed -> Length -> Double
computeTime (Speed v) (Length l) = l / v
In this particular example, the compiler knows that Speed is just a Double, so the pattern-matching is moot and will not generate any executable code.
Question
Are newtypes still cost-free when they appear as arguments of parametric types? For instance, consider a list of newtypes:
computeTimes :: [Speed] -> Length -> [Double]
computeTimes vs l = map (\v -> getSpeed v / l) vs
I could also pattern-match on speed in the lambda:
computeTimes' :: [Speed] -> Length -> [Double]
computeTimes' vs l = map (\(Speed v) -> v / l) vs
In either case, for some reason, I feel that real work is getting done! I start to feel even more uncomfortable when the newtype is buried within a deep tree of nested parametric datatypes, e.g. Map Speed [Set Speed]; in this situation, it may be difficult or impossible to pattern-match on the newtype, and one would have to resort to accessors like getSpeed.
TL;DR
Will the use of a newtype never ever incur a cost, even when the newtype appears as a (possibly deeply-buried) argument of another parametric type?
On their own, newtypes are cost-free. Applying their constructor, or pattern matching on them has zero cost.
When used as parameter for other types e.g. [T] the representation of [T] is precisely the same as the one for [T'] if T is a newtype for T'. So, there's no loss in performance.
However, there are two main caveats I can see.
newtypes and instances
First, newtype is frequently used to introduce new instances of type classes. Clearly, when these are user-defined, there's no guarantee that they have the same cost as the original instances. E.g., when using
newtype Op a = Op a
instance Ord a => Ord (Op a) where
compare (Op x) (Op y) = compare y x
comparing two Op Int will cost slightly more than comparing Int, since the arguments need to be swapped. (I am neglecting optimizations here, which might make this cost free when they trigger.)
newtypes used as type arguments
The second point is more subtle. Consider the following two implementations of the identity [Int] -> [Int]
id1, id2 :: [Int] -> [Int]
id1 xs = xs
id2 xs = map (\x->x) xs
The first one has constant cost. The second has a linear cost (assuming no optimization triggers). A smart programmer should prefer the first implementation, which is also simpler to write.
Suppose now we introduce newtypes on the argument type, only:
id1, id2 :: [Op Int] -> [Int]
id1 xs = xs -- error!
id2 xs = map (\(Op x)->x) xs
We can no longer use the constant cost implementation because of a type error. The linear cost implementation still works, and is the only option.
Now, this is quite bad. The input representation for [Op Int] is exactly, bit by bit, the same for [Int]. Yet, the type system forbids us to perform the identity in an efficient way!
To overcome this issue, safe coercions where introduced in Haskell.
id3 :: [Op Int] -> [Int]
id3 = coerce
The magic coerce function, under certain hypotheses, removes or inserts newtypes as needed to make type match, even inside other types, as for [Op Int] above. Further, it is a zero-cost function.
Note that coerce works only under certain conditions (the compiler checks for them). One of these is that the newtype constructor must be visible: if a module does not export Op :: a -> Op a you can not coerce Op Int to Int or vice versa. Indeed, if a module exports the type but not the constructor, it would be wrong to make the constructor accessible anyway through coerce. This makes the "smart constructors" idiom still safe: modules can still enforce complex invariants through opaque types.
It doesn't matter how deeply buried a newtype is in a stack of (fully) parametric types. At runtime, the values v :: Speed and w :: Double are completely indistinguishable – the wrapper is erased by the compiler, so even v is really just a pointer to a single 64-bit floating-point number in memory. Whether that pointer is stored in a list or tree or whatever doesn't make a difference either. getSpeed is a no-op and will not appear at runtime in any way at all.
So what do I mean by “fully parametric”? The thing is, newtypes can obviously make a difference at compile time, via the type system. In particular, they can guide instance resolution, so a newtype that invokes a different class method may certainly have worse (or, just as easily, better!) performance than the wrapped type. For example,
class Integral n => Fibonacci n where
fib :: n -> Integer
instance Fibonacci Int where
fib = (fibs !!)
where fibs = [ if i<2 then 1
else fib (i-2) + fib (i-1)
| i<-[0::Int ..] ]
this implementation is pretty slow, because it uses a lazy list (and performs lookups in it over and over again) for memoisation. On the other hand,
import qualified Data.Vector as Arr
-- | A number between 0 and 753
newtype SmallInt = SmallInt { getSmallInt :: Int }
instance Fibonacci SmallInt where
fib = (fibs Arr.!) . getSmallInt
where fibs = Arr.generate 754 $
\i -> if i<2 then 1
else fib (SmallInt $ i-2) + fib (SmallInt $ i-1)
This fib is much faster, because thanks to the input being limited to a small range, it is feasible to strictly allocate all of the results and store them in a fast O (1) lookup array, not needing the spine-laziness.
This of course applies again regardless of what structure you store the numbers in. But the different performance only comes about because different method instantiations are called – at runtime this means simply, completely different functions.
Now, a fully parametric type constructor must be able to store values of any type. In particular, it cannot impose any class restrictions on the contained data, and hence also not call any class methods. Therefore this kind of performance difference can not happen if you're just dealing with generic [a] lists or Map Int a maps. It can, however, occur when you're dealing with GADTs. In this case, even the actual memory layout might be completely differet, for instance with
{-# LANGUAGE GADTs #-}
import qualified Data.Vector as Arr
import qualified Data.Vector.Unboxed as UArr
data Array a where
BoxedArray :: Arr.Vector a -> Array a
UnboxArray :: UArr.Unbox a => UArr.Vector a -> Array a
might allow you to store Double values more efficiently than Speed values, because the former can be stored in a cache-optimised unboxed array. This is only possible because the UnboxArray constructor is not fully parametric.

Practical applications of Rank 2 polymorphism?

I'm covering polymorphism and I'm trying to see the practical uses of such a feature.
My basic understanding of Rank 2 is:
type MyType = ∀ a. a -> a
subFunction :: a -> a
subFunction el = el
mainFunction :: MyType -> Int
mainFunction func = func 3
I understand that this is allowing the user to use a polymorphic function (subFunction) inside mainFunction and strictly specify it's output (Int). This seems very similar to GADT's:
data Example a where
ExampleInt :: Int -> Example Int
ExampleBool :: Bool -> Example Bool
1) Given the above, is my understanding of Rank 2 polymorphism correct?
2) What are the general situations where Rank 2 polymorphism can be used, as opposed to GADT's, for example?
If you pass a polymorphic function as and argument to a Rank2-polymorphic function, you're essentially passing not just one function but a whole family of functions – for all possible types that fulfill the constraints.
Typically, those forall quantifiers come with a class constraint. For example, I might wish to do number arithmetic with two different types simultaneously (for comparing precision or whatever).
data FloatCompare = FloatCompare {
singlePrecision :: Float
, doublePrecision :: Double
}
Now I might want to modify those numbers through some maths operation. Something like
modifyFloat :: (Num -> Num) -> FloatCompare -> FloatCompare
But Num is not a type, only a type class. I could of course pass a function that would modify any particular number type, but I couldn't use that to modify both a Float and a Double value, at least not without some ugly (and possibly lossy) converting back and forth.
Solution: Rank-2 polymorphism!
modifyFloat :: (∀ n . Num n => n -> n) -> FloatCompare -> FloatCompare
mofidyFloat f (FloatCompare single double)
= FloatCompare (f single) (f double)
The best single example of how this is useful in practice are probably lenses. A lens is a “smart accessor function” to a field in some larger data structure. It allows you to access fields, update them, gather results... while at the same time composing in a very simple way. How it works: Rank2-polymorphism; every lens is polymorphic, with the different instantiations corresponding to the “getter” / “setter” aspects, respectively.
The go-to example of an application of rank-2 types is runST as Benjamin Hodgson mentioned in the comments. This is a rather good example and there are a variety of examples using the same trick. For example, branding to maintain abstract data type invariants across multiple types, avoiding confusion of differentials in ad, a region-based version of ST.
But I'd actually like to talk about how Haskell programmers are implicitly using rank-2 types all the time. Every type class whose methods have universally quantified types desugars to a dictionary with a field with a rank-2 type. In practice, this is virtually always a higher-kinded type class* like Functor or Monad. I'll use a simplified version of Alternative as an example. The class declaration is:
class Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
The dictionary representing this class would be:
data AlternativeDict f = AlternativeDict {
empty :: forall a. f a,
(<|>) :: forall a. f a -> f a -> f a }
Sometimes such an encoding is nice as it allows one to use different "instances" for the same type, perhaps only locally. For example, Maybe has two obvious instances of Alternative depending on whether Just a <|> Just b is Just a or Just b. Languages without type classes, such as Scala, do indeed use this encoding.
To connect to leftaroundabout's reference to lenses, you can view the hierarchy there as a hierarchy of type classes and the lens combinators as simply tools for explicitly building the relevant type class dictionaries. Of course, the reason it isn't actually a hierarchy of type classes is that we usually will have multiple "instances" for the same type. E.g. _head and _head . _tail are both "instances" of Traversal' s a.
* A higher-kinded type class doesn't necessarily lead to this, and it can happen for a type class of kind *. For example:
-- Higher-kinded but doesn't require universal quantification.
class Sum c where
sum :: c Int -> Int
-- Not higher-kinded but does require universal quantification.
class Length l where
length :: [a] -> l
If you are using modules in Haskell, you are already using Rank-2 types. Theoretically speaking, modules are records with rank-2 type properties.
For example, the Foo module below in Haskell ...
module Foo(id) where
id :: forall a. a -> a
id x = x
import qualified Foo
main = do
putStrLn (Foo.id "hello")
return ()
... can actually be thought as a record as follows:
type FooType = FooType {
id :: forall a. a -> a
}
Foo :: FooType
Foo = Foo {
id = \x -> x
}
P/S (unrelated this question): from a language design perspective, if you are going to support module system, then you might as well support higher-rank types (i.e. allow arbitrary quantification of type variables on any level) to reduce duplication of efforts (i.e. type checking a module should be almost the same as type checking a record with higher rank types).

Use of 'unsafeCoerce'

In Haskell, there is a function called unsafeCoerce, that turns anything into any other type of thing. What exactly is this used for? Like, why we would you want to transform things into each other in such an "unsafe" way?
Provide an example of a way that unsafeCoerce is actually used. A link to Hackage would help. Example code in someones question would not.
unsafeCoerce lets you convince the type system of whatever property you like. It's thus only "safe" exactly when you can be completely certain that the property you're declaring is true. So, for instance:
unsafeCoerce True :: Int
is a violation and can lead to wonky, bad runtime behavior.
unsafeCoerce (3 :: Int) :: Int
is (obviously) fine and will not lead to runtime misbehavior.
So what's a non-trivial use of unsafeCoerce? Let's say we've got an typeclass-bound existential type
module MyClass ( SomethingMyClass (..), intSomething ) where
class MyClass x where {}
instance MyClass Int where {}
data SomethingMyClass = forall a. MyClass a => SomethingMyClass a
Let's also say, as noted here, that the typeclass MyClass is not exported and thus nobody else can ever create instances of it. Indeed, Int is the only thing that instantiates it and the only thing that ever will.
Now when we pattern match to destruct a value of SomethingMyClass we'll be able to pull a "something" out from inside
foo :: SomethingMyClass -> ...
foo (SomethingMyClass a) =
-- here we have a value `a` with type `exists a . MyClass a => a`
--
-- this is totally useless since `MyClass` doesn't even have any
-- methods for us to use!
...
Now, at this point, as the comment suggests, the value we've pulled out has no type information—it's been "forgotten" by the existential context. It could be absolutely anything which instantiates MyClass.
Of course, in this very particular situation we know that the only thing implementing MyClass is Int. So our value a must actually have type Int. We could never convince the typechecker that this is true, but due to an outside proof we know that it is.
Therefore, we can (very carefully)
intSomething :: SomethingMyClass -> Int
intSomething (SomethingMyClass a) = unsafeCoerce a -- shudder!
Now, hopefully I've suggested that this is a terrible, dangerous idea, but it also may give a taste of what kind of information we can take advantage of in order to know things that the typechecker cannot.
In non-pathological situations, this is rare. Even rarer is a situation where using something we know and the typechecker doesn't isn't itself pathological. In the above example, we must be completely certain that nobody ever extends our MyClass module to instantiate more types to MyClass otherwise our use of unsafeCoerce becomes instantly unsafe.
> instance MyClass Bool where {}
> intSomething (SomethingMyClass True)
6917529027658597398
Looks like our compiler internals are leaking!
A more common example where this sort of behavior might be valuable is when using newtype wrappers. It's a fairly common idea that we might wrap a type in a newtype wrapper in order to specialize its instance definitions.
For example, Int does not have a Monoid definition because there are two natural monoids over Ints: sums and products. Instead, we use newtype wrappers to be more explicit.
newtype Sum a = Sum { getSum :: a }
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend (Sum a) (Sum b) = Sum (a+b)
Now, normally the compiler is pretty smart and recognizes that it can eliminate all of those Sum constructors in order to produce more efficient code. Sadly, there are times when it cannot, especially in highly polymorphic situations.
If you (a) know that some type a is actually just a newtype-wrapped b and (b) know that the compiler is incapable of deducing this itself, then you might want to do
unsafeCoerce (x :: a) :: b
for a slight efficiency gain. This, for instance, occurs frequently in lens and is expressed in the Data.Profunctor.Unsafe module of profunctors, a dependency of lens.
But let me again suggest that you really need to know what's going on before using unsafeCoerce like this is anything but highly unsafe.
One final thing to compare is the "typesafe cast" available in Data.Typeable. This function looks a bit like unsafeCoerce, but with much more ceremony.
unsafeCoerce :: a -> b
cast :: (Typeable a, Typeable b) => a -> Maybe b
Which, you might think of as being implemented using unsafeCoerce and a function typeOf :: Typeable a => a -> TypeRep where TypeRep are unforgeable, runtime tokens which reflect the type of a value. Then we have
cast :: (Typeable a, Typeable b) => a -> Maybe b
cast a = if (typeOf a == typeOf b) then Just b else Nothing
where b = unsafeCoerce a
Thus, cast is able to ensure that the types of a and b really are the same at runtime, and it can decide to return Nothing if they are not. As an example:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ExistentialQuantification #-}
data A = A deriving (Show, Typeable)
data B = B deriving (Show, Typeable)
data Forget = forall a . Typeable a => Forget a
getAnA :: Forget -> Maybe A
getAnA (Forget something) = cast something
which we can run as follows
> getAnA (Forget A)
Just A
> getAnA (Forget B)
Nothing
So if we compare this usage of cast with unsafeCoerce we see that it can achieve some of the same functionality. In particular, it allows us to rediscover information that may have been forgotten by ExistentialQuantification. However, cast manually checks the types at runtime to ensure that they are truly the same and thus cannot be used unsafely. To do this, it demands that both the source and target types allow for runtime reflection of their types via the Typeable class.
The only time I ever felt compelled to use unsafeCoerce was on finite natural numbers.
{-# LANGUAGE DataKinds, GADTs, TypeFamilies, StandaloneDeriving #-}
data Nat = Z | S Nat deriving (Eq, Show)
data Fin (n :: Nat) :: * where
FZ :: Fin (S n)
FS :: Fin n -> Fin (S n)
deriving instance Show (Fin n)
Fin n is a singly linked data structure that is statically ensured to be smaller than the n type level natural number by which it is parametrized.
-- OK, 1 < 2
validFin :: Fin (S (S Z))
validFin = FS FZ
-- type error, 2 < 2 is false
invalidFin :: Fin (S (S Z))
invalidFin = FS (FS FZ)
Fin can be used to safely index into various data structures. It's pretty standard in dependently typed languages, though not in Haskell.
Sometimes we want to convert a value of Fin n to Fin m where m is greater than n.
relaxFin :: Fin n -> Fin (S n)
relaxFin FZ = FZ
relaxFin (FS n) = FS (relaxFin n)
relaxFin is a no-op by definition, but traversing the value is still required for the types to check out. So we might just use unsafeCoerce instead of relaxFin. More pronounced gains in speed can result from coercing larger data structures that contain Fin-s (for example, you could have lambda terms with Fin-s as bound variables).
This is an admittedly exotic example, but I find it interesting in the sense that it's pretty safe: I can't really think of ways for external libraries or safe user code to mess this up. I might be wrong though and I'd be eager to hear about potential safety issues.
There is no use of unsafeCoerce I can really recommend, but I can see that in some cases such a thing might be useful.
The first use that springs to mind is the implementation of the Typeable-related routines. In particular cast :: (Typeable a, Typeable b) => a -> Maybe b achieves a type-safe behaviour, so it is safe to use, yet it has to play dirty tricks in its implementation.
Maybe unsafeCoerce can find some use when importing FFI subroutines to force types to match. After all, FFI already allows to import impure C functions as pure ones, so it is intrinsecally usafe. Note that "unsafe" does not mean impossible to use, but just "putting the burden of proof on the programmer".
Finally, pretend that sortBy did not exist. Consider then this example:
-- Like Int, but using the opposite ordering
newtype Rev = Rev { unRev :: Int }
instance Ord Rev where compare (Rev x) (Rev y) = compare y x
sortDescending :: [Int] -> [Int]
sortDescending = map unRev . sort . map Rev
The code above works, but feels silly IMHO. We perform two maps using functions such as Rev,unRev which we know to be no-ops at runtime. So we just scan the list twice for no reason, but that of convincing the compiler to use the right Ord instance.
The performance impact of these maps should be small since we also sort the list. Yet it is tempting to rewrite map Rev as unsafeCoerce :: [Int]->[Rev] and save some time.
Note that having a coercing function
castNewtype :: IsNewtype t1 t2 => f t2 -> f t1
where the constraint means that t1 is a newtype for t2 would help, but it would be quite dangerous. Consider
castNewtype :: Data.Set Int -> Data.Set Rev
The above would cause the data structure invariant to break, since we are changing the ordering underneath! Since Data.Set is implemented as a binary search tree, it would cause quite a large damage.

Real world use of GADT

How do I make use of Generalized Algebraic Data Type?
The example given in the haskell wikibook is too short to give me an insight of the real possibilities of GADT.
GADTs are weak approximations of inductive families from dependently typed languages—so let's begin there instead.
Inductive families are the core datatype introduction method in a dependently typed language. For instance, in Agda you define the natural numbers like this
data Nat : Set where
zero : Nat
succ : Nat -> Nat
which isn't very fancy, it's essentially just the same thing as the Haskell definition
data Nat = Zero | Succ Nat
and indeed in GADT syntax the Haskell form is even more similar
{-# LANGUAGE GADTs #-}
data Nat where
Zero :: Nat
Succ :: Nat -> Nat
So, at first blush you might think GADTs are just neat extra syntax. That's just the very tip of the iceberg though.
Agda has capacity to represent all kinds of types unfamiliar and strange to a Haskell programmer. A simple one is the type of finite sets. This type is written like Fin 3 and represents the set of numbers {0, 1, 2}. Likewise, Fin 5 represents the set of numbers {0,1,2,3,4}.
This should be quite bizarre at this point. First, we're referring to a type which has a regular number as a "type" parameter. Second, it's not clear what it means for Fin n to represent the set {0,1...n}. In real Agda we'd do something more powerful, but it suffices to say that we can define a contains function
contains : Nat -> Fin n -> Bool
contains i f = ?
Now this is strange again because the "natural" definition of contains would be something like i < n, but n is a value that only exists in the type Fin n and we shouldn't be able to cross that divide so easily. While it turns out that the definition is not nearly so straightforward, this is exactly the power that inductive families have in dependently typed languages—they introduce values that depend on their types and types that depend on their values.
We can examine what it is about Fin that gives it that property by looking at its definition.
data Fin : Nat -> Set where
zerof : (n : Nat) -> Fin (succ n)
succf : (n : Nat) -> (i : Fin n) -> Fin (succ n)
this takes a little work to understand, so as an example lets try constructing a value of the type Fin 2. There are a few ways to do this (in fact, we'll find that there are exactly 2)
zerof 1 : Fin 2
zerof 2 : Fin 3 -- nope!
zerof 0 : Fin 1 -- nope!
succf 1 (zerof 0) : Fin 2
This lets us see that there are two inhabitants and also demonstrates a little bit of how type computation happens. In particular, the (n : Nat) bit in the type of zerof reflects the actual value n up into the type allowing us to form Fin (n+1) for any n : Nat. After that we use repeated applications of succf to increment our Fin values up into the correct type family index (natural number that indexes the Fin).
What provides these abilities? In all honesty there are many differences in between a dependently typed inductive family and a regular Haskell ADT, but we can focus on the exact one that is most relevant to understanding GADTs.
In GADTs and inductive families you get an opportunity to specify the exact type of your constructors. This might be boring
data Nat where
Zero :: Nat
Succ :: Nat -> Nat
Or, if we have a more flexible, indexed type we can choose different, more interesting return types
data Typed t where
TyInt :: Int -> Typed Int
TyChar :: Char -> Typed Char
TyUnit :: Typed ()
TyProd :: Typed a -> Typed b -> Typed (a, b)
...
In particular, we're abusing the ability to modify the return type based on the particular value constructor used. This allows us to reflect some value information up into the type and produce more finely specified (fibered) typed.
So what can we do with them? Well, with a little bit of elbow grease we can produce Fin in Haskell. Succinctly it requires that we define a notion of naturals in types
data Z
data S a = S a
> undefined :: S (S (S Z)) -- 3
... then a GADT to reflect values up into those types...
data Nat where
Zero :: Nat Z
Succ :: Nat n -> Nat (S n)
... then we can use these to build Fin much like we did in Agda...
data Fin n where
ZeroF :: Nat n -> Fin (S n)
SuccF :: Nat n -> Fin n -> Fin (S n)
And finally we can construct exactly two values of Fin (S (S Z))
*Fin> :t ZeroF (Succ Zero)
ZeroF (Succ Zero) :: Fin (S (S Z))
*Fin> :t SuccF (Succ Zero) (ZeroF Zero)
SuccF (Succ Zero) (ZeroF Zero) :: Fin (S (S Z))
But notice that we've lost a lot of convenience over the inductive families. For instance, we can't use regular numeric literals in our types (though that's technically just a trick in Agda anyway), we need to create a separate "type nat" and "value nat" and use the GADT to link them together, and we'd also find, in time, that while type level mathematics is painful in Agda it can be done. In Haskell it's incredibly painful and often cannot.
For instance, it's possible to define a weaken notion in Agda's Fin type
weaken : (n <= m) -> Fin n -> Fin m
weaken = ...
where we provide a very interesting first value, a proof that n <= m which allows us to embed "a value less than n" into the set of "values less than m". We can do the same in Haskell, technically, but it requires heavy abuse of type class prolog.
So, GADTs are a resemblance of inductive families in dependently typed languages that are weaker and clumsier. Why do we want them in Haskell in the first place?
Basically because not all type invariants require the full power of inductive families to express and GADTs pick a particular compromise between expressiveness, implementability in Haskell, and type inference.
Some examples of useful GADTs expressions are Red-Black Trees which cannot have the Red-Black property invalidated or simply-typed lambda calculus embedded as HOAS piggy-backing off the Haskell type system.
In practice, you also often see GADTs use for their implicit existential context. For instance, the type
data Foo where
Bar :: a -> Foo
implicitly hides the a type variable using existential quantification
> :t Bar 4 :: Foo
in a way that is sometimes convenient. If you look carefully the HOAS example from Wikipedia uses this for the a type parameter in the App constructor. To express that statement without GADTs would be a mess of existential contexts, but the GADT syntax makes it natural.
GADTs can give you stronger type enforced guarantees than regular ADTs. For example, you can force a binary tree to be balanced on the type system level, like in this implementation of 2-3 trees:
{-# LANGUAGE GADTs #-}
data Zero
data Succ s = Succ s
data Node s a where
Leaf2 :: a -> Node Zero a
Leaf3 :: a -> a -> Node Zero a
Node2 :: Node s a -> a -> Node s a -> Node (Succ s) a
Node3 :: Node s a -> a -> Node s a -> a -> Node s a -> Node (Succ s) a
Each node has a type-encoded depth where all its leaves reside. A tree is then
either an empty tree, a singleton value, or a node of unspecified depth, again
using GADTs.
data BTree a where
Root0 :: BTree a
Root1 :: a -> BTree a
RootN :: Node s a -> BTree a
The type system guarantees you that only balanced nodes can be constructed.
This means that when implementing operations like insert on such trees, your
code type-checks only if its result is always a balanced tree.
I have found the "Prompt" monad (from the "MonadPrompt" package) a very useful tool in several places (along with the equivalent "Program" monad from the "operational" package. Combined with GADTs (which is how it was intended to be used), it allows you to make embedded languages very cheaply and very flexibly. There was a pretty good article in the Monad Reader issue 15 called "Adventures in Three Monads" that had a good introduction to the Prompt monad along with some realistic GADTs.
I like the example in the GHC manual. It's a quick demo of a core GADT idea: that you can embed the type system of a language you're manipulating into Haskell's type system. This lets your Haskell functions assume, and forces them to preserve, that the syntax trees correspond to well-typed programs.
When we define Term, it doesn't matter what types we choose. We could write
data Term a where
...
IsZero :: Term Char -> Term Char
or
...
IsZero :: Term a -> Term b
and the definition of Term would still go through.
It's only once we want to compute on Term, such as in defining eval, that the types matter. We need to have
...
IsZero :: Term Int -> Term Bool
because we need our recursive call to eval to return an Int, and we want to in turn return a Bool.
This is a short answer, but consult the Haskell Wikibook. It walks you though a GADT for a well-typed expression tree, which is a fairly canonical example: http://en.wikibooks.org/wiki/Haskell/GADT
GADTs are also used for implementing type equality: http://hackage.haskell.org/package/type-equality. I can't find the right paper to reference for this offhand -- this technique has made its way well into folklore by now. It is used quite well, however, in Oleg's typed tagless stuff. See, e.g. the section on typed compilation into GADTs. http://okmij.org/ftp/tagless-final/#tc-GADT

Resources