Can a Haskell type constructor have non-type parameters? - haskell

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *

Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E

I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a

Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.

No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

Related

What does data Vector :: * -> Nat -> * where mean in Haskell?

I'm looking at https://wiki.haskell.org/GHC/Kinds and I found this:
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
I tried looking about what does the where does in Haskell but I could not find info about it together with data. I have an idea of what a kind is. A constructor that takes no parameters has kind * and a constructor that takes one parameter has kind * -> *, it's the 'type' of the constructor. I think it's defining the kind of Vector as being * -> Nat -> * which means something that can take anything, a natural number and return a Vector?
And more important: why would someone use this stuff for?
The data ... where ... syntax is GADT syntax. (The wiki you were looking at also has a page about GADTs, linked from the page you were reading)
GADT syntax is enabled with the GADTs language extension. It gives you an alternative to the standard data declaration syntax, where instead of listing the data constructors as if they were applied to the types of their fields (thus implicitly defining the overall type of the constructor), you instead write an explicit type signature for each constructor.
For example, the standard Maybe type is defined like this in traditional syntax:
-- Maybe type constructor takes an argument a;
-- all data constructors implicitly return Maybe a
data Maybe a
= Just a -- Just data constructor takes an argument **of type** a, not a itself
| Nothing -- Nothing data constructor takes no arguments
And like this in GADT syntax:
-- Maybe type constructor takes an argument a
data Maybe a where
Just :: a -> Maybe a -- Just takes an argument of type a to return a Maybe a
Nothing :: Maybe a -- Nothing is simply of type Maybe a
This syntax is more verbose. In complex cases it can arguably be clearer, since we explicitly write the type of the constructors in ordinary type expression syntax, rather than defining them in a weird special-purpose syntax where we pseudo-apply term-level data constructors to the types of their arguments.
But more importantly, GADT syntax1 opens up the door to new features that simply cannot be expressed in the original syntax. That is happening in your example.
Primarily the new features come from control over the return type of each constructor. If we wanted to try to define Vector using the traditional data syntax, we would have to do something like this:
data Vector a n
= VNil
| VCons a (Vector a n)
The n parameter is supposed to represent the vector's number of elements at the type level. The idea is so that we can do things like zip two vectors together with the compiler enforcing that the two vectors have the same length, rather than doing something like Data.List.zip where it simply gives up and silently discards any remaining elements from one list when the other runs out of elements.
But what we've written above can't do that. VNil always returns a value of type Vector a n, and it doesn't have any fields of type n (or a) so any VNil can be used with any n type parameter at all; n isn't going to say anything about the size of the vector! And similarly VCons has a field with a Vector a n in it, but is also going to end up constructing a value of type Vector a n, where the n parameter is the same, when we want it to indicate that the size is one larger than the tail-vector's size.
GADT syntax allows us to fix those problems:
{-# LANGUAGE GADTs #-}
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Now we can explicitly say that the VNil constructor is a vector of size Zero; it's not like Nothing :: Maybe a where it is always polymorphic in that type variable. (Our VNil is still polymorphic in the type variable a, of course, since having no elements means we shouldn't constrain what the element type will be). And VCons takes an a and a Vector a n to produce a Vector a (Succ n)2; specifically a vector that is "one item larger" than the vector inside it.
We missed a step though, which is actually defining Zero and Succ anywhere. The way to do that is simply:
data Nat = Zero | Succ Nat
This says: A natural number2 is either Zero, or it is the Succ of some other Nat. We can pattern match a Nat repeatedly and we'll eventually3 hit Zero; if we wanted to convert this to a "normal" number we'd do that and count the Succ constructors.
But that has only given us Zero and Succ as data constructors. We were wanting to use them in the types of our VNil and VCons. For that we'll need another extension: DataKinds. It just allows use to use any data declaration both to define a (type-level) type constructor and its associated (term-level) data constructors and to define a (kind-level) "kind constructor" and its associated type constructors (which in turn do not contain any values at the term-level; they're purely for type manipulation).
Note that the wiki page you were looking at appears to be describing a not-yet-implemented language extension called Kinds, which is clearly the same as what is now DataKinds. As such, that wiki page is extremely old, so you should probably just ignore it. If you were looking for reading material about kinds in general (rather than the DataKinds extension specifically), you probably need to continue your search.
But using DataKinds we can do this (actually compiles now):
{-# LANGUAGE GADTs, DataKinds #-}
data Nat = Zero | Succ Nat
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Here we defined Nat (at the type level) and Zero & Succ (at the term level) as normal. But then in the definition of Vector we use Zero and Succ at the type level. This is enough for GHC to infer that in Vector a n the n must be of kind Nat (because it is used with type constructors in that kind). But we can also use yet-another extension KindSignatures and be more explicit about the kind of the type constructor Vector, like so:
{-# LANGUAGE GADTs, DataKinds, KindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where -- this line is where the difference is
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Before we were just saying data Vector a n and trusting both the compiler and the reader to figure out from the usage below that a must be of kind * (it's used as the type of an actual value in the VCons constructor) and n is of kind Nat (because that position in the Vector type constructor is filled by type-level Zero and Succ n in the constructors). Now we can explicitly give the compiler and the reader more information up front with data Vector :: * -> Nat -> *; Vector takes a type parameter of kind *, another type parameter of kind Nat, and results in a type of kind * (which means it's a type that can actually have values at the term level).
It can be a little confusing with DataKinds whether a token like Zero refers to the term-level Zero (of type Nat, which is of kind *) or the type-level Zero (of kind Nat). Most of the time the compiler can perfectly tell which is which because it keeps very strong track of whether a given expression is a term expression or a type expression. But DataKinds gives you a way of being more explicit; if you prepend a type constructor with a single quote (like 'Zero) then it definitively means the data constructor promoted to type level. Some people consider it good style to always use this explicit marking.4
So that is pretty much everything in that is going on in your example.
As for what it's all for... at the most basic level it allows you do things like keeping track of the length of lists5, and then have the compiler enforce that multiple lists have the same size (or have sizes that have some specific other relationship, like being larger, or smaller, or twice as large, etc). People use that capability for a huge number of things that I can't possibly sum up in a single post (and not just with sizes of things; there are countless ways to use DataKinds and GADTs to reflect information at the type level so that the compiler can enforce things for you); it's a bit like asking "what are functions for". But here's a couple of example functions that do interesting things with the type-level length:
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
The zip that enforces the vectors have the same length, as I mentioned earlier. Guarantees that you can't have a bug because one list was accidentally shorter and you simply ignored elements of the other, instead the compiler will complain. Here's vzipWith at work in GHCI:
λ vzipWith replicate (1 `VCons` (2 `VCons` VNil)) ('a' `VCons` ('b' `VCons` VNil))
VCons "a" (VCons "bb" VNil)
it :: Vector [Char] ('Succ ('Succ 'Zero))
With a bit more work we can define how to add type level Nats (using even more extensions, which I won't explain in detail here). Then we can append vectors while keeping track of the combined length:
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
And at work:
λ vappend (1 `VCons` (2 `VCons` VNil)) (3 `VCons` (4 `VCons` (5 `VCons` VNil)))
VCons 1 (VCons 2 (VCons 3 (VCons 4 (VCons 5 VNil))))
it ::
Num a => Vector a ('Succ ('Succ ('Succ ('Succ ('Succ 'Zero)))))
If you want to play around with it in GHCI, put all of this in a file and load it:
{-# LANGUAGE DataKinds, GADTs, KindSignatures, StandaloneDeriving, TypeFamilies, TypeOperators, StandaloneKindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a 'Zero
VCons :: a -> Vector a n -> Vector a ('Succ n)
deriving instance Show a => Show (Vector a n)
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
Here I have also added yet another extension StandaloneDeriving so we can derive a Show instance for Vector, so you can play around in the interpreter and see what you get.
1 There is in fact an extension GADTSyntax that enables just the new syntax, but doesn't allow you to define any types that you couldn't have defined in the old syntax. Hardly anyone uses this extension as far as I know; GADTs are well-regarded and anyone who bothers to learn the new syntax only does so in the context of learning about GADTs, so if they like the new syntax and want to use it everywhere they're probably just enabling GADTs everywhere.
2 "Succ" is short for "successor". The standard way of defining natural numbers from first principles is to assume that there exists the first natural number (zero), and that for any natural number has a successor which is a different natural number (and is not the successor of any other number). It's a fancy way of saying you can start at zero and count up from there as far as you want to go. But this inductive structure of natural numbers happens to map very nicely into Haskell's type logic, making it very easy to count things at type level using numbers defined this way, which is why it's being used here.
3 Unless it's infinite, which means our attempt to "count the succs" will never terminate, which is another way to produce bottom/undefined in Haskell.
4 This "tick" syntax is necessary in some cases, for disambiguation. For example, with DataKinds we can have type level lists of types, like [Bool, Char, Maybe Integer], because we can promote lists to operate at type-level instead of term-level. [Bool, Char, Maybe Integer] is a type-level list of kind [*] (a list of things of kind *, i.e. a list of types). The problem arises when we consider things like [Bool]. This is definitely a type expression, but is it the list type constructor applied to Bool (meaning, the type of terms that are a list of boolean values), or is it a singleton list-of-types (i.e. the list data constructors : and [], promoted to type level), whose single value happens to be the type Bool? One is of kind *, and the other is of kind [*]. There's no way to tell the programmer's intent from looking at it, so the rules of DataKinds say we prioritise the pre-DataKinds interpretation; [Bool] is definitely the type of lists of boolean values. We can use the tick mark to explicitly choose the other interpretation: '[Bool] is a singleton list of types.
5 Lists whose type reflects the length of the list are traditionally called vectors, to distinguish them from ordinary lists whose type says nothing about the length. Perhaps this is not ideal, since there are several other things that are also called "vectors" to distinguish them from ordinary lists. But it's established terminology.

Weird Error When Defining a Type Synonym of a Type Synonym (GHC)

Background
I have written the following code in Haskell (GHC):
{-# LANGUAGE
NoImplicitPrelude,
TypeInType, PolyKinds, DataKinds,
ScopedTypeVariables,
TypeFamilies
#-}
import Data.Kind(Type)
data PolyType k (t :: k)
type Wrap (t :: k) = PolyType k t
type Unwrap pt = (GetType pt :: GetKind pt)
type family GetType (pt :: Type) :: k where
GetType (PolyType k t) = t
type family GetKind (pt :: Type) :: Type where
GetKind (PolyType k t) = k
The intention of this code is to allow me to wrap a type of an arbitrary kind into a type (namely PolyType) of a single kind (namely Type) and then reverse the process (i.e. unwrap it) later.
Problem
I would like to define a type synonym for Unwrap akin to the following:
type UnwrapSynonym pt = Unwrap pt
However, the above definition produces the following error (GHC 8.4.3):
* Invalid declaration for `UnwrapSynonym'; you must explicitly
declare which variables are dependent on which others.
Inferred variable kinds: pt :: *
* In the type synonym declaration for `UnwrapSynonym'
What does this error mean? Is there a way I can get around it in order to define UnwrapSynonym?
What I Have Been Doing
My approach to this problem thus far is to essentially manually inline Uwrap in any high-order type synonyms I wish to define, but that feels icky, and I am hoping there is a better way.
Unfortunately, I am not experienced enough with the inner workings of GHC even understand exactly what the problem is much less figure out how to fix it.
I believe I have a decent understanding of how the extensions I'm using (ex. TypeInType and PolyKinds) work, but it is apparently not deep enough to understand this error. Furthermore, I have not been able to find an resources that address a similar problem. This is partly because I don't know how to succinctly describe it, which also made it hard to come up with a good title for this question.
The error is quite obtuse, but what I think it's trying to say is that GHC has noticed that the kind of UnwrapSynonym is dependent, forall (pt :: Type) -> GetKind pt, and it wants you to explicitly annotate the dependency:
type UnwrapSynonym pt = (Unwrap pt :: GetKind pt)
The reason it's talking about telling it "which variables are dependent on which others" is because this error can also pop up in e.g. this situation:
data Nat = Z | S Nat
data Fin :: Nat -> Type where
FZ :: Fin (S n)
FS :: Fin n -> Fin (S n)
type family ToNat (n :: Nat) (x :: Fin n) :: Nat where
ToNat (S n) FZ = Z
ToNat (S n) (FS x) = S (ToNat n x)
type ToNatAgain n x = ToNat n x -- similar error
ToNatAgain should have kind forall (n :: Nat) -> Fin n -> Nat, but the type of the variable x depends on the type of the variable n. GHC wants that explicitly annotated, so it tells me to tell it which variables depend on which other variables, and it gives me what it inferred as their kinds to help do that.
type ToNatAgain (n :: Nat) (x :: Fin n) = ToNat n x
In your case, the dependency is between the return kind and the argument kind. The underlying reason is the same, but the error message was apparently not designed with your case in mind and doesn't fit. You should submit a bug report.
As an aside, do you really need separate Unwrap and GetType? Why not make GetType dependent?
type family GetType (pt :: Type) :: GetKind pt where
GetType (PolyType k t) = t

differences: GADT, data family, data family that is a GADT

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.
ADDIT: to clarify #kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then #HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)
As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.
The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a is a GADT with an inferrable a parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF won't typecheck above.
Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).
I think the difference becomes clear if we use PatternSynonyms-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D, in "forward" or expression contexts, says, "forall a, you can give me 2 as and I'll give you a D a". "Backwards", as a pattern, it says, "forall a, you can give me a D a and I'll give you 2 as".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a, you can give me a GD a, and I can give you a proof that a ~ Int, plus an Int.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...; I got GD :: forall a. ... GD a. This is also true for Haskell 98 constructors, and also data family constructors, though it's a bit more subtle.
If I have a GD a, and I don't know what a is, I can pass into GD anyway, even though I wrote GD :: Int -> GD Int, which seems to say I can only match it with GD Ints. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a, I can match a GD a with the GD constructor and get evidence for a ~ Int and a value of Int.
Ok, data familys. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As #K.A.Buhr says, the different data instances act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a, and I don't know what ns is, I cannot match on it. RectNil only takes Rect '[] as, RectCons only takes Rect (n : ns) as. You might ask: "why would I want a reduction in power?" #KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect's case, as these instances already fill up the entire [Nat] * Type space. The reason is actually newtype.
Here's a GADT RectG:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a and don't know what ns is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int, I would incur an overhead of one million RectGN constructors that all "preserve" the same "information". Rect [1000, 1000] Int is fine, though, as I do not have the ability to match and tell whether a Rect is RectNil or RectCons. This allows the constructor to be newtype, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect in O(sum ns) space instead of O(product ns) space (I think those are right). This is also why GADTs are closed and families are open. A GADT is just like a normal data type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v and I don't know what k is, I can't match it against MW8General or MW8BitSet, because those only want Map Word8s. This is the data family's influence. If I have a Map Word8 v and I don't know what v is, matching on the constructors can reveal to me whether it's known to be Bool or is something else.

What can type families do that multi param type classes and functional dependencies cannot

I have played around with TypeFamilies, FunctionalDependencies, and MultiParamTypeClasses. And it seems to me as though TypeFamilies doesn't add any concrete functionality over the other two. (But not vice versa). But I know type families are pretty well liked so I feel like I am missing something:
"open" relation between types, such as a conversion function, which does not seem possible with TypeFamilies. Done with MultiParamTypeClasses:
class Convert a b where
convert :: a -> b
instance Convert Foo Bar where
convert = foo2Bar
instance Convert Foo Baz where
convert = foo2Baz
instance Convert Bar Baz where
convert = bar2Baz
Surjective relation between types, such as a sort of type safe pseudo-duck typing mechanism, that would normally be done with a standard type family. Done with MultiParamTypeClasses and FunctionalDependencies:
class HasLength a b | a -> b where
getLength :: a -> b
instance HasLength [a] Int where
getLength = length
instance HasLength (Set a) Int where
getLength = S.size
instance HasLength Event DateDiff where
getLength = dateDiff (start event) (end event)
Bijective relation between types, such as for an unboxed container, which could be done through TypeFamilies with a data family, although then you have to declare a new data type for every contained type, such as with a newtype. Either that or with an injective type family, which I think is not available prior to GHC 8. Done with MultiParamTypeClasses and FunctionalDependencies:
class Unboxed a b | a -> b, b -> a where
toList :: a -> [b]
fromList :: [b] -> a
instance Unboxed FooVector Foo where
toList = fooVector2List
fromList = list2FooVector
instance Unboxed BarVector Bar where
toList = barVector2List
fromList = list2BarVector
And lastly a surjective relations between two types and a third type, such as python2 or java style division function, which can be done with TypeFamilies by also using MultiParamTypeClasses. Done with MultiParamTypeClasses and FunctionalDependencies:
class Divide a b c | a b -> c where
divide :: a -> b -> c
instance Divide Int Int Int where
divide = div
instance Divide Int Double Double where
divide = (/) . fromIntegral
instance Divide Double Int Double where
divide = (. fromIntegral) . (/)
instance Divide Double Double Double where
divide = (/)
One other thing I should also add is that it seems like FunctionalDependencies and MultiParamTypeClasses are also quite a bit more concise (for the examples above anyway) as you only have to write the type once, and you don't have to come up with a dummy type name which you then have to type for every instance like you do with TypeFamilies:
instance FooBar LongTypeName LongerTypeName where
FooBarResult LongTypeName LongerTypeName = LongestTypeName
fooBar = someFunction
vs:
instance FooBar LongTypeName LongerTypeName LongestTypeName where
fooBar = someFunction
So unless I am convinced otherwise it really seems like I should just not bother with TypeFamilies and use solely FunctionalDependencies and MultiParamTypeClasses. Because as far as I can tell it will make my code more concise, more consistent (one less extension to care about), and will also give me more flexibility such as with open type relationships or bijective relations (potentially the latter is solver by GHC 8).
Here's an example of where TypeFamilies really shines compared to MultiParamClasses with FunctionalDependencies. In fact, I challenge you to come up with an equivalent MultiParamClasses solution, even one that uses FlexibleInstances, OverlappingInstance, etc.
Consider the problem of type level substitution (I ran across a specific variant of this in Quipper in QData.hs). Essentially what you want to do is recursively substitute one type for another. For example, I want to be able to
substitute Int for Bool in Either [Int] String and get Either [Bool] String,
substitute [Int] for Bool in Either [Int] String and get Either Bool String,
substitute [Int] for [Bool] in Either [Int] String and get Either [Bool] String.
All in all, I want the usual notion of type level substitution. With a closed type family, I can do this for any types (albeit I need an extra line for each higher-kinded type constructor - I stopped at * -> * -> * -> * -> *).
{-# LANGUAGE TypeFamilies #-}
-- Subsitute type `x` for type `y` in type `a`
type family Substitute x y a where
Substitute x y x = y
Substitute x y (k a b c d) = k (Substitute x y a) (Substitute x y b) (Substitute x y c) (Substitute x y d)
Substitute x y (k a b c) = k (Substitute x y a) (Substitute x y b) (Substitute x y c)
Substitute x y (k a b) = k (Substitute x y a) (Substitute x y b)
Substitute x y (k a) = k (Substitute x y a)
Substitute x y a = a
And trying at ghci I get the desired output:
> :t undefined :: Substitute Int Bool (Either [Int] String)
undefined :: Either [Bool] [Char]
> :t undefined :: Substitute [Int] Bool (Either [Int] String)
undefined :: Either Bool [Char]
> :t undefined :: Substitute [Int] [Bool] (Either [Int] String)
undefined :: Either [Bool] [Char]
With that said, maybe you should be asking yourself why am I using MultiParamClasses and not TypeFamilies. Of the examples you gave above, all except Convert translate to type families (albeit you will need an extra line per instance for the type declaration).
Then again, for Convert, I am not convinced it is a good idea to define such a thing. The natural extension to Convert would be instances such as
instance (Convert a b, Convert b c) => Convert a c where
convert = convert . convert
instance Convert a a where
convert = id
which are as unresolvable for GHC as they are elegant to write...
To be clear, I am not saying there are no uses of MultiParamClasses, just that when possible you should be using TypeFamilies - they let you think about type-level functions instead of just relations.
This old HaskellWiki page does an OK job of comparing the two.
EDIT
Some more contrasting and history I stumbled upon from augustss blog
Type families grew out of the need to have type classes with
associated types. The latter is not strictly necessary since it can be
emulated with multi-parameter type classes, but it gives a much nicer
notation in many cases. The same is true for type families; they can
also be emulated by multi-parameter type classes. But MPTC gives a
very logic programming style of doing type computation; whereas type
families (which are just type functions that can pattern match on the
arguments) is like functional programming.
Using closed type families
adds some extra strength that cannot be achieved by type classes. To
get the same power from type classes we would need to add closed type
classes. Which would be quite useful; this is what instance chains
gives you.
Functional dependencies only affect the process of constraint solving, while type families introduced the notion of non-syntactic type equality, represented in GHC's intermediate form by coercions. This means type families interact better with GADTs. See this question for the canonical example of how functional dependencies fail here.

Haskell get type of algebraic parameter

I have a type
class IntegerAsType a where
value :: a -> Integer
data T5
instance IntegerAsType T5 where value _ = 5
newtype (IntegerAsType q) => Zq q = Zq Integer deriving (Eq)
newtype (Num a, IntegerAsType n) => PolyRing a n = PolyRing [a]
I'm trying to make a nice "show" for the PolyRing type. In particular, I want the "show" to print out the type 'a'. Is there a function that returns the type of an algebraic parameter (a 'show' for types)?
The other way I'm trying to do it is using pattern matching, but I'm running into problems with built-in types and the algebraic type.
I want a different result for each of Integer, Int and Zq q.
(toy example:)
test :: (Num a, IntegerAsType q) => a -> a
(Int x) = x+1
(Integer x) = x+2
(Zq x) = x+3
There are at least two different problems here.
1) Int and Integer are not data constructors for the 'Int' and 'Integer' types. Are there data constructors for these types/how do I pattern match with them?
2) Although not shown in my code, Zq IS an instance of Num. The problem I'm getting is:
Ambiguous constraint `IntegerAsType q'
At least one of the forall'd type variables mentioned by the constraint
must be reachable from the type after the '=>'
In the type signature for `test':
test :: (Num a, IntegerAsType q) => a -> a
I kind of see why it is complaining, but I don't know how to get around that.
Thanks
EDIT:
A better example of what I'm trying to do with the test function:
test :: (Num a) => a -> a
test (Integer x) = x+2
test (Int x) = x+1
test (Zq x) = x
Even if we ignore the fact that I can't construct Integers and Ints this way (still want to know how!) this 'test' doesn't compile because:
Could not deduce (a ~ Zq t0) from the context (Num a)
My next try at this function was with the type signature:
test :: (Num a, IntegerAsType q) => a -> a
which leads to the new error
Ambiguous constraint `IntegerAsType q'
At least one of the forall'd type variables mentioned by the constraint
must be reachable from the type after the '=>'
I hope that makes my question a little clearer....
I'm not sure what you're driving at with that test function, but you can do something like this if you like:
{-# LANGUAGE ScopedTypeVariables #-}
class NamedType a where
name :: a -> String
instance NamedType Int where
name _ = "Int"
instance NamedType Integer where
name _ = "Integer"
instance NamedType q => NamedType (Zq q) where
name _ = "Zq (" ++ name (undefined :: q) ++ ")"
I would not be doing my Stack Overflow duty if I did not follow up this answer with a warning: what you are asking for is very, very strange. You are probably doing something in a very unidiomatic way, and will be fighting the language the whole way. I strongly recommend that your next question be a much broader design question, so that we can help guide you to a more idiomatic solution.
Edit
There is another half to your question, namely, how to write a test function that "pattern matches" on the input to check whether it's an Int, an Integer, a Zq type, etc. You provide this suggestive code snippet:
test :: (Num a) => a -> a
test (Integer x) = x+2
test (Int x) = x+1
test (Zq x) = x
There are a couple of things to clear up here.
Haskell has three levels of objects: the value level, the type level, and the kind level. Some examples of things at the value level include "Hello, world!", 42, the function \a -> a, or fix (\xs -> 0:1:zipWith (+) xs (tail xs)). Some examples of things at the type level include Bool, Int, Maybe, Maybe Int, and Monad m => m (). Some examples of things at the kind level include * and (* -> *) -> *.
The levels are in order; value level objects are classified by type level objects, and type level objects are classified by kind level objects. We write the classification relationship using ::, so for example, 32 :: Int or "Hello, world!" :: [Char]. (The kind level isn't too interesting for this discussion, but * classifies types, and arrow kinds classify type constructors. For example, Int :: * and [Int] :: *, but [] :: * -> *.)
Now, one of the most basic properties of Haskell is that each level is completely isolated. You will never see a string like "Hello, world!" in a type; similarly, value-level objects don't pass around or operate on types. Moreover, there are separate namespaces for values and types. Take the example of Maybe:
data Maybe a = Nothing | Just a
This declaration creates a new name Maybe :: * -> * at the type level, and two new names Nothing :: Maybe a and Just :: a -> Maybe a at the value level. One common pattern is to use the same name for a type constructor and for its value constructor, if there's only one; for example, you might see
newtype Wrapped a = Wrapped a
which declares a new name Wrapped :: * -> * at the type level, and simultaneously declares a distinct name Wrapped :: a -> Wrapped a at the value level. Some particularly common (and confusing examples) include (), which is both a value-level object (of type ()) and a type-level object (of kind *), and [], which is both a value-level object (of type [a]) and a type-level object (of kind * -> *). Note that the fact that the value-level and type-level objects happen to be spelled the same in your source is just a coincidence! If you wanted to confuse your readers, you could perfectly well write
newtype Huey a = Louie a
newtype Louie a = Dewey a
newtype Dewey a = Huey a
where none of these three declarations are related to each other at all!
Now, we can finally tackle what goes wrong with test above: Integer and Int are not value constructors, so they can't be used in patterns. Remember -- the value level and type level are isolated, so you can't put type names in value definitions! By now, you might wish you had written test' instead:
test' :: Num a => a -> a
test' (x :: Integer) = x + 2
test' (x :: Int) = x + 1
test' (Zq x :: Zq a) = x
...but alas, it doesn't quite work like that. Value-level things aren't allowed to depend on type-level things. What you can do is to write separate functions at each of the Int, Integer, and Zq a types:
testInteger :: Integer -> Integer
testInteger x = x + 2
testInt :: Int -> Int
testInt x = x + 1
testZq :: Num a => Zq a -> Zq a
testZq (Zq x) = Zq x
Then we can call the appropriate one of these functions when we want to do a test. Since we're in a statically-typed language, exactly one of these functions is going to be applicable to any particular variable.
Now, it's a bit onerous to remember to call the right function, so Haskell offers a slight convenience: you can let the compiler choose one of these functions for you at compile time. This mechanism is the big idea behind classes. It looks like this:
class Testable a where test :: a -> a
instance Testable Integer where test = testInteger
instance Testable Int where test = testInt
instance Num a => Testable (Zq a) where test = testZq
Now, it looks like there's a single function called test which can handle any of Int, Integer, or numeric Zq's -- but in fact there are three functions, and the compiler is transparently choosing one for you. And that's an important insight. The type of test:
test :: Testable a => a -> a
...looks at first blush like it is a function that takes a value that could be any Testable type. But in fact, it's a function that can be specialized to any Testable type -- and then only takes values of that type! This difference explains yet another reason the original test function didn't work. You can't have multiple patterns with variables at different types, because the function only ever works on a single type at a time.
The ideas behind the classes NamedType and Testable above can be generalized a bit; if you do, you get the Typeable class suggested by hammar above.
I think now I've rambled more than enough, and likely confused more things than I've clarified, but leave me a comment saying which parts were unclear, and I'll do my best.
Is there a function that returns the type of an algebraic parameter (a 'show' for types)?
I think Data.Typeable may be what you're looking for.
Prelude> :m + Data.Typeable
Prelude Data.Typeable> typeOf (1 :: Int)
Int
Prelude Data.Typeable> typeOf (1 :: Integer)
Integer
Note that this will not work on any type, just those which have a Typeable instance.
Using the extension DeriveDataTypeable, you can have the compiler automatically derive these for your own types:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Typeable
data Foo = Bar
deriving Typeable
*Main> typeOf Bar
Main.Foo
I didn't quite get what you're trying to do in the second half of your question, but hopefully this should be of some help.

Resources