'type family' vs 'data family', in brief? - haskell

I'm confused about how to choose between data family and type family. The wiki page on TypeFamilies goes into a lot of detail. Occasionally it informally refers to Haskell's data family as a "type family" in prose, but of course there is also type family in Haskell.
There is a simple example that shows where two versions of code are shown, differing only on whether a data family or a type family is being declared:
-- BAD: f is too ambiguous, due to non-injectivity
-- type family F a
-- OK
data family F a
f :: F a -> F a
f = undefined
g :: F Int -> F Int
g x = f x
type and data here have the same meaning, but the type family version fails to type-check, while the data family version is fine, because data family "creates new types and are therefore injective" (says the wiki page).
My takeaway from all of this is "try data family for simple cases, and, if it isn't powerful enough, then try type family". Which is fine, but I'd like to understand it better. Is there a Venn diagram or a decision tree I can follow to distinguish when to use which?

(Boosting useful information from comments into an answer.)
Standalone vs In-Class declaration
Two syntactically different ways to declare a type family and/or data family, that are semantically equivalent:
standalone:
type family Foo
data family Bar
or as part of a typeclass:
class C where
type Foo
data Bar
both declare a type family, but inside a typeclass the family part is implied by the class context, so GHC/Haskell abbreviates the declaration.
"New type" vs "Type Synonym" / "Type Alias"
data family F creates a new type, similar to how data F = ... creates a new type.
type family F does not create a new type, similar to how type F = Bar Baz doesn't create a new type (it just creates an alias/synonym to an existing type).
Example of non-injectivity of type family
An example (slightly modified) from Data.MonoTraversable.Element:
import Data.ByteString as S
import Data.ByteString.Lazy as L
-- Declare a family of type synonyms, called `Element`
-- `Element` has kind `* -> *`; it takes one parameter, which we call `container`
type family Element container
-- ByteString is a container for Word8, so...
-- The Element of a `S.ByteString` is a `Word8`
type instance Element S.ByteString = Word8
-- and the Element of a `L.ByteString` is also `Word8`
type instance Element L.ByteString = Word8
In a type family, the right-side of equations Word8 names an existing type; the things are the left-side creates new synonyms: Element S.ByteString and Element L.ByteString
Having a synonym means we can interchange Element Data.ByteString with Word8:
-- `w` is a Word8....
>let w = 0::Word8
-- ... and also an `Element L.ByteString`
>:t w :: Element L.ByteString
w :: Element L.ByteString :: Word8
-- ... and also an `Element S.ByteString`
>:t w :: Element S.ByteString
w :: Element S.ByteString :: Word8
-- but not an `Int`
>:t w :: Int
Couldn't match expected type `Int' with actual type `Word8'
These type synonyms are "non-injective" ("one-way"), and therefore non-invertible.
-- As before, `Word8` matches `Element L.ByteString` ...
>(0::Word8)::(Element L.ByteString)
-- .. but GHC can't infer which `a` is the `Element` of (`L.ByteString` or `S.ByteString` ?):
>(w)::(Element a)
Couldn't match expected type `Element a'
with actual type `Element a0'
NB: `Element' is a type function, and may not be injective
The type variable `a0' is ambiguous
Worse, GHC can't even solve non-ambiguous cases!:
type instance Element Int = Bool
> True::(Element a)
> NB: `Element' is a type function, and may not be injective
Note the use of "may not be"! I think GHC is being conservative, and refusing to check whether the Element truly is injective. (Perhaps because a programmer could add another type instance later, after importing a pre-compiled module, adding ambiguity.
Example of injectivity of data family
In contrast: In a data family, each right-hand side contains a unique constructor , so the definitions are injective ("reversible") equations.
-- Declare a list-like data family
data family XList a
-- Declare a list-like instance for Char
data instance XList Char = XCons Char (XList Char) | XNil
-- Declare a number-like instance for ()
data instance XList () = XListUnit Int
-- ERROR: "Multiple declarations of `XListUnit'"
data instance XList () = XListUnit Bool
-- (Note: GHCI accepts this; the new declaration just replaces the previous one.)
With data family, seeing the constructor name on the right (XCons, or XListUnit)
is enough to let the type-inferencer know we must be working with XList () not an XList Char. Since constructor names are unique, these defintions are injective/reversible.
If type "just" declares a synonym, why is it semantically useful?
Usually, type synonyms are just abbreviations, but type family synonyms have added power: They can make a simple type (kind *) become a synonym of a "type with kind * -> * applied to an argument":
type instance F A = B
makes B match F a. This is used, for example, in Data.MonoTraversable to make a simple type Word8 match functions of the type Element a -> a (Element is defined above).
For example, (a bit silly), suppose we have a version of const that only works with "related" types:
> class Const a where constE :: (Element a) -> a -> (Element a)
> instance Const S.ByteString where constE = const
> constE (0::Word8) undefined
ERROR: Couldn't match expected type `Word8' with actual type `Element a0'
-- By providing a match `a = S.ByteString`, `Word8` matches `(Element S.ByteString)`
> constE (0::Word8) (undefined::S.ByteString)
0
-- impossible, since `Char` cannot match `Element a` under our current definitions.
> constE 'x' undefined

I don't think any decision tree or Venn diagram will exist because the applications for type and data families are pretty wide.
Generally you've already highlighted the key design differences and I would agree with your takeaway to first see if you can get away with data family.
For me the key point is that each instance of a data family creates a new type, which does substantially limit the power because you can't do what is often the most natural thing and make an existing type be the instance.
For example the GMapKey example on the Haskell wiki page on "indexed types" is a reasonably natural fit for data families:
class GMapKey k where
data GMap k :: * -> *
empty :: GMap k v
lookup :: k -> GMap k v -> Maybe v
insert :: k -> v -> GMap k v -> GMap k v
The key type of the map k is the argument to the data family and the actual map type is the result of the data family (GMap k) . As a user of a GMapKey instance you're probably quite happy for the GMap k type to be abstract to you and just manipulate it via the generic map operations in the type class.
In contrast the Collects example on the same wiki page is sort of the opposite:
class Collects ce where
type Elem ce
empty :: ce
insert :: Elem ce -> ce -> ce
member :: Elem ce -> ce -> Bool
toList :: ce -> [Elem ce]
The argument type is the collection and the result type is the element of the collection. In general a user is going to want to operate on those elements directly using the normal operations on that type. For example the collection might be IntSet and the element might be Int. Having the Int be wrapped up in some other type would be quite inconvenient.
Note - these two examples are with type classes and therefore don't need the family keyword as declaring a type inside a type class implies it must be a family. Exactly the same considerations apply as for standalone families though, it's just a question of how the abstraction is organised.

Related

data family instances: newtype, data

With Haskell 98 decls, the whole datatype must be either newtype or data. But data families can have a mix of newtype instance, data instance. Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
| MkFoo2 Bool a
| newtype MkFoo3 a
I know I can't write the following, but why/what goes wrong?:
data family Bar a
newtype instance Bar (Maybe Int) = MkBar1 (Maybe Int)
-- MkBar1 :: (Maybe Int) -> Bar (Maybe Int), see below
newtype instance Bar [Char] = MkBar2 [Char]
data instance Bar [Bool] where
MkBar3 :: Int -> Bool -> Bar [Bool]
-- can't be a newtype because of the existential Int
-- we're OK up to here mixing newtypes and data/GADT
data instance Bar [a] where
MkBar4 :: Num a => a -> Bar [a]
-- can't be a newtype because of the Num a =>
I can't write that because the instance head Bar [a] overlaps the two heads for MkBar2, MkBar3. Then I could fix that by moving those two constructor decls inside the where ... for Bar [a]. But then MkBar2 becomes a GADT (because it's result type is not Bar [a]), so can't be a newtype.
Then is being a newtype a property of the result type, rather than of the constructor? But consider the type inferred for newtype instance MkBar1 above. I can't write a top-level newtype with the same type
newtype Baz a where
MkBaz :: (Maybe Int) -> Baz (Maybe Int)
-- Error: A newtype constructor must have a return type of form T a1 ... an
Huh? MkBar1 is a newtype constructor whose type is not of that form.
If possible, please explain without diving into talk of roles: I've tried to understand them; it just makes my head hurt. Talk in terms of constructing and pattern-matching on those constructors.
You can think of data families as stronger versions of (injective and open) type families. As you can see from this question I asked a while back, data families can almost be faked with injective type families.
Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
| MkFoo2 Bool a
| newtype MkFoo3 a
No. Being a newtype is definitively a property of the type constructor, not of the data constructor. A data family, quite like a type family, has two levels of type constructors:
the type constructor data/type family FamTyCon a
the type constructors for the instances data/type instance FamInstTyCon a = ...
Different instances of the same data/type family constructor are still fundamentally different types - they happen to be unified under one type constructor (and that type constructor is going to be injective and generative - see the linked question for more on that).
By the type family analogy, you wouldn't expect to be able to pattern match on something of type TyFam a, right? Because you could have type instance TyFam Int = Bool and type instance TyFam () = Int, and you wouldn't be able to statically know if you were looking at a Bool or an Int!
These are equivalent:
newtype NT a b = MkNT (Int, b)
data family DF a b
newtype instance DF a b = MkDF (Int, b)
-- inferred MkNT :: (Int, b) -> NT a b
-- MkDF :: (Int, b) -> DF a b
Furthermore you can't declare any other instance/constructor within family DF, because their instance heads would overlap DF a b.
The error message in the q re standalone newtype Baz is misleading
-- Error: A newtype constructor must have a return type of form T a1 ... an
(and presumably dates from before there were data families). A newtype constructor must have a return type exactly as the instance head (modulo alpha renaming if it's using GADT syntax). For a standalone newtype, "instance head" means the newtype head.
For non-newtype data instances, the various constructors might have return types strictly more specific than the instance head (that makes them GADTs).
The type declared in a data/newtype instance head (in the literature variously called a 'type scheme', 'monotype' -- because no constraints, 'Function-free type' -- in the 2008 paper 'Type Checking with Open Type Functions') is the principal type that all the constructors' return types must unify with. (There might not be any constructors with exactly that return type.)
If your instance be a newtype, the rules are much tighter: there can be only one constructor, and its return type must be exactly the principal type. So to answer the original q
Does that mean that being newtype is a property of the data constructor rather than the datatype?
... but ... Then is being a newtype a property of the result type, rather than of the constructor?
No, not of the constructor; yes more like of the result: being a newtype is the property of a data family instance/its principal type. It's just that for standalone newtypes, there can in effect be only one instance, and its principal type is the most general type for that type constructor. Derivatively, we can say the newtype's principal type/return type uniquely identifies a data constructor. That's crucial for type safety, because values of that type share their representation with the type inside the newtype's data constructor, i.e. with no wrapper -- as #AlexisKing's comment points out. Then pattern matching doesn't need to go looking for the constructor: the match is irrefutable/the constructor is virtual.
It strikes me that in the interests of more precise typing/better documentation, you might want to declare a newtype as a data family with only one instance, whose head is more specific than what you must put for a standalone newtype.

differences: GADT, data family, data family that is a GADT

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.
ADDIT: to clarify #kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then #HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)
As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.
The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a is a GADT with an inferrable a parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF won't typecheck above.
Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).
I think the difference becomes clear if we use PatternSynonyms-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D, in "forward" or expression contexts, says, "forall a, you can give me 2 as and I'll give you a D a". "Backwards", as a pattern, it says, "forall a, you can give me a D a and I'll give you 2 as".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a, you can give me a GD a, and I can give you a proof that a ~ Int, plus an Int.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...; I got GD :: forall a. ... GD a. This is also true for Haskell 98 constructors, and also data family constructors, though it's a bit more subtle.
If I have a GD a, and I don't know what a is, I can pass into GD anyway, even though I wrote GD :: Int -> GD Int, which seems to say I can only match it with GD Ints. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a, I can match a GD a with the GD constructor and get evidence for a ~ Int and a value of Int.
Ok, data familys. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As #K.A.Buhr says, the different data instances act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a, and I don't know what ns is, I cannot match on it. RectNil only takes Rect '[] as, RectCons only takes Rect (n : ns) as. You might ask: "why would I want a reduction in power?" #KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect's case, as these instances already fill up the entire [Nat] * Type space. The reason is actually newtype.
Here's a GADT RectG:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a and don't know what ns is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int, I would incur an overhead of one million RectGN constructors that all "preserve" the same "information". Rect [1000, 1000] Int is fine, though, as I do not have the ability to match and tell whether a Rect is RectNil or RectCons. This allows the constructor to be newtype, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect in O(sum ns) space instead of O(product ns) space (I think those are right). This is also why GADTs are closed and families are open. A GADT is just like a normal data type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v and I don't know what k is, I can't match it against MW8General or MW8BitSet, because those only want Map Word8s. This is the data family's influence. If I have a Map Word8 v and I don't know what v is, matching on the constructors can reveal to me whether it's known to be Bool or is something else.

Difference between type family and partial newtype? (and partial data?)

I've had to interface two libraries where metadata is represented as a type parameter in one and as a record field in the other. I wrote an adaptor using a GADT. Here's a distilled version:
{-# LANGUAGE GADTs #-}
newtype TFId a = MkId a
data TFDup a = MkDup !a !a
data GADT tf where
ConstructorId :: GADT TFId
ConstructorDup :: GADT TFDup
main = do
f ConstructorId
f ConstructorDup
f :: GADT tf -> IO ()
f = _
This works. (May not be perfect; comments welcome, but that's not the question.)
It took me some time to get to this working state. My initial intuition was to use a type family for TFId, figuring: “GADT has kind (* -> *) -> *; in ConstructorDup TFDup has kind * -> *; so for ConstructorId I can use the following * -> * type family:”
{-# LANGUAGE TypeFamilies #-}
type family TFId a where TFId a = a
The type constructor does have the same kind * -> *, but GHC apparently won't have it in the same place:
error: …
The type family ‘TFId’ should have 1 argument, but has been given none
In the definition of data constructor ‘ConstructorId’
In the data type declaration for ‘GADT’
Well, if it says so…
I'm no sure I understand why it would make such a difference. No using type family stems without applying them? What's going on? Any other (better) way to do?
Injectivity.
type family F :: * -> *
type instance F Int = Bool
type instance F Char = Bool
here F Int ~ F Char. However,
data G (a :: *) = ...
will never cause G Int ~ G Char. These are guaranteed to be distinct types.
In universal quantifications like
foo :: forall f a. f a -> a
f is allowed to be G (injective) but not allowed to be F (not injective).
This is to make inference work. foo (... :: G Int) can be inferred to have type Int. foo (... :: F Int) is equivalent to foo (... :: Bool) which may have type Int, or type Char -- it's an ambiguous type.
Also consider foo True. We can't expect GHC to choose f ~ F, a ~ Int (or Char) for us. This would involve looking at all type families and see if Bool can be produced by any on them -- essentially, we would need to invert all the type families. Even if this were feasible, it would generate a huge amount of possible solutions, so it would be ambiguous.

Can a Haskell type constructor have non-type parameters?

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *
Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E
I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a
Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.
No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

How do I understand the set of valid inputs to a Haskell type constructor?

Warning: very beginner question.
I'm currently mired in the section on algebraic types in the Haskell book I'm reading, and I've come across the following example:
data Id a =
MkId a deriving (Eq, Show)
idInt :: Id Integer
idInt = MkId 10
idIdentity :: Id (a -> a)
idIdentity = MkId $ \x -> x
OK, hold on. I don't fully understand the idIdentity example. The explanation in the book is that:
This is a little odd. The type Id takes an argument and the data
constructor MkId takes an argument of the corresponding polymorphic
type. So, in order to have a value of type Id Integer, we need to
apply a -> Id a to an Integer value. This binds the a type variable to
Integer and applies away the (->) in the type constructor, giving us
Id Integer. We can also construct a MkId value that is an identity
function by binding the a to a polymorphic function in both the type
and the term level.
But wait. Why only fully polymorphic functions? My previous understanding was that a can be any type. But apparently constrained polymorphic type doesn't work: (Num a) => a -> a won't work here, and the GHC error suggests that only completely polymorphic types or "qualified types" (not sure what those are) are valid:
f :: (Num a) => a -> a
f = undefined
idConsPoly :: Id (Num a) => a -> a
idConsPoly = MkId undefined
Illegal polymorphic or qualified type: Num a => a -> a
Perhaps you intended to use ImpredicativeTypes
In the type signature for ‘idIdentity’:
idIdentity :: Id (Num a => a -> a)
EDIT: I'm a bonehead. I wrote the type signature below incorrectly, as pointed out by #chepner in his answer below. This also resolves my confusion in the next sentence below...
In retrospect, this behavior makes sense because I haven't defined a Num instance for Id. But then what explains me being able to apply a type like Integer in idInt :: Id Integer?
So in generality, I guess my question is: What specifically is the set of valid inputs to type constructors? Only fully polymorphic types? What are "qualified types" then? Etc...
You just have the type constructor in the wrong place. The following is fine:
idConsPoly :: Num a => Id (a -> a)
idConsPoly = MkId undefined
The type constructor Id here has kind * -> *, which means you can give it any value that has kind * (which includes all "ordinary" types) and returns a new value of kind *. In general, you are more concerned with arrow-kinded functions(?), of which type constructors are just one example.
TypeProd is a ternary type constructor whose first two arguments have kind * -> *:
-- Based on :*: from Control.Compose
newtype TypeProd f g a = Prod { unProd :: (f a, g a) }
Either Int is an expression whose value has kind * -> * but is not a type constructor, being the partial application of the type constructor Either to the nullary type constructor Int.
Also contributing to your confusion is that you've misinterpreted the error message from GHC. It means "Num a => a -> a is a polymorphic or qualified type, and therefore illegal (in this context)".
The last sentence that you quoted from the book is not very well worded, and maybe contributed to that misunderstanding. It's important to realize that in Id (a -> a) the argument a -> a is not a polymorphic type, but just an ordinary type that happens to mention a type variable. The thing which is polymorphic is idIdentity, which can have the type Id (a -> a) for any type a.
In standard Haskell polymorphism and qualification can only appear at the outermost level of the type in a type signature.
The type signature is almost correct
idConsPoly :: (Num a) => Id (a -> a)
Should be right, though i have no ghc on my phone to test this.
Also i think your question is quite broad, thus i deliberately answer only the concrete problem here.

Resources