data family instances: newtype, data - haskell

With Haskell 98 decls, the whole datatype must be either newtype or data. But data families can have a mix of newtype instance, data instance. Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
| MkFoo2 Bool a
| newtype MkFoo3 a
I know I can't write the following, but why/what goes wrong?:
data family Bar a
newtype instance Bar (Maybe Int) = MkBar1 (Maybe Int)
-- MkBar1 :: (Maybe Int) -> Bar (Maybe Int), see below
newtype instance Bar [Char] = MkBar2 [Char]
data instance Bar [Bool] where
MkBar3 :: Int -> Bool -> Bar [Bool]
-- can't be a newtype because of the existential Int
-- we're OK up to here mixing newtypes and data/GADT
data instance Bar [a] where
MkBar4 :: Num a => a -> Bar [a]
-- can't be a newtype because of the Num a =>
I can't write that because the instance head Bar [a] overlaps the two heads for MkBar2, MkBar3. Then I could fix that by moving those two constructor decls inside the where ... for Bar [a]. But then MkBar2 becomes a GADT (because it's result type is not Bar [a]), so can't be a newtype.
Then is being a newtype a property of the result type, rather than of the constructor? But consider the type inferred for newtype instance MkBar1 above. I can't write a top-level newtype with the same type
newtype Baz a where
MkBaz :: (Maybe Int) -> Baz (Maybe Int)
-- Error: A newtype constructor must have a return type of form T a1 ... an
Huh? MkBar1 is a newtype constructor whose type is not of that form.
If possible, please explain without diving into talk of roles: I've tried to understand them; it just makes my head hurt. Talk in terms of constructing and pattern-matching on those constructors.

You can think of data families as stronger versions of (injective and open) type families. As you can see from this question I asked a while back, data families can almost be faked with injective type families.
Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
| MkFoo2 Bool a
| newtype MkFoo3 a
No. Being a newtype is definitively a property of the type constructor, not of the data constructor. A data family, quite like a type family, has two levels of type constructors:
the type constructor data/type family FamTyCon a
the type constructors for the instances data/type instance FamInstTyCon a = ...
Different instances of the same data/type family constructor are still fundamentally different types - they happen to be unified under one type constructor (and that type constructor is going to be injective and generative - see the linked question for more on that).
By the type family analogy, you wouldn't expect to be able to pattern match on something of type TyFam a, right? Because you could have type instance TyFam Int = Bool and type instance TyFam () = Int, and you wouldn't be able to statically know if you were looking at a Bool or an Int!

These are equivalent:
newtype NT a b = MkNT (Int, b)
data family DF a b
newtype instance DF a b = MkDF (Int, b)
-- inferred MkNT :: (Int, b) -> NT a b
-- MkDF :: (Int, b) -> DF a b
Furthermore you can't declare any other instance/constructor within family DF, because their instance heads would overlap DF a b.
The error message in the q re standalone newtype Baz is misleading
-- Error: A newtype constructor must have a return type of form T a1 ... an
(and presumably dates from before there were data families). A newtype constructor must have a return type exactly as the instance head (modulo alpha renaming if it's using GADT syntax). For a standalone newtype, "instance head" means the newtype head.
For non-newtype data instances, the various constructors might have return types strictly more specific than the instance head (that makes them GADTs).
The type declared in a data/newtype instance head (in the literature variously called a 'type scheme', 'monotype' -- because no constraints, 'Function-free type' -- in the 2008 paper 'Type Checking with Open Type Functions') is the principal type that all the constructors' return types must unify with. (There might not be any constructors with exactly that return type.)
If your instance be a newtype, the rules are much tighter: there can be only one constructor, and its return type must be exactly the principal type. So to answer the original q
Does that mean that being newtype is a property of the data constructor rather than the datatype?
... but ... Then is being a newtype a property of the result type, rather than of the constructor?
No, not of the constructor; yes more like of the result: being a newtype is the property of a data family instance/its principal type. It's just that for standalone newtypes, there can in effect be only one instance, and its principal type is the most general type for that type constructor. Derivatively, we can say the newtype's principal type/return type uniquely identifies a data constructor. That's crucial for type safety, because values of that type share their representation with the type inside the newtype's data constructor, i.e. with no wrapper -- as #AlexisKing's comment points out. Then pattern matching doesn't need to go looking for the constructor: the match is irrefutable/the constructor is virtual.
It strikes me that in the interests of more precise typing/better documentation, you might want to declare a newtype as a data family with only one instance, whose head is more specific than what you must put for a standalone newtype.

Related

Clarification of Terms around Haskell Type system

Type system in haskell seem to be very Important and I wanted to clarify some terms revolving around haskell type system.
Some type classes
Functor
Applicative
Monad
After using :info I found that Functor is a type class, Applicative is a type class with => (deriving?) Functor and Monad deriving Applicative type class.
I've read that Maybe is a Monad, does that mean Maybe is also Applicative and Functor?
-> operator
When i define a type
data Maybe = Just a | Nothing
and check :t Just I get Just :: a -> Maybe a. How to read this -> operator?
It confuses me with the function where a -> b means it evaluates a to b (sort of returns a maybe) – I tend to think lhs to rhs association but it turns when defining types?
The term type is used in ambiguous ways, Type, Type Class, Type Constructor, Concrete Type etc... I would like to know what they mean to be exact
Indeed the word “type” is used in somewhat ambiguous ways.
The perhaps most practical way to look at it is that a type is just a set of values. For example, Bool is the finite set containing the values True and False.Mathematically, there are subtle differences between the concepts of set and type, but they aren't really important for a programmer to worry about. But you should in general consider the sets to be infinite, for example Integer contains arbitrarily big numbers.
The most obvious way to define a type is with a data declaration, which in the simplest case just lists all the values:
data Colour = Red | Green | Blue
There we have a type which, as a set, contains three values.
Concrete type is basically what we say to make it clear that we mean the above: a particular type that corresponds to a set of values. Bool is a concrete type, that can easily be understood as a data definition, but also String, Maybe Integer and Double -> IO String are concrete types, though they don't correspond to any single data declaration.
What a concrete type can't have is type variables†, nor can it be an incompletely applied type constructor. For example, Maybe is not a concrete type.
So what is a type constructor? It's the type-level analogue to value constructors. What we mean mathematically by “constructor” in Haskell is an injective function, i.e. a function f where if you're given f(x) you can clearly identify what was x. Furthermore, any different constructors are assumed to have disjoint ranges, which means you can also identify f.‡
Just is an example of a value constructor, but it complicates the discussion that it also has a type parameter. Let's consider a simplified version:
data MaybeInt = JustI Int | NothingI
Now we have
JustI :: Int -> MaybeInt
That's how JustI is a function. Like any function of the same signature, it can be applied to argument values of the right type, like, you can write JustI 5.What it means for this function to be injective is that I can define a variable, say,
quoxy :: MaybeInt
quoxy = JustI 9328
and then I can pattern match with the JustI constructor:
> case quoxy of { JustI n -> print n }
9328
This would not be possible with a general function of the same signature:
foo :: Int -> MaybeInt
foo i = JustI $ negate i
> case quoxy of { foo n -> print n }
<interactive>:5:17: error: Parse error in pattern: foo
Note that constructors can be nullary, in which case the injective property is meaningless because there is no contained data / arguments of the injective function. Nothing and True are examples of nullary constructors.
Type constructors are the same idea as value constructors: type-level functions that can be pattern-matched. Any type-name defined with data is a type constructor, for example Bool, Colour and Maybe are all type constructors. Bool and Colour are nullary, but Maybe is a unary type constructor: it takes a type argument and only the result is then a concrete type.
So unlike value-level functions, type-level functions are kind of by default type constructors. There are also type-level functions that aren't constructors, but they require -XTypeFamilies.
A type class may be understood as a set of types, in the same vein as a type can be seen as a set of values. This is not quite accurate, it's closer to true to say a class is a set of type constructors but again it's not as useful to ponder the mathematical details – better to look at examples.
There are two main differences between type-as-set-of-values and class-as-set-of-types:
How you define the “elements”: when writing a data declaration, you need to immediately describe what values are allowed. By contrast, a class is defined “empty”, and then the instances are defined later on, possibly in a different module.
How the elements are used. A data type basically enumerates all the values so they can be identified again. Classes meanwhile aren't generally concerned with identifying types, rather they specify properties that the element-types fulfill. These properties come in the form of methods of a class. For example, the instances of the Num class are types that have the property that you can add elements together.
You could say, Haskell is statically typed on the value level (fixed sets of values in each type), but duck-typed on the type level (classes just require that somebody somewhere implements the necessary methods).
A simplified version of the Num example:
class Num a where
(+) :: a -> a -> a
instance Num Int where
0 + x = x
x + y = ...
If the + operator weren't already defined in the prelude, you would now be able to use it with Int numbers. Then later on, perhaps in a different module, you could also make it usable with new, custom number types:
data MyNumberType = BinDigits [Bool]
instance Num MyNumberType where
BinDigits [] + BinDigits l = BinDigits l
BinDigits (False:ds) + BinDigits (False:es)
= BinDigits (False : ...)
Unlike Num, the Functor...Monad type classes are not classes of types, but of 1-ary type constructors. I.e. every functor is a type constructor taking one argument to make it a concrete type. For instance, recall that Maybe is a 1-ary type constructor.
class Functor f where
fmap :: (a->b) -> f a -> f b
instance Functor Maybe where
fmap f (Just a) = Just (f a)
fmap _ Nothing = Nothing
As you have concluded yourself, Applicative is a subclass of Functor. D being a subclass of C means basically that D is a subset of the set of type constructors in C. Therefore, yes, if Maybe is an instance of Monad it also is an instance of Functor.
†That's not quite true: if you consider the _universal quantor_ explicitly as part of the type, then a concrete type can contain variables. This is a bit of an advanced subject though.
‡This is not guaranteed to be true if the -XPatternSynonyms extension is used.

Data families vs Injective type families

Now that we have injective type families, is there any remaining use case for using data families over type families?
Looking at past StackOverflow questions about data families, there is this question from a couple years ago discussing the difference between type families and data families, and this answer about use cases of data families. Both say that the injectivity of data families is their greatest strength.
Looking at the docs on data families, I see reason not to rewrite all uses of data families using injective type families.
For example, say I have a data family (I've merged some examples from the docs to try to squeeze in all the features of data families)
data family G a b
data instance G Int Bool = G11 Int | G12 Bool deriving (Eq)
newtype instance G () a = G21 a
data instance G [a] b where
G31 :: c -> G [Int] b
G32 :: G [a] Bool
I might as well rewrite it as
type family G a b = g | g -> a b
type instance G Int Bool = G_Int_Bool
type instance G () a = G_Unit_a a
type instance G [a] b = G_lal_b a b
data G_Int_Bool = G11 Int | G12 Bool deriving (Eq)
newtype G_Unit_a a = G21 a
data G_lal_b a b where
G31 :: c -> G_lal_b [Int] b
G32 :: G_lal_b [a] Bool
It goes without saying that associated instances for data families correspond to associated instances with type families in the same way. Then is the only remaining difference that we have less things in the type-namespace?
As a followup, is there any benefit to having less things in the type-namespace? All I can think of is that this will become debugging hell for someone playing with this on ghci - the types of the constructors all seem to indicate that the constructors are all under one GADT...
type family T a = r | r -> a
data family D a
An injective type family T satisfies the injectivity axiom
if T a ~ T b then a ~ b
But a data family satisfies the much stronger generativity axiom
if D a ~ g b then D ~ g and a ~ b
(If you like: Because the instances of D define new types that are different from any existing types.)
In fact D itself is a legitimate type in the type system, unlike a type family like T, which can only ever appear in a fully saturated application like T a. This means
D can be the argument to another type constructor, like MaybeT D. (MaybeT T is illegal.)
You can define instances for D, like instance Functor D. (You can't define instances for a type family Functor T, and it would be unusable anyway because instance selection for, e.g., map :: Functor f => (a -> b) -> f a -> f b relies on the fact that from the type f a you can determine both f and a; for this to work f cannot be allowed to vary over type families, even injective ones.)
You're missing one other detail - data families create new types. Type families can only refer to other types. In particular, every instance of a data family declares new constructors. And it's nicely generic. You can create a data instance with newtype instance if you want newtype semantics. Your instance can be a record. It can have multiple constructors. It can even be a GADT if you want.
It's exactly the difference between the type and data/newtype keywords. Injective type families don't give you new types, rendering them useless in the case where you need that.
I understand where you're coming from. I had this same issue with the difference initially. Then I finally ran into a use case where they're useful, even without a type class getting involved.
I wanted to write an api for dealing with mutable cells in a few different contexts, without using classes. I knew I wanted to do it with a free monad with interpreters in IO, ST, and maybe some horrible hacks with unsafeCoerce to even go so far as shoehorning it into State. This wasn't for any practical purpose, of course - I was just exploring API designs.
So I had something like this:
data MutableEnv (s :: k) a ...
newRef :: a -> MutableEnv s (Ref s a)
readRef :: Ref s a -> MutableEnv s a
writeRef :: Ref s a -> a -> MutableEnv s ()
The definition of MutableEnv wasn't important. Just standard free/operational monad stuff with constructors matching the three functions in the api.
But I was stuck on what to define Ref as. I didn't want some sort of class, I wanted it to be a concrete type as far as the type system was concerned.
Then late one night I was out for a walk and it hit me - what I essentially want is a type whose constructors are indexed by an argument type. But it had to be open, unlike a GADT - new interpreters could be added at will. And then it hit me. That's exactly what a data family is. An open, type-indexed family of data values. I could complete the api with just the following:
data family Ref (s :: k) :: * -> *
Then, dealing with the underlying representation for a Ref was no big deal. Just create a data instance (or newtype instance, more likely) whenever an interpreter for MutableEnv is defined.
This exact example isn't really useful. But it clearly illustrates something data families can do that injective type families can't.
The answer by Reid Barton explains the distinction between my two examples perfectly. It has reminded me of something I read in Richard Eisenberg's thesis about adding dependent types to Haskell and I thought that since the heart of this question is injectivity and generativity, it would be worth mentioning how DependentHaskell will deal with this (when it eventually gets implemented, and if the quantifiers proposed now are the ones eventually implemented).
What follows is based on pages 56 and 57 (4.3.4 Matchability) of the aforementioned thesis:
Definition (Generativity). If f and g are generative, then f a ~ g b implies f ~ g
Definition (Injectivity). If f is injective, then f a ~ f b implies a ~ b
Definition (Matchability). A function f is matchable iff it is generative and injective
In Haskell as we know it now (8.0.1) the matchable (type-level) functions consist exactly of newtype, data, and data family type constructors. In the future, under DependentHaskell, one of the new quantifiers we will get will be '-> and this will be used to denote matchable functions. In other words, there will be a way to inform the compiler a type-level function is generative (which currently can only be done by making sure that function is a type constructor).

How do I understand the set of valid inputs to a Haskell type constructor?

Warning: very beginner question.
I'm currently mired in the section on algebraic types in the Haskell book I'm reading, and I've come across the following example:
data Id a =
MkId a deriving (Eq, Show)
idInt :: Id Integer
idInt = MkId 10
idIdentity :: Id (a -> a)
idIdentity = MkId $ \x -> x
OK, hold on. I don't fully understand the idIdentity example. The explanation in the book is that:
This is a little odd. The type Id takes an argument and the data
constructor MkId takes an argument of the corresponding polymorphic
type. So, in order to have a value of type Id Integer, we need to
apply a -> Id a to an Integer value. This binds the a type variable to
Integer and applies away the (->) in the type constructor, giving us
Id Integer. We can also construct a MkId value that is an identity
function by binding the a to a polymorphic function in both the type
and the term level.
But wait. Why only fully polymorphic functions? My previous understanding was that a can be any type. But apparently constrained polymorphic type doesn't work: (Num a) => a -> a won't work here, and the GHC error suggests that only completely polymorphic types or "qualified types" (not sure what those are) are valid:
f :: (Num a) => a -> a
f = undefined
idConsPoly :: Id (Num a) => a -> a
idConsPoly = MkId undefined
Illegal polymorphic or qualified type: Num a => a -> a
Perhaps you intended to use ImpredicativeTypes
In the type signature for ‘idIdentity’:
idIdentity :: Id (Num a => a -> a)
EDIT: I'm a bonehead. I wrote the type signature below incorrectly, as pointed out by #chepner in his answer below. This also resolves my confusion in the next sentence below...
In retrospect, this behavior makes sense because I haven't defined a Num instance for Id. But then what explains me being able to apply a type like Integer in idInt :: Id Integer?
So in generality, I guess my question is: What specifically is the set of valid inputs to type constructors? Only fully polymorphic types? What are "qualified types" then? Etc...
You just have the type constructor in the wrong place. The following is fine:
idConsPoly :: Num a => Id (a -> a)
idConsPoly = MkId undefined
The type constructor Id here has kind * -> *, which means you can give it any value that has kind * (which includes all "ordinary" types) and returns a new value of kind *. In general, you are more concerned with arrow-kinded functions(?), of which type constructors are just one example.
TypeProd is a ternary type constructor whose first two arguments have kind * -> *:
-- Based on :*: from Control.Compose
newtype TypeProd f g a = Prod { unProd :: (f a, g a) }
Either Int is an expression whose value has kind * -> * but is not a type constructor, being the partial application of the type constructor Either to the nullary type constructor Int.
Also contributing to your confusion is that you've misinterpreted the error message from GHC. It means "Num a => a -> a is a polymorphic or qualified type, and therefore illegal (in this context)".
The last sentence that you quoted from the book is not very well worded, and maybe contributed to that misunderstanding. It's important to realize that in Id (a -> a) the argument a -> a is not a polymorphic type, but just an ordinary type that happens to mention a type variable. The thing which is polymorphic is idIdentity, which can have the type Id (a -> a) for any type a.
In standard Haskell polymorphism and qualification can only appear at the outermost level of the type in a type signature.
The type signature is almost correct
idConsPoly :: (Num a) => Id (a -> a)
Should be right, though i have no ghc on my phone to test this.
Also i think your question is quite broad, thus i deliberately answer only the concrete problem here.

Are GHC's Type Famlies An Example of System F-omega?

I'm reading up about the Lambda-Cube, and I'm particularly interested in System F-omega, which allows for "type operators" i.e. types depending on types. This sounds a lot like GHC's type families. For example
type family Foo a
type instance Foo Int = Int
type instance Foo Float = ...
...
where the actual type depends on the type parameter a. Am I right in thinking that type families are an example of the type operators ala system F-omega? Or am I out in left field?
System F-omega allows universal quantification, abstraction and application at higher kinds, so not only over types (at kind *), but also at kinds k1 -> k2, where k1 and k2 are themselves kinds generated from * and ->. Hence, the type level itself becomes a simply typed lambda-calculus.
Haskell delivers slightly less than F-omega, in that the type system allows quantification and application at higher kinds, but not abstraction. Quantification at higher kinds is how we have types like
fmap :: forall f, s, t. Functor f => (s -> t) -> f s -> f t
with f :: * -> *. Correspondingly, variables like f can be instantiated with higher-kinded type expressions, such as Either String. The lack of abstraction makes it possible to solve unification problems in type expressions by the standard first-order techniques which underpin the Hindley-Milner type system.
However, type families are not really another means to introduce higher-kinded types, nor a replacement for the missing type-level lambda. Crucially, they must be fully applied. So your example,
type family Foo a
type instance Foo Int = Int
type instance Foo Float = ...
....
should not be considered as introducing some
Foo :: * -> * -- this is not what's happening
because Foo on its own is not a meaningful type expression. We have only the weaker rule that Foo t :: * whenever t :: *.
Type families do, however, act as a distinct type-level computation mechanism beyond F-omega, in that they introduce equations between type expressions. The extension of System F with equations is what gives us the "System Fc" which GHC uses today. Equations s ~ t between type expressions of kind * induce coercions transporting values from s to t. Computation is done by deducing equations from the rules you give when you define type families.
Moreover, you can give type families a higher-kinded return type, as in
type family Hoo a
type instance Hoo Int = Maybe
type instance Hoo Float = IO
...
so that Hoo t :: * -> * whenever t :: *, but still we cannot let Hoo stand alone.
The trick we sometimes use to get around this restriction is newtype wrapping:
newtype Noo i = TheNoo {theNoo :: Foo i}
which does indeed give us
Noo :: * -> *
but means that we have to apply the projection to make computation happen, so Noo Int and Int are provably distinct types, but
theNoo :: Noo Int -> Int
So it's a bit clunky, but we can kind of compensate for the fact that type families do not directly correspond to type operators in the F-omega sense.

'type family' vs 'data family', in brief?

I'm confused about how to choose between data family and type family. The wiki page on TypeFamilies goes into a lot of detail. Occasionally it informally refers to Haskell's data family as a "type family" in prose, but of course there is also type family in Haskell.
There is a simple example that shows where two versions of code are shown, differing only on whether a data family or a type family is being declared:
-- BAD: f is too ambiguous, due to non-injectivity
-- type family F a
-- OK
data family F a
f :: F a -> F a
f = undefined
g :: F Int -> F Int
g x = f x
type and data here have the same meaning, but the type family version fails to type-check, while the data family version is fine, because data family "creates new types and are therefore injective" (says the wiki page).
My takeaway from all of this is "try data family for simple cases, and, if it isn't powerful enough, then try type family". Which is fine, but I'd like to understand it better. Is there a Venn diagram or a decision tree I can follow to distinguish when to use which?
(Boosting useful information from comments into an answer.)
Standalone vs In-Class declaration
Two syntactically different ways to declare a type family and/or data family, that are semantically equivalent:
standalone:
type family Foo
data family Bar
or as part of a typeclass:
class C where
type Foo
data Bar
both declare a type family, but inside a typeclass the family part is implied by the class context, so GHC/Haskell abbreviates the declaration.
"New type" vs "Type Synonym" / "Type Alias"
data family F creates a new type, similar to how data F = ... creates a new type.
type family F does not create a new type, similar to how type F = Bar Baz doesn't create a new type (it just creates an alias/synonym to an existing type).
Example of non-injectivity of type family
An example (slightly modified) from Data.MonoTraversable.Element:
import Data.ByteString as S
import Data.ByteString.Lazy as L
-- Declare a family of type synonyms, called `Element`
-- `Element` has kind `* -> *`; it takes one parameter, which we call `container`
type family Element container
-- ByteString is a container for Word8, so...
-- The Element of a `S.ByteString` is a `Word8`
type instance Element S.ByteString = Word8
-- and the Element of a `L.ByteString` is also `Word8`
type instance Element L.ByteString = Word8
In a type family, the right-side of equations Word8 names an existing type; the things are the left-side creates new synonyms: Element S.ByteString and Element L.ByteString
Having a synonym means we can interchange Element Data.ByteString with Word8:
-- `w` is a Word8....
>let w = 0::Word8
-- ... and also an `Element L.ByteString`
>:t w :: Element L.ByteString
w :: Element L.ByteString :: Word8
-- ... and also an `Element S.ByteString`
>:t w :: Element S.ByteString
w :: Element S.ByteString :: Word8
-- but not an `Int`
>:t w :: Int
Couldn't match expected type `Int' with actual type `Word8'
These type synonyms are "non-injective" ("one-way"), and therefore non-invertible.
-- As before, `Word8` matches `Element L.ByteString` ...
>(0::Word8)::(Element L.ByteString)
-- .. but GHC can't infer which `a` is the `Element` of (`L.ByteString` or `S.ByteString` ?):
>(w)::(Element a)
Couldn't match expected type `Element a'
with actual type `Element a0'
NB: `Element' is a type function, and may not be injective
The type variable `a0' is ambiguous
Worse, GHC can't even solve non-ambiguous cases!:
type instance Element Int = Bool
> True::(Element a)
> NB: `Element' is a type function, and may not be injective
Note the use of "may not be"! I think GHC is being conservative, and refusing to check whether the Element truly is injective. (Perhaps because a programmer could add another type instance later, after importing a pre-compiled module, adding ambiguity.
Example of injectivity of data family
In contrast: In a data family, each right-hand side contains a unique constructor , so the definitions are injective ("reversible") equations.
-- Declare a list-like data family
data family XList a
-- Declare a list-like instance for Char
data instance XList Char = XCons Char (XList Char) | XNil
-- Declare a number-like instance for ()
data instance XList () = XListUnit Int
-- ERROR: "Multiple declarations of `XListUnit'"
data instance XList () = XListUnit Bool
-- (Note: GHCI accepts this; the new declaration just replaces the previous one.)
With data family, seeing the constructor name on the right (XCons, or XListUnit)
is enough to let the type-inferencer know we must be working with XList () not an XList Char. Since constructor names are unique, these defintions are injective/reversible.
If type "just" declares a synonym, why is it semantically useful?
Usually, type synonyms are just abbreviations, but type family synonyms have added power: They can make a simple type (kind *) become a synonym of a "type with kind * -> * applied to an argument":
type instance F A = B
makes B match F a. This is used, for example, in Data.MonoTraversable to make a simple type Word8 match functions of the type Element a -> a (Element is defined above).
For example, (a bit silly), suppose we have a version of const that only works with "related" types:
> class Const a where constE :: (Element a) -> a -> (Element a)
> instance Const S.ByteString where constE = const
> constE (0::Word8) undefined
ERROR: Couldn't match expected type `Word8' with actual type `Element a0'
-- By providing a match `a = S.ByteString`, `Word8` matches `(Element S.ByteString)`
> constE (0::Word8) (undefined::S.ByteString)
0
-- impossible, since `Char` cannot match `Element a` under our current definitions.
> constE 'x' undefined
I don't think any decision tree or Venn diagram will exist because the applications for type and data families are pretty wide.
Generally you've already highlighted the key design differences and I would agree with your takeaway to first see if you can get away with data family.
For me the key point is that each instance of a data family creates a new type, which does substantially limit the power because you can't do what is often the most natural thing and make an existing type be the instance.
For example the GMapKey example on the Haskell wiki page on "indexed types" is a reasonably natural fit for data families:
class GMapKey k where
data GMap k :: * -> *
empty :: GMap k v
lookup :: k -> GMap k v -> Maybe v
insert :: k -> v -> GMap k v -> GMap k v
The key type of the map k is the argument to the data family and the actual map type is the result of the data family (GMap k) . As a user of a GMapKey instance you're probably quite happy for the GMap k type to be abstract to you and just manipulate it via the generic map operations in the type class.
In contrast the Collects example on the same wiki page is sort of the opposite:
class Collects ce where
type Elem ce
empty :: ce
insert :: Elem ce -> ce -> ce
member :: Elem ce -> ce -> Bool
toList :: ce -> [Elem ce]
The argument type is the collection and the result type is the element of the collection. In general a user is going to want to operate on those elements directly using the normal operations on that type. For example the collection might be IntSet and the element might be Int. Having the Int be wrapped up in some other type would be quite inconvenient.
Note - these two examples are with type classes and therefore don't need the family keyword as declaring a type inside a type class implies it must be a family. Exactly the same considerations apply as for standalone families though, it's just a question of how the abstraction is organised.

Resources