Haskell type checking and determinism - haskell

According to the Haskell 2010 language report, its type checker is based on Hindley-Milner. So consider a function f of this type,
f :: forall a. [a] -> Int
It could be the length function for instance. According to Hindley-Milner, f [] type checks to Int. We can prove this by instantiating the type of f to [Int] -> Int, and the type of [] to [Int], then conclude that the application ([Int] -> Int) [Int] is of type Int.
In this proof, I chose to instantiate types forall a. [a] -> Int and forall a. [a] by substituting Int to a. I can substitute Bool instead, the proof works too. Isn't it strange in Hindley-Milner that we can apply a polymorphic type to another, without specifying which instances we use ?
More specifically, what in Haskell prevents me from using the type a in the implementation of f ? I could imagine that f is a function that equals 18 on any [Bool], and equals the usual length function on all other types of lists. In this case, would f [] be 18 or 0 ? The Haskell report says "the kernel is not formally specified", so it's hard to tell.

During type inference, such type variables can indeed instantiated to any type. This may be seen as a highly non deterministic step.
GHC, for what it is worth, uses the internal Any type in such cases. For instance, compiling
{-# NOINLINE myLength #-}
myLength :: [a] -> Int
myLength = length
test :: Int
test = myLength []
results in the following Core:
-- RHS size: {terms: 3, types: 4, coercions: 0}
myLength [InlPrag=NOINLINE] :: forall a_aw2. [a_aw2] -> Int
[GblId, Str=DmdType]
myLength =
\ (# a_aP5) -> length # [] Data.Foldable.$fFoldable[] # a_aP5
-- RHS size: {terms: 2, types: 6, coercions: 0}
test :: Int
[GblId, Str=DmdType]
test = myLength # GHC.Prim.Any (GHC.Types.[] # GHC.Prim.Any)
where GHC.Prim.Any occurs in the last line.
Now, is that really not deterministic? Well, it does involve a kind of non deterministic step "in the middle" of the algorithm, but the final resulting (most general) type is Int, and deterministically so. It does not matter what type we choose for a, we always get type Int at the end.
Of course, getting the same type is not enough: we also want to get the same Int value. I conjecture that it can be proven that, given
f :: forall a. T a
g :: forall a. T a -> U
then
g # V (f # V) :: U
is the same value whatever type V is. This should be a consequence of parametricity applied to those polymorphic types.

To follow-up on Chi's answer, here is the proof that f [] cannot depend on the type instances of f and []. According to Theorems for free (the last article here),
for any types a,a' and any function g :: a -> a', then
f_a = f_a' . map g
where f_a is the instantiation of f on type a, for example in Haskell
f_Bool :: [Bool] -> Int
f_Bool = f
Then if you evaluate the previous equality on []_a, it yields f_a []_a = f_a' []_a'. In the case of the original question, f_Int []_Int = f_Bool []_Bool.
Some references for parametricity in Haskell would be useful too, because Haskell looks stronger than the polymorphic lambda calculus described in Walder's paper. In particular, this wiki page says parametricity can be broken in Haskell by using the seq function.
The wiki page also says that my type-depending function exists (in other languages than Haskell), it is called ad-hoc polymorphism.

Related

SystemT Compiler and dealing with Infinite Types in Haskell

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html
It shows a small OCaml compiler program for System T (a simple total functional language).
The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see:
module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.
I'm attempting to translate it to Haskell without the use of Template Haskell.
For the SystemT Combinator form, I've written it as
{-# LANGUAGE GADTs #-}
data TNat = Zero | Succ TNat
data THom a b where
Id :: THom a a
Unit :: THom a ()
ZeroH :: THom () TNat
SuccH :: THom TNat TNat
Compose :: THom a b -> THom b c -> THom a c
Pair :: THom a b -> THom a c -> THom a (b, c)
Fst :: THom (a, b) a
Snd :: THom (a, b) b
Curry :: THom (a, b) c -> THom a (b -> c)
Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b
Note that Compose is forward composition, which differs from (.).
During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.
The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:
let rec find x = function
| [] -> raise Not_found
| (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >>
| (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>
Into Haskell:
find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
| tvar == tvar' = Snd
| otherwise = Compose Fst (find tvar ctx)
Results in an infinite type error.
• Occurs check: cannot construct the infinite type: a ~ (a, c)
Expected type: THom (a, c) c
Actual type: THom ((a, c), c) c
The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.
The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.
I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like
data TExpr = forall a b. TExpr (THom a b)
Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.
This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.
The approach in the original post
Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.
Dependent typing/GADTs in input and output
If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not).
However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.
But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:
data Exp a where
Abs :: (Exp a -> Exp b) -> Exp (a -> b)
App :: Exp (a -> b) -> Exp a -> Exp b
Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)
If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.
From untyped to typed terms
A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.
But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.
What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types).
I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.
data Type t where
VNat :: Type Nat
VString :: Type String
VArrow :: Type a -> Type b -> Type (a -> b)
VPair :: Type a -> Type b -> Type (a, b)
VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)
type Context = [(TVar, SomeType)]
getType :: Context -> SomeType
getType [] = SomeType VUnit
getType ((_, SomeType ttyp) :: gamma) =
case getType gamma of
SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom
find tvar ((tvar’, ttyp) :: gamma)
| tvar == tvar’ =
case (ttyp, getType gamma) of
(SomeType t, SomeType ctxT) ->
SomeHom (VPair t ctxT) t Fst

differences: GADT, data family, data family that is a GADT

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.
ADDIT: to clarify #kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then #HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)
As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.
The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a is a GADT with an inferrable a parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF won't typecheck above.
Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).
I think the difference becomes clear if we use PatternSynonyms-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D, in "forward" or expression contexts, says, "forall a, you can give me 2 as and I'll give you a D a". "Backwards", as a pattern, it says, "forall a, you can give me a D a and I'll give you 2 as".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a, you can give me a GD a, and I can give you a proof that a ~ Int, plus an Int.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...; I got GD :: forall a. ... GD a. This is also true for Haskell 98 constructors, and also data family constructors, though it's a bit more subtle.
If I have a GD a, and I don't know what a is, I can pass into GD anyway, even though I wrote GD :: Int -> GD Int, which seems to say I can only match it with GD Ints. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a, I can match a GD a with the GD constructor and get evidence for a ~ Int and a value of Int.
Ok, data familys. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As #K.A.Buhr says, the different data instances act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a, and I don't know what ns is, I cannot match on it. RectNil only takes Rect '[] as, RectCons only takes Rect (n : ns) as. You might ask: "why would I want a reduction in power?" #KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect's case, as these instances already fill up the entire [Nat] * Type space. The reason is actually newtype.
Here's a GADT RectG:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a and don't know what ns is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int, I would incur an overhead of one million RectGN constructors that all "preserve" the same "information". Rect [1000, 1000] Int is fine, though, as I do not have the ability to match and tell whether a Rect is RectNil or RectCons. This allows the constructor to be newtype, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect in O(sum ns) space instead of O(product ns) space (I think those are right). This is also why GADTs are closed and families are open. A GADT is just like a normal data type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v and I don't know what k is, I can't match it against MW8General or MW8BitSet, because those only want Map Word8s. This is the data family's influence. If I have a Map Word8 v and I don't know what v is, matching on the constructors can reveal to me whether it's known to be Bool or is something else.

Why is a function type required to be "wrapped" for the type checker to be satisfied?

The following program type-checks:
{-# LANGUAGE RankNTypes #-}
import Numeric.AD (grad)
newtype Fun = Fun (forall a. Num a => [a] -> a)
test1 [u, v] = (v - (u * u * u))
test2 [u, v] = ((u * u) + (v * v) - 1)
main = print $ fmap (\(Fun f) -> grad f [1,1]) [Fun test1, Fun test2]
But this program fails:
main = print $ fmap (\f -> grad f [1,1]) [test1, test2]
With the type error:
Grad.hs:13:33: error:
• Couldn't match type ‘Integer’
with ‘Numeric.AD.Internal.Reverse.Reverse s Integer’
Expected type: [Numeric.AD.Internal.Reverse.Reverse s Integer]
-> Numeric.AD.Internal.Reverse.Reverse s Integer
Actual type: [Integer] -> Integer
• In the first argument of ‘grad’, namely ‘f’
In the expression: grad f [1, 1]
In the first argument of ‘fmap’, namely ‘(\ f -> grad f [1, 1])’
Intuitively, the latter program looks correct. After all, the
following, seemingly equivalent program does work:
main = print $ [grad test1 [1,1], grad test2 [1,1]]
It looks like a limitation in GHC's type system. I would like to know
what causes the failure, why this limitation exists, and any possible
workarounds besides wrapping the function (per Fun above).
(Note: this is not caused by the monomorphism restriction; compiling
with NoMonomorphismRestriction does not help.)
This is an issue with GHC's type system. It is really GHC's type system by the way; the original type system for Haskell/ML like languages don't support higher rank polymorphism, let alone impredicative polymorphism which is what we're using here.
The issue is that in order to type check this we need to support foralls at any position in a type. Not only bunched all the way at the front of the type (the normal restriction which allows for type inference). Once you leave this area type inference becomes undecidable in general (for rank n polymorphism and beyond). In our case, the type of [test1, test2] would need to be [forall a. Num a => a -> a] which is a problem considering that it doesn't fit into the scheme discussed above. It would require us to use impredicative polymorphism, so called because a ranges over types with foralls in them and so a could be replaced with the type in which it's being used.
So, therefore there's going to be some cases that misbehave just because the problem is not fully solvable. GHC does have some support for rank n polymorphism and a bit of support for impredicative polymorphism but it's generally better to just use newtype wrappers to get reliable behavior. To the best of my knowledge, GHC also discourages using this feature precisely because it's so hard to figure out exactly what the type inference algorithm will handle.
In summary, math says that there will be flaky cases and newtype wrappers are the best, if somewhat dissatisfying way, to cope with it.
The type inference algorithm will not infer higher rank types (those with forall at the left of ->). If I remember correctly, it becomes undecidable. Anyway, consider this code
foo f = (f True, f 'a')
what should its type be? We could have
foo :: (forall a. a -> a) -> (Bool, Char)
but we could also have
foo :: (forall a. a -> Int) -> (Int, Int)
or, for any type constructor F :: * -> *
foo :: (forall a. a -> F a) -> (F Bool, F Char)
Here, as far as I can see, we can not find a principal type -- a type which is the most general type we can assign to foo.
If a principal type does not exist, the type inference machinery can only pick a suboptimal type for foo, which can cause type errors later on. This is bad. Instead, GHC relies on a Hindley-Milner style type inference engine, which was greatly extended so to cover more advanced Haskell types. This mechanism, unlike plain Hindley-Milner, will assign f a polymorphic type provided the user explicitly required that, e.g. by giving foo a signature.
Using a wrapper newtype like Fun also instructs GHC in a similar way, providing the polymorphic type for f.

Can a Haskell type constructor have non-type parameters?

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *
Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E
I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a
Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.
No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

When are type signatures necessary in Haskell?

Many introductory texts will tell you that in Haskell type signatures are "almost always" optional. Can anybody quantify the "almost" part?
As far as I can tell, the only time you need an explicit signature is to disambiguate type classes. (The canonical example being read . show.) Are there other cases I haven't thought of, or is this it?
(I'm aware that if you go beyond Haskell 2010 there are plenty for exceptions. For example, GHC will never infer rank-N types. But rank-N types are a language extension, not part of the official standard [yet].)
Polymorphic recursion needs type annotations, in general.
f :: (a -> a) -> (a -> b) -> Int -> a -> b
f f1 g n x =
if n == (0 :: Int)
then g x
else f f1 (\z h -> g (h z)) (n-1) x f1
(Credit: Patrick Cousot)
Note how the recursive call looks badly typed (!): it calls itself with five arguments, despite f having only four! Then remember that b can be instantiated with c -> d, which causes an extra argument to appear.
The above contrived example computes
f f1 g n x = g (f1 (f1 (f1 ... (f1 x))))
where f1 is applied n times. Of course, there is a much simpler way to write an equivalent program.
Monomorphism restriction
If you have MonomorphismRestriction enabled, then sometimes you will need to add a type signature to get the most general type:
{-# LANGUAGE MonomorphismRestriction #-}
-- myPrint :: Show a => a -> IO ()
myPrint = print
main = do
myPrint ()
myPrint "hello"
This will fail because myPrint is monomorphic. You would need to uncomment the type signature to make it work, or disable MonomorphismRestriction.
Phantom constraints
When you put a polymorphic value with a constraint into a tuple, the tuple itself becomes polymorphic and has the same constraint:
myValue :: Read a => a
myValue = read "0"
myTuple :: Read a => (a, String)
myTuple = (myValue, "hello")
We know that the constraint affects the first part of the tuple but does not affect the second part. The type system doesn't know that, unfortunately, and will complain if you try to do this:
myString = snd myTuple
Even though intuitively one would expect myString to be just a String, the type checker needs to specialize the type variable a and figure out whether the constraint is actually satisfied. In order to make this expression work, one would need to annotate the type of either snd or myTuple:
myString = snd (myTuple :: ((), String))
In Haskell, as I'm sure you know, types are inferred. In other words, the compiler works out what type you want.
However, in Haskell, there are also polymorphic typeclasses, with functions that act in different ways depending on the return type. Here's an example of the Monad class, though I haven't defined everything:
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
fail :: String -> m a
We're given a lot of functions with just type signatures. Our job is to make instance declarations for different types that can be treated as Monads, like Maybe t or [t].
Have a look at this code - it won't work in the way we might expect:
return 7
That's a function from the Monad class, but because there's more than one Monad, we have to specify what return value/type we want, or it automatically becomes an IO Monad. So:
return 7 :: Maybe Int
-- Will return...
Just 7
return 6 :: [Int]
-- Will return...
[6]
This is because [t] and Maybe have both been defined in the Monad type class.
Here's another example, this time from the random typeclass. This code throws an error:
random (mkStdGen 100)
Because random returns something in the Random class, we'll have to define what type we want to return, with a StdGen object tupelo with whatever value we want:
random (mkStdGen 100) :: (Int, StdGen)
-- Returns...
(-3650871090684229393,693699796 2103410263)
random (mkStdGen 100) :: (Bool, StdGen)
-- Returns...
(True,4041414 40692)
This can all be found at learn you a Haskell online, though you'll have to do some long reading. This, I'm pretty much 100% certain, it the only time when types are necessary.

Resources