Type-Level Peano Numbers and UndecidableInstances

Type-Level Peano Numbers and UndecidableInstances - haskell

Following this tutorial, I have the following code:
{-# LANGUAGE DataKinds, TypeFamilies #-}
data Nat = Z | S Nat
type family Plus (n :: Nat) (m :: Nat) :: Nat
type instance Plus Z m = m
type instance Plus (S n) m = S (Plus n m)
So far so good. I know that DataKinds does auto promotion of types to kinds, and TypeFamilies enable type-level functions.
Then I add:
type family Mul (m :: Nat) (n :: Nat) :: Nat
type instance Mul Z m = Z
type instance Mul (S n) m = Plus m (Mul n m)
Compiling gives me:
Nested type family application
in the type family application: Plus m (Mul n m)
(Use UndecidableInstances to permit this)
In the type instance declaration for ‘Mul’
The name of this extension is scary. Some people say to avoid it, and explanations of this extension I've tried reading do not make sense to me.
Looking for a bit of help in understanding the error, why it is showing up in this example, what is the meaning of undecidable in this context and in what instances this extension is as scary as it sounds?

It's not really scary. It disables the (rather weak) GHC checks that try to ascertain that type family and instance definitions terminate. So, its worst case behavior is non-terminating type checking. However, most often we get error messages instead of actual looping, because loops often exceed stack limits in the type checker. We don't get unsafety, incoherence or any other property that may adversely affect program robustness or inference.
"Undecidable" is probably a stronger expression than warranted, since it only denotes that GHC can't decide whether a definition is terminating. In many cases it's GHC that's too weak for the job, and other compilers for dependent languages would be able to check termination without problem. With GHC, UndecidableInstances is often required for obviously terminating definitions too, in which cases its usage is quite justified.

UndecidableInstance is ok to use. Granted, you should always consider a moment whether something is really needed, or it would be more idiomatic to choose another path.
Luke makes the point that UndecidableInstances should not be used to implement superclassing. Well, that's true, but it doesn't really have much to do with undecidable instances – superclassing is just always a hack and pretty much backwards to the way abstractions should be designed.
It is IME not true, at least not anymore, that superclassing is the most common use of UndecidableInstances. There are plenty of things you can do with this extension enabled but not without. Most certainly, it can't be possible to impose a halting-proof condition if you actually want to do Turing-complete dependent typing at compile-time, which is basically what all the DataKinds fuzz is heading to.
So, don't worry too much; indeed you probably won't get around UndecidableInstances. As said by András Kovács, this extension is perfectly safe WRT correctness, the worst that can happen is a loop / error.
If that's too scary for your tastes, use a language that's designed up front as dependently-typed and total, i.e. Agda.

Related

Programmatic proofs for monad laws

In haskell, the users do not have to prove that their monads satisfy the monad laws.
return a >>= k = k a
m >>= return = m
m >>= (\x -> k x >>= h) = (m >>= k) >>= h
If I understand correctly, even if they want, there's no way for the compiler could read such a proof.
Questions
What technology is missing in haskell for there to be a way to implement such proof checkable by a compiler?
Which functional language supports such a functionality (i.e. can check if a claim-to-be monad satisfies the monad laws)?

In practice, laws usually aren't proven in Haskell, but they may very well be tested. If you throw lots of random inputs to the expressions on both sides of the equation for your monad, and the result always comes out the same on both sides, that may not guarantee anything but it does make it quite likely that any law-violating behaviour would be caught. That is, provided you generate the inputs in a sufficiently representative way. QuickCheck is usually pretty good at this.
If you do want to prove laws then, well, Haskell isn't really the right tool. You'd want the proof to be checked at compile time, but Haskell makes it rather difficult to express complicated values in the type level. If you do it at runtime instead, then first of all: no good if the deployed executable crashes because of a mistake. But more importantly, since Haskell isn't total you could “prove” any proposition by just giving undefined as the result – or some other ⊥ value, more typically this might be some subtle infinite loop.
The right tool is a dependently typed language. The most popular are Coq and Lean, which resemble ML more than Haskell, and Agda. These are primarily intended to be proof assistants rather that general programming languages which also allow you to formulate theorems; Idris goes more in that direction.
All that said, modern Haskell is now also have somewhat capable of dependently-typed programming. The key tool is to have your functions as type families, and use singletons to get value-level standins to the type-level values, and then use either GADTs or constrained CPS to pass around the proofs.
It's still really awkward to use this to specify laws for a type class, but it can be used quite nicely to Curry-Howard-express concrete theorems. The singletons-base package contains a lot of standard functions in type-lifted form, thus suitable for proving stuff about. For example, here's how you could formulate that the list concatenation operator is associative:
{-# LANGUAGE TypeFamilies, DataKinds, KindSignatures, PolyKinds, TypeOperators #-}
import Data.List.Singletons
listConcatAssoc :: ∀ k l m ρ . Sing k -> Sing l -> Sing m
-> (((k++l)++m ~ k++(l++m)) => ρ) -> ρ
listConcatAssoc SNil SNil SNil φ = φ
...
The complete proof will be quite annoying to write, but TBH proofs are annoying to write even in Coq, though that is specifically its job. Coq does make it a lot nicer to really express typeclasses with laws etc., though.

Avoiding class constraint on type level naturals

I am using the Nat type in from the GHC.TypeLits module, which admittedly says the programmer interface should be defined in a separate library. In any case, the GHC.TypeLits has a class KnownNat with a class function natVal which converts a compile time Nat into a runtime Integer. There's also a type function (+) which adds compile time Nats.
The problem is that given (KnownNat n1, KnownNat n2), GHC can't derive that KnownNat (n1 + n2).
This results in an explosion of constraints needed to be added whenever I add type level naturals.
One alternative would be to define Natural numbers myself like so:
data Nat = Zero | Succ Nat
Or perhaps use a library like type-natural. But it seems silly to not use the Nats which are built into GHC, which also allow you to nicely use literal numbers in types (i.e. 0, 1) instead of having to define:
N0 = Zero
N1 = Succ N0
etc...
Is there anyway around this issue with GHC KnownNat constraints being required all over the place? Or should I just ignore the GHC.TypeLits module for my problem?

This GHC type checker plugin does exactly that (derives "complex" KnownNat constraints from other ones already available): https://hackage.haskell.org/package/ghc-typelits-knownnat
If "type checker plugin" sounds a little intimidating (it did to me at first), it's actually very simple to use. Simply add it as a dependency in your package file (or cabal install it) like any other package, then either add:
{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}
to the start of your source files (much like a LANGUAGE pragma), or add it as an option globally in your package file.
There's also another plugin by the same author that's very useful as well for working with typelit Nats: https://hackage.haskell.org/package/ghc-typelits-natnormalise. This one is able to infer equality of Nat type expressions that GHC on its own gives up on: things like n + (m + 1) ~ (n + 1) + m that come up all the time when GHC is trying to prove "expected" and "actual" types match.

Why not be dependently typed?

I have seen several sources echo the opinion that "Haskell is gradually becoming a dependently-typed language". The implication seems to be that with more and more language extensions, Haskell is drifting in that general direction, but isn't there yet.
There are basically two things I would like to know. The first is, quite simply, what does "being a dependently-typed language" actually mean? (Hopefully without being too technical about it.)
The second question is... what's the drawback? I mean, people know we're heading that way, so there must be some advantage to it. And yet, we're not there yet, so there must be some downside stopping people going all the way. I get the impression that the problem is a steep increase in complexity. But, not really understanding what dependent typing is, I don't know for sure.
What I do know is that every time I start reading about a dependently-typed programming language, the text is utterly incomprehensible... Presumably that's the problem. (?)

Dependently Typed Haskell, Now?
Haskell is, to a small extent, a dependently typed language. There is a notion of type-level data, now more sensibly typed thanks to DataKinds, and there is some means (GADTs) to give a run-time
representation to type-level data. Hence, values of run-time stuff effectively show up in types, which is what it means for a language to be dependently typed.
Simple datatypes are promoted to the kind level, so that the values
they contain can be used in types. Hence the archetypal example
data Nat = Z | S Nat
data Vec :: Nat -> * -> * where
VNil :: Vec Z x
VCons :: x -> Vec n x -> Vec (S n) x
becomes possible, and with it, definitions such as
vApply :: Vec n (s -> t) -> Vec n s -> Vec n t
vApply VNil VNil = VNil
vApply (VCons f fs) (VCons s ss) = VCons (f s) (vApply fs ss)
which is nice. Note that the length n is a purely static thing in
that function, ensuring that the input and output vectors have the
same length, even though that length plays no role in the execution of
vApply. By contrast, it's much trickier (i.e., impossible) to
implement the function which makes n copies of a given x (which
would be pure to vApply's <*>)
vReplicate :: x -> Vec n x
because it's vital to know how many copies to make at run-time. Enter
singletons.
data Natty :: Nat -> * where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
For any promotable type, we can build the singleton family, indexed
over the promoted type, inhabited by run-time duplicates of its
values. Natty n is the type of run-time copies of the type-level n
:: Nat. We can now write
vReplicate :: Natty n -> x -> Vec n x
vReplicate Zy x = VNil
vReplicate (Sy n) x = VCons x (vReplicate n x)
So there you have a type-level value yoked to a run-time value:
inspecting the run-time copy refines static knowledge of the
type-level value. Even though terms and types are separated, we can
work in a dependently typed way by using the singleton construction as
a kind of epoxy resin, creating bonds between the phases. That's a
long way from allowing arbitrary run-time expressions in types, but it ain't nothing.
What's Nasty? What's Missing?
Let's put a bit of pressure on this technology and see what starts
wobbling. We might get the idea that singletons should be manageable a
bit more implicitly
class Nattily (n :: Nat) where
natty :: Natty n
instance Nattily Z where
natty = Zy
instance Nattily n => Nattily (S n) where
natty = Sy natty
allowing us to write, say,
instance Nattily n => Applicative (Vec n) where
pure = vReplicate natty
(<*>) = vApply
That works, but it now means that our original Nat type has spawned
three copies: a kind, a singleton family and a singleton class. We
have a rather clunky process for exchanging explicit Natty n values
and Nattily n dictionaries. Moreover, Natty is not Nat: we have
some sort of dependency on run-time values, but not at the type we
first thought of. No fully dependently typed language makes dependent
types this complicated!
Meanwhile, although Nat can be promoted, Vec cannot. You can't
index by an indexed type. Full on dependently typed languages impose
no such restriction, and in my career as a dependently typed show-off,
I've learned to include examples of two-layer indexing in my talks,
just to teach folks who've made one-layer indexing
difficult-but-possible not to expect me to fold up like a house of
cards. What's the problem? Equality. GADTs work by translating the
constraints you achieve implicitly when you give a constructor a
specific return type into explicit equational demands. Like this.
data Vec (n :: Nat) (x :: *)
= n ~ Z => VNil
| forall m. n ~ S m => VCons x (Vec m x)
In each of our two equations, both sides have kind Nat.
Now try the same translation for something indexed over vectors.
data InVec :: x -> Vec n x -> * where
Here :: InVec z (VCons z zs)
After :: InVec z ys -> InVec z (VCons y ys)
becomes
data InVec (a :: x) (as :: Vec n x)
= forall m z (zs :: Vec x m). (n ~ S m, as ~ VCons z zs) => Here
| forall m y z (ys :: Vec x m). (n ~ S m, as ~ VCons y ys) => After (InVec z ys)
and now we form equational constraints between as :: Vec n x and
VCons z zs :: Vec (S m) x where the two sides have syntactically
distinct (but provably equal) kinds. GHC core is not currently
equipped for such a concept!
What else is missing? Well, most of Haskell is missing from the type
level. The language of terms which you can promote has just variables
and non-GADT constructors, really. Once you have those, the type family machinery allows you to write type-level programs: some of
those might be quite like functions you would consider writing at the
term level (e.g., equipping Nat with addition, so you can give a
good type to append for Vec), but that's just a coincidence!
Another thing missing, in practice, is a library which makes
use of our new abilities to index types by values. What do Functor
and Monad become in this brave new world? I'm thinking about it, but
there's a lot still to do.
Running Type-Level Programs
Haskell, like most dependently typed programming languages, has two
operational semanticses. There's the way the run-time system runs
programs (closed expressions only, after type erasure, highly
optimised) and then there's the way the typechecker runs programs
(your type families, your "type class Prolog", with open expressions). For Haskell, you don't normally mix
the two up, because the programs being executed are in different
languages. Dependently typed languages have separate run-time and
static execution models for the same language of programs, but don't
worry, the run-time model still lets you do type erasure and, indeed,
proof erasure: that's what Coq's extraction mechanism gives you;
that's at least what Edwin Brady's compiler does (although Edwin
erases unnecessarily duplicated values, as well as types and
proofs). The phase distinction may not be a distinction of syntactic category
any longer, but it's alive and well.
Dependently typed languages, being total, allow the typechecker to run
programs free from the fear of anything worse than a long wait. As
Haskell becomes more dependently typed, we face the question of what
its static execution model should be? One approach might be to
restrict static execution to total functions, which would allow us the
same freedom to run, but might force us to make distinctions (at least
for type-level code) between data and codata, so that we can tell
whether to enforce termination or productivity. But that's not the only
approach. We are free to choose a much weaker execution model which is
reluctant to run programs, at the cost of making fewer equations come
out just by computation. And in effect, that's what GHC actually
does. The typing rules for GHC core make no mention of running
programs, but only for checking evidence for equations. When
translating to the core, GHC's constraint solver tries to run your type-level programs,
generating a little silvery trail of evidence that a given expression
equals its normal form. This evidence-generation method is a little
unpredictable and inevitably incomplete: it fights shy of
scary-looking recursion, for example, and that's probably wise. One
thing we don't need to worry about is the execution of IO
computations in the typechecker: remember that the typechecker doesn't have to give
launchMissiles the same meaning that the run-time system does!
Hindley-Milner Culture
The Hindley-Milner type system achieves the truly awesome coincidence
of four distinct distinctions, with the unfortunate cultural
side-effect that many people cannot see the distinction between the
distinctions and assume the coincidence is inevitable! What am I
talking about?
terms vs types
explicitly written things vs implicitly written things
presence at run-time vs erasure before run-time
non-dependent abstraction vs dependent quantification
We're used to writing terms and leaving types to be inferred...and
then erased. We're used to quantifying over type variables with the
corresponding type abstraction and application happening silently and
statically.
You don't have to veer too far from vanilla Hindley-Milner
before these distinctions come out of alignment, and that's no bad thing. For a start, we can have more interesting types if we're willing to write them in a few
places. Meanwhile, we don't have to write type class dictionaries when
we use overloaded functions, but those dictionaries are certainly
present (or inlined) at run-time. In dependently typed languages, we
expect to erase more than just types at run-time, but (as with type
classes) that some implicitly inferred values will not be
erased. E.g., vReplicate's numeric argument is often inferable from the type of the desired vector, but we still need to know it at run-time.
Which language design choices should we review because these
coincidences no longer hold? E.g., is it right that Haskell provides
no way to instantiate a forall x. t quantifier explicitly? If the
typechecker can't guess x by unifiying t, we have no other way to
say what x must be.
More broadly, we cannot treat "type inference" as a monolithic concept
that we have either all or nothing of. For a start, we need to split
off the "generalisation" aspect (Milner's "let" rule), which relies heavily on
restricting which types exist to ensure that a stupid machine can
guess one, from the "specialisation" aspect (Milner's "var" rule)
which is as effective as your constraint solver. We can expect that
top-level types will become harder to infer, but that internal type
information will remain fairly easy to propagate.
Next Steps For Haskell
We're seeing the type and kind levels grow very similar (and they
already share an internal representation in GHC). We might as well
merge them. It would be fun to take * :: * if we can: we lost
logical soundness long ago, when we allowed bottom, but type
soundness is usually a weaker requirement. We must check. If we must have
distinct type, kind, etc levels, we can at least make sure everything
at the type level and above can always be promoted. It would be great
just to re-use the polymorphism we already have for types, rather than
re-inventing polymorphism at the kind level.
We should simplify and generalise the current system of constraints by
allowing heterogeneous equations a ~ b where the kinds of a and
b are not syntactically identical (but can be proven equal). It's an
old technique (in my thesis, last century) which makes dependency much
easier to cope with. We'd be able to express constraints on
expressions in GADTs, and thus relax restrictions on what can be
promoted.
We should eliminate the need for the singleton construction by
introducing a dependent function type, pi x :: s -> t. A function
with such a type could be applied explicitly to any expression of type s which
lives in the intersection of the type and term languages (so,
variables, constructors, with more to come later). The corresponding
lambda and application would not be erased at run-time, so we'd be
able to write
vReplicate :: pi n :: Nat -> x -> Vec n x
vReplicate Z x = VNil
vReplicate (S n) x = VCons x (vReplicate n x)
without replacing Nat by Natty. The domain of pi can be any
promotable type, so if GADTs can be promoted, we can write dependent
quantifier sequences (or "telescopes" as de Briuijn called them)
pi n :: Nat -> pi xs :: Vec n x -> ...
to whatever length we need.
The point of these steps is to eliminate complexity by working directly with more general tools, instead of making do with weak tools and clunky encodings. The current partial buy-in makes the benefits of Haskell's sort-of dependent types more expensive than they need to be.
Too Hard?
Dependent types make a lot of people nervous. They make me nervous,
but I like being nervous, or at least I find it hard not to be nervous
anyway. But it doesn't help that there's quite such a fog of ignorance
around the topic. Some of that's due to the fact that we all still
have a lot to learn. But proponents of less radical approaches have
been known to stoke fear of dependent types without always making sure
the facts are wholly with them. I won't name names. These "undecidable typechecking", "Turing incomplete", "no phase distinction", "no type erasure", "proofs everywhere", etc, myths persist, even though they're rubbish.
It's certainly not the case that dependently typed programs must
always be proven correct. One can improve the basic hygiene of one's
programs, enforcing additional invariants in types without going all
the way to a full specification. Small steps in this direction quite
often result in much stronger guarantees with few or no additional
proof obligations. It is not true that dependently typed programs are
inevitably full of proofs, indeed I usually take the presence of any
proofs in my code as the cue to question my definitions.
For, as with any increase in articulacy, we become free to say foul
new things as well as fair. E.g., there are plenty of crummy ways to
define binary search trees, but that doesn't mean there isn't a good way. It's important not to presume that bad experiences cannot be
bettered, even if it dents the ego to admit it. Design of dependent
definitions is a new skill which takes learning, and being a Haskell
programmer does not automatically make you an expert! And even if some
programs are foul, why would you deny others the freedom to be fair?
Why Still Bother With Haskell?
I really enjoy dependent types, but most of my hacking projects are
still in Haskell. Why? Haskell has type classes. Haskell has useful
libraries. Haskell has a workable (although far from ideal) treatment
of programming with effects. Haskell has an industrial strength
compiler. The dependently typed languages are at a much earlier stage
in growing community and infrastructure, but we'll get there, with a
real generational shift in what's possible, e.g., by way of
metaprogramming and datatype generics. But you just have to look
around at what people are doing as a result of Haskell's steps towards
dependent types to see that there's a lot of benefit to be gained by
pushing the present generation of languages forwards, too.

Dependent typing is really just the unification of the value and type levels, so you can parametrize values on types (already possible with type classes and parametric polymorphism in Haskell) and you can parametrize types on values (not, strictly speaking, possible yet in Haskell, although DataKinds gets very close).
Edit: Apparently, from this point forward, I was wrong (see #pigworker's comment). I'll preserve the rest of this as a record of the myths I've been fed. :P
The issue with moving to full dependent typing, from what I've heard, is that it would break the phase restriction between the type and value levels that allows Haskell to be compiled to efficient machine code with erased types. With our current level of technology, a dependently typed language must go through an interpreter at some point (either immediately, or after being compiled to dependently-typed bytecode or similar).
This is not necessarily a fundamental restriction, but I'm not personally aware of any current research that looks promising in this regard but that has not already made it into GHC. If anyone else knows more, I would be happy to be corrected.

John that's another common misconception about dependent types: that they don't work when data is only available at run-time. Here's how you can do the getLine example:
data Some :: (k -> *) -> * where
Like :: p x -> Some p
fromInt :: Int -> Some Natty
fromInt 0 = Like Zy
fromInt n = case fromInt (n - 1) of
Like n -> Like (Sy n)
withZeroes :: (forall n. Vec n Int -> IO a) -> IO a
withZeroes k = do
Like n <- fmap (fromInt . read) getLine
k (vReplicate n 0)
*Main> withZeroes print
5
VCons 0 (VCons 0 (VCons 0 (VCons 0 (VCons 0 VNil))))
Edit: Hm, that was supposed to be a comment to pigworker's answer. I clearly fail at SO.

pigworker gives an excellent discussion of why we should be headed towards dependent types: (a) they're awesome; (b) they would actually simplify a lot of what Haskell already does.
As for the "why not?" question, there are a couple points I think. The first point is that while the basic notion behind dependent types is easy (allow types to depend on values), the ramifications of that basic notion are both subtle and profound. For example, the distinction between values and types is still alive and well; but discussing the difference between them becomes far more nuanced than in yer Hindley--Milner or System F. To some extent this is due to the fact that dependent types are fundamentally hard (e.g., first-order logic is undecidable). But I think the bigger problem is really that we lack a good vocabulary for capturing and explaining what's going on. As more and more people learn about dependent types, we'll develop a better vocabulary and so things will become easier to understand, even if the underlying problems are still hard.
The second point has to do with the fact that Haskell is growing towards dependent types. Because we're making incremental progress towards that goal, but without actually making it there, we're stuck with a language that has incremental patches on top of incremental patches. The same sort of thing has happened in other languages as new ideas became popular. Java didn't use to have (parametric) polymorphism; and when they finally added it, it was obviously an incremental improvement with some abstraction leaks and crippled power. Turns out, mixing subtyping and polymorphism is inherently hard; but that's not the reason why Java Generics work the way they do. They work the way they do because of the constraint to be an incremental improvement to older versions of Java. Ditto, for further back in the day when OOP was invented and people started writing "objective" C (not to be confused with Objective-C), etc. Remember, C++ started out under the guise of being a strict superset of C. Adding new paradigms always requires defining the language anew, or else ending up with some complicated mess. My point in all of this is that, adding true dependent types to Haskell is going to require a certain amount of gutting and restructuring the language--- if we're going to do it right. But it's really hard to commit to that kind of an overhaul, whereas the incremental progress we've been making seems cheaper in the short term. Really, there aren't that many people who hack on GHC, but there's a goodly amount of legacy code to keep alive. This is part of the reason why there are so many spinoff languages like DDC, Cayenne, Idris, etc.

Are there type signatures which Haskell can't verify?

This paper establishes that type inference (called "typability" in the paper) in System F is undecidable. What I've never heard mentioned elsewhere is the second result of the paper, namely that "type checking" in F is also undecidable. Here the "type checking" question means: given a term t, type T and typing environment A, is the judgment A ⊢ t : T derivable? That this question is undecidable (and that it's equivalent to the question of typability) is surprising to me, because it seems intuitively like it should be an easier question to answer.
But in any case, given that Haskell is based on System F (or F-omega, even), the result about type checking would seem to suggest that there is a Haskell term t and type T such that the compiler would be unable to decide whether t :: T is valid. If that's the case, I'm curious what such a term and type are... if it's not the case, what am I misunderstanding?
Presumably comprehending the paper would lead to a constructive answer, but I'm a little out of my depth :)

Type checking can be made decidable by enriching the syntax appropriately. For example, in the paper, we have lambdas written as \x -> e; to type-check this, you must guess the type of x. However, with a suitably enriched syntax, this can be written as \x :: t -> e instead, which takes the guess-work out of the process. Similarly, in the paper, they allow type-level lambdas to be implicit; that is, if e :: t, then also e :: forall a. t. To do typechecking, you have to guess when and how many forall's to add, and when to eliminate them. As before, you can make this more deterministic by adding syntax: we add two new expression forms /\a. e and e [t] and two new typing rule that says if e :: t, then /\a. e :: forall a. t, and if e :: forall a. t, then e [t'] :: t [t' / a] (where the brackets in t [t' / a] are substitution brackets). Then the syntax tells us when and how many foralls to add, and when to eliminate them as well.
So the question is: can we go from Haskell to sufficiently-annotated System F terms? And the answer is yes, thanks to a few critical restrictions placed by the Haskell type system. The most critical is that all types are rank one*. Without going into too much detail, "rank" is related to how many times you have to go to the left of an -> constructor to find a forall.
Int -> Bool -- rank 0?
forall a. (a -> a) -- rank 1
(forall a. a -> a) -> (forall a. a -> a) -- rank 2
In particular, this restricts polymorphism a bit. We can't type something like this with rank one types:
foo :: (forall a. a -> a) -> (String, Bool) -- a rank-2 type
foo polymorphicId = (polymorphicId "hey", polymorphicId True)
The next most critical restriction is that type variables can only be replaced by monomorphic types. (This includes other type variables, like a, but not polymorphic types like forall a. a.) This ensures in part that type substitution preserves rank-one-ness.
It turns out that if you make these two restrictions, then not only is type-inference decidable, but you also get minimal types.
If we turn from Haskell to GHC, then we can talk not only about what is typable, but how the inference algorithm looks. In particular, in GHC, there are extensions that relax the above two restrictions; how does GHC do inference in that setting? Well, the answer is that it simply doesn't even try. If you want to write terms using those features, then you must add the typing annotations we talked about all the way back in paragraph one: you must explicitly annotate where foralls get introduced and eliminated. So, can we write a term that GHC's type-checker rejects? Yes, it's easy: simply use un-annotated rank-two (or higher) types or impredicativity. For example, the following doesn't type-check, even though it has an explicit type annotation and is typable with rank-two types:
{-# LANGUAGE Rank2Types #-}
foo :: (String, Bool)
foo = (\f -> (f "hey", f True)) id
* Actually, restricting to rank two is enough to make it decidable, but the algorithm for rank one types can be more efficient. Rank three types already give the programmer enough rope to make the inference problem undecidable. I'm not sure whether these facts were known at the time that the committee chose to restrict Haskell to rank-one types.

Here is an example for a type level implementation of the SKI calculus in Scala: http://michid.wordpress.com/2010/01/29/scala-type-level-encoding-of-the-ski-calculus/
The last example shows an unbounded iteration. If you do the same in Haskell (and I'm pretty sure you can), you have an example for an "untypeable expression".

Are Rank2Types/RankNTypes practical without polytype variables?

Since type variables cannot hold poly-types, it seems that with Rank*Types we cannot re-use existing functions because of their monotype restriction.
For example, we cannot use the function (.) when the intermediate type is a polytype. We are forced to re-implement (.) at the spot. This is of course trivial for (.) but a problem for more substantial bodies of code.
I also think making ((f . g) x) not equivalent to (f (g x)) a severe blow to referential transparency and its benefits.
It seems to me to be a show-stopper issue, and seems to make the Rank*Types extensions almost impractical for wide-spread use.
Am I missing something? Is there a plan to make Rank*Types interact better with the rest of the type-system?
EDIT: How can you make the types of (runST . forever) work out?

The most recent proposal for Rank-N types is Don's linked FPH paper. In my opinion it's also the nicest of the bunch. The main goal of all these systems is to require as few type annotations as possible. The problem is that when going from Hindley/Milner to System F we lose principal types and type inference becomes undecidable – hence the need for type annotations.
The basic idea of the "boxy types" work is to propagate type annotations as far as possible. The type checker switches between type checking and type inference mode and hopefully no more annotations are required. The downside here is that whether or not a type annotation is required is hard to explain because it depends on implementation details.
Remy's MLF system is so far the nicest proposal; it requires the least amount of type annotations and is stable under many code transformations. The problem is that it extends the type system. The following standard example illustrates this:
choose :: forall a. a -> a -> a
id :: forall b. b -> b
choose id :: forall c. (c -> c) -> (c -> c)
choose id :: (forall c. c -> c) -> (forall c. c -> c)
Both the above types are admissable in System F. The first one is the standard Hindley/Milner type and uses predicative instantiation, the second one uses impredicative instantiation. Neither type is more general than the other, so type inference would have to guess which type the user wants, and that is usually a bad idea.
MLF instead extends System F with bounded quantification. The principal (= most general) type for the above example would be:
choose id :: forall (a < forall b. b -> b). a -> a
You can read this as "choose id has type a to a where a must be an instance of forall b. b -> b".
Interestingly, this alone is no more powerful than standard Hindley/Milner. MLF therefore also allows rigid quantification. The following two types are equivalent:
(forall b. b -> b) -> (forall b. b -> b)
forall (a = forall b. b -> b). a -> a
Rigid quantification is introduced by type annotations and the technical details are indeed quite complicated. The upside is that MLF only needs very few type annotations and there is a simple rule for when they are needed. The downsides are:
Types can become harder to read, because the right hand side of '<' can contain further nested quantifications.
Until recently no explicitly typed variant of MLF existed. This is important for typed compiler transformations (like GHC does). Part 3 of Boris Yakobowski's PhD thesis has a first attempt at such a variant. (Parts 1 & 2 are also interesting; they describe a more intuitive representation of MLF via "Graphical Types".)
Coming back to FPH, its basic idea is to use MLF techniques internally, but to require type annotations on let bindings. If you only want the Hindley/Milner type, then no annotations are necessary. If you want a higher-rank type, you need to specify the requested type, but only at the let (or top-level) binding.
FPH (like MLF) supports impredicative instantiation, so I don't think your issue applies. It should therefore have no issue typing your f . g expression above. However, FPH hasn't been implemented in GHC yet and most likely won't be. The difficulties come from the interaction with equality coercions (and possibly type class constraints). I'm not sure what the latest status is, but I heard that SPJ wants to move away from impredicativity. All that expressive power comes at a cost, and so far no affordable and all-accompanying solution has been found.

Is there a plan to make Rank*Types interact better with the rest of the type-system?
Given how common the ST monad is, at least Rank2 types are common enough to be evidence to the contrary. However, you might look at the "sexy/boxy types" series of papers, for how approaches to making arbitrary rank polymorphism play better with others.
FPH : First-class Polymorphism for Haskell, Dimitrios Vytiniotis, Stephanie Weirich, and Simon Peyton Jones, submitted to ICFP 2008.
See also -XImpredicativeTypes -- which interestingly, is slated for deprecation!

About ImpredicativeTypes: that doesn't actually make a difference (I'm relatively sure) to peaker's question. That extension has to do with datatypes. For instance, GHC will tell you that:
Maybe :: * -> *
(forall a. a -> a) :: *
However, this is sort of a lie. It's true in an impredicative system, and in such a system, you can write:
Maybe (forall a. a -> a) :: *
and it will work fine. That is what ImpredicativeTypes enables. Without the extension, the appropriate way to think about this is:
Maybe :: *m -> *m
(forall a :: *m. a -> a) :: *p
and thus there is a kind mismatch when you try to form the application above.
GHC is fairly inconsistent on the impredicativity front, though. For instance, the type for id I gave above would be:
id :: (forall a :: *m. a -> a)
but GHC will gladly accept the annotation (with RankNTypes enabled, but not ImpredicativeTypes):
id :: (forall a. a -> a) -> (forall a. a -> a)
even though forall a. a -> a is not a monotype. So, it will allow impredicative instantiation of quantified variables that are used only with (->) if you annotate as such. But it won't do it itself, I guess, which leads to the runST $ ... problems. That used to be solved with an ad-hoc instantiation rule (the details of which I was never particularly clear on), but that rule was removed not long after it was added.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string