What is Haskell missing for totality checking? - haskell

A total (functional) language is one in which everything can be shown to terminate. Obviously, there are lots of places where I don't want this - throwing exceptions is sometimes handy, a web-server isn't supposed to terminate, etc. But sometimes, I would like a local totality check to enable certain optimizations. For example, if I have a provably-total function
commutativity :: forall (n :: Nat) (m :: Nat). n + m :~: m + n
commutativity = ...
then, since :~: has exactly one inhabitant (Refl), GHC could optimize
gcastWith (commutativity #n #m) someExpression
==>
someExpression
And my proof of commutativity goes from having an O(n) runtime cost to being free. So, now for my question:
What are some of the subtle difficulties in making a totality checker for Haskell?
Obviously, such a checker is conservative so whenever GHC isn't sure something is total (or is to lazy to check) it could assume it isn't... Seems to me it might not be too difficult to cobble together a not-so-smart checker that would still be very useful (at least it should be straightforward to eliminate all my arithmetic proofs). Yet, I can't seem to find any efforts to build such a thing into GHC, so obviously I'm missing some pretty big constraints. Go ahead SO, crush my dreams. :)
Relevant but not recent: Unfailing Haskell by Neil Mitchell, 2005.

Liquid Haskell has totality checking: https://github.com/ucsd-progsys/liquidhaskell#termination-check
By default a termination check is performed on all recursive functions.
Use the no-termination option to disable the check
liquid --no-termination test.hs
In recursive functions the first algebraic or integer argument should be decreasing.
The default decreasing measure for lists is length and Integers its value.
(I included screenshot and quote for posterity.)
Similar to in Agda or other languages with totality checking, the arguments to the function must become structurally smaller over time to reach the base case. Combined with a totality checker, this makes for a reliable check for a number of functions. LH also supports helping the checker along by indicating how things decrease, which you could do with an abstract data type that's opaque or from an FFI. It's really quite practical.

Related

How to make a partial function?

I was thinking about how I could save myself from undefinition, and one idea I had was to enumerate all possible sources of partiality. At least I would know what of to beware. I found three yet:
Incomplete pattern matches or guards.
Recursion. (Optionally excluding structural recursion on algebraic types.)
If a function is unsafe, any use of that function infects the user code. (Should I be saying "partiality is transitive"?)
I have heard of other ways to obtain a logical contradiction, for instance by using negative types, but I am not sure if anything of that sort applies to Haskell. There are many logical paradoxes out there, and some of them can be encoded in Haskell, but may it be true that any logical paradox requires the use of recursion, and is therefore covered by the point 2 above?
For instance, if it were proven that a Haskell expression free of recursion can always be evaluated to normal form, then the three points I give would be a complete list. I fuzzily remember seeing something like a proof of this in one of Simon Peyton Jones's books, but that was written like 30 years ago, so even if I remember correctly and it used to apply to a prototype Haskell back then, it may be false today, seeing how many a language extension we have. Possibly some of them enable other ways to undefine a program?
And then, if it were so easy to detect expressions that cannot be partial, why do we not do that? How easier would life be!
This is a partial answer (pun intended), where I'll only list a few arguably non obvious ways one can achieve non termination.
First, I'll confirm that negative-recursive types can indeed cause non termination. Indeed, it is known that allowing a recursive type such as
data R a = R (R a -> a)
allows one to define fix, and obtain non termination from there.
{-# LANGUAGE ScopedTypeVariables #-}
{-# OPTIONS -Wall #-}
data R a = R (R a -> a)
selfApply :: R a -> a
selfApply t#(R x) = x t
-- Church's fixed point combinator Y
-- fix f = (\x. f (x x))(\x. f (x x))
fix :: forall a. (a -> a) -> a
fix f = selfApply (R (\x -> f (selfApply x)))
Total languages like Coq or Agda prohibit this by requiring recursive types to use only strictly-positive recursion.
Another potential source of non-termination is that Haskell allows Type :: Type. As far as I can see, that makes it possible to encode System U in Haskell, where Girard's paradox can be used to cause a logical inconsistency, constructing a term of type Void. That term (as far as I understand) would be non terminating.
Girard's paradox is unfortunately rather complex to fully describe, and I have not completely studied it yet. I only know it is related to the so-called hypergame, a game where the first move is to choose a finite game to play. A finite game is one which causes every match to terminate after finitely many moves. The next moves after that would correspond to a match according to the chosen finite game at step one. Here's the paradox: since the chosen game must be finite, no matter what it is, the whole hypergame match will always terminate after a finite amount of moves. This makes hypergame itself a finite game, making the infinite sequence of moves "I choose hypergame, I choose hypergame, ..." a valid play, in turn proving that hypergame is not finite.
Apparently, this argument can be encoded in a rich enough pure type system like System U, and Type :: Type allows to embed the same argument.

Getting value out of context or leaving them in?

Assume code below. Is there a quicker way to get the contextual values out of findSerial rather than writing a function like outOfContext?
The underlying question is: does one usually stick within context and use Functors, Applicatives, Monoids and Monads to get the job done, or is it better to take it out of context and apply the usual non-contextual computation methods. In brief: don't want to learn Haskell all wrong, since it takes time enough as it does.
import qualified Data.Map as Map
type SerialNumber = (String, Int)
serialList :: Map.Map String SerialNumber
serialList = Map.fromList [("belt drive",("BD",0001))
,("chain drive",("CD",0002))
,("drive pulley",("DP",0003))
,("drive sprocket",("DS",0004))
]
findSerial :: Ord k => k -> Map.Map k a -> Maybe a
findSerial input = Map.lookup input
outOfContext (Just (a, b)) = (a, b)
Assuming I understand it correctly, I think your question essentially boils down to “Is it idiomatic in Haskell to write and use partial functions?” (which your outOfContext function is, since it’s just a specialized form of the built-in partial function fromJust). The answer to that question is a resounding no. Partial functions are avoided whenever possible, and code that uses them can usually be refactored into code that doesn’t.
The reason partial functions are avoided is that they voluntarily compromise the effectiveness of the type system. In Haskell, when a function has type X -> Y, it is generally assumed that providing it an X will actually produce a Y, and that it will not do something else entirely (i.e. crash). If you have a function that doesn’t always succeed, reflecting that information in the type by writing X -> Maybe Y forces the caller to somehow handle the Nothing case, and it can either handle it directly or defer the failure further to its caller (by also producing a Maybe). This is great, since it means that programs that typecheck won’t crash at runtime. The program might still have logical errors, but knowing before even running the program that it won’t blow up is still pretty nice.
Partial functions throw this guarantee out the window. Any program that uses a partial function will crash at runtime if the function’s preconditions are accidentally violated, and since those preconditions are not reflected in the type system, the compiler cannot statically enforce them. A program might be logically correct at the time of its writing, but without enforcing that correctness with the type system, further modification, extension, or refactoring could easily introduce a bug by mistake.
For example, a programmer might write the expression
if isJust n then fromJust n else 0
which will certainly never crash at runtime, since fromJust’s precondition is always checked before it is called. However, the type system cannot enforce this, and a further refactoring might swap the branches of the if, or it might move the fromJust n to a different part of the program entirely and accidentally omit the isJust check. The program will still compile, but it may fail at runtime.
In contrast, if the programmer avoids partial functions, using explicit pattern-matching with case or total functions like maybe and fromMaybe, they can replace the tricky conditional above with something like
fromMaybe 0 n
which is not only clearer, but ensures any accidental misuse will simply fail to typecheck, and the potential bug will be detected much earlier.
For some concrete examples of how the type system can be a powerful ally if you stick exclusively to total functions, as well as some good food for thought about different ways to encode type safety for your domain into Haskell’s type system, I highly recommend reading Matt Parsons’s wonderful blog post, Type Safety Back and Forth, which explores these ideas in more depth. It additionally highlights how using Maybe as a catch-all representation of failure can be awkward, and it shows how the type system can be used to enforce preconditions to avoid needing to propagate Maybe throughout an entire system.

Can compilers deduce/prove mathematically?

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?
Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.
The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.
The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.
What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)
As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.
Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

Why not be dependently typed?

I have seen several sources echo the opinion that "Haskell is gradually becoming a dependently-typed language". The implication seems to be that with more and more language extensions, Haskell is drifting in that general direction, but isn't there yet.
There are basically two things I would like to know. The first is, quite simply, what does "being a dependently-typed language" actually mean? (Hopefully without being too technical about it.)
The second question is... what's the drawback? I mean, people know we're heading that way, so there must be some advantage to it. And yet, we're not there yet, so there must be some downside stopping people going all the way. I get the impression that the problem is a steep increase in complexity. But, not really understanding what dependent typing is, I don't know for sure.
What I do know is that every time I start reading about a dependently-typed programming language, the text is utterly incomprehensible... Presumably that's the problem. (?)
Dependently Typed Haskell, Now?
Haskell is, to a small extent, a dependently typed language. There is a notion of type-level data, now more sensibly typed thanks to DataKinds, and there is some means (GADTs) to give a run-time
representation to type-level data. Hence, values of run-time stuff effectively show up in types, which is what it means for a language to be dependently typed.
Simple datatypes are promoted to the kind level, so that the values
they contain can be used in types. Hence the archetypal example
data Nat = Z | S Nat
data Vec :: Nat -> * -> * where
VNil :: Vec Z x
VCons :: x -> Vec n x -> Vec (S n) x
becomes possible, and with it, definitions such as
vApply :: Vec n (s -> t) -> Vec n s -> Vec n t
vApply VNil VNil = VNil
vApply (VCons f fs) (VCons s ss) = VCons (f s) (vApply fs ss)
which is nice. Note that the length n is a purely static thing in
that function, ensuring that the input and output vectors have the
same length, even though that length plays no role in the execution of
vApply. By contrast, it's much trickier (i.e., impossible) to
implement the function which makes n copies of a given x (which
would be pure to vApply's <*>)
vReplicate :: x -> Vec n x
because it's vital to know how many copies to make at run-time. Enter
singletons.
data Natty :: Nat -> * where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
For any promotable type, we can build the singleton family, indexed
over the promoted type, inhabited by run-time duplicates of its
values. Natty n is the type of run-time copies of the type-level n
:: Nat. We can now write
vReplicate :: Natty n -> x -> Vec n x
vReplicate Zy x = VNil
vReplicate (Sy n) x = VCons x (vReplicate n x)
So there you have a type-level value yoked to a run-time value:
inspecting the run-time copy refines static knowledge of the
type-level value. Even though terms and types are separated, we can
work in a dependently typed way by using the singleton construction as
a kind of epoxy resin, creating bonds between the phases. That's a
long way from allowing arbitrary run-time expressions in types, but it ain't nothing.
What's Nasty? What's Missing?
Let's put a bit of pressure on this technology and see what starts
wobbling. We might get the idea that singletons should be manageable a
bit more implicitly
class Nattily (n :: Nat) where
natty :: Natty n
instance Nattily Z where
natty = Zy
instance Nattily n => Nattily (S n) where
natty = Sy natty
allowing us to write, say,
instance Nattily n => Applicative (Vec n) where
pure = vReplicate natty
(<*>) = vApply
That works, but it now means that our original Nat type has spawned
three copies: a kind, a singleton family and a singleton class. We
have a rather clunky process for exchanging explicit Natty n values
and Nattily n dictionaries. Moreover, Natty is not Nat: we have
some sort of dependency on run-time values, but not at the type we
first thought of. No fully dependently typed language makes dependent
types this complicated!
Meanwhile, although Nat can be promoted, Vec cannot. You can't
index by an indexed type. Full on dependently typed languages impose
no such restriction, and in my career as a dependently typed show-off,
I've learned to include examples of two-layer indexing in my talks,
just to teach folks who've made one-layer indexing
difficult-but-possible not to expect me to fold up like a house of
cards. What's the problem? Equality. GADTs work by translating the
constraints you achieve implicitly when you give a constructor a
specific return type into explicit equational demands. Like this.
data Vec (n :: Nat) (x :: *)
= n ~ Z => VNil
| forall m. n ~ S m => VCons x (Vec m x)
In each of our two equations, both sides have kind Nat.
Now try the same translation for something indexed over vectors.
data InVec :: x -> Vec n x -> * where
Here :: InVec z (VCons z zs)
After :: InVec z ys -> InVec z (VCons y ys)
becomes
data InVec (a :: x) (as :: Vec n x)
= forall m z (zs :: Vec x m). (n ~ S m, as ~ VCons z zs) => Here
| forall m y z (ys :: Vec x m). (n ~ S m, as ~ VCons y ys) => After (InVec z ys)
and now we form equational constraints between as :: Vec n x and
VCons z zs :: Vec (S m) x where the two sides have syntactically
distinct (but provably equal) kinds. GHC core is not currently
equipped for such a concept!
What else is missing? Well, most of Haskell is missing from the type
level. The language of terms which you can promote has just variables
and non-GADT constructors, really. Once you have those, the type family machinery allows you to write type-level programs: some of
those might be quite like functions you would consider writing at the
term level (e.g., equipping Nat with addition, so you can give a
good type to append for Vec), but that's just a coincidence!
Another thing missing, in practice, is a library which makes
use of our new abilities to index types by values. What do Functor
and Monad become in this brave new world? I'm thinking about it, but
there's a lot still to do.
Running Type-Level Programs
Haskell, like most dependently typed programming languages, has two
operational semanticses. There's the way the run-time system runs
programs (closed expressions only, after type erasure, highly
optimised) and then there's the way the typechecker runs programs
(your type families, your "type class Prolog", with open expressions). For Haskell, you don't normally mix
the two up, because the programs being executed are in different
languages. Dependently typed languages have separate run-time and
static execution models for the same language of programs, but don't
worry, the run-time model still lets you do type erasure and, indeed,
proof erasure: that's what Coq's extraction mechanism gives you;
that's at least what Edwin Brady's compiler does (although Edwin
erases unnecessarily duplicated values, as well as types and
proofs). The phase distinction may not be a distinction of syntactic category
any longer, but it's alive and well.
Dependently typed languages, being total, allow the typechecker to run
programs free from the fear of anything worse than a long wait. As
Haskell becomes more dependently typed, we face the question of what
its static execution model should be? One approach might be to
restrict static execution to total functions, which would allow us the
same freedom to run, but might force us to make distinctions (at least
for type-level code) between data and codata, so that we can tell
whether to enforce termination or productivity. But that's not the only
approach. We are free to choose a much weaker execution model which is
reluctant to run programs, at the cost of making fewer equations come
out just by computation. And in effect, that's what GHC actually
does. The typing rules for GHC core make no mention of running
programs, but only for checking evidence for equations. When
translating to the core, GHC's constraint solver tries to run your type-level programs,
generating a little silvery trail of evidence that a given expression
equals its normal form. This evidence-generation method is a little
unpredictable and inevitably incomplete: it fights shy of
scary-looking recursion, for example, and that's probably wise. One
thing we don't need to worry about is the execution of IO
computations in the typechecker: remember that the typechecker doesn't have to give
launchMissiles the same meaning that the run-time system does!
Hindley-Milner Culture
The Hindley-Milner type system achieves the truly awesome coincidence
of four distinct distinctions, with the unfortunate cultural
side-effect that many people cannot see the distinction between the
distinctions and assume the coincidence is inevitable! What am I
talking about?
terms vs types
explicitly written things vs implicitly written things
presence at run-time vs erasure before run-time
non-dependent abstraction vs dependent quantification
We're used to writing terms and leaving types to be inferred...and
then erased. We're used to quantifying over type variables with the
corresponding type abstraction and application happening silently and
statically.
You don't have to veer too far from vanilla Hindley-Milner
before these distinctions come out of alignment, and that's no bad thing. For a start, we can have more interesting types if we're willing to write them in a few
places. Meanwhile, we don't have to write type class dictionaries when
we use overloaded functions, but those dictionaries are certainly
present (or inlined) at run-time. In dependently typed languages, we
expect to erase more than just types at run-time, but (as with type
classes) that some implicitly inferred values will not be
erased. E.g., vReplicate's numeric argument is often inferable from the type of the desired vector, but we still need to know it at run-time.
Which language design choices should we review because these
coincidences no longer hold? E.g., is it right that Haskell provides
no way to instantiate a forall x. t quantifier explicitly? If the
typechecker can't guess x by unifiying t, we have no other way to
say what x must be.
More broadly, we cannot treat "type inference" as a monolithic concept
that we have either all or nothing of. For a start, we need to split
off the "generalisation" aspect (Milner's "let" rule), which relies heavily on
restricting which types exist to ensure that a stupid machine can
guess one, from the "specialisation" aspect (Milner's "var" rule)
which is as effective as your constraint solver. We can expect that
top-level types will become harder to infer, but that internal type
information will remain fairly easy to propagate.
Next Steps For Haskell
We're seeing the type and kind levels grow very similar (and they
already share an internal representation in GHC). We might as well
merge them. It would be fun to take * :: * if we can: we lost
logical soundness long ago, when we allowed bottom, but type
soundness is usually a weaker requirement. We must check. If we must have
distinct type, kind, etc levels, we can at least make sure everything
at the type level and above can always be promoted. It would be great
just to re-use the polymorphism we already have for types, rather than
re-inventing polymorphism at the kind level.
We should simplify and generalise the current system of constraints by
allowing heterogeneous equations a ~ b where the kinds of a and
b are not syntactically identical (but can be proven equal). It's an
old technique (in my thesis, last century) which makes dependency much
easier to cope with. We'd be able to express constraints on
expressions in GADTs, and thus relax restrictions on what can be
promoted.
We should eliminate the need for the singleton construction by
introducing a dependent function type, pi x :: s -> t. A function
with such a type could be applied explicitly to any expression of type s which
lives in the intersection of the type and term languages (so,
variables, constructors, with more to come later). The corresponding
lambda and application would not be erased at run-time, so we'd be
able to write
vReplicate :: pi n :: Nat -> x -> Vec n x
vReplicate Z x = VNil
vReplicate (S n) x = VCons x (vReplicate n x)
without replacing Nat by Natty. The domain of pi can be any
promotable type, so if GADTs can be promoted, we can write dependent
quantifier sequences (or "telescopes" as de Briuijn called them)
pi n :: Nat -> pi xs :: Vec n x -> ...
to whatever length we need.
The point of these steps is to eliminate complexity by working directly with more general tools, instead of making do with weak tools and clunky encodings. The current partial buy-in makes the benefits of Haskell's sort-of dependent types more expensive than they need to be.
Too Hard?
Dependent types make a lot of people nervous. They make me nervous,
but I like being nervous, or at least I find it hard not to be nervous
anyway. But it doesn't help that there's quite such a fog of ignorance
around the topic. Some of that's due to the fact that we all still
have a lot to learn. But proponents of less radical approaches have
been known to stoke fear of dependent types without always making sure
the facts are wholly with them. I won't name names. These "undecidable typechecking", "Turing incomplete", "no phase distinction", "no type erasure", "proofs everywhere", etc, myths persist, even though they're rubbish.
It's certainly not the case that dependently typed programs must
always be proven correct. One can improve the basic hygiene of one's
programs, enforcing additional invariants in types without going all
the way to a full specification. Small steps in this direction quite
often result in much stronger guarantees with few or no additional
proof obligations. It is not true that dependently typed programs are
inevitably full of proofs, indeed I usually take the presence of any
proofs in my code as the cue to question my definitions.
For, as with any increase in articulacy, we become free to say foul
new things as well as fair. E.g., there are plenty of crummy ways to
define binary search trees, but that doesn't mean there isn't a good way. It's important not to presume that bad experiences cannot be
bettered, even if it dents the ego to admit it. Design of dependent
definitions is a new skill which takes learning, and being a Haskell
programmer does not automatically make you an expert! And even if some
programs are foul, why would you deny others the freedom to be fair?
Why Still Bother With Haskell?
I really enjoy dependent types, but most of my hacking projects are
still in Haskell. Why? Haskell has type classes. Haskell has useful
libraries. Haskell has a workable (although far from ideal) treatment
of programming with effects. Haskell has an industrial strength
compiler. The dependently typed languages are at a much earlier stage
in growing community and infrastructure, but we'll get there, with a
real generational shift in what's possible, e.g., by way of
metaprogramming and datatype generics. But you just have to look
around at what people are doing as a result of Haskell's steps towards
dependent types to see that there's a lot of benefit to be gained by
pushing the present generation of languages forwards, too.
Dependent typing is really just the unification of the value and type levels, so you can parametrize values on types (already possible with type classes and parametric polymorphism in Haskell) and you can parametrize types on values (not, strictly speaking, possible yet in Haskell, although DataKinds gets very close).
Edit: Apparently, from this point forward, I was wrong (see #pigworker's comment). I'll preserve the rest of this as a record of the myths I've been fed. :P
The issue with moving to full dependent typing, from what I've heard, is that it would break the phase restriction between the type and value levels that allows Haskell to be compiled to efficient machine code with erased types. With our current level of technology, a dependently typed language must go through an interpreter at some point (either immediately, or after being compiled to dependently-typed bytecode or similar).
This is not necessarily a fundamental restriction, but I'm not personally aware of any current research that looks promising in this regard but that has not already made it into GHC. If anyone else knows more, I would be happy to be corrected.
John that's another common misconception about dependent types: that they don't work when data is only available at run-time. Here's how you can do the getLine example:
data Some :: (k -> *) -> * where
Like :: p x -> Some p
fromInt :: Int -> Some Natty
fromInt 0 = Like Zy
fromInt n = case fromInt (n - 1) of
Like n -> Like (Sy n)
withZeroes :: (forall n. Vec n Int -> IO a) -> IO a
withZeroes k = do
Like n <- fmap (fromInt . read) getLine
k (vReplicate n 0)
*Main> withZeroes print
5
VCons 0 (VCons 0 (VCons 0 (VCons 0 (VCons 0 VNil))))
Edit: Hm, that was supposed to be a comment to pigworker's answer. I clearly fail at SO.
pigworker gives an excellent discussion of why we should be headed towards dependent types: (a) they're awesome; (b) they would actually simplify a lot of what Haskell already does.
As for the "why not?" question, there are a couple points I think. The first point is that while the basic notion behind dependent types is easy (allow types to depend on values), the ramifications of that basic notion are both subtle and profound. For example, the distinction between values and types is still alive and well; but discussing the difference between them becomes far more nuanced than in yer Hindley--Milner or System F. To some extent this is due to the fact that dependent types are fundamentally hard (e.g., first-order logic is undecidable). But I think the bigger problem is really that we lack a good vocabulary for capturing and explaining what's going on. As more and more people learn about dependent types, we'll develop a better vocabulary and so things will become easier to understand, even if the underlying problems are still hard.
The second point has to do with the fact that Haskell is growing towards dependent types. Because we're making incremental progress towards that goal, but without actually making it there, we're stuck with a language that has incremental patches on top of incremental patches. The same sort of thing has happened in other languages as new ideas became popular. Java didn't use to have (parametric) polymorphism; and when they finally added it, it was obviously an incremental improvement with some abstraction leaks and crippled power. Turns out, mixing subtyping and polymorphism is inherently hard; but that's not the reason why Java Generics work the way they do. They work the way they do because of the constraint to be an incremental improvement to older versions of Java. Ditto, for further back in the day when OOP was invented and people started writing "objective" C (not to be confused with Objective-C), etc. Remember, C++ started out under the guise of being a strict superset of C. Adding new paradigms always requires defining the language anew, or else ending up with some complicated mess. My point in all of this is that, adding true dependent types to Haskell is going to require a certain amount of gutting and restructuring the language--- if we're going to do it right. But it's really hard to commit to that kind of an overhaul, whereas the incremental progress we've been making seems cheaper in the short term. Really, there aren't that many people who hack on GHC, but there's a goodly amount of legacy code to keep alive. This is part of the reason why there are so many spinoff languages like DDC, Cayenne, Idris, etc.

What are super combinators and constant applicative forms?

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Resources