How did Haskell add Turing-completeness to System F? - haskell

I've been reading up on various type systems and lambda calculi, and i see that all of the typed lambda calculi in the lambda cube are strongly normalizing rather than Turing equivalent. This includes System F, the simply typed lambda calculus plus polymorphism.
This leads me to the following questions, for which I've been unable to find any comprehensible answer:
How does the formalism of (e.g.) Haskell differ from the calculus on which it is ostensibly based?
What language features in Haskell don't fall within System F formalism?
What's the minimum change necessary to allow Turing complete computation?
Thank you so much to whomever helps me understand this.

In a word, general recursion.
Haskell allows for arbitrary recursion while System F has no form of recursion. The lack of infinite types means fix isn't expressible as a closed term.
There is no primitive notion of names and recursion. In fact, pure System F has no notion of any such thing as definitions!
So in Haskell this single definition is what adds turing completeness
fix :: (a -> a) -> a
fix f = let x = f x in x
Really this function is indicative of a more general idea, by having fully recursive bindings, we get turing completeness. Notice that this applies to types, not just values.
data Rec a = Rec {unrec :: Rec a -> a}
y :: (a -> a) -> a
y f = u (Rec u)
where u x = f $ unrec x x
With infinite types we can write the Y combinator (modulo some unfolding) and through it general recursion!
In pure System F, we often have some informal notion of definitions, but these are simply shorthands that are to be mentally inlined fully. This isn't possible in Haskell as this would create infinite terms.
The kernel of Haskell terms without any notion of let, where or = is strongly normalizing, since we don't have infinite types. Even this core term calculus isn't really System F. System F has "big lambdas" or type abstraction. The full term for id in System F is
id := /\ A -> \(x : A) -> x
This is because type inference for System F is undecidable! We explicitly notate wherever and whenever we expect polymorphism. In Haskell such a property would be annoying, so we limit the power of Haskell. In particular, we never infer a polymorphic type for a Haskell lambda argument without annotation (terms and conditions may apply). This is why in ML and Haskell
let x = exp in foo
isn't the same as
(\x -> foo) exp
even when exp isn't recursive! This is the crux of HM type inference and algorithm W, called "let generalization".

Related

Why aren't Haskell variables polymorphic when bound by pattern matching?

Consider a variable introduced in a pattern, such as f in this Haskell example:
case (\x -> x) of f -> (f True, f 'c')
This code results in a type error ("Couldn't match expected type ‘Bool’ with actual type ‘Char’"), because of the two different uses of f. It shows that the inferred type of f is not polymorphic in Haskell.
But why shouldn't f be polymorphic?
I have two points of comparison: OCaml and "textbook" Hindley-Milner. Both suggest that f ought to be polymorphic.
In OCaml, the analogous code is not an error:
match (fun x -> x) with f -> (f true, f 'c')
This evaluates to (true, 'c') with type bool * char. So it looks like OCaml gets along fine with assigning f a polymorphic type.
We can gain clarity by stripping things down to the fundamentals of Hindley-Milner - lambda calculus with "let" - which both Haskell and OCaml are based on. When reduce to this core system, of course, there is no such thing as pattern matching. We can draw parallels though. Between "let" and "lambda", case expr1 of f -> expr2 is much closer to let f = expr1 in expr2 than to (lambda f. expr2) expr1. "Case", like "let", syntactically restricts f to be bound to expr1, while a function lambda f. expr2 doesn't know what f will be bound to since the function has no such restriction on where in the program it will be called. This was the reason why let-bound variables are generalized in Hindley-Milner and lambda-bound variables are not. It appears that the same reasoning that allows let-bound variables to be generalized shows that variables introduced by pattern matching could be generalized too.
The examples above are minimal for clarity, so they only show a trivial pattern f in the pattern matching, but all the same logic extends to arbitrarily complex patterns like Just (a:b:(x,y):_), which can introduce multiple variables that would all be generalized.
Is my analysis correct? In Haskell specifically - recognizing that it's not just plain Hindley-Milner and not OCaml - why don't we generalize the type of f in the first example?
Was this an explicit language design decision, and if so, what were the reasons? (I note that some in the community think that not even "let" should be generalized, but I would imagine the design decision pre-dates that paper.)
If variables introduced in a pattern were made polymorphic similar to "let", would that break compatibility with other aspects of Haskell in a significant way?
If we assign a polymorphic type (forall x. t) to a case scrutinee, then it matches no non-trivial pattern, so there's no point for having case.
Could we generalize in some other useful way? Not really, because of GHC's lack of support for "impredicative" instantiation. In your example of Just (a:b:(x,y):_), not a single bound variable can have polymorphic type, since Maybe, (,), and [] cannot be instantiated with such types.
One thing works, as mentioned in the comments: data types with polymorphic fields, such as data Endo = Endo (forall a. a -> a). However, type checking for polymorphic fields doesn't technically involve a generalization step, nor does it behave like let-generalization.
In principle, generalization could be performed at many points, for example even at arbitrary function arguments (e.g. in f (\x -> x)). However, too much generalization clogs up type inference by introducing untractable higher-rank types; this can be also understood as eliminating useful type dependencies between different parts of the program by removing unsolved metavariables. Although there are systems which can handle higher-rank inference much better than GHC, most notably MLF, they're also much more complicated and haven't seen much practical use. I personally prefer to not have silent let-generalization at all.
One first issue is that with type classes, generalization is not always free. Consider show :: forall a. Show a => a -> String and this expression:
case show of
f -> ...
If you generalize f to f :: forall a. Show a => a -> String, then GHC will pass a Show dictionary at every call of f, instead of once at the single occurrence of show. In case there are multiple calls all at the same type, this duplicates work compared to not generalizing.
It is also not actually a generalization of the current type inference algorithm when combined with type classes: it can cause existing programs to no longer typecheck. For example,
case show of
f -> f () ++ f mempty
By not generalizing f, we can infer that mempty has type (). On the other hand, generalizing f :: forall a. Show a => a -> String will lose that connection, and the type of mempty in that expression will be ambiguous.
It is true though that these are minor issues, and maybe things would be mostly fine with some monomorphism restrictions, even if not entirely backwards compatible.
In addition to the other answers, there's a reason for how type variables are treated in pattern matches in terms of the interaction with existential types. Let's take a look at a couple definitions from Data.Functor.Coyoneda:
{-# LANGUAGE GADTs #-}
data Coyoneda f a where
Coyoneda :: (b -> a) -> f b -> Coyoneda f a
lowerCoyoneda :: Functor f => Coyoneda f a -> f a
lowerCoyoneda (Coyoneda g x) = fmap g x
Coyoneda has an existential type variable used by both arguments to the constructor. If GHC doesn't pin that type down, there's no way for the fmap in lowerCoyoneda to type-check. GHC needs to know that g and x have the appropriate relation in their types, and that requires fixing the type variable in the pattern match.

How to think about the lack of laws

I am a mathematician who works a lot with category theory, and I've been using Haskell for a while to perform certain computations etc., but I am definitely not a programmer. I really love Haskell and want to become much more fluent in it, and the type system is something that I find especially great to have in place when writing programs.
However, I've recently been trying to implement category theoretic things, and am running into problems concerning the fact that you seemingly can't have class method laws in Haskell. In case my terminology here is wrong, what I mean is that I can write
class Monoid c where
id :: c -> c
m :: c -> c -> c
but I can't write some law along the lines of
m (m x y) z == m x $ m y z
From what I gather, this is due to the lack of dependent types in Haskell, but I'm not sure how exactly this is the case (having now read a bit about dependent types). It also seems that the convention is just to include laws like this in comments and hope that you don't accidentally cook up some instance that doesn't satisfy them.
How should I change my approach to Haskell to deal with this problem? Is there a nice mathematical/type-theoretic solution (for example, require the existence of an associator that is an isomorphism (though then the question is, how do we encode isomorphisms without a law?)); is there some 'hack' (using extensions such as DataKinds); should I be drastic and switch to using something like Idris instead; or is the best response to just change the way I think about using Haskell (i.e. accept that these laws can't be implemented in a Haskelly way)?
(bonus) How exactly does the lack of laws come from not supporting dependent types?
You want to require that:
m (m x y) z = m x (m y z) -- (1)
But to require this you need a way to check it. So you, or your compiler (or proof assistant), need to construct a proof of this. And the question is, what type is a proof of (1)?
One could imagine some Proof type but then maybe you could just construct a proof that 0 = 0 instead of a proof of (1) and both would have type Proof. So you’d need a more general type. I can’t decide how to break up the rest of the question so I’ll go for a super brief explanation of the Curry-Howard isomorphism followed by an explanation of how to prove two things are equal and then how dependent types are relevant.
The Curry-Howard isomorphism says that propositions are isomorphic to types and proofs are isomorphic to programs: a type corresponds to a proposition and a proof of that proposition corresponds to a program constructing a value inhabiting that type. Ignoring how many propositions might be expressed as types, an example would be that the type A * B (written (A, B) in Haskell) corresponds to the proposition “A and B,” while the type A + B (written Either A B in Haskell) corresponds to the proposition “A or B.” Finally the type A -> B corresponds to “A implies B,” as a proof of this is a program which takes evidence of A and gives you evidence of B. One should note that there isn’t a way to express not A but one could imagine adding a type Not A with builtins of type Either a (Not a) for the law of the excluded middle as well as Not (Not a) -> a, and a * Not a -> Void (where Void is a type which cannot be inhabited and therefore corresponds to false), but then one can’t really run these programs to get constructivist proofs.
Now we will ignore some realities of Haskell and imagine that there aren’t ways round these rules (in particular undefined :: a says everything is true, and unsafeCoerce :: a -> b says that anything implies anything else, or just other functions that don’t return where their existence does not imply the corresponding proof).
So we know how to combine propositions but what might a proposition be? Well one could be to say that two types are equal. In Haskell this corresponds to the GADT
data Eq a b where Refl :: Eq c c
Where this constructor corresponds to the reflexive property of equality.
[side note: if you’re still interested so far, you may be interested to look up Voevodsky’s univalent foundations, depending on how much the idea of “Homotopy type theory” interests you]
So can we prove something now? How about the transitive property of equality:
trans :: Eq a b -> Eq b c -> Eq a c
trans x y =
case x of
Refl -> -- by this match being successful, the compiler now knows that a = b
case y of
Refl -> -- and now b = c and so the compiler knows a = c
Refl -- the compiler knows that this is of type Eq d d, and as it knows a = c, this typechecks as Eq a c
This feels like one hasn’t really proven anything (especially as this mainly relies on the compiler knowing the transitive and symmetric properties), but one gets a similar feeling when proving simple things in logic as well.
So now how might you prove the original proposition (1)? Well let’s imagine we want a type c to be a monoid then we should also prove that $\forall x,y,z:c, m (m x y) z = m x (m y z).$ So we need a way to express m (m x y) z as a type. Strictly speaking this isn’t dependent types (this can be done with DataKinds to promote values and type families instead of functions). But you do need dependent types to have types depend on values. Specifically if you have a type Nat of natural numbers and a type family Vec :: Nat -> * (* is the kind (read type) of all types) of fixed length vectors, you could define a dependently typed function mkVec :: (n::Nat) -> Vec n. Observe how the type of the output depends on the value of the input.
So your law needs to have functions promoted to type level (skipping the questions about how one defines type equality and value equality), as well as dependent types (made up syntax):
class Monoid c where
e :: c
(*) :: c -> c -> c
idl :: (x::c) -> Eq x (e * x)
idr :: (x::c) -> Eq x (x * e)
assoc :: (x::c) -> (y::c) -> (z::c) -> Eq ((x * y) * z) (x * (y * z))
Observe how types tend to become large with dependent types and proofs. In a language missing typeclasses one could put such values into a record.
Final note on the theory of dependent types and how these correspond to the curry Howard isomorphism.
Dependent types can be considered an answer to the question: what types correspond to the propositions $\forall x\in S\quad P(x)$ and $\exists y\in T\quad Q(y)?$
The answer is that you create new ways to make types: the dependent product and the dependent sum (coproduct). The dependent product expresses “for all values $x$ of type $S,$ there is a value of type $P(x).$” A normal product would be a dependent product with $S=2,$ a type inhabited by two values. A dependent product might be written (x:T) -> P x. A dependent sum says “some value $y$ of type $T$, paired with a value of type $Q(y).$” this might be written (y:T) * Q y.
One can think of these as a generalisation of arbitrarily indexed (co)products from Set to general categories, where one might sensibly write e.g. $\prod_\Lambda X(\lambda),$ and sometimes such notation is used in type theory.

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?
If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.
In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

what is the meaning of "let x = x in x" and "data Float#" in GHC.Prim in Haskell

I looked at the module of GHC.Prim and found that it seems that all datas in GHC.Prim are defined as data Float# without something like =A|B, and all functions in GHC.Prim is defined as gtFloat# = let x = x in x.
My question is whether these definations make sense and what they mean.
I checked the header of GHC.Prim like below
{-
This is a generated file (generated by genprimopcode).
It is not code to actually be used. Its only purpose is to be
consumed by haddock.
-}
I guess it may have some relations with the questions and who could please explain that to me.
It's magic :)
These are the "primitive operators and operations". They are hardwired into the compiler, hence there are no data constructors for primitives and all functions are bottom since they are necessarily not expressable in pure haskell.
(Bottom represents a "hole" in a haskell program, an infinite loop or undefined are examples of bottom)
To put it another way
These data declarations/functions are to provide access to the raw compiler internals. GHC.Prim exists to export these primitives, it doesn't actually implement them or anything (eg its code isn't actually useful). All of that is done in the compiler.
It's meant for code that needs to be extremely optimized. If you think you might need it, some useful reading about the primitives in GHC
A brief expansion of jozefg's answer ...
Primops are precisely those operations that are supplied by the runtime because they can't be defined within the language (or shouldn't be, for reasons of efficiency, say). The true purpose of GHC.Prim is not to define anything, but merely to export some operations so that Haddock can document their existence.
The construction let x = x in x is used at this point in GHC's codebase because the value undefined has not yet been, um, "defined". (That waits until the Prelude.) But notice that the circular let construction, just like undefined, is both syntactically correct and can have any type. That is, it's an infinite loop with the semantics of ⊥, just as undefined is.
... and an aside
Also note that in general the Haskell expression let x = z in y means "change the variable x to the expression z wherever x occurs in the expression y". If you're familiar with the lambda calculus, you should recognize this as the reduction rule for the application of the lambda abstraction \x -> y to the term z. So is the Haskell expression let x = x in x nothing more than some syntax on top of the pure lambda calculus? Let's take a look.
First, we need to account for the recursiveness of Haskell's let expressions. The lambda calculus does not admit recursive definitions, but given a primitive fixed-point operator fix,1 we can encode recursiveness explicitly. For example, the Haskell expression let x = x in x has the same meaning as (fix \r x -> r x) z.2 (I've renamed the x on the right side of the application to z to emphasize that it has no implicit relation to the x inside the lambda).
Applying the usual definition of a fixed-point operator, fix f = f (fix f), our translation of let x = x in x reduces (or rather doesn't) like this:
(fix \r x -> r x) z ==>
(\s y -> s y) (fix \r x -> r x) z ==>
(\y -> (fix \r x -> r x) y) z ==>
(fix \r x -> r x) z ==> ...
So at this point in the development of the language, we've introduced the semantics of ⊥ from the foundation of the (typed) lambda calculus with a built-in fixed-point operator. Lovely!
We need a primitive fixed-point operation (that is, one that is built into the language) because it's impossible to define a fixed-point combinator in the simply typed lambda calculus and its close cousins. (The definition of fix in Haskell's Prelude doesn't contradict this—it's defined recursively, but we need a fixed-point operator to implement recursion.)
If you haven't seen this before, you should read up on fixed-point recursion in the lambda calculus. A text on the lambda calculus is best (there are some free ones online), but some Googling should get you going. The basic idea is that we can convert a recursive definition into a non-recursive one by abstracting over the recursive call, then use a fixed-point combinator to pass our function (lambda abstraction) to itself. The base-case of a well-defined recursive definition corresponds to a fixed point of our function, so the function executes, calling itself over and over again until it hits a fixed point, at which point the function returns its result. Pretty damn neat, huh?

Is Milner let polymorphism a rank 2 feature?

let a = b in c can be thought as a syntactic sugar for (\a -> c) b, but in a typed setting in general it's not the case. For example, in the Milner calculus let a = \x -> x in (a True, a 1) is typable, but seemingly equivalent (\a -> (a True, a 1)) (\x -> x) is not.
However, the latter is typable in System F with a rank 2 type for the first lambda.
My questions are:
Is let polymorphism a rank 2 feature that sneaked secretly in the otherwise rank 1 world of Milner calculus?
The purpose of having of separate let construct seems to specify which types should be generalized by type checker, and which are not. Does it serve any other purposes? Are there any reasons to extend more powerful systems e.g. System F with separate let which is not sugar? Are there any papers on the rationale behind the design of the Milner calculus which no longer seems obvious to me?
Is there the most general type for \a -> (a True, a 1) in System F?
Are there type systems closed under beta expansion? I.e. if P is typable and M N = P then M is typable?
Some clarifications:
By equivalence I mean equivalence modulo type annotations. Is 'System F a la Church' the correct term for that?
I know that in general the principal typing property doesn't hold in F, but a principal type could exist for my particular term.
By let I mean the non-recursive flavour of let. Extension of system F with recursive let is obviously useful as it allows for non-termination.
W.r.t. to the four questions asked:
A key insight in this matter is that rather than just typing a
lambda-abstraction with a potentially polymorphic argument type, we
are typing a (sugared) abstraction that is (1) applied exactly once
and, moreover, that is (2) applied to a statically known argument.
That is, we can first subject the "argument" (i.e. the definiens of
the local definition) to type reconstruction to find its
(polymorphic) type; then assign the found type to the "parameter"
(the definiendum); and then, finally, type the body in the extended
type context.
Note that that is considerably more easy than general rank-2 type
inference.
Note that, strictly speaking, let .. = .. in .. is only syntactic sugar in System F if you demand that the definiendum carries a type annotation: let .. : .. = .. in .. .
Here are two solutions for T in (\a :: T -> (a True, a 1)) in System F: forall b. (forall a. a -> b) -> (b, b) and forall c d. (forall a b. a -> b) -> (c, d). Note that neither one of them is more general than the other. In general, System F does not admit principal types.
I suppose this holds for the simply typed lambda-calculus?
Types are not preserved under beta-expansion in any calculus that can express the concept of "dead code". You can probably figure out how to write something similar to this in any usable language:
if True then something typable else utter nonsense
For example, let M = (\x y -> x) (something typable) and N = (utter nonsense) and P = (something typable), so that M N = P, and P is typable, but M N isn't.
...rereading your question, I see that you only demand that M be typable, but that seems like a very strange meaning to give to "preserved under beta-expansion" to me. Anyway, I don't see why some argument like the above couldn't apply: simply let M have some untypable dead code in it.
You could type (\a -> (a True, a 1)) (\x -> x) if instead of generalizing only let expressions, you generalized all lambda abstractions. Having done so, one also needs to instantiate type schemas at every use point, not simply at the point where the binder which refers to them is actually used. I'm don't think there's any problem with this actually, outside of the fact that its vastly less efficient. I recall some discussion of this in TAPL, in fact, making similar points.
I recall many years ago seeing in a book about lambda calculus (possibly Barendregt) a type system preserved by beta expansion. It had no quantification, but it had disjunction to express that a term needed to be of more than one type simultaneously, as well as a special type omega which every term inhabited. As I recall, the latter avoids Daniel Wagner's dead code objection. While every expression was well-typed, restricting the position of omega in the type allowed you to charactize which expressions had (weak?) head normal forms.
Also if I recall correctly, fully normal form expressions had principal types, which did not contain omega.
For example the principal type of \f x -> f (f x) (the Church numeral 2) would be something like ((A -> B) /\ (B -> C)) -> A -> C
Not able to answer all your very specialized questions, but no, its not a rank 2 feature. As you write, it's just that let definitions are being quantified which yields a fully polymorphic rank-1 type unless the definition depends on some monomorphic value ina nouter scope.
Please also note that Haskell let is known as let rec in other languages and allows definition of mutually recursive functions and values. This is something you would not want to code manually with lambda-expressions and Y-combinators.

Resources