How to think about the lack of laws

How to think about the lack of laws - haskell

I am a mathematician who works a lot with category theory, and I've been using Haskell for a while to perform certain computations etc., but I am definitely not a programmer. I really love Haskell and want to become much more fluent in it, and the type system is something that I find especially great to have in place when writing programs.
However, I've recently been trying to implement category theoretic things, and am running into problems concerning the fact that you seemingly can't have class method laws in Haskell. In case my terminology here is wrong, what I mean is that I can write
class Monoid c where
id :: c -> c
m :: c -> c -> c
but I can't write some law along the lines of
m (m x y) z == m x $ m y z
From what I gather, this is due to the lack of dependent types in Haskell, but I'm not sure how exactly this is the case (having now read a bit about dependent types). It also seems that the convention is just to include laws like this in comments and hope that you don't accidentally cook up some instance that doesn't satisfy them.
How should I change my approach to Haskell to deal with this problem? Is there a nice mathematical/type-theoretic solution (for example, require the existence of an associator that is an isomorphism (though then the question is, how do we encode isomorphisms without a law?)); is there some 'hack' (using extensions such as DataKinds); should I be drastic and switch to using something like Idris instead; or is the best response to just change the way I think about using Haskell (i.e. accept that these laws can't be implemented in a Haskelly way)?
(bonus) How exactly does the lack of laws come from not supporting dependent types?

You want to require that:
m (m x y) z = m x (m y z) -- (1)
But to require this you need a way to check it. So you, or your compiler (or proof assistant), need to construct a proof of this. And the question is, what type is a proof of (1)?
One could imagine some Proof type but then maybe you could just construct a proof that 0 = 0 instead of a proof of (1) and both would have type Proof. So you’d need a more general type. I can’t decide how to break up the rest of the question so I’ll go for a super brief explanation of the Curry-Howard isomorphism followed by an explanation of how to prove two things are equal and then how dependent types are relevant.
The Curry-Howard isomorphism says that propositions are isomorphic to types and proofs are isomorphic to programs: a type corresponds to a proposition and a proof of that proposition corresponds to a program constructing a value inhabiting that type. Ignoring how many propositions might be expressed as types, an example would be that the type A * B (written (A, B) in Haskell) corresponds to the proposition “A and B,” while the type A + B (written Either A B in Haskell) corresponds to the proposition “A or B.” Finally the type A -> B corresponds to “A implies B,” as a proof of this is a program which takes evidence of A and gives you evidence of B. One should note that there isn’t a way to express not A but one could imagine adding a type Not A with builtins of type Either a (Not a) for the law of the excluded middle as well as Not (Not a) -> a, and a * Not a -> Void (where Void is a type which cannot be inhabited and therefore corresponds to false), but then one can’t really run these programs to get constructivist proofs.
Now we will ignore some realities of Haskell and imagine that there aren’t ways round these rules (in particular undefined :: a says everything is true, and unsafeCoerce :: a -> b says that anything implies anything else, or just other functions that don’t return where their existence does not imply the corresponding proof).
So we know how to combine propositions but what might a proposition be? Well one could be to say that two types are equal. In Haskell this corresponds to the GADT
data Eq a b where Refl :: Eq c c
Where this constructor corresponds to the reflexive property of equality.
[side note: if you’re still interested so far, you may be interested to look up Voevodsky’s univalent foundations, depending on how much the idea of “Homotopy type theory” interests you]
So can we prove something now? How about the transitive property of equality:
trans :: Eq a b -> Eq b c -> Eq a c
trans x y =
case x of
Refl -> -- by this match being successful, the compiler now knows that a = b
case y of
Refl -> -- and now b = c and so the compiler knows a = c
Refl -- the compiler knows that this is of type Eq d d, and as it knows a = c, this typechecks as Eq a c
This feels like one hasn’t really proven anything (especially as this mainly relies on the compiler knowing the transitive and symmetric properties), but one gets a similar feeling when proving simple things in logic as well.
So now how might you prove the original proposition (1)? Well let’s imagine we want a type c to be a monoid then we should also prove that $\forall x,y,z:c, m (m x y) z = m x (m y z).$ So we need a way to express m (m x y) z as a type. Strictly speaking this isn’t dependent types (this can be done with DataKinds to promote values and type families instead of functions). But you do need dependent types to have types depend on values. Specifically if you have a type Nat of natural numbers and a type family Vec :: Nat -> * (* is the kind (read type) of all types) of fixed length vectors, you could define a dependently typed function mkVec :: (n::Nat) -> Vec n. Observe how the type of the output depends on the value of the input.
So your law needs to have functions promoted to type level (skipping the questions about how one defines type equality and value equality), as well as dependent types (made up syntax):
class Monoid c where
e :: c
(*) :: c -> c -> c
idl :: (x::c) -> Eq x (e * x)
idr :: (x::c) -> Eq x (x * e)
assoc :: (x::c) -> (y::c) -> (z::c) -> Eq ((x * y) * z) (x * (y * z))
Observe how types tend to become large with dependent types and proofs. In a language missing typeclasses one could put such values into a record.
Final note on the theory of dependent types and how these correspond to the curry Howard isomorphism.
Dependent types can be considered an answer to the question: what types correspond to the propositions $\forall x\in S\quad P(x)$ and $\exists y\in T\quad Q(y)?$
The answer is that you create new ways to make types: the dependent product and the dependent sum (coproduct). The dependent product expresses “for all values $x$ of type $S,$ there is a value of type $P(x).$” A normal product would be a dependent product with $S=2,$ a type inhabited by two values. A dependent product might be written (x:T) -> P x. A dependent sum says “some value $y$ of type $T$, paired with a value of type $Q(y).$” this might be written (y:T) * Q y.
One can think of these as a generalisation of arbitrarily indexed (co)products from Set to general categories, where one might sensibly write e.g. $\prod_\Lambda X(\lambda),$ and sometimes such notation is used in type theory.

Related

Can I verify whether a given function type signature has a potential implementation?

In case of explicit type annotations Haskell checks whether the inferred type is at least as polymorphic as its signature, or in other words, whether the inferred type is a subtype of the explicit one. Hence, the following functions are ill-typed:
foo :: a -> b
foo x = x
bar :: (a -> b) -> a -> c
bar f x = f x
In my scenario, however, I only have a function signature and need to verify, whether it is "inhabited" by a potential implementation - hopefully, this explanation makes sense at all!
Due to the parametricity property I'd assume that for both foo and bar there don't exist an implementation and consequently, both should be rejected. But I don't know how to conclude this programmatically.
Goal is to sort out all or at least a subset of invalid type signatures like ones above. I am grateful for every hint.

Goal is to sort out all or at least a subset of invalid type signatures like ones above. I am grateful for every hint.
You might want to have a look at the Curry-Howard correspondence.
Basically, types in functional programs correspond to logical formulas.
Just replace -> with implication, (,) with conjunction (AND), and Either with disjunction (OR). Inhabited types are exactly those having a corresponding formula which is a tautology in intuitionistic logic.
There are algorithms which can decide provability in intuitionistic logic (e.g. exploiting cut-elimination in Gentzen's sequents), but the problem is PSPACE-complete, so in general we can't work with very large types. For medium-sized types, though, the cut-elimination algorithm works fine.
If you only want a subset of not-inhabited types, you can restrict to those having a corresponding formula which is NOT a tautology in classical logic. This is correct, since intuitionistic tautologies are also classical ones. Checking whether a formula P is not a classical tautology can be done by asking whether not P is a satisfiable formula. So, the problem is in NP. Not much, but better than PSPACE-complete.
For instance, both the above-mentioned types
a -> b
(a -> b) -> a -> c
are clearly NOT tautologies! Hence they are not inhabited.
Finally, note that in Haskell undefined :: T and let x = x in x :: T for any type T, so technically every type is inhabited. Once one restricts to terminating programs which are free from runtime errors, we get a more meaningful notion of "inhabited", which is the one addressed by the Curry-Howard correspondence.

Here's a recent implementation of that as a GHC plugin that unfortunately requires GHC HEAD currently.
It consists of a type class with a single method
class JustDoIt a where
justDoIt :: a
Such that justDoIt typechecks whenever the plugin can find an inhabitant of its inferred type.
foo :: (r -> Either e a) -> (a -> (r -> Either e b)) -> (r -> Either e (a,b))
foo = justDoIt
For more information, read Joachim Breitner's blogpost, which also mentions a few other options: djinn (already in other comments here), exference, curryhoward for Scala, hezarfen for Idris.

Why doesn't Haskell have a stronger alternative to Eq?

The reason why Set is not a functor is given here. It seems to boil down to the fact that a == b && f a /= f b is possible. So, why doesn't Haskell have as standard an alternative to Eq, something like
class Eq a => StrongEq a where
(===) :: a -> a -> Bool
(/==) :: a -> a -> Bool
x /== y = not (x === y)
x === y = not (x /== y)
for which instances are supposed to obey the laws
∀a,b,f. not (a === b) || (f a === f b)
∀a. a === a
∀a,b. (a === b) == (b === a)
and maybe some others? Then we could have:
instance StrongEq a => Functor (Set a) where
-- ...
Or am I missing something?
Edit: my problem is not “Why are there types without an Eq instance?”, like some of you seem to have answered. It's the opposite: “Why are there instances of Eq that aren't extensionally equal? Why are there too many Eq instances?”, combined with “If a == b does imply extensional equality, why is Set not an instance of Functor?”.
Also, my instance declaration is rubbish (thanks #n.m.). I should have said:
newtype StrongSet a = StrongSet (Set a)
instance Functor StrongSet where
fmap :: (StrongEq a, StrongEq b) => (a -> b) -> StrongSet a -> StrongSet b
fmap (StrongSet s) = StrongSet (map s)

instance StrongEq a => Functor (Set a) where
This makes sense neither in Haskell nor in the grand mathematical/categorical scheme of things, regardless of what StrongEq means.
In Haskell, Functor requires a type constructor of kind * -> *. The arrow reflects the fact that in category theory, a functor is a kind of mapping. [] and (the hypothetical) Set are such type constructors. [a] and Set a have kind * and cannot be functors.
In Haskell, it is hard to define Set such that it can be made into a Functor because equality cannot be sensibly defined for some types no matter what. You cannot compare two things of type Integer->Integer, for example.
Let's suppose there is a function
goedel :: Integer -> Integer -> Integer
goedel x y = -- compute the result of a function with
-- Goedel number x, applied to y
Suppose you have a value s :: Set Integer. What fmap goedel s should look like? How do you eliminate duplicates?
In your typical set theory equality is magically defined for everything, including functions, so Set (or Powerset to be precise) is a functor, no problem with that.

Since I'm not a category theorist, I'll try to write a more concrete/practical explanation (i.e., one I can understand):
The key point is the one that #leftaroundabout made in a comment:
== is supposed to
witness "equivalent by all observable means" (that doesn't necessarily
require a == b must hold only for identical implementations; but
anything you can "officially" do with a and b should again yield
equivalent results. So unAlwaysEq should never be exposed in the first
place). If you can't ensure this for some type, you shouldn't give it
an Eq instance.
That is, there should be no need for your StrongEq because that's what Eq is supposed to be already.
Haskell values are often intended to represent some sort of mathematical or "real-life" value. Many times, this representation is one-to-one. For example, consider the type
data PlatonicSolid = Tetrahedron | Cube |
Octahedron | Dodecahedron | Icosahedron
This type contains exactly one representation of each Platonic solid. We can take advantage of this by adding deriving Eq to the declaration, and it will produce the correct instance.
In many cases, however, the same abstract value may be represented by more than one Haskell value. For example, the red-black trees Node B (Node R Leaf 1 Leaf) 2 Leaf and Node B Leaf 1 (Node R Leaf 2 Leaf) can both represent the set {1,2}. If we added deriving Eq to our declaration, we would get an instance of Eq that distinguishes things we want to be considered the same (outside of the implementation of the set operations).
It's important to make sure that types are only made instances of Eq (and Ord) when appropriate! It's very tempting to make something an instance of Ord just so you can stick it in a data structure that requires ordering, but if the ordering is not truly a total ordering of the abstract values, all manner of breakage may ensue. Unless the documentation absolutely guarantees it, for example, a function called sort :: Ord a => [a] -> [a] may not only be an unstable sort, but may not even produce a list containing all the Haskell values that go into it. sort [Bad 1 "Bob", Bad 1 "James"] can reasonably produce [Bad 1 "Bob", Bad 1 "James"], [Bad 1 "James", Bad 1 "Bob"], [Bad 1 "James", Bad 1 "James"], or [Bad 1 "Bob", Bad 1 "Bob"]. All of these are perfectly legitimate. A function that uses unsafePerformIO in the back room to implement a Las Vegas-style randomized algorithm or to race threads against each other to get an answer from the fastest may even give different results different times, as long as they're == to each other.
tl;dr: Making something an instance of Eq is a way of making a very strong statement to the world; don't make that statement if you don't mean it.

Your second Functor instance also doesn't make any sense. The biggest reason why Set can't be a Functor in Haskell is fmap can't have constraints. Inventing different notions of equality as StrongEq doesn't change the fact that you can't write those constraints on fmap in your Set instance.
fmap in general shouldn't have the constraints you need. It makes perfect sense to have functors of functions, for example (without it the whole notion of using Applicative to apply functions inside a functor breaks down), and functions can't be members of Eq or your StrongEq in general.
fmap can't have extra constraints on only some instances, because of code like this:
fmapBoth :: (Functor f, Functor g) => (a -> b, c -> d) -> (f a, g c) -> (f b, g d)
fmapBoth (h, j) (x, y) = (fmap h x, fmap j y)
This code claims to work regardless of the functors f and g, and regardless of the functions h and j. It has no way of checking whether one of the functors is a special one that has extra constraints on fmap, nor any way of checking whether one of the functions it's applying would violate those constraints.
Saying that Set is a Functor in Haskell, is saying that there is a (lawful) operation fmap :: (a -> b) -> Set a -> Set b, with that exact type. That is precisely what Functor means. fmap :: (Eq a -> Eq b) => (a -> b) -> Set a -> Set b is not an example of such an operation.
It is possible, I understand, to use the ConstraintKinds GHC extendsion to write a different Functor class that permits constraints on the values which vary by Functor (and what you actually need is an Ord constraint, not just Eq). This blog post talks about doing so to make a new Monad class which can have an instance for Set. I've never played around with code like this, so I don't know much more than that the technique exists. It wouldn't help you hand off Sets to existing code that needs Functors, but you should be able to use it instead of Functor in your own code if you wish.

This notion of StrongEq is tough. In general, equality is a place where computer science becomes significantly more rigorous than typical mathematics in the kind of way which makes things challenging.
In particular, typical mathematics likes to talk about objects as though they exist in a set and can be uniquely identified. Computer programs usually deal with types which are not always computable (as a simple counterexample, tell me what the set corresponding to the type data U = U (U -> U) is). This means that it may be undecidable as to whether two values are identifiable.
This becomes an enormous topic in dependently typed languages since typechecking requires identifying like types and dependently typed languages may have arbitrary values in their types and thus need a way to project equality.
So, StrongEq could be defined over a restricted part of Haskell containing only the types which can be decidably compared for equality. We can consider this a category with the arrows as computable functions and then see Set as an endofunctor from types to the type of sets of values of that type. Unfortunately, these restrictions have taken us far from standard Haskell and make defining StrongEq or Functor (Set a) a little less than practical.

What are type quantifiers?

Many statically typed languages have parametric polymorphism. For example in C# one can define:
T Foo<T>(T x){ return x; }
In a call site you can do:
int y = Foo<int>(3);
These types are also sometimes written like this:
Foo :: forall T. T -> T
I have heard people say "forall is like lambda-abstraction at the type level". So Foo is a function that takes a type (for example int), and produces a value (for example a function of type int -> int). Many languages infer the type parameter, so that you can write Foo(3) instead of Foo<int>(3).
Suppose we have an object f of type forall T. T -> T. What we can do with this object is first pass it a type Q by writing f<Q>. Then we get back a value with type Q -> Q. However, certain f's are invalid. For example this f:
f<int> = (x => x+1)
f<T> = (x => x)
So if we "call" f<int> then we get back a value with type int -> int, and in general if we "call" f<Q> then we get back a value with type Q -> Q, so that's good. However, it is generally understood that this f is not a valid thing of type forall T. T -> T, because it does something different depending on which type you pass it. The idea of forall is that this is explicitly not allowed. Also, if forall is lambda for the type level, then what is exists? (i.e. existential quantification). For these reasons it seems that forall and exists are not really "lambda at the type level". But then what are they? I realize this question is rather vague, but can somebody clear this up for me?
A possible explanation is the following:
If we look at logic, quantifiers and lambda are two different things. An example of a quantified expression is:
forall n in Integers: P(n)
So there are two parts to forall: a set to quantify over (e.g. Integers), and a predicate (e.g. P). Forall can be viewed as a higher order function:
forall n in Integers: P(n) == forall(Integers,P)
With type:
forall :: Set<T> -> (T -> bool) -> bool
Exists has the same type. Forall is like an infinite conjunction, where S[n] is the n-th elemen to of the set S:
forall(S,P) = P(S[0]) ∧ P(S[1]) ∧ P(S[2]) ...
Exists is like an infinite disjunction:
exists(S,P) = P(S[0]) ∨ P(S[1]) ∨ P(S[2]) ...
If we do an analogy with types, we could say that the type analogue of ∧ is computing the intersection type ∩, and the type analogue of ∨ computing the union type ∪. We could then define forall and exists on types as follows:
forall(S,P) = P(S[0]) ∩ P(S[1]) ∩ P(S[2]) ...
exists(S,P) = P(S[0]) ∪ P(S[1]) ∪ P(S[2]) ...
So forall is an infinite intersection, and exists is an infinite union. Their types would be:
forall, exists :: Set<T> -> (T -> Type) -> Type
For example the type of the polymorphic identity function. Here Types is the set of all types, and -> is the type constructor for functions and => is lambda abstraction:
forall(Types, t => (t -> t))
Now a thing of type forall T:Type. T -> T is a value, not a function from types to values. It is a value whose type is the intersection of all types T -> T where T ranges over all types. When we use such a value, we do not have to apply it to a type. Instead, we use a subtype judgement:
id :: forall T:Type. T -> T
id = (x => x)
id2 = id :: int -> int
This downcasts id to have type int -> int. This is valid because int -> int also appears in the infinite intersection.
This works out nicely I think, and it clearly explains what forall is and how it is different from lambda, but this model is incompatible with what I have seen in languages like ML, F#, C#, etc. For example in F# you do id<int> to get the identity function on ints, which does not make sense in this model: id is a function on values, not a function on types that returns a function on values.
Can somebody with knowledge of type theory explain what exactly are forall and exists? And to what extent is it true that "forall is lambda at the type level"?

Let me address your questions separately.
Calling forall "a lambda at the type level" is inaccurate for two reasons. First, it is the type of a lambda, not the lambda itself. Second, that lambda lives on the term level, even though it abstracts over types (lambdas on the type level exist as well, they provide what is often called generic types).
Universal quantification does not necessarily imply "same behaviour" for all instantiations. That is a particular property called "parametricity" that may or may not be present. The plain polymorphic lambda calculus is parametric, because you simply cannot express any non-parametric behaviour. But if you add constructs like typecase (a.k.a. intensional type analysis) or checked casts as a weaker form of that, then you loose parametricity. Parametricity implies nice properties, e.g. it allows a language to be implemented without any runtime representation of types. And it induces very strong reasoning principles, see e.g. Wadler's paper "Theorems for free!". But it's a trade-off, sometimes you want dispatch on types.
Existential types essentially denote pairs of a type (the so-called witness) and a term, sometimes called packages. One common way to view these is as implementation of abstract data types. Here is a simple example:
pack (Int, (λx. x, λx. x)) : ∃ T. (Int → T) × (T → Int)
This is a simple ADT whose representation is Int and that only provides two operations (as a nested tuple), for converting ints in and out of the abstract type T. This is the basis of type theories for modules, for example.
In summary, universal quantification provides client-side data abstraction, while existential types dually provides implementor-side data abstraction.
As an additional remark, in the so-called lambda cube, forall and arrow are generalised to the unified notion of Π-type (where T1→T2 = Π(x:T1).T2 and ∀A.T = Π(A:&ast;).T) and likewise exists and tupling can be generalised to Σ-types (where T1×T2 = Σ(x:T1).T2 and ∃A.T = Σ(A:&ast;).T). Here, the type &ast; is the "type of types".

A few remarks to complement the two already-excellent answers.
First, one cannot say that forall is lambda at the type-level because there already is a notion of lambda at the type level, and it is different from forall. It appears in system F_omega, an extension of System F with type-level computation, that is useful to explain ML modules systems for example (F-ing modules, by Andreas Rossberg, Claudio Russo and Derek Dreyer, 2010).
In (a syntax for) System F_omega you can write for example:
type prod =
lambda (a : *). lambda (b : *).
forall (c : *). (a -> b -> c) -> c
This is a definition of the "type constructor" prod, such as prod a b is the type of the church-encoding of the product type (a, b). If there is computation at the type level, then you need to control it if you want to ensure termination of type-checking (otherwise you could define the type (lambda t. t t) (lambda t. t t). This is done by using a "type system at the type level", or a kind system. prod would be of kind * -> * -> *. Only the types at kind * can be inhabited by values, types at higher-kind can only be applied at the type level. lambda (c : k) . .... is a type-level abstraction that cannot be the type of a value, and may live at any kind of the form k -> ..., while forall (c : k) . .... classify values that are polymorphic in some type c : k and is necessarily of ground kind *.
Second, there is an important difference between the forall of System F and the Pi-types of Martin-Löf type theory. In System F, polymorphic values do the same thing on all types. As a first approximation, you could say that a value of type forall a . a -> a will (implicitly) take a type t as input and return a value of type t -> t. But that suggest that there may be some computation happening in the process, which is not the case. Morally, when you instantiate a value of type forall a. a -> a into a value of type t -> t, the value does not change. There are three (related) ways to think about it:
System F quantification has type erasure, you can forget about the types and you will still know what the dynamic semantic of the program is. When we use ML type inference to leave the polymorphism abstraction and instantiation implicit in our programs, we don't really let the inference engine "fill holes in our program", if you think of "program" as the dynamic object that will be run and compute.
A forall a . foo is not a something that "produces an instance of foo for each type a, but a single type foo that is "generic in an unknown type a".
You can explain universal quantification as an infinite conjunction, but there is an uniformity condition that all conjuncts have the same structure, and in particular that their proofs are all alike.
By contrast, Pi-types in Martin-Löf type theory are really more like function types that take something and return something. That's one of the reason why they can easily be used not only to depend on types, but also to depend on terms (dependent types).
This has very important implications once you're concerned about the soundness of those formal theories. System F is impredicative (a forall-quantified type quantifies on all types, itself included), and the reason why it's still sound is this uniformity of universal quantification. While introducing non-parametric constructs is reasonable from a programmer's point of view (and we can still reason about parametricity in an generally-non-parametric language), it very quickly destroys the logical consistency of the underlying static reasoning system. Martin-Löf predicative theory is much simpler to prove correct and to extend in correct way.
For a high-level description of this uniformity/genericity aspect of System F, see Fruchart and Longo's 97 article Carnap's remarks on Impredicative Definitions and the Genericity Theorem. For a more technical study of System F failure in presence of non-parametric constructs, see Parametricity and variants of Girard's J operator by Robert Harper and John Mitchell (1999). Finally, for a description, from a language design point of view, on how to abandon global parametricity to introduce non-parametric constructs but still be able to locally discuss parametricity, see Non-Parametric Parametricity by George Neis, Derek Dreyer and Andreas Rossberg, 2011.
This discussion of the difference between "computational abstraction" and "uniform abstract" has been revived by the large amount of work on representing variable binders. A binding construction feels like an abstraction (and can be modeled by a lambda-abstraction in HOAS style) but has an uniform structure that makes it rather like a data skeleton than a family of results. This has been much discussed, for example in the LF community, "representational arrows" in Twelf, "positive arrows" in Licata&Harper's work, etc.
Recently there have been several people working on the related notion of "irrelevance" (lambda-abstractions where the result "does not depend" on the argument), but it's still not totally clear how closely this is related to parametric polymorphism. One example is the work of Nathan Mishra-Linger with Tim Sheard (eg. Erasure and Polymorphism in Pure Type Systems).

if forall is lambda ..., then what is exists
Why, tuple of course!
In Martin-Löf type theory you have Π types, corresponding to functions/universal quantification and Σ-types, corresponding to tuples/existential quantification.
Their types are very similar to what you have proposed (I am using Agda notation here):
Π : (A : Set) -> (A -> Set) -> Set
Σ : (A : Set) -> (A -> Set) -> Set
Indeed, Π is an infinite product and Σ is infinite sum. Note that they are not "intersection" and "union" though, as you proposed because you can't do that without additionally defining where the types intersect. (which values of one type correspond to which values of the other type)
From these two type constructors you can have all of normal, polymorphic and dependent functions, normal and dependent tuples, as well as existentially and universally-quantified statements:
-- Normal function, corresponding to "Integer -> Integer" in Haskell
factorial : Π ℕ (λ _ → ℕ)
-- Polymorphic function corresponding to "forall a . a -> a"
id : Π Set (λ A -> Π A (λ _ → A))
-- A universally-quantified logical statement: all natural numbers n are equal to themselves
refl : Π ℕ (λ n → n ≡ n)
-- (Integer, Integer)
twoNats : Σ ℕ (λ _ → ℕ)
-- exists a. Show a => a
someShowable : Σ Set (λ A → Σ A (λ _ → Showable A))
-- There are prime numbers
aPrime : Σ ℕ IsPrime
However, this does not address parametricity at all and AFAIK parametricity and Martin-Löf type theory are independent.
For parametricity, people usually refer to the Philip Wadler's work.

Functions don't just have types: They ARE Types. And Kinds. And Sorts. Help put a blown mind back together

I was doing my usual "Read a chapter of LYAH before bed" routine, feeling like my brain was expanding with every code sample. At this point I was convinced that I understood the core awesomeness of Haskell, and now just had to understand the standard libraries and type classes so that I could start writing real software.
So I was reading the chapter about applicative functors when all of a sudden the book claimed that functions don't merely have types, they are types, and can be treated as such (For example, by making them instances of type classes). (->) is a type constructor like any other.
My mind was blown yet again, and I immediately jumped out of bed, booted up the computer, went to GHCi and discovered the following:
Prelude> :k (->)
(->) :: ?? -> ? -> *
What on earth does it mean?
If (->) is a type constructor, what are the value constructors? I can take a guess, but would have no idea how define it in traditional data (->) ... = ... | ... | ... format. It's easy enough to do this with any other type constructor: data Either a b = Left a | Right b. I suspect my inability to express it in this form is related to the extremly weird type signature.
What have I just stumbled upon? Higher kinded types have kind signatures like * -> * -> *. Come to think of it... (->) appears in kind signatures too! Does this mean that not only is it a type constructor, but also a kind constructor? Is this related to the question marks in the type signature?
I have read somewhere (wish I could find it again, Google fails me) about being able to extend type systems arbitrarily by going from Values, to Types of Values, to Kinds of Types, to Sorts of Kinds, to something else of Sorts, to something else of something elses, and so on forever. Is this reflected in the kind signature for (->)? Because I've also run into the notion of the Lambda cube and the calculus of constructions without taking the time to really investigate them, and if I remember correctly it is possible to define functions that take types and return types, take values and return values, take types and return values, and take values which return types.
If I had to take a guess at the type signature for a function which takes a value and returns a type, I would probably express it like this:
a -> ?
or possibly
a -> *
Although I see no fundamental immutable reason why the second example couldn't easily be interpreted as a function from a value of type a to a value of type *, where * is just a type synonym for string or something.
The first example better expresses a function whose type transcends a type signature in my mind: "a function which takes a value of type a and returns something which cannot be expressed as a type."

You touch so many interesting points in your question, so I am
afraid this is going to be a long answer :)
Kind of (->)
The kind of (->) is * -> * -> *, if we disregard the boxity GHC
inserts. But there is no circularity going on, the ->s in the
kind of (->) are kind arrows, not function arrows. Indeed, to
distinguish them kind arrows could be written as (=>), and then
the kind of (->) is * => * => *.
We can regard (->) as a type constructor, or maybe rather a type
operator. Similarly, (=>) could be seen as a kind operator, and
as you suggest in your question we need to go one 'level' up. We
return to this later in the section Beyond Kinds, but first:
How the situation looks in a dependently typed language
You ask how the type signature would look for a function that takes a
value and returns a type. This is impossible to do in Haskell:
functions cannot return types! You can simulate this behaviour using
type classes and type families, but let us for illustration change
language to the dependently typed language
Agda. This is a
language with similar syntax as Haskell where juggling types together
with values is second nature.
To have something to work with, we define a data type of natural
numbers, for convenience in unary representation as in
Peano Arithmetic.
Data types are written in
GADT style:
data Nat : Set where
Zero : Nat
Succ : Nat -> Nat
Set is equivalent to * in Haskell, the "type" of all (small) types,
such as Natural numbers. This tells us that the type of Nat is
Set, whereas in Haskell, Nat would not have a type, it would have
a kind, namely *. In Agda there are no kinds, but everything has
a type.
We can now write a function that takes a value and returns a type.
Below is a the function which takes a natural number n and a type,
and makes iterates the List constructor n applied to this
type. (In Agda, [a] is usually written List a)
listOfLists : Nat -> Set -> Set
listOfLists Zero a = a
listOfLists (Succ n) a = List (listOfLists n a)
Some examples:
listOfLists Zero Bool = Bool
listOfLists (Succ Zero) Bool = List Bool
listOfLists (Succ (Succ Zero)) Bool = List (List Bool)
We can now make a map function that operates on listsOfLists.
We need to take a natural number that is the number of iterations
of the list constructor. The base cases are when the number is
Zero, then listOfList is just the identity and we apply the function.
The other is the empty list, and the empty list is returned.
The step case is a bit move involving: we apply mapN to the head
of the list, but this has one layer less of nesting, and mapN
to the rest of the list.
mapN : {a b : Set} -> (a -> b) -> (n : Nat) ->
listOfLists n a -> listOfLists n b
mapN f Zero x = f x
mapN f (Succ n) [] = []
mapN f (Succ n) (x :: xs) = mapN f n x :: mapN f (Succ n) xs
In the type of mapN, the Nat argument is named n, so the rest of
the type can depend on it. So this is an example of a type that
depends on a value.
As a side note, there are also two other named variables here,
namely the first arguments, a and b, of type Set. Type
variables are implicitly universally quantified in Haskell, but
here we need to spell them out, and specify their type, namely
Set. The brackets are there to make them invisible in the
definition, as they are always inferable from the other arguments.
Set is abstract
You ask what the constructors of (->) are. One thing to point out
is that Set (as well as * in Haskell) is abstract: you cannot
pattern match on it. So this is illegal Agda:
cheating : Set -> Bool
cheating Nat = True
cheating _ = False
Again, you can simulate pattern matching on types constructors in
Haskell using type families, one canoical example is given on
Brent Yorgey's blog.
Can we define -> in the Agda? Since we can return types from
functions, we can define an own version of -> as follows:
_=>_ : Set -> Set -> Set
a => b = a -> b
(infix operators are written _=>_ rather than (=>)) This
definition has very little content, and is very similar to doing a
type synonym in Haskell:
type Fun a b = a -> b
Beyond kinds: Turtles all the way down
As promised above, everything in Agda has a type, but then
the type of _=>_ must have a type! This touches your point
about sorts, which is, so to speak, one layer above Set (the kinds).
In Agda this is called Set1:
FunType : Set1
FunType = Set -> Set -> Set
And in fact, there is a whole hierarchy of them! Set is the type of
"small" types: data types in haskell. But then we have Set1,
Set2, Set3, and so on. Set1 is the type of types which mentions
Set. This hierarchy is to avoid inconsistencies such as Girard's
paradox.
As noticed in your question, -> is used for types and kinds in
Haskell, and the same notation is used for function space at all
levels in Agda. This must be regarded as a built in type operator,
and the constructors are lambda abstraction (or function
definitions). This hierarchy of types is similar to the setting in
System F omega, and more
information can be found in the later chapters of
Pierce's Types and Programming Languages.
Pure type systems
In Agda, types can depend on values, and functions can return types,
as illustrated above, and we also had an hierarchy of
types. Systematic investigation of different systems of the lambda
calculi is investigated in more detail in Pure Type Systems. A good
reference is
Lambda Calculi with Types by Barendregt,
where PTS are introduced on page 96, and many examples on page 99 and onwards.
You can also read more about the lambda cube there.

Firstly, the ?? -> ? -> * kind is a GHC-specific extension. The ? and ?? are just there to deal with unboxed types, which behave differently from just * (which has to be boxed, as far as I know). So ?? can be any normal type or an unboxed type (e.g. Int#); ? can be either of those or an unboxed tuple. There is more information here: Haskell Weird Kinds: Kind of (->) is ?? -> ? -> *
I think a function can't return an unboxed type because functions are lazy. Since a lazy value is either a value or a thunk, it has to be boxed. Boxed just means it is a pointer rather than just a value: it's like Integer() vs int in Java.
Since you are probably not going to be using unboxed types in LYAH-level code, you can imagine that the kind of -> is just * -> * -> *.
Since the ? and ?? are basically just more general version of *, they do not have anything to do with sorts or anything like that.
However, since -> is just a type constructor, you can actually partially apply it; for example, (->) e is an instance of Functor and Monad. Figuring out how to write these instances is a good mind-stretching exercise.
As far as value constructors go, they would have to just be lambdas (\ x ->) or function declarations. Since functions are so fundamental to the language, they get their own syntax.

Is Milner let polymorphism a rank 2 feature?

let a = b in c can be thought as a syntactic sugar for (\a -> c) b, but in a typed setting in general it's not the case. For example, in the Milner calculus let a = \x -> x in (a True, a 1) is typable, but seemingly equivalent (\a -> (a True, a 1)) (\x -> x) is not.
However, the latter is typable in System F with a rank 2 type for the first lambda.
My questions are:
Is let polymorphism a rank 2 feature that sneaked secretly in the otherwise rank 1 world of Milner calculus?
The purpose of having of separate let construct seems to specify which types should be generalized by type checker, and which are not. Does it serve any other purposes? Are there any reasons to extend more powerful systems e.g. System F with separate let which is not sugar? Are there any papers on the rationale behind the design of the Milner calculus which no longer seems obvious to me?
Is there the most general type for \a -> (a True, a 1) in System F?
Are there type systems closed under beta expansion? I.e. if P is typable and M N = P then M is typable?
Some clarifications:
By equivalence I mean equivalence modulo type annotations. Is 'System F a la Church' the correct term for that?
I know that in general the principal typing property doesn't hold in F, but a principal type could exist for my particular term.
By let I mean the non-recursive flavour of let. Extension of system F with recursive let is obviously useful as it allows for non-termination.

W.r.t. to the four questions asked:
A key insight in this matter is that rather than just typing a
lambda-abstraction with a potentially polymorphic argument type, we
are typing a (sugared) abstraction that is (1) applied exactly once
and, moreover, that is (2) applied to a statically known argument.
That is, we can first subject the "argument" (i.e. the definiens of
the local definition) to type reconstruction to find its
(polymorphic) type; then assign the found type to the "parameter"
(the definiendum); and then, finally, type the body in the extended
type context.
Note that that is considerably more easy than general rank-2 type
inference.
Note that, strictly speaking, let .. = .. in .. is only syntactic sugar in System F if you demand that the definiendum carries a type annotation: let .. : .. = .. in .. .
Here are two solutions for T in (\a :: T -> (a True, a 1)) in System F: forall b. (forall a. a -> b) -> (b, b) and forall c d. (forall a b. a -> b) -> (c, d). Note that neither one of them is more general than the other. In general, System F does not admit principal types.
I suppose this holds for the simply typed lambda-calculus?

Types are not preserved under beta-expansion in any calculus that can express the concept of "dead code". You can probably figure out how to write something similar to this in any usable language:
if True then something typable else utter nonsense
For example, let M = (\x y -> x) (something typable) and N = (utter nonsense) and P = (something typable), so that M N = P, and P is typable, but M N isn't.
...rereading your question, I see that you only demand that M be typable, but that seems like a very strange meaning to give to "preserved under beta-expansion" to me. Anyway, I don't see why some argument like the above couldn't apply: simply let M have some untypable dead code in it.

You could type (\a -> (a True, a 1)) (\x -> x) if instead of generalizing only let expressions, you generalized all lambda abstractions. Having done so, one also needs to instantiate type schemas at every use point, not simply at the point where the binder which refers to them is actually used. I'm don't think there's any problem with this actually, outside of the fact that its vastly less efficient. I recall some discussion of this in TAPL, in fact, making similar points.

I recall many years ago seeing in a book about lambda calculus (possibly Barendregt) a type system preserved by beta expansion. It had no quantification, but it had disjunction to express that a term needed to be of more than one type simultaneously, as well as a special type omega which every term inhabited. As I recall, the latter avoids Daniel Wagner's dead code objection. While every expression was well-typed, restricting the position of omega in the type allowed you to charactize which expressions had (weak?) head normal forms.
Also if I recall correctly, fully normal form expressions had principal types, which did not contain omega.
For example the principal type of \f x -> f (f x) (the Church numeral 2) would be something like ((A -> B) /\ (B -> C)) -> A -> C

Not able to answer all your very specialized questions, but no, its not a rank 2 feature. As you write, it's just that let definitions are being quantified which yields a fully polymorphic rank-1 type unless the definition depends on some monomorphic value ina nouter scope.
Please also note that Haskell let is known as let rec in other languages and allows definition of mutually recursive functions and values. This is something you would not want to code manually with lambda-expressions and Y-combinators.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string