How are tail-position contexts GHC join points paper formed?

How are tail-position contexts GHC join points paper formed? - haskell

Compiling without Continuations describes a way to extend ANF System F with join points. GHC itself has join points in Core (an intermediate representation) rather than exposing join points directly in the surface language (Haskell). Out of curiosity, I started trying to write a language that simply extends System F with join points. That is, the join points are user facing. However, there's something about the typing rules in the paper that I don't understand. Here are the parts that I do understand:
There are two environments, one for ordinary values/functions and one that only has join points.
The rational for ∆ being ε in several of the rules. In the expression let x:σ = u in ..., u cannot reference any join points (VBIND) because it join points cannot return to arbitrary locations.
The strange typing rule for JBIND. The paper does a good job explaining this.
Here's what I don't get. The paper introduces a notation that I will call the "overhead arrow", but the paper itself does not explicitly give it a name or mention it. Visually, it looks like a arrow pointing to the right, and it goes above an expression. Roughly, this seems to indicate a "tail context" (the paper does use this term). In the paper, these overhead arrows can be applied to terms, types, data constructors, and even environments. They can be nested as well. Here's the main difficulty I'm having. There are several rules with premises that include type environments under an overhead arrow. JUMP, CASE, RVBIND, and RJBIND all include premises with such a type environment (Figure 2 in the paper). However, none of the typing rules have a conclusion where the type environment is under an overhead arrow. So, I cannot see how JUMP, CASE, etc. can ever be used since the premises cannot be derived by any of the other rules.
That's the question, but if anyone has any supplementary material that provides more context are the overhead arrow convention or if anyone is aware an implementation of the System-F-with-join-points type system (other than in GHC's IR), that would be helpful too.

In this paper, x⃗ means “A sequence of x, separated by appropriate delimiters”.
A few examples:
If x is a variable, λx⃗. e is an abbreviation for λx1. λx2. … λxn e. In other words, many nested 1-argument lambdas, or a many-argument lambda.
If σ and τ are types, σ⃗ → τ is an abbreviation for σ1 → σ2 → … → σn → τ. In other words, a function type with many parameter types.
If a is a type variable and σ is a type, ∀a⃗. σ is an abbreviation for ∀a1. ∀a2. … ∀an. σ. In other words, many nested polymorphic functions, or a polymorphic function with many type parameters.
In Figure 1 of the paper, the syntax of a jump expression is defined as:
e, u, v ⩴ … | jump j ϕ⃗ e⃗ τ
If this declaration were translated into a Haskell data type, it might look like this:
data Term
-- | A jump expression has a label that it jumps to, a list of type argument
-- applications, a list of term argument applications, and the return type
-- of the overall `jump`-expression.
= Jump LabelVar [Type] [Term] Type
| ... -- Other syntactic forms.
That is, a data constructor that takes a label variable j, a sequence of type arguments ϕ⃗, a sequence of term arguments e⃗, and a return type τ.
“Zipping” things together:
Sometimes, multiple uses of the overhead arrow place an implicit constraint that their sequences have the same length. One place that this occurs is with substitutions.
{ϕ/⃗a} means “replace a1 with ϕ1, replace a2 with ϕ2, …, replace an with ϕn”, implicitly asserting that both a⃗ and ϕ⃗ have the same length, n.
Worked example: the JUMP rule:
The JUMP rule is interesting because it provides several uses of sequencing, and even a sequence of premises. Here’s the rule again:
(j : ∀a⃗. σ⃗ → ∀r. r) ∈ Δ
(Γ; ε ⊢⃗ u : σ {ϕ/⃗a})
Γ; Δ ⊢ jump j ϕ⃗ u⃗ τ : τ
The first premise should be fairly straightforward, now: lookup j in the label context Δ, and check that the type of j starts with a bunch of ∀s, followed by a bunch of function types, ending with a ∀r. r.
The second “premise” is actually a sequence of premises. What is it looping over? So far, the sequences we have in scope are ϕ⃗, σ⃗, a⃗, and u⃗.
ϕ⃗ and a⃗ are used in a nested sequence, so probably not those two.
On the other hand, u⃗ and σ⃗ seem quite plausible if you consider what they mean.
σ⃗ is the list of argument types expected by the label j, and u⃗ is the list of argument terms provided to the label j, and it makes sense that you might want to iterate over argument types and argument terms together.
So this “premise” actually means something like this:
for each pair of σ and u:
Γ; ε ⊢ u : σ {ϕ/⃗a}
Pseudo-Haskell implementation
Finally, here’s a somewhat-complete code sample illustrating what this typing rule might look like in an actual implementation. x⃗ is implemented as a list of x values, and some monad M is used to signal failure when a premise is not satisfied.
data LabelVar
data Type
= ...
data Term
= Jump LabelVar [Type] [Term] Type
| ...
typecheck :: TermContext -> LabelContext -> Term -> M Type
typecheck gamma delta (Jump j phis us tau) = do
-- Look up `j` in the label context. If it's not there, throw an error.
typeOfJ <- lookupLabel j delta
-- Check that the type of `j` has the right shape: a bunch of `foralls`,
-- followed by a bunch of function types, ending with `forall r.r`. If it
-- has the correct shape, split it into a list of `a`s, a list of `\sigma`s
-- and the return type, `forall r.r`.
(as, sigmas, ret) <- splitLabelType typeOfJ
-- exactZip is a helper function that "zips" two sequences together.
-- If the sequences have the same length, it produces a list of pairs of
-- corresponding elements. If not, it raises an error.
for each (u, sigma) in exactZip (us, sigmas):
-- Type-check the argument `u` in a context without any tail calls,
-- and assert that its type has the correct form.
sigma' <- typecheck gamma emptyLabelContext u
-- let subst = { \sequence{\phi / a} }
subst <- exactZip as phis
assert (applySubst subst sigma == sigma')
-- After all the premises have been satisfied, the type of the `jump`
-- expression is just its return type.
return tau
-- Other syntactic forms
typecheck gamma delta u = ...
-- Auxiliary definitions
type M = ...
instance Monad M
lookupLabel :: LabelVar -> LabelContext -> M Type
splitLabelType :: Type -> M ([TypeVar], [Type], Type)
exactZip :: [a] -> [b] -> M [(a, b)]
applySubst :: [(TypeVar, Type)] -> Type -> Type

As far as I know SPJ’s style for notation, and this does align with what I see in the paper, it simply means “0 or more”. E.g. you can replace \overarrow{a} with a_1, …, a_n, n >= 0.
It may be “1 or more” in some cases, but it shouldn’t be hard to figure one which one of the two.

Related

If Either can be either Left or Right but not both, then why does it correspond to OR instead of XOR in Curry-Howard correspondence?

When I asked this question, one of the answers, now deleted, was suggesting that the type Either corresponds to XOR, rather than OR, in the Curry-Howard correspondence, because it cannot be Left and Right at the same time.
Where is the truth?

If you have a value of type P and a value of type Q (that is, you have both a proof of P and a proof of Q), then you are still able to provide a value of type Either P Q.
Consider
x :: P
y :: Q
...
z :: Either P Q
z = Left x -- Another possible proof would be `Right y`
While Either does not have a specific case that explicitly represents this situation (unlike These), it does not do anything to exclude it (as in exclusive OR).
This third case where both have proofs is a bit different than the other two cases where only one has a proof, which reflects the fact that "not excluding" something is a bit different than "including" something in intuitionistic logic, since Either does not provide a particular witness for this fact. However Either is not an XOR in the way that XOR would typically work since, as I said, it does not exclude the case where both parts have proofs. What Daniel Wagner proposes in this answer, on the other hand, is much closer to an XOR.
Either is kind of like an exclusive OR in terms of what its possible witnesses are. On the other hand, it is like an inclusive OR when you consider whether or not you can actually create a witness in four possible scenarios: having a proof of P and a refutation of Q, having a proof of Q and a refutation of P, having a proof of both or having a refutation of both.[1] While you can construct a value of type Either P Q when you have a proof of both P and Q (similar to an inclusive OR), you cannot distinguish this situation from the situation where only P has a proof or only Q has a proof using only a value of type Either P Q (kind of similar to an exclusive OR). Daniel Wagner's solution, on the other hand, is similar to exclusive OR on both construction and deconstruction.
It is also worth mentioning that These more explicitly represents the possibility of both having proofs. These is similar to inclusive OR on both construction and deconstruction. However, it's also worth noting that there is nothing preventing you from using an "incorrect" constructor when you have a proof of both P and Q. You could extend These to be even more representative of an inclusive OR in this regard with a bit of additional complexity:
data IOR a b
= OnlyFirst a (Not b)
| OnlySecond (Not a) b
| Both a b
type Not a = a -> Void
The potential "wrong constructor" issue of These (and the lack of a "both" witness in Either) doesn't really matter if you are only interested in a proof irrelevant logical system (meaning that there is no way to distinguish between any two proofs of the same proposition), but it might matter in cases where you want more computational relevance in the logic.[2]
In the practical situation of writing computer programs that are actually meant to be executed, computational relevance is often extremely important. Even though 0 and 23 are both proofs that the Int type is inhabited, we certainly like to distinguish between the two values in programs, in general!
Regarding "construction" and "destruction"
Essentially, I just mean "creating values of a type" by construction and "pattern matching" by destruction (sometimes people use the words "introduction" and "elimination" here, particularly in the context of logic).
In the case of Daniel Wagner's solutions:
Construction: When you construct a value of type Xor A B, you must provide a proof of exactly one of A or B and a refutation of the other one. This is similar to exclusive or. It is not possible to construct a value of this unless you have a refutation of either A or B and a proof of the other one. A particularly significant fact is that you cannot construct a value of this type if you have a proof of both A and B and you don't have a refutation of either of them (unlike inclusive OR).
Destruction: When you pattern match on a value of type Xor A B, you always have a proof of one of the types and a refutation of the other. It will never give you a proof of both of them. This follows from its definition.
In the case of IOR:
Construction: When you create a value of type IOR A B, you must do exactly one of the following: (1) provide only a proof of A and a refutation of B, (2) provide a proof of B and a refutation of B, (3) provide a proof of both A and B. This is like inclusive OR. These three possibilities correspond exactly to each of the three constructors of IOR, with no overlap. Note that, unlike the situation with These, you cannot use the "incorrect constructor" in the case where you have a proof of both A and B: the only way to make a value of type IOR A B in this case is to use Both (since you would otherwise need to provide a refutation of either A or B).
Destruction: Since the three possible situations where you have a proof of at least one of A and B are exactly represented by IOR, with a separate constructor for each (and no overlap between the constructors), you will always know exactly which of A and B are true and which is false (if applicable) by pattern matching on it.
Pattern matching on IOR
Pattern matching on IOR works exactly like pattern matching on any other algebraic datatype. Here is an example:
x :: IOR Char Int
x = Both 'c' 3
y :: IOR Char Void
y = OnlyFirst 'a' (\v -> v)
f :: Not p -> IOR p Int
f np = OnlySecond np 7
z :: IOR Void Int
z = f notVoid
g :: IOR p Int -> Int
g w =
case w of
OnlyFirst p q -> -1
OnlySecond p q -> q
Both p q -> q
-- We can show that the proposition represented by "Void" is indeed false:
notVoid :: Not Void
notVoid = \v -> v
Then a sample GHCi session, with the above code loaded:
ghci> g x
3
ghci> g z
7
[1]This gets a bit more complex when you consider that some statements are undecidable and therefore you cannot construct a proof or a refutation for them.
[2]Homotopy type theory would be one example of a proof relevant system, but this is reaching the limit of my knowledge as of now.

The confusion stems from the Boolean truth-table exposition of logic. In particular, when both arguments are True, OR is True, whereas XOR is False. Logically it means that to prove OR it's enough to provide the proof of one of the arguments; but it's okay if the other one is True as well--we just don't care.
In Curry-Howard interpretation, if somebody gives you an element of Either a b, and you were able to extract the value of a from it, you still know nothing about b. It could be inhabited or not.
On the other hand, to prove XOR, you not only need the proof of one argument, you must also provide the proof of the falsehood of the other argument.
So, with Curry-Howard interpretation, if somebody gives you an element of Xor a b and you were able to extract the value of a from it, you would conclude that b is uninhabited (that is, isomorphic to Void). Conversely, if you were able to extract the value of b, then you'd know that a was uninhabited.
The proof of falsehood of a is a function a->Void. Such a function would be able to produce a value of Void, given a value of a, which is clearly impossible. So there can be no values of a. (There is only one function that returns Void, and that's the identity on Void.)

Perhaps try replacing “proof” in the Curry-Howard isomorphism with “evidence”.
Below I will use italics for propositions and proofs (which I will also call evidence), the mathematical side of the isomorphism, and I will use code for types and values.
The question is: suppose I know the type for [values corresponding to] evidence that P is true (I will call this type P), and I know the type for evidence that Q is true (I call this type Q), then what is the type for evidence of the proposition R = P OR Q?
Well there are two ways to prove R: we can prove P, or we can prove Q. We could prove both but that would be more work than necessary.
Now ask what the type should be? It is the type for things which are either evidence of P or evidence of Q. I.e. values which are either things of type P or things of type Q. The type Either P Q contains precisely those values.
What if you have evidence of P AND Q? Well this is just a value of type (P, Q), and we can write a simple function:
f :: (p,q) -> Either p q
f (a,b) = Left a
And this gives us a way to prove P OR Q if we can prove P AND Q. Therefore Either cannot correspond to xor.
What is the type for P XOR Q?
At this point I will say that negations are a bit annoying in this sort of constructive logic.
Let’s convert the question to things we understand, and a simpler thing we don’t:
P XOR Q = (P AND (NOT Q)) OR (Q AND (NOT P))
Ask now: what is the type for evidence of NOT P?
I don’t have an intuitive explanation for why this is the simplest type but if NOT P were true then evidence of P being true would be a contradiction, which we say as proving FALSE, the unprovable thing (aka BOTTOM or BOT). That is, NOT P may be written in simpler terms as: P IMPLIES FALSE. The type for FALSE is called Void (in haskell). It is a type which no values inhabit because there are no proofs of it. Therefore if you could construct a value of that type you would have problems. IMPLIES corresponds to functions and so the type corresponding to NOT P is P -> Void.
We put this with what we know and get the following equivalence in the language of propositions:
P XOR Q = (P AND (NOT Q)) OR (Q AND (NOT P)) = (P AND (Q IMPLIES FALSE)) OR ((P IMPLIES FALSE) AND Q)
The type is then:
type Xor p q = Either (p, q -> Void) (p -> Void, q)

How do we overcome the compile time and runtime gap when programming in a Dependently Typed Language?

I'm told that in dependent type system, "types" and "values" is mixed, and we can treat both of them as "terms" instead.
But there is something I can't understand: in a strongly typed programming language without Dependent Type (like Haskell), Types is decided (infered or checked) at compile time, but values is decided (computed or inputed) at runtime.
I think there must be a gap between these two stages. Just think that if a value is interactively read from STDIN, how can we reference this value in a type which must be decided AOT?
e.g. There is a natural number n and a list of natural number xs (which contains n elements) which I need to read from STDIN, how can I put them into a data structure Vect n Nat?

Suppose we input n :: Int at runtime from STDIN. We then read n strings, and store them into vn :: Vect n String (pretend for the moment this can be done).
Similarly, we can read m :: Int and vm :: Vect m String. Finally, we concatenate the two vectors: vn ++ vm (simplifying a bit here). This can be type checked, and will have type Vect (n+m) String.
Now it is true that the type checker runs at compile time, before the values n,m are known, and also before vn,vm are known. But this does not matter: we can still reason symbolically on the unknowns n,m and argue that vn ++ vm has that type, involving n+m, even if we do not yet know what n+m actually is.
It is not that different from doing math, where we manipulate symbolic expressions involving unknown variables according to some rules, even if we do not know the values of the variables. We don't need to know what number is n to see that n+n = 2*n.
Similarly, the type checker can type check
-- pseudocode
readNStrings :: (n :: Int) -> IO (Vect n String)
readNStrings O = return Vect.empty
readNStrings (S p) = do
s <- getLine
vp <- readNStrings p
return (Vect.cons s vp)
(Well, actually some more help from the programmer could be needed to typecheck this, since it involves dependent matching and recursion. But I'll neglect this.)
Importantly, the type checker can check that without knowing what n is.
Note that the same issue actually already arises with polymorphic functions.
fst :: forall a b. (a, b) -> a
fst (x, y) = x
test1 = fst # Int # Float (2, 3.5)
test2 = fst # String # Bool ("hi!", True)
...
One might wonder "how can the typechecker check fst without knowing what types a and b will be at runtime?". Again, by reasoning symbolically.
With type arguments this is arguably more obvious since we usually run the programs after type erasure, unlike value parameters like our n :: Int above, which can not be erased. Still, there is some similarity between universally quantifying over types or over Int.

It seems to me that there are two questions here:
Given that some values are unknown during compile-time (e.g., values read from STDIN), how can we make use of them in types? (Note that chi has already given an excellent answer to this.)
Some operations (e.g., getLine) seem to make absolutely no sense at compile-time; how could we possibly talk about them in types?
The answer to (1), as chi has said, is symbolic or abstract reasoning. You can read in a number n, and then have a procedure that builds a Vect n Nat by reading from the command line n times, making use of arithmetic properties such as the fact that 1+(n-1) = n for nonzero natural numbers.
The answer to (2) is a bit more subtle. Naively, you might want to say "this function returns a vector of length n, where n is read from the command line". There are two types you might try to give this (apologies if I'm getting Haskell notation wrong)
unsafePerformIO (do n <- getLine; return (IO (Vect (read n :: Int) Nat)))
or (in pseudo-Coq notation, since I'm not sure what Haskell's notation for existential types is)
IO (exists n, Vect n Nat)
These two types can actually both be made sense of, and say different things. The first type, to me, says "at compile time, read n from the command line, and return a function which, at runtime, gives a vector of length n by performing IO". The second type says "at runtime, perform IO to get a natural number n and a vector of length n".
The way I like looking at this is that all side effects (other than, perhaps, non-termination) are monad transformers, and there is only one monad: the "real-world" monad. Monad transformers work just as well at the type level as at the term level; the one thing which is special is run :: M a -> a which executes the monad (or stack of monad transformers) in the "real world". There are two points in time at which you can invoke run: one is at compile time, where you invoke any instance of run which shows up at the type level. Another is at runtime, where you invoke any instance of run which shows up at the value level. Note that run only makes sense if you specify an evaluation order; if your language does not specify whether it is call-by-value or call-by-name (or call-by-push-value or call-by-need or call-by-something-else), you can get incoherencies when you try to compute a type.

forall as an intersection over those sets

I have been reading the existential section on Wikibooks and this is what is stated there:
Firstly, forall really does mean 'for all'. One way of thinking about
types is as sets of values with that type, for example, Bool is the
set {True, False, ⊥} (remember that bottom, ⊥, is a member of every
type!), Integer is the set of integers (and bottom), String is the set
of all possible strings (and bottom), and so on. forall serves as an
intersection over those sets. For example, forall a. a is the intersection over all types, which must be {⊥}, that is, the type (i.e. set) whose only value (i.e. element) is bottom.
How does forall serve as an intersection over those sets ?
forall in formal logic means that it can be any value from the universe of discourse. How does in Haskell it gets translated to intersection ?

Haskell's forall-s can be viewed as restricted dependent function types, which I think is the conceptually most enlightening approach and also most amenable to set-theoretic or logical interpretations.
In a dependent language one can bind the values of arguments in function types, and mention those values in the return types.
-- Idris
id : (a : Type) -> (a -> a)
id _ x = x
-- Can also leave arguments implicit (to be inferred)
id : a -> a
id x = x
-- Generally, an Idris function type has the form "(x : A) -> F x"
-- where A is a type (or kind/sort, or any level really) and F is
-- a function of type "A -> Type"
-- Haskell
id :: forall (a : *). (a -> a)
id x = x
The crucial difference is that Haskell can only bind types, lifted kinds, and type constructors, using forall, while dependent languages can bind anything.
In the literature dependent functions are called dependent products. Why call them that, when they are, well, functions? It turns out that we can implement Haskell's algebraic product types using only dependent functions.
Generally, any function a -> b can be viewed as a lookup function for some product, where the keys have type a and the elements have type b. Bool -> Int can be interpreted as a pair of Int-s. This interpretation is not very interesting for non-dependent functions, since all the product fields must be of the same type. With dependent functions, our pair can be properly polymorphic:
Pair : Type -> Type -> Type
Pair a b = (index : Bool) -> (if index then a else b)
fst : Pair a b -> a
fst pair = pair True
snd : Pair a b -> b
snd pair = pair False
setFst : c -> Pair a b -> Pair c b
setFst c pair = \index -> if index then c else pair False
setSnd : c -> Pair a b -> Pair a c
setSnd c pair = \index -> if index then pair True else c
We have recovered all the essential functionality of pairs here. Also, using Pair we can build up products of arbitrary arity.
So, how does is tie in to the interpretation of forall-s? Well, we can interpret ordinary products and build up some intuition for them, and then try to transfer that to forall-s.
So, let's look a bit first at the algebra of ordinary products. Algebraic types are called algebraic because we can determine the number of their values by algebra. Link to detailed explanation. If A has |A| number of values and B has |B| number of values, then Pair A B has |A| * |B| number of possible values. With sum types we sum the number of inhabitants. Since A -> B can be viewed as a product with |A| fields, all having type B, the number of the inhabitants of A -> B is |A| number of |B|-s multiplied together, which equals |B|^|A|. Hence the name "exponential type" that is sometimes used to denote functions. When the function is dependent, we fall back to the "product over some number of different types" interpretation, since the exponential formula no longer fits.
Armed with this understanding, we can interpret forall (a :: *). t as a product type with indices of type * and fields having type t, where a might be mentioned inside t, and thus the field types may vary depending on the choice of a. We can look up the fields by making Haskell infer some particular type for the forall, effectively applying the function to the type argument.
Note that this product has as many fields as many values of indices there are, which is pretty much infinite here, considering the potential number of Haskell types.

You have to view types in either negative or positive context—i.e. either in the process of construction or the process of use (have/receive and this is all probably best understood in Game Semantics, but I am not familiar with them).
If I "give you" a type forall a . a then you know I must have constructed it somehow. The only way for a particular constructed value to have the type forall a . a is that it could be a stand-in "for all" types in the universe of discourse—which is, of course, the intersection of their functionality. In sane languages no such value exists (Void), but in Haskell we have bottom.
bottom :: forall a . a
bottom = let a = a in a
On the other hand, if I somehow magically have a value of forall a . a and I attempt to use it then we get the opposite effect—I can treat it as anything in the union of all types in the universe of discourse (which is what you were looking for) and thus I have
absurd :: (forall a . a) -> b
absurd a = a

How does forall serve as an intersection over those sets ?
Here you may benefit from starting to read a bit about the Curry-Howard correspondence. To make a long story short, you can think of a type as a logical proposition, language expressions as proofs of their types, and values as normal form proofs (proofs that cannot be simplified any further). So for example, "Hello world!" :: String would be read as ""Hello world!" is a proof of the proposition String."
So now think of forall a. a as a proposition. Intuitively, think of this as a second-order quantified statement over a propositional variable: "For all statements a, a." It's basically asserting all propositions. This means that if x is a proof of forall a. a, then for any proposition P, x is also a proof of P. So, since the proofs of forall a. a are the proofs that prove any propositions, then it must follow that the proofs of forall a. a must be the same as what you'd get if you mapped each proposition to the set of its proofs and took their intersection. And the only normal-form proof (i.e. "value") that is common to all those sets is bottom.
Another informal way to look at it is that universal quantification is like an infinite conjunction (∀x.P(x) is like P(c0) ∧ P(c1) ∧ ...). Conjunction, seen from a model-theoretical view, is set intersection; the set of evaluation environments where A ∧ B is true is the intersection of the environments where A is true and the ones where B is true.

Functions don't just have types: They ARE Types. And Kinds. And Sorts. Help put a blown mind back together

I was doing my usual "Read a chapter of LYAH before bed" routine, feeling like my brain was expanding with every code sample. At this point I was convinced that I understood the core awesomeness of Haskell, and now just had to understand the standard libraries and type classes so that I could start writing real software.
So I was reading the chapter about applicative functors when all of a sudden the book claimed that functions don't merely have types, they are types, and can be treated as such (For example, by making them instances of type classes). (->) is a type constructor like any other.
My mind was blown yet again, and I immediately jumped out of bed, booted up the computer, went to GHCi and discovered the following:
Prelude> :k (->)
(->) :: ?? -> ? -> *
What on earth does it mean?
If (->) is a type constructor, what are the value constructors? I can take a guess, but would have no idea how define it in traditional data (->) ... = ... | ... | ... format. It's easy enough to do this with any other type constructor: data Either a b = Left a | Right b. I suspect my inability to express it in this form is related to the extremly weird type signature.
What have I just stumbled upon? Higher kinded types have kind signatures like * -> * -> *. Come to think of it... (->) appears in kind signatures too! Does this mean that not only is it a type constructor, but also a kind constructor? Is this related to the question marks in the type signature?
I have read somewhere (wish I could find it again, Google fails me) about being able to extend type systems arbitrarily by going from Values, to Types of Values, to Kinds of Types, to Sorts of Kinds, to something else of Sorts, to something else of something elses, and so on forever. Is this reflected in the kind signature for (->)? Because I've also run into the notion of the Lambda cube and the calculus of constructions without taking the time to really investigate them, and if I remember correctly it is possible to define functions that take types and return types, take values and return values, take types and return values, and take values which return types.
If I had to take a guess at the type signature for a function which takes a value and returns a type, I would probably express it like this:
a -> ?
or possibly
a -> *
Although I see no fundamental immutable reason why the second example couldn't easily be interpreted as a function from a value of type a to a value of type *, where * is just a type synonym for string or something.
The first example better expresses a function whose type transcends a type signature in my mind: "a function which takes a value of type a and returns something which cannot be expressed as a type."

You touch so many interesting points in your question, so I am
afraid this is going to be a long answer :)
Kind of (->)
The kind of (->) is * -> * -> *, if we disregard the boxity GHC
inserts. But there is no circularity going on, the ->s in the
kind of (->) are kind arrows, not function arrows. Indeed, to
distinguish them kind arrows could be written as (=>), and then
the kind of (->) is * => * => *.
We can regard (->) as a type constructor, or maybe rather a type
operator. Similarly, (=>) could be seen as a kind operator, and
as you suggest in your question we need to go one 'level' up. We
return to this later in the section Beyond Kinds, but first:
How the situation looks in a dependently typed language
You ask how the type signature would look for a function that takes a
value and returns a type. This is impossible to do in Haskell:
functions cannot return types! You can simulate this behaviour using
type classes and type families, but let us for illustration change
language to the dependently typed language
Agda. This is a
language with similar syntax as Haskell where juggling types together
with values is second nature.
To have something to work with, we define a data type of natural
numbers, for convenience in unary representation as in
Peano Arithmetic.
Data types are written in
GADT style:
data Nat : Set where
Zero : Nat
Succ : Nat -> Nat
Set is equivalent to * in Haskell, the "type" of all (small) types,
such as Natural numbers. This tells us that the type of Nat is
Set, whereas in Haskell, Nat would not have a type, it would have
a kind, namely *. In Agda there are no kinds, but everything has
a type.
We can now write a function that takes a value and returns a type.
Below is a the function which takes a natural number n and a type,
and makes iterates the List constructor n applied to this
type. (In Agda, [a] is usually written List a)
listOfLists : Nat -> Set -> Set
listOfLists Zero a = a
listOfLists (Succ n) a = List (listOfLists n a)
Some examples:
listOfLists Zero Bool = Bool
listOfLists (Succ Zero) Bool = List Bool
listOfLists (Succ (Succ Zero)) Bool = List (List Bool)
We can now make a map function that operates on listsOfLists.
We need to take a natural number that is the number of iterations
of the list constructor. The base cases are when the number is
Zero, then listOfList is just the identity and we apply the function.
The other is the empty list, and the empty list is returned.
The step case is a bit move involving: we apply mapN to the head
of the list, but this has one layer less of nesting, and mapN
to the rest of the list.
mapN : {a b : Set} -> (a -> b) -> (n : Nat) ->
listOfLists n a -> listOfLists n b
mapN f Zero x = f x
mapN f (Succ n) [] = []
mapN f (Succ n) (x :: xs) = mapN f n x :: mapN f (Succ n) xs
In the type of mapN, the Nat argument is named n, so the rest of
the type can depend on it. So this is an example of a type that
depends on a value.
As a side note, there are also two other named variables here,
namely the first arguments, a and b, of type Set. Type
variables are implicitly universally quantified in Haskell, but
here we need to spell them out, and specify their type, namely
Set. The brackets are there to make them invisible in the
definition, as they are always inferable from the other arguments.
Set is abstract
You ask what the constructors of (->) are. One thing to point out
is that Set (as well as * in Haskell) is abstract: you cannot
pattern match on it. So this is illegal Agda:
cheating : Set -> Bool
cheating Nat = True
cheating _ = False
Again, you can simulate pattern matching on types constructors in
Haskell using type families, one canoical example is given on
Brent Yorgey's blog.
Can we define -> in the Agda? Since we can return types from
functions, we can define an own version of -> as follows:
_=>_ : Set -> Set -> Set
a => b = a -> b
(infix operators are written _=>_ rather than (=>)) This
definition has very little content, and is very similar to doing a
type synonym in Haskell:
type Fun a b = a -> b
Beyond kinds: Turtles all the way down
As promised above, everything in Agda has a type, but then
the type of _=>_ must have a type! This touches your point
about sorts, which is, so to speak, one layer above Set (the kinds).
In Agda this is called Set1:
FunType : Set1
FunType = Set -> Set -> Set
And in fact, there is a whole hierarchy of them! Set is the type of
"small" types: data types in haskell. But then we have Set1,
Set2, Set3, and so on. Set1 is the type of types which mentions
Set. This hierarchy is to avoid inconsistencies such as Girard's
paradox.
As noticed in your question, -> is used for types and kinds in
Haskell, and the same notation is used for function space at all
levels in Agda. This must be regarded as a built in type operator,
and the constructors are lambda abstraction (or function
definitions). This hierarchy of types is similar to the setting in
System F omega, and more
information can be found in the later chapters of
Pierce's Types and Programming Languages.
Pure type systems
In Agda, types can depend on values, and functions can return types,
as illustrated above, and we also had an hierarchy of
types. Systematic investigation of different systems of the lambda
calculi is investigated in more detail in Pure Type Systems. A good
reference is
Lambda Calculi with Types by Barendregt,
where PTS are introduced on page 96, and many examples on page 99 and onwards.
You can also read more about the lambda cube there.

Firstly, the ?? -> ? -> * kind is a GHC-specific extension. The ? and ?? are just there to deal with unboxed types, which behave differently from just * (which has to be boxed, as far as I know). So ?? can be any normal type or an unboxed type (e.g. Int#); ? can be either of those or an unboxed tuple. There is more information here: Haskell Weird Kinds: Kind of (->) is ?? -> ? -> *
I think a function can't return an unboxed type because functions are lazy. Since a lazy value is either a value or a thunk, it has to be boxed. Boxed just means it is a pointer rather than just a value: it's like Integer() vs int in Java.
Since you are probably not going to be using unboxed types in LYAH-level code, you can imagine that the kind of -> is just * -> * -> *.
Since the ? and ?? are basically just more general version of *, they do not have anything to do with sorts or anything like that.
However, since -> is just a type constructor, you can actually partially apply it; for example, (->) e is an instance of Functor and Monad. Figuring out how to write these instances is a good mind-stretching exercise.
As far as value constructors go, they would have to just be lambdas (\ x ->) or function declarations. Since functions are so fundamental to the language, they get their own syntax.

Abusing the algebra of algebraic data types - why does this work?

The 'algebraic' expression for algebraic data types looks very suggestive to someone with a background in mathematics. Let me try to explain what I mean.
Having defined the basic types
Product •
Union +
Singleton X
Unit 1
and using the shorthand X² for X•X and 2X for X+X et cetera, we can then define algebraic expressions for e.g. linked lists
data List a = Nil | Cons a (List a) ↔ L = 1 + X • L
and binary trees:
data Tree a = Nil | Branch a (Tree a) (Tree a) ↔ T = 1 + X • T²
Now, my first instinct as a mathematician is to go nuts with these expressions, and try to solve for L and T. I could do this through repeated substitution, but it seems much easier to abuse the notation horrifically and pretend I can rearrange it at will. For example, for a linked list:
L = 1 + X • L
(1 - X) • L = 1
L = 1 / (1 - X) = 1 + X + X² + X³ + ...
where I've used the power series expansion of 1 / (1 - X) in a totally unjustified way to derive an interesting result, namely that an L type is either Nil, or it contains 1 element, or it contains 2 elements, or 3, etc.
It gets more interesting if we do it for binary trees:
T = 1 + X • T²
X • T² - T + 1 = 0
T = (1 - √(1 - 4 • X)) / (2 • X)
T = 1 + X + 2 • X² + 5 • X³ + 14 • X⁴ + ...
again, using the power series expansion (done with Wolfram Alpha). This expresses the non-obvious (to me) fact that there is only one binary tree with 1 element, 2 binary trees with two elements (the second element can be on the left or the right branch), 5 binary trees with three elements etc.
So my question is - what am I doing here? These operations seem unjustified (what exactly is the square root of an algebraic data type anyway?) but they lead to sensible results. does the quotient of two algebraic data types have any meaning in computer science, or is it just notational trickery?
And, perhaps more interestingly, is it possible to extend these ideas? Is there a theory of the algebra of types that allows, for example, arbitrary functions on types, or do types require a power series representation? If you can define a class of functions, then does composition of functions have any meaning?

Disclaimer: A lot of this doesn't really work quite right when you account for ⊥, so I'm going to blatantly disregard that for the sake of simplicity.
A few initial points:
Note that "union" is probably not the best term for A+B here--that's specifically a disjoint union of the two types, because the two sides are distinguished even if their types are the same. For what it's worth, the more common term is simply "sum type".
Singleton types are, effectively, all unit types. They behave identically under algebraic manipulations and, more importantly, the amount of information present is still preserved.
You probably want a zero type as well. Haskell provides that as Void. There are no values whose type is zero, just as there is one value whose type is one.
There's still one major operation missing here but I'll get back to that in a moment.
As you've probably noticed, Haskell tends to borrow concepts from Category Theory, and all of the above has a very straightforward interpretation as such:
Given objects A and B in Hask, their product A×B is the unique (up to isomorphism) type that allows two projections fst : A×B → A and snd : A×B → B, where given any type C and functions f : C → A, g : C → B you can define the pairing f &&& g : C → A×B such that fst ∘ (f &&& g) = f and likewise for g. Parametricity guarantees the universal properties automatically and my less-than-subtle choice of names should give you the idea. The (&&&) operator is defined in Control.Arrow, by the way.
The dual of the above is the coproduct A+B with injections inl : A → A+B and inr : B → A+B, where given any type C and functions f : A → C, g : B → C, you can define the copairing f ||| g : A+B → C such that the obvious equivalences hold. Again, parametricity guarantees most of the tricky parts automatically. In this case, the standard injections are simply Left and Right and the copairing is the function either.
Many of the properties of product and sum types can be derived from the above. Note that any singleton type is a terminal object of Hask and any empty type is an initial object.
Returning to the aforementioned missing operation, in a cartesian closed category you have exponential objects that correspond to arrows of the category. Our arrows are functions, our objects are types with kind *, and the type A -> B indeed behaves as BA in the context of algebraic manipulation of types. If it's not obvious why this should hold, consider the type Bool -> A. With only two possible inputs, a function of that type is isomorphic to two values of type A, i.e. (A, A). For Maybe Bool -> A we have three possible inputs, and so on. Also, observe that if we rephrase the copairing definition above to use algebraic notation, we get the identity CA × CB = CA+B.
As for why this all makes sense--and in particular why your use of the power series expansion is justified--note that much of the above refers to the "inhabitants" of a type (i.e., distinct values having that type) in order to demonstrate the algebraic behavior. To make that perspective explicit:
The product type (A, B) represents a value each from A and B, taken independently. So for any fixed value a :: A, there is one value of type (A, B) for each inhabitant of B. This is of course the cartesian product, and the number of inhabitants of the product type is the product of the number of inhabitants of the factors.
The sum type Either A B represents a value from either A or B, with the left and right branches distinguished. As mentioned earlier, this is a disjoint union, and the number of inhabitants of the sum type is the sum of the number of inhabitants of the summands.
The exponential type B -> A represents a mapping from values of type B to values of type A. For any fixed argument b :: B, any value of A can be assigned to it; a value of type B -> A picks one such mapping for each input, which is equivalent to a product of as many copies of A as B has inhabitants, hence the exponentiation.
While it's tempting at first to treat types as sets, that doesn't actually work very well in this context--we have disjoint union rather than the standard union of sets, there's no obvious interpretation of intersection or many other set operations, and we don't usually care about set membership (leaving that to the type checker).
On the other hand, the constructions above spend a lot of time talking about counting inhabitants, and enumerating the possible values of a type is a useful concept here. That quickly leads us to enumerative combinatorics, and if you consult the linked Wikipedia article you'll find that one of the first things it does is define "pairs" and "unions" in exactly the same sense as product and sum types by way of generating functions, then does the same for "sequences" that are identical to Haskell's lists using exactly the same technique you did.
Edit: Oh, and here's a quick bonus that I think demonstrates the point strikingly. You mentioned in a comment that for a tree type T = 1 + T^2 you can derive the identity T^6 = 1, which is clearly wrong. However, T^7 = T does hold, and a bijection between trees and seven-tuples of trees can be constructed directly, cf. Andreas Blass's "Seven Trees in One".
Edit×2: On the subject of the "derivative of a type" construction mentioned in other answers, you might also enjoy this paper from the same author which builds on the idea further, including notions of division and other interesting whatnot.

Binary trees are defined by the equation T=1+XT^2 in the semiring of types. By construction, T=(1-sqrt(1-4X))/(2X) is defined by the same equation in the semiring of complex numbers. So given that we're solving the same equation in the same class of algebraic structure it actually shouldn't be surprising that we see some similarities.
The catch is that when we reason about polynomials in the semiring of complex numbers we typically use the fact that the complex numbers form a ring or even a field so we find ourselves using operations such as subtraction that don't apply to semirings. But we can often eliminate subtractions from our arguments if we have a rule that allows us to cancel from both sides of an equation. This is the kind of thing proved by Fiore and Leinster showing that many arguments about rings can be transferred to semirings.
This means that lots of your mathematical knowledge about rings can be reliably transferred to types. As a result, some arguments involving complex numbers or power series (in the ring of formal power series) can carry over to types in a completely rigorous way.
However there's more to the story than this. It's one thing to prove two types are equal (say) by showing two power series are equal. But you can also deduce information about types by inspecting the terms in the power series. I'm not sure of what the formal statement here should be. (I recommend Brent Yorgey's paper on combinatorial species for some work that's closely related but species are not the same as types.)
What I find utterly mind blowing is that what you've discovered can be extended to calculus. Theorems about calculus can be transferred over to the semiring of types. In fact, even arguments about finite differences can be transferred over and you find that classical theorems from numerical analysis have interpretations in type theory.
Have fun!

Calculus and Maclaurin series with types
Here is another minor addition - a combinatorial insight into why the coefficients in a series expansion should 'work', in particular focusing on series which can be derived using the Taylor-Maclaurin approach from calculus. NB: the example series expansion you give of the manipulated list type is a Maclaurin series.
Since other answers and comments deal with the behaviour of algebraic type expressions (sums, products and exponents), this answer will elide that detail and focus on type 'calculus'.
You may notice inverted commas doing some heavy lifting in this answer. There are two reasons:
we are in the business of giving interpretations from one domain to entities from another and it seems appropriate to delimit such such foreign notions in this way.
some notions will be able to be formalised more rigorously, but the shape and ideas seem more important (and take less space to write) than the details.
Definition of Maclaurin series
The Maclaurin series of a function f : ℝ → ℝ is defined as
f(0) + f'(0)X + (1/2)f''(0)X² + ... + (1/n!)f⁽ⁿ⁾(0)Xⁿ + ...
where f⁽ⁿ⁾ means the nth derivative of f.
To be able to make sense of the Maclaurin series as interpreted with types, we need to understand how we can interpret three things in a type context:
a (possibly multiple) derivative
applying a function to 0
terms like (1/n!)
and it turns out that these concepts from analysis have suitable counterparts in the type world.
What do I mean by a 'suitable counterpart'? It should have the flavour of an isomorphism - if we can preserve truth in both directions, facts derivable in one context can be transferred to the other.
Calculus with types
So what does the derivative of a type expression mean? It turns out that for a large and well-behaved ('differentiable') class of type expressions and functors, there is a natural operation which behaves similarly enough to be a suitable interpretation!
To spoil the punchline, the operation analogous to differentiation is that of making 'one-hole contexts'. This is an excellent place to expand on this particular point further but the basic concept of a one-hole context (da/dx) is that it represents the result of extracting a single subitem of a particular type (x) from a term (of type a), preserving all other information, including that necessary to determine the original location of the subitem. For example, one way to represent a one-hole context for a list is with two lists: one for items which came before the extracted one, and one for items which came after.
The motivation for identifying this operation with differentiation comes from the following observations. We write da/dx to mean the type of one-hole contexts for type a with hole of type x.
d1/dx = 0
dx/dx = 1
d(a + b)/dx = da/dx + db/dx
d(a × b)/dx = a × db/dx + b × da/dx
d(g(f(x))/dx = d(g(y))/dy[f(x)/a] × df(x)/dx
Here, 1 and 0 represent types with exactly one and exactly zero inhabitants, respectively, and + and × represent sum and product types as usual. f and g are used to represent type functions, or type expression formers, and [f(x)/a] means the operation of substituting f(x) for every a in the preceding expression.
This may be written in a point-free style, writing f' to mean the derivative function of type function f, thus:
(x ↦ 1)' = x ↦ 0
(x ↦ x)' = x ↦ 1
(f + g)' = f' + g'
(f × g)' = f × g' + g × f'
(g ∘ f)' = (g' ∘ f) × f'
which may be preferable.
NB the equalities can be made rigorous and exact if we define derivatives using isomorphism classes of types and functors.
Now, we notice in particular that the rules in calculus pertaining to the algebraic operations of addition, multiplication and composition (often called the Sum, Product and Chain rules) are reflected exactly by the operation of 'making a hole'. Further, the base cases of 'making a hole' in a constant expression or the termx itself also behave as differentiation, so by induction we get differentiation-like behaviour for all algebraic type expressions.
Now we can interpret differentiation, what does the nth 'derivative' of a type expression, dⁿe/dxⁿ mean? It is a type representing n-place contexts: terms which, when 'filled' with n terms of type x yield an e. There is another key observation related to '(1/n!)' coming later.
The invariant part of a type functor: applying a function to 0
We already have an interpretation for 0 in the type world: an empty type with no members. What does it mean, from a combinatorial point of view, to apply a type function to it? In more concrete terms, supposing f is a type function, what does f(0) look like? Well, we certainly don't have access to anything of type 0, so any constructions of f(x) which require an x are unavailable. What is left is those terms which are accessible in their absence, which we can call the 'invariant' or 'constant' part of the type.
For an explicit example, take the Maybe functor, which can be represented algebraically as x ↦ 1 + x. When we apply this to 0, we get 1 + 0 - it's just like 1: the only possible value is the None value. For a list, similarly, we get just the term corresponding to the empty list.
When we bring it back and interpret the type f(0) as a number it can be thought of as the count of how many terms of type f(x) (for any x) can be obtained without access to an x: that is, the number of 'empty-like' terms.
Putting it together: complete interpretation of a Maclaurin series
I'm afraid I can't think of an appropriate direct interpretation of (1/n!) as a type.
If we consider, though, the type f⁽ⁿ⁾(0) in light of the above, we see that it can be interpreted as the type of n-place contexts for a term of type f(x) which do not already contain an x - that is, when we 'integrate' them n times, the resulting term has exactly n xs, no more, no less. Then the interpretation of the type f⁽ⁿ⁾(0) as a number (as in the coefficients of the Maclaurin series of f) is simply a count of how many such empty n-place contexts there are. We are nearly there!
But where does (1/n!) end up? Examining the process of type 'differentiation' shows us that, when applied multiple times, it preserves the 'order' in which subterms are extracted. For example, consider the term (x₀, x₁) of type x × x and the operation of 'making a hole' in it twice. We get both sequences
(x₀, x₁) ↝ (_₀, x₁) ↝ (_₀, _₁)
(x₀, x₁) ↝ (x₀, _₀) ↝ (_₁, _₀)
(where _ represents a 'hole')
even though both come from the same term, because there are 2! = 2 ways to take two elements from two, preserving order. In general, there are n! ways to take n elements from n. So in order to get a count of the number of configurations of a functor type which have n elements, we have to count the type f⁽ⁿ⁾(0) and divide by n!, exactly as in the coefficients of the Maclaurin series.
So dividing by n! turns out to be interpretable simply as itself.
Final thoughts: 'recursive' definitions and analyticity
First, some observations:
if a function f : ℝ → ℝ has a derivative, this derivative is unique
similarly, if a function f : ℝ → ℝ is analytic, it has exactly one corresponding polynomial series
Since we have the chain rule, we can use implicit differentiation, if we formalise type derivatives as isomorphism classes. But implicit differentiation doesn't require any alien manoeuvres like subtraction or division! So we can use it to analyse recursive type definitions. To take your list example, we have
L(X) ≅ 1 + X × L(X)
L'(X) = X × L'(X) + L(X)
and then we can evaluate
L'(0) = L(0) = 1
to obtain the coefficient of X¹ in the Maclaurin series.
But since we are confident that these expressions are indeed strictly 'differentiable', if only implicitly, and since we have the correspondence with functions ℝ → ℝ, where derivatives are certainly unique, we can rest assured that even if we obtain the values using 'illegal' operations, the result is valid.
Now, similarly, to use the second observation, due to the correspondence (is it a homomorphism?) with functions ℝ → ℝ, we know that, provided we are satisfied that a function has a Maclaurin series, if we can find any series at all, the principles outlined above can be applied to make it rigorous.
As for your question about composition of functions, I suppose the chain rule provides a partial answer.
I'm not certain how many Haskell-style ADTs this applies to, but I suspect it is many if not all. I have discovered a truly marvellous proof of this fact, but this margin is too small to contain it...
Now, certainly this is only one way to work out what is going on here and there are probably many other ways.
Summary: TL;DR
type 'differentiation' corresponds to 'making a hole'.
applying a functor to 0 gets us the 'empty-like' terms for that functor.
Maclaurin power series therefore (somewhat) rigorously correspond to enumerating the number of members of a functor type with a certain number of elements.
implicit differentiation makes this more watertight.
uniqueness of derivatives and uniqueness of power series mean we can fudge the details and it works.

It seems that all you're doing is expanding the recurrence relation.
L = 1 + X • L
L = 1 + X • (1 + X • (1 + X • (1 + X • ...)))
= 1 + X + X^2 + X^3 + X^4 ...
T = 1 + X • T^2
L = 1 + X • (1 + X • (1 + X • (1 + X • ...^2)^2)^2)^2
= 1 + X + 2 • X^2 + 5 • X^3 + 14 • X^4 + ...
And since the rules for the operations on the types work like the rules for arithmetic operations, you can use algebraic means to help you figure out how to expand the recurrence relation (since it is not obvious).

I don't have a complete answer, but these manipulations tend to 'just work'. A relevant paper might be Objects of Categories as Complex Numbers by Fiore and Leinster - I came across that one while reading sigfpe's blog on a related subject ; the rest of that blog is a goldmine for similar ideas and is worth checking out!
You can also differentiate datatypes, by the way - that will get you the appropriate Zipper for the datatype!

The Algebra of Communicating Processes (ACP) deals with similar kinds of expressions for processes.
It offers addition and multiplication as operators for choice and sequence, with associated neutral elements.
Based on these there are operators for other constructs, such as parallelism and disruption.
See http://en.wikipedia.org/wiki/Algebra_of_Communicating_Processes. There is also a paper online named "A Brief History of Process Algebra".
I am working on extending programming languages with ACP. Last April I presented a research paper at Scala Days 2012, available at http://code.google.com/p/subscript/
At the conference I demonstrated a debugger running a parallel recursive specification of a bag:
Bag = A; (Bag&a)
where A and a stand for input and output actions; the semicolon and ampersand stand for sequence and parallelism. See the video at SkillsMatter, reachable from the previous link.
A bag specification more comparable to
L = 1 + X•L
would be
B = 1 + X&B
ACP defines parallelism in terms of choice and sequence using axioms; see the Wikipedia article. I wonder what the bag analogy would be for
L = 1 / (1-X)
ACP style programming is handy for text parsers and GUI controllers. Specifications such as
searchCommand = clicked(searchButton) + key(Enter)
cancelCommand = clicked(cancelButton) + key(Escape)
may be written down more concisely by making the two refinements "clicked" and "key" implicit (like what Scala allows with functions). Hence we can write:
searchCommand = searchButton + Enter
cancelCommand = cancelButton + Escape
The right hand sides now contains operands that are data, rather than processes. At this level it is not necessary needed to know what implicit refinements will turn these operands into processes; they would not necessarily refine into input actions; output actions would also apply, e.g. in the specification of a test robot.
Processes get this way data as companions; thus I coin the term "item algebra".

Dependent type theory and 'arbitrary' type functions
My first answer to this question was high on concepts and low on details and reflected on the subquestion, 'what is going on?'; this answer will be the same but focused on the subquestion, 'can we get arbitrary type functions?'.
One extension to the algebraic operations of sum and product are the so called 'large operators', which represent the sum and product of a sequence (or more generally, the sum and product of a function over a domain) usually written Σ and Π respectively. See Sigma Notation.
So the sum
a₀ + a₁X + a₂X² + ...
might be written
Σ[i ∈ ℕ]aᵢXⁱ
where a is some sequence of real numbers, for example. The product would be represented similarly with Π instead of Σ.
When you look from a distance, this kind of expression looks a lot like an 'arbitrary' function in X; we are limited of course to expressible series, and their associated analytic functions. Is this a candidate for a representation in a type theory? Definitely!
The class of type theories which have immediate representations of these expressions is the class of 'dependent' type theories: theories with dependent types. Naturally we have terms dependent on terms, and in languages like Haskell with type functions and type quantification, terms and types depending on types. In a dependent setting, we additionally have types depending on terms. Haskell is not a dependently typed language, although many features of dependent types can be simulated by torturing the language a bit.
Curry-Howard and dependent types
The 'Curry-Howard isomorphism' started life as an observation that the terms and type-judging rules of simply-typed lambda calculus correspond exactly to natural deduction (as formulated by Gentzen) applied to intuitionistic propositional logic, with types taking the place of propositions, and terms taking the place of proofs, despite the two being independently invented/discovered. Since then, it has been a huge source of inspiration for type theorists. One of the most obvious things to consider is whether, and how, this correspondence for propositional logic can be extended to predicate or higher order logics. Dependent type theories initially arose from this avenue of exploration.
For an introduction to the Curry-Howard isomorphism for simply-typed lambda calculus, see here. As an example, if we want to prove A ∧ B we must prove A and prove B; a combined proof is simply a pair of proofs: one for each conjunct.
In natural deduction:
Γ ⊢ A Γ ⊢ B
Γ ⊢ A ∧ B
and in simply-typed lambda calculus:
Γ ⊢ a : A Γ ⊢ b : B
Γ ⊢ (a, b) : A × B
Similar correspondences exist for ∨ and sum types, → and function types, and the various elimination rules.
An unprovable (intuitionistically false) proposition corresponds to an uninhabited type.
With the analogy of types as logical propositions in mind, we can start to consider how to model predicates in the type-world. There are many ways in which this has been formalised (see this introduction to Martin-Löf's Intuitionistic Type Theory for a widely-used standard) but the abstract approach usually observes that a predicate is like a proposition with free term variables, or, alternatively, a function taking terms to propositions. If we allow type expressions to contain terms, then a treatment in lambda calculus style immediately presents itself as a possibility!
Considering only constructive proofs, what constitutes a proof of ∀x ∈ X.P(x)? We can think of it as a proof function, taking terms (x) to proofs of their corresponding propositions (P(x)). So members (proofs) of the type (proposition) ∀x : X.P(x) are 'dependent functions', which for each x in X give a term of type P(x).
What about ∃x ∈ X.P(x)? We need any member of X, x, together with a proof of P(x). So members (proofs) of the type (proposition) ∃x : X.P(x) are 'dependent pairs': a distinguished term x in X, together with a term of type P(x).
Notation:
I will use
∀x ∈ X...
for actual statements about members of the class X, and
∀x : X...
for type expressions corresponding to universal quantification over type X. Likewise for ∃.
Combinatorial considerations: products and sums
As well as the Curry-Howard correspondence of types with propositions, we have the combinatorial correspondence of algebraic types with numbers and functions, which is the main point of this question. Happily, this can be extended to the dependent types outlined above!
I will use the modulus notation
|A|
to represent the 'size' of a type A, to make explicit the correspondence outlined in the question, between types and numbers. Note that this is a concept outside of the theory; I do not claim that there need be any such operator within the language.
Let us count the possible (fully reduced, canonical) members of type
∀x : X.P(x)
which is the type of dependent functions taking terms x of type X to terms of type P(x). Each such function must have an output for every term of X, and this output must be of a particular type. For each x in X, then, this gives |P(x)| 'choices' of output.
The punchline is
|∀x : X.P(x)| = Π[x : X]|P(x)|
which of course doesn't make huge deal of sense if X is IO (), but is applicable to algebraic types.
Similarly, a term of type
∃x : X.P(x)
is the type of pairs (x, p) with p : P(x), so given any x in X we can construct an appropriate pair with any member of P(x), giving |P(x)| 'choices'.
Hence,
|∃x : X.P(x)| = Σ[x : X]|P(x)|
with the same caveats.
This justifies the common notation for dependent types in theories using the symbols Π and Σ, and indeed many theories blur the distinction between 'for all' and 'product' and between 'there is' and 'sum', due to the above-mentioned correspondences.
We are getting close!
Vectors: representing dependent tuples
Can we now encode numerical expressions like
Σ[n ∈ ℕ]Xⁿ
as type expressions?
Not quite. While we can informally consider the meaning of expressions like Xⁿ in Haskell, where X is a type and n a natural number, it's an abuse of notation; this is a type expression containing a number: distinctly not a valid expression.
On the other hand, with dependent types in the picture, types containing numbers is precisely the point; in fact, dependent tuples or 'vectors' are a very commonly-cited example of how dependent types can provide pragmatic type-level safety for operations like list access. A vector is just a list along with type-level information regarding its length: precisely what we are after for type expressions like Xⁿ.
For the duration of this answer, let
Vec X n
be the type of length-n vectors of X-type values.
Technically n here is, rather than an actual natural number, a representation in the system of a natural number. We can represent natural numbers (Nat) in Peano style as either zero (0) or the successor (S) of another natural number, and for n ∈ ℕ I write ˻n˼ to mean the term in Nat which represents n. For example, ˻3˼ is S (S (S 0)).
Then we have
|Vec X ˻n˼| = |X|ⁿ
for any n ∈ ℕ.
Nat types: promoting ℕ terms to types
Now we can encode expressions like
Σ[n ∈ ℕ]Xⁿ
as types. This particular expression would give rise to a type which is of course isomorphic to the type of lists of X, as identified in the question. (Not only that, but from a category-theoretic point of view, the type function - which is a functor - taking X to the above type is naturally isomorphic to the List functor.)
One final piece of the puzzle for 'arbitrary' functions is how to encode, for
f : ℕ → ℕ
expressions like
Σ[n ∈ ℕ]f(n)Xⁿ
so that we can apply arbitrary coefficients to a power series.
We already understand the correspondence of algebraic types with numbers, allowing us to map from types to numbers and type functions to numerical functions. We can also go the other way! - taking a natural number, there is obviously a definable algebraic type with that many term members, whether or not we have dependent types. We can easily prove this outside of the type theory by induction. What we need is a way to map from natural numbers to types, inside the system.
A pleasing realisation is that, once we have dependent types, proof by induction and construction by recursion become intimately similar - indeed they are the very same thing in many theories. Since we can prove by induction that types exist which fulfil our needs, should we not be able to construct them?
There are several ways to represent types at the term level. I will use here an imaginary Haskellish notation with * for the universe of types, itself usually considered a type in a dependent setting.1
Likewise, there are also at least as many ways to notate 'ℕ-elimination' as there are dependent type theories. I will use a Haskellish pattern-matching notation.
We need a mapping, α from Nat to *, with the property
∀n ∈ ℕ.|α ˻n˼| = n.
The following pseudodefinition suffices.
data Zero -- empty type
data Successor a = Z | Suc a -- Successor ≅ Maybe
α : Nat -> *
α 0 = Zero
α (S n) = Successor (α n)
So we see that the action of α mirrors the behaviour of the successor S, making it a kind of homomorphism. Successor is a type function which 'adds one' to the number of members of a type; that is, |Successor a| = 1 + |a| for any a with a defined size.
For example α ˻4˼ (which is α (S (S (S (S 0))))), is
Successor (Successor (Successor (Successor Zero)))
and the terms of this type are
Z
Suc Z
Suc (Suc Z)
Suc (Suc (Suc Z))
giving us exactly four elements: |α ˻4˼| = 4.
Likewise, for any n ∈ ℕ, we have
|α ˻n˼| = n
as required.
Many theories require that the members of * are mere representatives of types, and an operation is provided as an explicit mapping from terms of type * to their associated types. Other theories permit the literal types themselves to be term-level entities.
'Arbitrary' functions?
Now we have the apparatus to express a fully general power series as a type!
The series
Σ[n ∈ ℕ]f(n)Xⁿ
becomes the type
∃n : Nat.α (˻f˼ n) × (Vec X n)
where ˻f˼ : Nat → Nat is some suitable representation within the language of the function f. We can see this as follows.
|∃n : Nat.α (˻f˼ n) × (Vec X n)|
= Σ[n : Nat]|α (˻f˼ n) × (Vec X n)| (property of ∃ types)
= Σ[n ∈ ℕ]|α (˻f˼ ˻n˼) × (Vec X ˻n˼)| (switching Nat for ℕ)
= Σ[n ∈ ℕ]|α ˻f(n)˼ × (Vec X ˻n˼)| (applying ˻f˼ to ˻n˼)
= Σ[n ∈ ℕ]|α ˻f(n)˼||Vec X ˻n˼| (splitting product)
= Σ[n ∈ ℕ]f(n)|X|ⁿ (properties of α and Vec)
Just how 'arbitrary' is this? We are limited not only to integer coefficients by this method, but to natural numbers. Apart from that, f can be anything at all, given a Turing Complete language with dependent types, we can represent any analytic function with natural number coefficients.
I haven't investigated the interaction of this with, for example, the case provided in the question of List X ≅ 1/(1 - X) or what possible sense such negative and non-integer 'types' might have in this context.
Hopefully this answer goes some way to exploring how far we can go with arbitrary type functions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string