Can a pure function have free variables? - haskell

For example, a referentially transparent function with no free variables:
g op x y = x `op` y
A now now a function with the free (from the point-of-view of f) variables op and x:
x = 1
op = (+)
f y = x `op` y
f is also referentially transparent. But is it a pure function?
If it's not a pure function, what is the name for a function that is referentially transparent, but makes use of 1 or more variables bound in an enclosing scope?
Motivation for this question:
It's not clear to me from Wikipedia's article:
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
(emphasis mine)
nor from Google searches whether a pure function can depend on free (in the sense of being bound in an enclosing scope, and not being bound in the scope of the function) variables.
Also, this book says:
If functions without free variables are pure, are closures impure?
The function function (y) { return x } is interesting. It contains a
free variable, x. A free variable is one that is not bound within
the function. Up to now, we’ve only seen one way to “bind” a variable,
namely by passing in an argument with the same name. Since the
function function (y) { return x } doesn’t have an argument named x,
the variable x isn’t bound in this function, which makes it “free.”
Now that we know that variables used in a function are either bound or
free, we can bifurcate functions into those with free variables and
those without:
Functions containing no free variables are called pure functions.
Functions containing one or more free variables are called closures.
So what is the definition of a "pure function"?

To the best of my understanding "purity" is defined at the level of semantics while "referentially transparent" can take meaning both syntactically and embedded in lambda calculus substitution rules. Defining either one also leads to a bit of a challenge in that we need to have a robust notion of equality of programs which can be challenging. Finally, it's important to note that the idea of a free variable is entirely syntactic—once you've gone to a value domain you can no longer have expressions with free variables—they must be bound else that's a syntax error.
But let's dive in and see if this becomes more clear.
Quinian Referential Transparency
We can define referential transparency very broadly as a property of a syntactic context. Per the original definition, this would be built from a sentence like
New York is an American city.
of which we've poked a hole
_ is an American city.
Such a holey-sentence, a "context", is said to be referentially transparent if, given two sentence fragments which both "refer" to the same thing, filling the context with either of those two does not change its meaning.
To be clear, two fragments with the same reference we can pick would be "New York" and "The Big Apple". Injecting those fragments we write
New York is an American city.
The Big Apple is an American city.
suggesting that
_ is an American city.
is referentially transparent. To demonstrate the quintessential counterexample, we might write
"The Big Apple" is an apple-themed epithet referring to New York.
and consider the context
"_" is an apple-themed epithet referring to New York.
and now when we inject the two referentially identical phrases we get one valid and one invalid sentence
"The Big Apple" is an apple-themed epithet referring to New York.
"New York" is an apple-themed epithet referring to New York.
In other words, quotations break referential transparency. We can see how this occurs by causing the sentence to refer to a syntactic construct instead of purely the meaning of that construct. This notion will return later.
Syntax v Semantics
There's something confusing going on in that this definition of referential transparency above applies directly to English sentences of which we build contexts by literally stripping words out. While we can do that in a programming language and consider whether such a context is referentially transparent, we also might recognize that this idea of "substitution" is critical to the very notion of a computer language.
So, let's be clear: there are two kinds of referential transparency we can consider over lambda calculus—the syntactic one and the semantic one. The syntactic one requires we define "contexts" as holes in the literal words written in a programming language. That lets us consider holes like
let x = 3 in _
and fill it in with things like "x". We'll leave the analysis of that replacement for later. At the semantic level we use lambda terms to denote contexts
\x -> x + 3 -- similar to the context "_ + 3"
and are restricted to filling in the hole not with syntax fragments but instead only valid semantic values, the action of that being performed by application
(\x -> x + 3) 5
==>
5 + 3
==>
8
So, when someone refers to referential transparency in Haskell it's important to figure out what kind of referential transparency they're referring to.
Which kind is being referred to in this question? Since it's about the notion of an expression containing a free variable, I'm going to suggest that it's syntactic. There are two major thrusts for my reasoning here. Firstly, in order to convert a syntax to a semantics requires that the syntax be valid. In the case of Haskell this means both syntactic validity and a successfully type check. However, we'll note that a program fragment like
x + 3
is actually a syntax error since x is simply unknown, unbound leaving us unable to consider the semantics of it as a Haskell program. Secondly, the very notion of a variable such as one that can be let-bound (and consider the difference between "variable" as it refers to a "slot" such as an IORef) is entirely a syntactic construct—there's no way to even talk about them from inside the semantics of a Haskell program.
So let's refine the question to be:
Can an expression containing free variables be (syntactically) referentially transparent?
and the answer is, uninterestingly, no. Referential transparency is a property of "contexts", not expressions. So let's explore the notion of free variables in contexts instead.
Free variable contexts
How can a context meaningfully have a free variable? It could be beside the hole
E1 ... x ... _ ... E2
and so long as we cannot insert something into that syntactic hole which "reaches over" and affects x syntactically then we're fine. So, for instance, if we fill that hole with something like
E1 ... x ... let x = 3 in E ... E2
then we haven't "captured" the x and thus can perhaps consider that syntactic hole to be referentially transparent. However, we're being nice to our syntax. Let's consider a more dangerous example
do x <- foo
let x = 3
_
return x
Now we see that the hole we've provided in some sense has dominion over the later phrase "return x". In fact, if we inject a fragment like "let x = 4" then it indeed changes the meaning of the whole. In that sense, the syntax here is no referentially transparent.
Another interesting interaction between referential transparency and free variables is the notion of an assigning context like
let x = 3 in _
where, from an outside perspective, both phrases "x" and "y" are reference the same thing, some named variable, but
let x = 3 in x ==/== let x = 3 in y
Progression from thorniness around equality and context
Now, hopefully the previous section explained a few ways for referential transparency to break under various kinds of syntactic contexts. It's worth asking harder questions about what kinds of contexts are valid and what kinds of expressions are equivalent. For instance, we might desugar our do notation in a previous example and end up noticing that we weren't working with a genuine context, but instead sort of a higher-order context
foo >>= \x -> (let x = 3 in ____(return x)_____)
Is this a valid notion of context? It depends a lot on what kind of meaning we're giving the program. The notion of desugaring the syntax already implies that the syntax must be well-defined enough to allow for such desugaring.
As a general rule, we must be very careful with defining both contexts and notions of equality. Further, the more meaning we demand the fragments of our language to take on the greater the ways they can be equal and the fewer the valid contexts we can build.
Ultimately, this leads us all the way to what I called "semantic referential transparency" earlier where we can only substitute proper values into a proper, closed lambda expression and we take the resulting equality to be "equality as programs".
What this ends up meaning is that as we impute more and more meaning on our language, as we begin to accept fewer and fewer things as valid, we get stronger and stronger guarantees about referential transparency.
Purity
And so this finally leads to the notion of a pure function. My understanding here is (even) less complete, but it's worth noting that purity, as a concept, does not much exist until we've moved to a very rich semantic space—that of Haskell semantics as a category over lifted Complete Partial Orders.
If that doesn't make much sense, then just imagine purity is a concept that only exists when talking about Haskell values as functions and equality of programs. In particular, we examine the collection of Haskell functions
trivial :: a -> ()
trivial x = x `seq` ()
where we have a trivial function for every choice of a. We'll notate the specific choice using an underscore
trivial_Int :: Int -> ()
trivial_Int x = x `seq` ()
Now we can define a (very strictly) pure function to be a function f :: a -> b such that
trivial_b . f = trivial_a
In other words, if we throw out the result of computing our function, the b, then we may as well have never computed it in the first place.
Again, there's no notion of purity without having Haskell values and no notion of Haskell values when your expressions contain free variables (since it's a syntax error).
So what's the answer?
Ultimately, the answer is that you can't talk about purity around free variables and you can break referential transparency in lots of ways whenever you are talking about syntax. At some point as you convert your syntactic representation to its semantic denotation you must forget the notion and names of free variables in order to have them represent the reduction semantics of lambda terms and by this point we've begun to have referential transparency.
Finally, purity is something even more stringent than referential transparency having to do with even the reduction characteristics of your (referentially transparent) lambda terms.
By the definition of purity given above, most of Haskell isn't pure itself as Haskell may represent non-termination. Many feel that this is a better definition of purity, however, as non-termination can be considered a side effect of computation instead of a meaningful resultant value.

The Wikipedia definition is incomplete, insofar a pure function may use constants to compute its answer.
When we look at
increment n = 1+n
this is obvious. Perhaps it was not mentioned because it is that obvious.
Now the trick in Haskell is that not only top level values and functions are constants, but inside a closure also the variables(!) closed over:
add x = (\y -> x+y)
Here x stands for the value we applied add to - we call it variable not because it could change within the right hand side of add, but because it can be different each time we apply add. And yet, from the point of view of the lambda, x is a constant.
It follows that free variables always name constant values at the point where they are used and hence do not impact purity.

Short answer is YES f is pure
In Haskell map is defined with foldr. Would you agree that map is functional? If so did it matter that it had global function foldr that wasn't supplied to map as an argument?
In map foldr is a free variable. It's not doubt about it. It makes no difference that it's a function or something that evaluates to a value. It's the same.
Free variables, like the functions foldl and +, are essential for functional languages to exist. Without it you wouldn't have abstraction and the languages would be worse off than the Fortran.

Related

Why is Haskell fully declarative?

I'm not very good in understanding the difference between the imperative and declarative programming paradigms. I read about Haskell being a declarative language. To an extend I would says yes but there is a something that bothers me in respect to the definition of imperative.
When I have a datastruct and use Haskell's functions to transform that I actually just told WHAT to transform. So I give the datastruct as argument to a function and am happy with the result.
But what if there is no function that actually satisfies my needs?
I would start writing an own function which expects the datastruct as an argument. After that I would start writing how the datastruct should be processed. Since I only can call native Haskell functions I'm still with the declarative paradigm right? But what when I start using an "if statement". Wouldn't that end the declarative nature since I'm about to tell the program HOW to do stuff from that point?
Perhaps this is a matter of perspective. The way I see it, there is nothing imperative about defining things in terms of other things because we can always replace something with its definition (as long as the definition is pure). That is, if we have f x = x + 1, then any place we see f z we can replace with z + 1. So pure functions shouldn't really be considered instructions; they should be considered definitions.
A lot of Haskell code is considered declarative for this reason. We simply define things as (pure) functions of other things.
It is possible to write imperative-style code in Haskell as well. Sometimes we really do want to say something like "do A, then do B, then do C". This adds a new dimension to the simple world of function application: we need a notion of 'happens before'. Haskell has embraced the Monad concept to describe computations that have an order of evaluation. This turns out to be very handy because it can encapsulate effects such as changing state.
There is no if statement in Haskell, only an if expression if a then b else c, which is really equivalent to ternary expressions like a ? b : c in C-like languages or b if a else c in Python. You cannot leave out the else, and it can be used anywhere an arbitrary expression can be used. It is further constrained that a must have type Boolean, and b and c must have the same type. It is syntactic sugar for
case a of
True -> b
False -> c
Don't get too hung up on the label "declarative"; it's just a style of programming that the language supports. Consider, for example, a typical declarative definition of the factorial function:
fact 0 = 1
fact n = n * fact (n - 1)
This is also just syntactic sugar for something you would see as more imperative:
fact n = case n of
0 -> 1
otherwise -> n * fact (n - 1)
Frankly, the term "declarative programming" is more of a marketing term than anything else. The informal definition that it means "specifying what to do, not how to do it" is vague and open to interpretation and definitely far from a black-and-white boundary. In practice, it seems to be applied to anything that broadly falls into the categories of functional programming, logic programming or the use of Domain-Specific Languages (DSLs).
Consequently (and I realise this probably doesn't qualify as an answer to your question, but still :)), I recommend you do not waste your time wondering whether something is still declarative or not. The terms imperative vs. functional vs. logic programming are already a bit more meaningful, so perhaps it's more useful to reflect on those.
Declarative languages are constructed from expressions whereas imperative languages are constructed from statements.
The usual explanation is what to do versus how to do it. You've already found the confusion in this. If you take it this way, declarative is using definitions (by name) and imperative is writing definitions. What is a declarative language then? One which only names definitions? If so then the only Haskell program you could write is precisely main!
There is a declarative way and an imperative way to have branching, and it follows directly from the definition. A declarative branch is an expression and an imperative branch is a statement. Haskell has case … of … (an expression) and C has if (…) {…} else {…} (a statement).
What is the difference between expressions and statements? Expressions have a value whereas statements have effects. What is the difference between values and effects?
For expressions, there is a function μ which maps any expression e to its value μ(e). This is also called the semantic, or meaning, and is ideally a well-defined mathematical object. This way of defining values is called denotational semantics. There are other methods.
For statements, there is a state P immediately before a statement S and a state Q immediately after. The effect of S is the delta from P to Q. This way of defining effects is called Hoare logic. There are other methods.
You can't use an "if statement" in Haskell, because there is none. You can use an "if-expression"
if c then a else b
but this is just syntactic sugar for something like
let f True a b = a; f False a b = b in f c a b
which is again fuly declarative.

Binding a name to a value vs. assigning a value to a variable

Reading through Bartosz Milewski's piece at fpcomplete where he writes, "In Haskell you never assign to a variable, instead you bind a name to a value."
Could someone explain to me what this means, as well as the practical ramifications of this in the functional programming world?
In some sense, the only real difference is that variables in imperative languages can change whereas bound names in functional languages can't. But I think it's important to understand the higher-level semantic distinction between the two and how we think about them differently.
In imperative languages, a variable is a thing on its own which happens to include a value. They're often compared to boxes that contain something. The contents of a box can change and, semantically, the box has an identity of its own.
Haskell names, on the other hand, are just labels for a value. You can use one or the other completely interchangeably. Crucially, they can't change.
Compare binding in Haskell to function names in languages like Java¹. These can't change either, and you don't think of them on their own; they're just names for the method they're attached to.
Here's a contrived example of the difference: imagine a function f that closes over an imperative variable x:
var x = 7;
function foo() {
console.log(x);
}
Effectively, x is just a name for 7... until x changes. What you've closed over is the variable x, not its value 7, so if it changes so will the behavior of foo.
In Haskell, on the other hand, if you bind 7 to the name x and close over it, that's the same as just closing over 7.
let x = 7 in \ () -> x
does the same thing as
\ () -> 7
(which, itself, doesn't do anything more than just 7 by itself, ignoring strictness issues).
¹ ignoring reflection and similar shenanigans

Haskell pattern match "diverge" and ⊥

I'm trying to understand the Haskell 2010 Report section 3.17.2 "Informal Semantics of Pattern Matching". Most of it, relating to a pattern match succeeding or failing seems straightforward, however I'm having difficulty understanding the case which is described as the pattern match "diverging".
I'm semi-persuaded it means that the match algorithm does not "converge" to an answer (hence the match function never returns). But if doesn't return, then, how can it return a value, as suggested by the parenthetical "i.e. return ⊥"? And what does it mean to "return ⊥" anyway? How one handle that outcome?
Item 5 has the particularly confusing (to me) point "If the value is ⊥, the match diverges". Is this just saying that a value of ⊥ produces a match result of ⊥? (Setting aside that I don't know what that outcome means!)
Any illumination, possibly with an example, would be appreciated!
Addendum after a couple of lengthy answers:
Thanks Tikhon and all for your efforts.
It seems my confusion comes from there being two different realms of explanation: The realm of Haskell features and behaviors, and the realm of mathematics/semantics, and in Haskell literature these two are intermingled in an attempt to explain the former in terms of the latter, without sufficient signposts (to me) as to which elements belong to which.
Evidently "bottom" ⊥ is in the semantics domain, and does not exist as a value within Haskell (ie: you can't type it in, you never get a result that prints out as " ⊥").
So, where the explanation says a function "returns ⊥", this refers to a function that does any of a number of inconvenient things, like not terminate, throw an exception, or return "undefined". Is that right?
Further, those who commented that ⊥ actually is a value that can be passed around, are really thinking of bindings to ordinary functions that haven't yet actually been called upon to evaluate ("unexploded bombs" so to speak) and might never, due to laziness, right?
The value is ⊥, usually pronounced "bottom". It is a value in the semantic sense--it is not a normal Haskell value per se. It represents computations that do not produce a normal Haskell value: exceptions and infinite loops, for example.
Semantics is about defining the "meaning" of a program. In Haskell, we usually talk about denotational semantics, where the value is a mathematical object of some sort. The most trivial example would be that the expression 10 (but also the expression 9 + 1) have denotations of the number 10 (rather than the Haskell value 10). We usually write that ⟦9 + 1⟧ = 10 meaning that the denotation of the Haskell expression 9 + 1 is the number 10.
However, what do we do with an expression like let x = x in x? There is no Haskell value for this expression. If you tried to evaluate it, it would simply never finish. Moreover, it is not obvious what mathematical object this corresponds to. However, in order to reason about programs, we need to give some denotation for it. So, essentially, we just make up a value for all these computations, and we call the value ⊥ (bottom).
So ⊥ is just a way to define what a computation that doesn't return "means".
We also define other computations like undefined and error "some message" as ⊥ because they also do not have obvious normal values. So throwing an exception corresponds to ⊥. This is exactly what happens with a failed pattern match.
The usual way of thinking about this is that every Haskell type is "lifted"--it contains ⊥. That is, Bool corresponds to {⊥, True, False} rather than just {True, False}. This represents the fact that Haskell programs are not guaranteed to terminate and can have exceptions. This is also true when you define your own type--the type contains every value you defined for it as well as ⊥.
Interestingly, since Haskell is non-strict, ⊥ can exist in normal code. So you could have a value like Just ⊥, and if you never evaluate it, everything will work fine. A good example of this is const: const 1 ⊥ evaluates to 1. This works for failed pattern matches as well:
const 1 (let Just x = Nothing in x) -- 1
You should read the section on denotational semantics in the Haskell WikiBook. It's a very approachable introduction to the subject, which I personally find very fascinating.
Denotational semantics
So, briefly denotational semantics, which is where ⊥ lives, is a mapping from Haskell values to some other space of values. You do this to give meaning to programs in a more formal manner than just talking about what programs should do—you say that they must respect their denotational semantics.
So for Haskell, you often think about how Haskell expressions denote mathematical values. You often see Strachey brackets ⟦·⟧ to denote the "semantic mapping" from Haskell to Math. Finally, we want our semantic brackets to be compatible with semantic operations. For instance
⟦x + y⟧ = ⟦x⟧ + ⟦y⟧
where on the left side + is the Haskell function (+) :: Num a => a -> a -> a and on the right side it's the binary operation in a commutative group. While is cool, because then we know that we can use the properties from the semantic map to know how our Haskell functions should work. To wit, let's write the commutative property "in Math"
⟦x⟧ + ⟦y⟧ == ⟦y⟧ + ⟦x⟧
= ⟦x + y⟧ == ⟦y + x⟧
= ⟦x + y == y + x⟧
where the third step also indicates that the Haskell (==) :: Eq a => a -> a -> a ought to have the properties of a mathematical equivalence relationship.
Well, except...
Anyway, that's all well and good until we remember that computers are finite things and Maths don't much care about that (unless you're using intuitionistic logic, and then you get Coq). So, we have to take note of places where our semantics don't follow Math quite right. Here are three examples
⟦undefined⟧ = ??
⟦error "undefined"⟧ = ??
⟦let x = x in x⟧ = ??
This is where ⊥ comes into play. We just assert that so far as the denotational semantics of Haskell are concerned each of those examples might as well mean (the newly introduced Mathematical/semantic concept of) ⊥. What are the Mathematical properties of ⊥? Well, this is where we start to really dive into what the semantic domain is and start talking about monotonicity of functions and CPOs and the like. Essentially, though, ⊥ is a mathematical object which plays roughly the same game as non-termination does. To the point of view of the semantic model, ⊥ is toxic and it infects expressions with its toxic-nondeterminism.
But it's not a Haskell-the-language concept, just a Semantic-domain-of-the-language-Haskell thing. In Haskell we have undefined, error and infinite looping. This is important.
Extra-semantic behavior (side note)
So the semantics of ⟦undefined⟧ = ⟦error "undefined"⟧ = ⟦let x = x in x⟧ = ⊥ are clear once we understand the mathematical meanings of ⊥, but it's also clear that those each have different effects "in reality". This is sort of like "undefined behavior" of C... it's behavior that's undefined so far as the semantic domain is concerned. You might call it semantically unobservable.
So how does pattern matching return ⊥?
So what does it mean "semantically" to return ⊥? Well, ⊥ is a perfectly valid semantic value which has the infection property which models non-determinism (or asynchronous error throwing). From the semantic point of view it's a perfectly valid value which can be returned as is.
From the implementation point of view, you have a number of choices, each of which map to the same semantic value. undefined isn't quite right, nor is entering an infinite loop, so if you're going to pick a semantically undefined behavior you might as well pick one that's useful and throw an error
*** Exception: <interactive>:2:5-14: Non-exhaustive patterns in function cheers

What are super combinators and constant applicative forms?

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Why Haskell doesn't have a single element tuple?

I'm wondering why Haskell doesn't have a single element tuple. Is it just because nobody needed it so far, or any rational reasons? I found an interesting thread in a comment at the Real World Haskell's website http://book.realworldhaskell.org/read/types-and-functions.html#funcstypes.composite, and people guessed various reasons like:
No good syntax sugar.
It is useless.
You can think that a normal value like (1) is actually a single element tuple.
But does anyone know the reason except a guess?
There's a lib for that!
http://hackage.haskell.org/packages/archive/OneTuple/0.2.1/doc/html/Data-Tuple-OneTuple.html
Actually, we have a OneTuple we use all the time. It's called Identity, and is now used as the base of standard pure monads in the new mtl:
http://hackage.haskell.org/packages/archive/transformers/0.2.2.0/doc/html/Data-Functor-Identity.html
And it has an important use! By virtue of providing a type constructor of kind * -> *, it can be made an instance (a trival one, granted, though not the most trivial) of Monad, Functor, etc., which lets us use it as a base for transformer stacks.
The exact reason is because it's totally unnecessary. Why would you need a one-tuple if you can just have its value?
The syntax also tends to be a bit clunky. In Python, you can have one-tuples, but you need a trailing comma to distinguish it from a parenthesized expression:
onetuple = (3,)
All in all, there's no reason for it. I'm sure there's no "official" reason because the designers of Haskell probably never even considered a single element tuple because it has no use.
I don't know if you were looking for some reasons beyond the obvious, but in this case the obvious answer is the right one.
My answer is not exactly about Haskell semantics, but about the theoretical mathematical elegance of making a value the same as its one-tuple. (So this answer should not be taken as an explanation of the standard behavior expected of a Haskell implementation, because it isn't intended as such.)
In programming languages and computation models where all functions are curried, such as lambda-calculus and combinatory logic, every function has exactly one input argument and one output/return value. No more, no less.
When we want a particular function f to have more than one input argument – say 3 –, we simulate it under this curried regime by creating a 1-argument function that returns a 2-argument function. Thus, f x y z = ((f x) y) z, and f x would return a 2-argument function.
Likewise, sometimes we might want to return more than one value from a function. It is not literally possible under this semantics, but we can simulate it by returning a tuple. We can generalize this.
If, for uniformity, we constrain the only return value of any function to be an (n-)tuple, we are able to harmonize some interesting features of the unit value and of supposedly non-tuple return values with the features of tuples in general, as follows.
Let's adopt as the general syntax of n-tuples the following schema, where ci is the component with the index i:
Notice that n-tuples have delimiting parentheses in this syntax.
Under this schema, how would we represent a 0-tuple? Since it has no components, this degenerate case would be represented like this: ( ). This syntax precisely coincides with the syntax we adopt to represent the unit value. So, we are tempted to make the unit value the same as a 0-tuple.
What about a 1-tuple? It would have this representation: . Here a syntactical ambiguity would immediately arise: parentheses would be used in the language both as 1-tuple delimiters and as mere grouping of values or expressions. So, in a context where (v) appears, a compiler or interpreter would be unsure whether this is a 1-tuple with a component whose value is v, or just an isolated value v inside superfluous parentheses.
A way to solve this ambiguity is to force a value to be the same as the 1-tuple that would have it as its only component. Not much would be sacrificed, since the only non-empty projection we can perform on a 1-tuple is to obtain its only value.
For this to be consistently enforced, the syntactical consequence is that we would have to relax a bit our former requirement that delimiting parentheses are mandatory for all n-tuples: now they would be optional for 1-tuples, and mandatory for all other values of n. (Or we could require all values to be delimited by parentheses, but this would be inconvenient for practical use.)
In summary, under the interpretation that a 1-tuple is the same as its only component value, we could, by making syntactic puns with parentheses, consider all return values of functions in our programming language or computing model as n-tuples: the 0-tuple in the case of the unit type, 1-tuples in the case of ordinary/"atomic" values which we usually don't think of as tuples, and pairs/triples/quadruples/... for other kinds of tuples. This heterodox interpretation is mathematically parsimonious and uniform, is expressive enough to simulate functions with multiple input arguments and multiple return values, and is not incompatible with Haskell (in the sense that no harm is done if the programmer assumes this unofficial interpretation).
This was an argument by syntactic puns. Whether you are satisfied or not with it, we can do even better. A more principled argument can be taken from the mathematical theory of relations, by exploring the Cartesian product operation.
An (n-adic) relation is extensionally defined as a uniform set of (n-)tuples. (This characterization is fundamental to relational database theory and is therefore important knowledge for professional computer programmers.)
A dyadic relation – a set of pairs (2-tuples) – is a subset of the Cartesian product of 2 sets:
. For a homogeneous relation: .
A triadic relation – a set of triples (3-tuples) – is a subset of the Cartesian product of 3 sets:
. For a homogeneous relation: .
A monadic relation – a set of monuples (1-tuples) – is a subset of the Cartesian product of 1 set:
(by the usual mathematical convention).
As we can see, a monadic relation is just a set of atomic values! This means that a set of 1-tuples is a set of atomic values. Therefore, it is convenient to consider that 1-tuples and atomic values are the same thing.

Resources