Precedence of function application - haskell

In order to illustrate function application has the highest precedence in Haskell the following example was provided (by schoolofhaskell):
sq b = b * b
main = print $
-- show
sq 3+1
-- /show
The result here is 10.
What puzzles me is that the argument constitutes a function application too. Consider the operator + being a short-cut of a function. So when the argument is taken, I would expect that its function application now takes precedence over the original one.
Written that way it delivers the expected result:
sq b = b * b
main = print $
sq ((+) 3 1 )
Any explanation?

What puzzles me is that the argument constitutes a function application too. Consider the operator "+" being a short-cut of a function.
I think this is the heart of the misunderstanding (actually, very good understanding!) involved here. It is true that 3 + 1 is an expression denoting the application of the (+) function to 3 and 1, you have understood that correctly. However, Haskell has two kinds of function application syntax, prefix and infix. So the more precise version of "function application has the highest precedence" would be something like "syntactically prefix function application has higher precedence than any syntactically infix function application".
You can also convert back and forth between the two forms. Each function has a "natural" position: names with only symbols are naturally syntactically infix and names with letters and numbers and the like are naturally syntactically prefix. You can turn a naturally prefix function to infix with backticks, and a naturally infix function to prefix with parentheses. So, if we define
plus = (+)
then all of the following really mean the same thing:
3 + 1
3 `plus` 1
(+) 3 1
plus 3 1
Returning to your example:
sq 3+1
Because sq is naturally prefix, and + is naturally infix, the sq application takes precedence.

So when the argument is taken, would expect that its function application now takes precedence over the original one.
The Haskell grammar [Haskell report] specifies:
exp10
→ …
| …
| fexp
fexp
→ [fexp] aexp (function application)
This means that function application syntax has precedence 10 (the superscript on the exp10 part).
This means that, when your expression is parsed, the empty space in sq 3 takes precedence over the + in 3+1, and thus sq 3+1 is interpreted as (sq 3) + 1 which semantically means that it squares 3 first, and then adds 1 to the result, and will thus produce 10.
If you write it as sq (3 + 1) or in canonical form as sq ((+) 3 1) it will first sum up 3 and 1 and then determine the square which will produce 16.

The addition operator is syntactically different from a function application, and that is what determines its operator precedence.
If you rewrite your addition (3 + 1) as a function application ((+) 3 1), the slice (+) follows special slicing rules inside its own parentheses, but outside the slice it's just another parenthesized expression.
Note that your "expected result" is not really parallel to your original example:
sq 3 + 1 -- original example, parsed `(sq 3) + 1`
sq ((+) 3 1) -- "expected result", has added parentheses to force your parse
sq (3 + 1) -- this is the operator version of "expected result"
In Haskell, the parentheses are not part of function application -- they are used solely for grouping!
That is to say: like (+) is just another parenthesized expression, so is (3 + 1)

I think your confusion is just the result of slightly imprecise language (on the part of both the OP and the School of Haskell page linked).
When we say things like "function application has higher precedence than any operator", the term "function application" there is not actually a phrase meaning "applying a function". It's a name for the specific syntactic form func arg (where you just write two terms next to each other in order to apply the first one to the second). We are trying to draw a distinction between "normal" prefix syntax for applying a function and the infix operator syntax for applying a function. So with this specific usage, sq 3 is "function application" and 3 + 1 is not. But this isn't claiming that sq is a function and + is not!
Whereas in many other contexts "function application" does just mean "applying a function"; there it isn't a single term, but just the ordinary meaning of the words "function" and "application" that happen to be used together. In that sense, both sq 3 and 3 + 1 are examples of "function application".
These two senses of the term arise because there are two different contexts we use when thinking about the code1: logical and syntactic. Consider if we define:
add = (+)
In the "logical" view where we think about the idealised mathematical objects represented by our code, both add and (+) are simply functions (the same function in fact). They are even exactly the same object (we defined one by saying it was equal to the other). This underlying mathematical function exists independently of any name, and has exactly the same properties no matter how we choose to refer to it. In particular, the function can be applied (since that is basically the sole defining feature of a function).
But at the syntactic level, the language simply has different rules about how you can use the names add and + (regardless of what underlying objects those names refer to). One of these names is an operator, and the other is not. We have special syntactic rules for the how you need to write the application of an operator, which differs from how you need to write the application of any non-operator term (including but not limited to non-operator names like sq2). So when talking about syntax we need to be able to draw a distinction between these two cases. But it's important to remember that this is a distinction about names and syntax, and has nothing to do with the underlying functions being referred to (proved by the fact that the exact same function can have many names in different parts of the program, some of them operators and some of them not).
There isn't really an agreed upon common term for "any term that isn't an operator"; if there was we would probably say "non-operator application has higher precedence than any operator", since that would be clearer. But for better or worse the "application by simple adjacency" syntactic feature is frequently referred to as "function application", even though that term could also mean other things in other contexts3.
So (un?)fortunately there isn't really anything deeper going on here than the phrase "function application" meaning different things in different contexts.
1 Okay, there are way more than two contexts we might use to think about our code. There are two relevant to the point I'm making.
2 For an example of other non-operator terms that can be applied we can also have arbitrary expressions in parentheses. For example (compare on fst) (1, ()) is the non-operator application of (compare `on` fst) to (1, ()); the expression being applied is itself the operator-application of on to compare and fst.
3 For yet another usage, $ is often deemed to be the "function application operator"; this is perhaps ironic when considered alongside usages that are trying to use the phrase "function application" specifically to exclude operators!

Related

Does an operators (such as +) behave more like a curried function or a function with an argument of a pair tuple type?

How can I find out the type of operator "+"? says operator + isn't a function.
Which does an operator such as + behave more like,
a curried function, or
a function whose argument has a pair tuple type?
By an operator say #, I mean # not (#). (#) is a curried function, and if # behaves like a curried function, I guess there is no need to have (#), isn't it?
Thanks.
According to the Haskell 2010 report:
An operator symbol [not starting with a colon] is an ordinary identifier.1
Hence, + is an identifier. Identifiers can identify different things, and in the case of +, it identifies a function from the Num typeclass. That function behaves like any other Haskell function, except for different parsing rules of expressions involving said functions.
Specifically, this is described in 3.2.
An operator is a function that can be applied using
infix syntax (Section 3.4), or partially applied using a section (Section 3.5).
In order to determine what constitutes an operator, we can read further:
An operator is either an operator symbol, such as + or $$, or is an ordinary identifier enclosed in grave accents (backquotes), such as `op`2.
So, to sum up:
+ is an identifier. It's an operator symbol, because it doesn't start with a colon and only uses non-alphanumeric characters. When used in an expression (parsed as a part of an expression), we treat it as an operator, which in turn means it's a function.
To answer your question specifically, everything you said involves just syntax. The need for () in expressions is merely to enable prefix application. In all other cases where such disambiguation isn't necessary (such as :type), it might have simply been easier to implement that way, because you can then just expect ordinary identifiers and push the burden to provide one to the user.
1 For what it's worth, I think the report is actually misleading here. This specific statement seems to be in conflict with other parts, crucially this:
Dually, an operator symbol can be converted to an ordinary identifier by enclosing it in parentheses.
My understanding it is that in the first quote, the context for "ordinary" means that it's not a type constructor operator, but a value category operator, hence "value category" means "ordinary". In other quotes, "ordinary" is used for identifiers which are not operator-identifiers; the difference being obviously the application. That would corroborate the fact that enclosing an operator identifier in parens, we turn it into an ordinary identifier for the purposes of prefix application. Phew, at least I didn't write that report ;)
2 I'd like to point out one additional thing here. Neither `(+)` nor (`add`) do actually parse. The second one is understandable, since the report specifically says that enclosing in parens only works for operator identifiers, one can see that `add`, while being an operator, isn't an operator identifier like +.
The first case is actually a bit more tricky for me. Since we can obtain an operator by enclosing an ordinary identifier in backticks, (+) isn't exactly "as ordinary" as add. The language, or at least GHC parser that I tested this with, seems to differentiate between "ordinary ordinary" identifiers, and ordinary identifiers obtained by enclosing operator symbols with parens. Whether this actually contradicts the spec or is another case of mixed naming is beyond my knowledge at this point.
(+) and + refer to the same thing, a function of type Num a => a -> a -> a. We spell it as (+) when we want to use it prefix, and + when we want to write it infix. There is no essential difference; it is just a matter of syntax.
So + (without section) behaves just like a function which requires a pair argument [because a section is needed for partial application]. () works like currying.
While this is not unreasonable from a syntactical point of view, I wouldn't describe it like that. Uncurried functions in Haskell take a tuple as an argument (say, (a, a) -> a as opposed to a -> a -> a), and there isn't an actual tuple anywhere here. 2 + 3 is better thought of as an abbreviation for (+) 2 3, the key point being that the function being used is the same in both cases.

Haskell analog of lisp backquoting and splicing

In some lisps (e.g. elisp, common lisp) there is a feature called backquoting.
It allows to construct a list while evaluating or splicing into it some elements. For example:
`(1 2 (3 (+ 4 5)))
⇒ (1 2 (3 (+ 4 5))) ; just quoted unevaluated list
`(1 2 (3 ,(+ 4 5)))
⇒ (1 2 (3 9)) ; (+ 4 5) has been evaluated
`(1 2 ,#(list 3 (+ 4 5)))
⇒ (1 2 3 9) ; (3 9) has been spliced into the list
I guess, in Haskell some subset of backquoting could look like this:
[backquote| 1, 2, #$(replicate 2 3), 2 + 2 |]
⇒ [1, 2, 3, 3, 4]
I wonder, if splicing into list like this is possible and if it has been implemented.
It seems like the discussion in the comments kind of went off the rails. Anyway, I have a different take on this, so let me offer an answer.
I would say that Haskell already has a feature analogous to backquoting, and you've probably used it extensively in your own Haskell programming without realizing it.
You've drawn a parallel between Lisp lists and Haskell lists, but in Lisp, S-expression (i.e., "pairs with atoms", especially with atomic symbols) are a flexible and ubiquitous data structure, used not only for representing Lisp code, but also as the go-to representation that's at least the first consideration for any complex, structured data. As such, most Lisp programs spend a lot of time generating and manipulating these structures, and so S-expression "literals" are common in Lisp code. And, S-expression "almost literals", where a few sub-expressions need to be calculated, are more conveniently written using the backquoting mechanism than trying to build up the expression with smaller literal and evaluated pieces using functions like cons, list, append, etc.
Contrast that with Haskell -- Haskell lists are certainly popular in Haskell code, and are a go-to structure for representing homogeneous sequences, but they provide only a small fraction of the flexibility of S-expressions. Instead, the corresponding ubiquitous data structure in Haskell is the Algebraic Data Type (ADT).
Well, just like Lisp with its S-expressions, Haskell programs spend a lot of time generating and manipulating ADTs, and Haskell also has a convenient syntax for ADT literals and "almost literals". They are unified into a single "function application" syntax with literal and evaluated parts differentiated by the use of constructors (identifiers with an initial uppercase letter or infix operators starting with a colon) versus non-constructors (identifiers with an initial lowercase letter or infix operators without an initial colon). There is, of course, some additional syntax for certain constructors (lists and tuples).
For example, compare the following backquoted expressions in Lisp and Haskell:
;; Lisp
(setq baz `(node ,id
(node ,(+ id 1) ,left-tree leaf)
(node ,(+ id 2) leaf ,right-tree)))
-- Haskell
baz = Node id (Node (id + 1) left_tree Leaf) (Node (id + 2) Leaf right_tree)
In the Haskell version of this "almost literal", the Node and Leaf constructors represent the quoted parts; the left-tree, right-tree, and + infix expression represent the evaluated parts, and they are syntactically distinguishable by the usual rules for constructors and non-constructors.
Of course, completely separate from this, there's a Template Haskell mechanism that directly manipulates snippets of Haskell code at compile time. While the code is represented as an ADT that could, in principle, be written using the same "almost literal" syntax used for other ADTs, the ADT in question is quite cumbersome and looks nothing like the underlying Haskell code. So, Template Haskell provides a more classic sort of backquoting syntax.

Haskell: Get a prefix operator that works without parentheses

One big reason prefix operators are nice is that they can avoid the need for parentheses so that + - 10 1 2 unambiguously means (10 - 1) + 2. The infix expression becomes ambiguous if parens are dropped, which you can do away with by having certain precedence rules but that's messy, blah, blah, blah.
I'd like to make Haskell use prefix operations but the only way I've seen to do that sort of trades away the gains made by getting rid of parentheses.
Prelude> (-) 10 1
uses two parens.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's trying to feed the minus operation into the plus operations rather than first evaluating the minus and then feeding--so now you need even more parens!
Is there a way to make Haskell intelligently evaluate prefix notation? I think if I made functions like
Prelude> let p x y = x+y
Prelude> let m x y = x-y
I would recover the initial gains on fewer parens but function composition would still be a problem. If there's a clever way to join this with $ notation to make it behave at least close to how I want, I'm not seeing it. If there's a totally different strategy available I'd appreciate hearing it.
I tried reproducing what the accepted answer did here:
Haskell: get rid of parentheses in liftM2
but in both a Prelude console and a Haskell script, the import command didn't work. And besides, this is more advanced Haskell than I'm able to understand, so I was hoping there might be some other simpler solution anyway before I do the heavy lifting to investigate whatever this is doing.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's
trying to feed the minus operation into the plus operations rather
than first evaluating the minus and then feeding--so now you need even
more parens!
Here you raise exactly the key issue that's a blocker for getting what you want in Haskell.
The prefix notation you're talking about is unambiguous for basic arithmetic operations (more generally, for any set of functions of statically known arity). But you have to know that + and - each accept 2 arguments for + - 10 1 2 to be unambiguously resolved as +(-(10, 1), 2) (where I've explicit argument lists to denote every call).
But ignoring the specific meaning of + and -, the first function taking the second function as an argument is a perfectly reasonable interpretation! For Haskell rather than arithmetic we need to support higher order functions like map. You would want not not x has to turn into not(not(x)), but map not x has to turn into map(not, x).
And what if I had f g x? How is that supposed to work? Do I need to know what f and g are bound to so that I know whether it's a case like not not x or a case like map not x, just to know how to parse the call structure? Even assuming I have all the code available to inspect, how am I supposed to figure out what things are bound to if I can't know what the call structure of any expression is?
You'd end up needing to invent disambiguation syntax like map (not) x, wrapping not in parentheses to disable it's ability to act like an arity-1 function (much like Haskell's actual syntax lets you wrap operators in parentheses to disable their ability to act like an infix operator). Or use the fact that all Haskell functions are arity-1, but then you have to write (map not) x and your arithmetic example has to look like (+ ((- 10) 1)) 2. Back to the parentheses!
The truth is that the prefix notation you're proposing isn't unambiguous. Haskell's normal function syntax (without operators) is; the rule is you always interpret a sequence of terms like foo bar baz qux etc as ((((foo) bar) baz) qux) etc (where each of foo, bar, etc can be an identifier or a sub-term in parentheses). You use parentheses not to disambiguate that rule, but to group terms to impose a different call structure than that hard rule would give you.
Infix operators do complicate that rule, and they are ambiguous without knowing something about the operators involved (their precedence and associativity, which unlike arity is associated with the name not the actual value referred to). Those complications were added to help make the code easier to understand; particularly for the arithmetic conventions most programmers are already familiar with (that + is lower precedence than *, etc).
If you don't like the additional burden of having to memorise the precedence and associativity of operators (not an unreasonable position), you are free to use a notation that is unambiguous without needing precedence rules, but it has to be Haskell's prefix notation, not Polish prefix notation. And whatever syntactic convention you're using, in any language, you'll always have to use something like parentheses to indicate grouping where the call structure you need is different from what the standard convention would indicate. So:
(+) ((-) 10 1) 2
Or:
plus (minus 10 1) 2
if you define non-operator function names.

Can a pure function have free variables?

For example, a referentially transparent function with no free variables:
g op x y = x `op` y
A now now a function with the free (from the point-of-view of f) variables op and x:
x = 1
op = (+)
f y = x `op` y
f is also referentially transparent. But is it a pure function?
If it's not a pure function, what is the name for a function that is referentially transparent, but makes use of 1 or more variables bound in an enclosing scope?
Motivation for this question:
It's not clear to me from Wikipedia's article:
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
(emphasis mine)
nor from Google searches whether a pure function can depend on free (in the sense of being bound in an enclosing scope, and not being bound in the scope of the function) variables.
Also, this book says:
If functions without free variables are pure, are closures impure?
The function function (y) { return x } is interesting. It contains a
free variable, x. A free variable is one that is not bound within
the function. Up to now, we’ve only seen one way to “bind” a variable,
namely by passing in an argument with the same name. Since the
function function (y) { return x } doesn’t have an argument named x,
the variable x isn’t bound in this function, which makes it “free.”
Now that we know that variables used in a function are either bound or
free, we can bifurcate functions into those with free variables and
those without:
Functions containing no free variables are called pure functions.
Functions containing one or more free variables are called closures.
So what is the definition of a "pure function"?
To the best of my understanding "purity" is defined at the level of semantics while "referentially transparent" can take meaning both syntactically and embedded in lambda calculus substitution rules. Defining either one also leads to a bit of a challenge in that we need to have a robust notion of equality of programs which can be challenging. Finally, it's important to note that the idea of a free variable is entirely syntactic—once you've gone to a value domain you can no longer have expressions with free variables—they must be bound else that's a syntax error.
But let's dive in and see if this becomes more clear.
Quinian Referential Transparency
We can define referential transparency very broadly as a property of a syntactic context. Per the original definition, this would be built from a sentence like
New York is an American city.
of which we've poked a hole
_ is an American city.
Such a holey-sentence, a "context", is said to be referentially transparent if, given two sentence fragments which both "refer" to the same thing, filling the context with either of those two does not change its meaning.
To be clear, two fragments with the same reference we can pick would be "New York" and "The Big Apple". Injecting those fragments we write
New York is an American city.
The Big Apple is an American city.
suggesting that
_ is an American city.
is referentially transparent. To demonstrate the quintessential counterexample, we might write
"The Big Apple" is an apple-themed epithet referring to New York.
and consider the context
"_" is an apple-themed epithet referring to New York.
and now when we inject the two referentially identical phrases we get one valid and one invalid sentence
"The Big Apple" is an apple-themed epithet referring to New York.
"New York" is an apple-themed epithet referring to New York.
In other words, quotations break referential transparency. We can see how this occurs by causing the sentence to refer to a syntactic construct instead of purely the meaning of that construct. This notion will return later.
Syntax v Semantics
There's something confusing going on in that this definition of referential transparency above applies directly to English sentences of which we build contexts by literally stripping words out. While we can do that in a programming language and consider whether such a context is referentially transparent, we also might recognize that this idea of "substitution" is critical to the very notion of a computer language.
So, let's be clear: there are two kinds of referential transparency we can consider over lambda calculus—the syntactic one and the semantic one. The syntactic one requires we define "contexts" as holes in the literal words written in a programming language. That lets us consider holes like
let x = 3 in _
and fill it in with things like "x". We'll leave the analysis of that replacement for later. At the semantic level we use lambda terms to denote contexts
\x -> x + 3 -- similar to the context "_ + 3"
and are restricted to filling in the hole not with syntax fragments but instead only valid semantic values, the action of that being performed by application
(\x -> x + 3) 5
==>
5 + 3
==>
8
So, when someone refers to referential transparency in Haskell it's important to figure out what kind of referential transparency they're referring to.
Which kind is being referred to in this question? Since it's about the notion of an expression containing a free variable, I'm going to suggest that it's syntactic. There are two major thrusts for my reasoning here. Firstly, in order to convert a syntax to a semantics requires that the syntax be valid. In the case of Haskell this means both syntactic validity and a successfully type check. However, we'll note that a program fragment like
x + 3
is actually a syntax error since x is simply unknown, unbound leaving us unable to consider the semantics of it as a Haskell program. Secondly, the very notion of a variable such as one that can be let-bound (and consider the difference between "variable" as it refers to a "slot" such as an IORef) is entirely a syntactic construct—there's no way to even talk about them from inside the semantics of a Haskell program.
So let's refine the question to be:
Can an expression containing free variables be (syntactically) referentially transparent?
and the answer is, uninterestingly, no. Referential transparency is a property of "contexts", not expressions. So let's explore the notion of free variables in contexts instead.
Free variable contexts
How can a context meaningfully have a free variable? It could be beside the hole
E1 ... x ... _ ... E2
and so long as we cannot insert something into that syntactic hole which "reaches over" and affects x syntactically then we're fine. So, for instance, if we fill that hole with something like
E1 ... x ... let x = 3 in E ... E2
then we haven't "captured" the x and thus can perhaps consider that syntactic hole to be referentially transparent. However, we're being nice to our syntax. Let's consider a more dangerous example
do x <- foo
let x = 3
_
return x
Now we see that the hole we've provided in some sense has dominion over the later phrase "return x". In fact, if we inject a fragment like "let x = 4" then it indeed changes the meaning of the whole. In that sense, the syntax here is no referentially transparent.
Another interesting interaction between referential transparency and free variables is the notion of an assigning context like
let x = 3 in _
where, from an outside perspective, both phrases "x" and "y" are reference the same thing, some named variable, but
let x = 3 in x ==/== let x = 3 in y
Progression from thorniness around equality and context
Now, hopefully the previous section explained a few ways for referential transparency to break under various kinds of syntactic contexts. It's worth asking harder questions about what kinds of contexts are valid and what kinds of expressions are equivalent. For instance, we might desugar our do notation in a previous example and end up noticing that we weren't working with a genuine context, but instead sort of a higher-order context
foo >>= \x -> (let x = 3 in ____(return x)_____)
Is this a valid notion of context? It depends a lot on what kind of meaning we're giving the program. The notion of desugaring the syntax already implies that the syntax must be well-defined enough to allow for such desugaring.
As a general rule, we must be very careful with defining both contexts and notions of equality. Further, the more meaning we demand the fragments of our language to take on the greater the ways they can be equal and the fewer the valid contexts we can build.
Ultimately, this leads us all the way to what I called "semantic referential transparency" earlier where we can only substitute proper values into a proper, closed lambda expression and we take the resulting equality to be "equality as programs".
What this ends up meaning is that as we impute more and more meaning on our language, as we begin to accept fewer and fewer things as valid, we get stronger and stronger guarantees about referential transparency.
Purity
And so this finally leads to the notion of a pure function. My understanding here is (even) less complete, but it's worth noting that purity, as a concept, does not much exist until we've moved to a very rich semantic space—that of Haskell semantics as a category over lifted Complete Partial Orders.
If that doesn't make much sense, then just imagine purity is a concept that only exists when talking about Haskell values as functions and equality of programs. In particular, we examine the collection of Haskell functions
trivial :: a -> ()
trivial x = x `seq` ()
where we have a trivial function for every choice of a. We'll notate the specific choice using an underscore
trivial_Int :: Int -> ()
trivial_Int x = x `seq` ()
Now we can define a (very strictly) pure function to be a function f :: a -> b such that
trivial_b . f = trivial_a
In other words, if we throw out the result of computing our function, the b, then we may as well have never computed it in the first place.
Again, there's no notion of purity without having Haskell values and no notion of Haskell values when your expressions contain free variables (since it's a syntax error).
So what's the answer?
Ultimately, the answer is that you can't talk about purity around free variables and you can break referential transparency in lots of ways whenever you are talking about syntax. At some point as you convert your syntactic representation to its semantic denotation you must forget the notion and names of free variables in order to have them represent the reduction semantics of lambda terms and by this point we've begun to have referential transparency.
Finally, purity is something even more stringent than referential transparency having to do with even the reduction characteristics of your (referentially transparent) lambda terms.
By the definition of purity given above, most of Haskell isn't pure itself as Haskell may represent non-termination. Many feel that this is a better definition of purity, however, as non-termination can be considered a side effect of computation instead of a meaningful resultant value.
The Wikipedia definition is incomplete, insofar a pure function may use constants to compute its answer.
When we look at
increment n = 1+n
this is obvious. Perhaps it was not mentioned because it is that obvious.
Now the trick in Haskell is that not only top level values and functions are constants, but inside a closure also the variables(!) closed over:
add x = (\y -> x+y)
Here x stands for the value we applied add to - we call it variable not because it could change within the right hand side of add, but because it can be different each time we apply add. And yet, from the point of view of the lambda, x is a constant.
It follows that free variables always name constant values at the point where they are used and hence do not impact purity.
Short answer is YES f is pure
In Haskell map is defined with foldr. Would you agree that map is functional? If so did it matter that it had global function foldr that wasn't supplied to map as an argument?
In map foldr is a free variable. It's not doubt about it. It makes no difference that it's a function or something that evaluates to a value. It's the same.
Free variables, like the functions foldl and +, are essential for functional languages to exist. Without it you wouldn't have abstraction and the languages would be worse off than the Fortran.

Haskell pattern match "diverge" and ⊥

I'm trying to understand the Haskell 2010 Report section 3.17.2 "Informal Semantics of Pattern Matching". Most of it, relating to a pattern match succeeding or failing seems straightforward, however I'm having difficulty understanding the case which is described as the pattern match "diverging".
I'm semi-persuaded it means that the match algorithm does not "converge" to an answer (hence the match function never returns). But if doesn't return, then, how can it return a value, as suggested by the parenthetical "i.e. return ⊥"? And what does it mean to "return ⊥" anyway? How one handle that outcome?
Item 5 has the particularly confusing (to me) point "If the value is ⊥, the match diverges". Is this just saying that a value of ⊥ produces a match result of ⊥? (Setting aside that I don't know what that outcome means!)
Any illumination, possibly with an example, would be appreciated!
Addendum after a couple of lengthy answers:
Thanks Tikhon and all for your efforts.
It seems my confusion comes from there being two different realms of explanation: The realm of Haskell features and behaviors, and the realm of mathematics/semantics, and in Haskell literature these two are intermingled in an attempt to explain the former in terms of the latter, without sufficient signposts (to me) as to which elements belong to which.
Evidently "bottom" ⊥ is in the semantics domain, and does not exist as a value within Haskell (ie: you can't type it in, you never get a result that prints out as " ⊥").
So, where the explanation says a function "returns ⊥", this refers to a function that does any of a number of inconvenient things, like not terminate, throw an exception, or return "undefined". Is that right?
Further, those who commented that ⊥ actually is a value that can be passed around, are really thinking of bindings to ordinary functions that haven't yet actually been called upon to evaluate ("unexploded bombs" so to speak) and might never, due to laziness, right?
The value is ⊥, usually pronounced "bottom". It is a value in the semantic sense--it is not a normal Haskell value per se. It represents computations that do not produce a normal Haskell value: exceptions and infinite loops, for example.
Semantics is about defining the "meaning" of a program. In Haskell, we usually talk about denotational semantics, where the value is a mathematical object of some sort. The most trivial example would be that the expression 10 (but also the expression 9 + 1) have denotations of the number 10 (rather than the Haskell value 10). We usually write that ⟦9 + 1⟧ = 10 meaning that the denotation of the Haskell expression 9 + 1 is the number 10.
However, what do we do with an expression like let x = x in x? There is no Haskell value for this expression. If you tried to evaluate it, it would simply never finish. Moreover, it is not obvious what mathematical object this corresponds to. However, in order to reason about programs, we need to give some denotation for it. So, essentially, we just make up a value for all these computations, and we call the value ⊥ (bottom).
So ⊥ is just a way to define what a computation that doesn't return "means".
We also define other computations like undefined and error "some message" as ⊥ because they also do not have obvious normal values. So throwing an exception corresponds to ⊥. This is exactly what happens with a failed pattern match.
The usual way of thinking about this is that every Haskell type is "lifted"--it contains ⊥. That is, Bool corresponds to {⊥, True, False} rather than just {True, False}. This represents the fact that Haskell programs are not guaranteed to terminate and can have exceptions. This is also true when you define your own type--the type contains every value you defined for it as well as ⊥.
Interestingly, since Haskell is non-strict, ⊥ can exist in normal code. So you could have a value like Just ⊥, and if you never evaluate it, everything will work fine. A good example of this is const: const 1 ⊥ evaluates to 1. This works for failed pattern matches as well:
const 1 (let Just x = Nothing in x) -- 1
You should read the section on denotational semantics in the Haskell WikiBook. It's a very approachable introduction to the subject, which I personally find very fascinating.
Denotational semantics
So, briefly denotational semantics, which is where ⊥ lives, is a mapping from Haskell values to some other space of values. You do this to give meaning to programs in a more formal manner than just talking about what programs should do—you say that they must respect their denotational semantics.
So for Haskell, you often think about how Haskell expressions denote mathematical values. You often see Strachey brackets ⟦·⟧ to denote the "semantic mapping" from Haskell to Math. Finally, we want our semantic brackets to be compatible with semantic operations. For instance
⟦x + y⟧ = ⟦x⟧ + ⟦y⟧
where on the left side + is the Haskell function (+) :: Num a => a -> a -> a and on the right side it's the binary operation in a commutative group. While is cool, because then we know that we can use the properties from the semantic map to know how our Haskell functions should work. To wit, let's write the commutative property "in Math"
⟦x⟧ + ⟦y⟧ == ⟦y⟧ + ⟦x⟧
= ⟦x + y⟧ == ⟦y + x⟧
= ⟦x + y == y + x⟧
where the third step also indicates that the Haskell (==) :: Eq a => a -> a -> a ought to have the properties of a mathematical equivalence relationship.
Well, except...
Anyway, that's all well and good until we remember that computers are finite things and Maths don't much care about that (unless you're using intuitionistic logic, and then you get Coq). So, we have to take note of places where our semantics don't follow Math quite right. Here are three examples
⟦undefined⟧ = ??
⟦error "undefined"⟧ = ??
⟦let x = x in x⟧ = ??
This is where ⊥ comes into play. We just assert that so far as the denotational semantics of Haskell are concerned each of those examples might as well mean (the newly introduced Mathematical/semantic concept of) ⊥. What are the Mathematical properties of ⊥? Well, this is where we start to really dive into what the semantic domain is and start talking about monotonicity of functions and CPOs and the like. Essentially, though, ⊥ is a mathematical object which plays roughly the same game as non-termination does. To the point of view of the semantic model, ⊥ is toxic and it infects expressions with its toxic-nondeterminism.
But it's not a Haskell-the-language concept, just a Semantic-domain-of-the-language-Haskell thing. In Haskell we have undefined, error and infinite looping. This is important.
Extra-semantic behavior (side note)
So the semantics of ⟦undefined⟧ = ⟦error "undefined"⟧ = ⟦let x = x in x⟧ = ⊥ are clear once we understand the mathematical meanings of ⊥, but it's also clear that those each have different effects "in reality". This is sort of like "undefined behavior" of C... it's behavior that's undefined so far as the semantic domain is concerned. You might call it semantically unobservable.
So how does pattern matching return ⊥?
So what does it mean "semantically" to return ⊥? Well, ⊥ is a perfectly valid semantic value which has the infection property which models non-determinism (or asynchronous error throwing). From the semantic point of view it's a perfectly valid value which can be returned as is.
From the implementation point of view, you have a number of choices, each of which map to the same semantic value. undefined isn't quite right, nor is entering an infinite loop, so if you're going to pick a semantically undefined behavior you might as well pick one that's useful and throw an error
*** Exception: <interactive>:2:5-14: Non-exhaustive patterns in function cheers

Resources