What are super combinators and constant applicative forms?

What are super combinators and constant applicative forms? - haskell

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.

These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]

Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Related

Is there a way to "preserve" results in Haskell?

I'm new to Haskell and understand that it is (basically) a pure functional language, which has the advantage that results to functions will not change across multiple evaluations. Given this, I'm puzzled by why I can't easily mark a function in such a way that its remembers the results of its first evaluation, and does not have to be evaluated again each time its value is required.
In Mathematica, for example, there is a simple idiom for accomplishing this:
f[x_]:=f[x]= ...
but in Haskell, the closest things I've found is something like
f' = (map f [0 ..] !!)
where f 0 = ...
f n = f' ...
which in addition to being far less clear (and apparently limited to Int arguments?) does not (seem to) preserve results within an interactive session.
Admittedly (and clearly), I don't understand exactly what's going on here; but naively, it seems like Haskel should have some way, at the function definition level, of
taking advantage of the fact that its functions are functions and skipping re-computation of their results once they have been computed, and
indicating a desire to do this at the function definition level with a simple and clean idiom.
Is there a way to accomplish this in Haskell that I'm missing? I understand (sort of) that Haskell can't store the evaluations as "state", but why can't it simply (in effect) redefine evaluated functions to be their computed value?
This grows out of this question, in which lack of this feature results in terrible performance.

Use a suitable library, such as MemoTrie.
import Data.MemoTrie
f' = memo f
where f 0 = ...
f n = f' ...
That's hardly less nice than the Mathematica version, is it?
Regarding
“why can't it simply (in effect) redefine evaluated functions to be their computed value?”
Well, it's not so easy in general. These values have to be stored somewhere. Even for an Int-valued function, you can't just allocate an array with all possible values – it wouldn't fit in memory. The list solution only works because Haskell is lazy and therefore allows infinite lists, but that's not particularly satisfying since lookup is O(n).
For other types it's simply hopeless – you'd need to somehow diagonalise an over-countably infinite domain.
You need some cleverer organisation. I don't know how Mathematica does this, but it probably uses a lot of “proprietary magic”. I wouldn't be so sure that it does really work the way you'd like, for any inputs.
Haskell fortunately has type classes, and these allow you to express exactly what a type needs in order to be quickly memoisable. HasTrie is such a class.

How do experienced Haskell developers approach laziness at design time?

I'm an intermediate Haskell programmer with tons of experience in strict FP and non-FP languages. Most of my Haskell code analyzes moderately large datasets (10^6..10^9 things), so laziness is always lurking. I have a reasonably good understanding of thunks, WHNF, pattern matching, and sharing, and I've been able to fix leaks with bang patterns and seq, but this profile-and-pray approach feels sordid and wrong.
I want to know how experienced Haskell programmers approach laziness at design time. I'm not asking about easy items like Data.ByteString.Lazy or foldl'; rather, I want to know how you think about the lower-level lazy machinery that causes runtime memory problems and tricky debugging.
How do you think about thunks, pattern matching, and sharing during design time?
What design patterns and idioms do you use to avoid leaks?
How did you learn these patterns and idioms, and do you have some good refs?
How do you avoid premature optimization of non-leaking non-problems?
(Amended 2014-05-15 for time budgeting):
Do you budget substantial project time for finding and fixing memory problems?
Or, do your design skills typically circumvent memory problems, and you get the expected memory consumption very early in the development cycle?

I think most of the trouble with "strictness leaks" happens because people don't have a good conceptual model. Haskellers without a good conceptual model tend to have and propagate the superstition that stricter is better. Perhaps this intuition comes from their results from toying with small examples & tight loops. But it is incorrect. It's just as important to be lazy at the right times as to be strict at the right times.
There are two camps of data types, usually referred to as "data" and "codata". It is essential to respect the patterns of each one.
Operations which produce "data" (Int, ByteString, ...) must be forced close to where they occur. If I add a number to an accumulator, I am careful to make sure that it will be forced before I add another one. A good understanding of laziness is very important here, especially its conditional nature (i.e. strictness propositions don't take the form "X gets evaluated" but rather "when Y is evaluated, so is X").
Operations which produce and consume "codata" (lists most of the time, trees, most other recursive types) must do so incrementally. Usually codata -> codata transformation should produce some information for each bit of information they consume (modulo skipping like filter). Another important piece for codata is that you use it linearly whenever possible -- i.e. use the tail of a list exactly once; use each branch of a tree exactly once. This ensures that the GC can collect pieces as they are consumed.
Things take a special amount of care when you have codata that contains data. E.g. iterate (+1) 0 !! 1000 will end up producing a size-1000 thunk before evaluating it. You need to think about conditional strictness again -- the way to prevent this case is to ensure that when a cons of the list is consumed, the addition of its element occurs. iterate violates this, so we need a better version.
iterate' :: (a -> a) -> a -> [a]
iterate' f x = x : (x `seq` iterate' f (f x))
As you start composing things, of course it gets harder to tell when bad cases happen. In general it is hard to make efficient data structures / functions that work equally well on data and codata, and it's important to keep in mind which is which (even in a polymorphic setting where it's not guaranteed, you should have one in mind and try to respect it).
Sharing is tricky, and I think I approach it mostly on a case-by-case basis. Because it's tricky, I try to keep it localized, choosing not to expose large data structures to module users in general. This can usually be done by exposing combinators for generating the thing in question, and then producing and consuming it all in one go (the codensity transformation on monads is an example of this).
My design goal is to get every function to be respectful of the data / codata patterns of my types. I can usually hit it (though sometimes it requires some heavy thought -- it has become natural over the years), and I seldom have leak problems when I do. But I don't claim that it's easy -- it requires experience with the canonical libraries and patterns of the language. These decisions are not made in isolation, and everything has to be right at once for it to work well. One poorly tuned instrument can ruin the whole concert (which is why "optimization by random perturbation" almost never works for these kinds of issues).
Apfelmus's Space Invariants article is helpful for developing your space/thunk intuition further. Also see Edward Kmett's comment below.

Can a pure function have free variables?

For example, a referentially transparent function with no free variables:
g op x y = x `op` y
A now now a function with the free (from the point-of-view of f) variables op and x:
x = 1
op = (+)
f y = x `op` y
f is also referentially transparent. But is it a pure function?
If it's not a pure function, what is the name for a function that is referentially transparent, but makes use of 1 or more variables bound in an enclosing scope?
Motivation for this question:
It's not clear to me from Wikipedia's article:
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
(emphasis mine)
nor from Google searches whether a pure function can depend on free (in the sense of being bound in an enclosing scope, and not being bound in the scope of the function) variables.
Also, this book says:
If functions without free variables are pure, are closures impure?
The function function (y) { return x } is interesting. It contains a
free variable, x. A free variable is one that is not bound within
the function. Up to now, we’ve only seen one way to “bind” a variable,
namely by passing in an argument with the same name. Since the
function function (y) { return x } doesn’t have an argument named x,
the variable x isn’t bound in this function, which makes it “free.”
Now that we know that variables used in a function are either bound or
free, we can bifurcate functions into those with free variables and
those without:
Functions containing no free variables are called pure functions.
Functions containing one or more free variables are called closures.
So what is the definition of a "pure function"?

To the best of my understanding "purity" is defined at the level of semantics while "referentially transparent" can take meaning both syntactically and embedded in lambda calculus substitution rules. Defining either one also leads to a bit of a challenge in that we need to have a robust notion of equality of programs which can be challenging. Finally, it's important to note that the idea of a free variable is entirely syntactic—once you've gone to a value domain you can no longer have expressions with free variables—they must be bound else that's a syntax error.
But let's dive in and see if this becomes more clear.
Quinian Referential Transparency
We can define referential transparency very broadly as a property of a syntactic context. Per the original definition, this would be built from a sentence like
New York is an American city.
of which we've poked a hole
_ is an American city.
Such a holey-sentence, a "context", is said to be referentially transparent if, given two sentence fragments which both "refer" to the same thing, filling the context with either of those two does not change its meaning.
To be clear, two fragments with the same reference we can pick would be "New York" and "The Big Apple". Injecting those fragments we write
New York is an American city.
The Big Apple is an American city.
suggesting that
_ is an American city.
is referentially transparent. To demonstrate the quintessential counterexample, we might write
"The Big Apple" is an apple-themed epithet referring to New York.
and consider the context
"_" is an apple-themed epithet referring to New York.
and now when we inject the two referentially identical phrases we get one valid and one invalid sentence
"The Big Apple" is an apple-themed epithet referring to New York.
"New York" is an apple-themed epithet referring to New York.
In other words, quotations break referential transparency. We can see how this occurs by causing the sentence to refer to a syntactic construct instead of purely the meaning of that construct. This notion will return later.
Syntax v Semantics
There's something confusing going on in that this definition of referential transparency above applies directly to English sentences of which we build contexts by literally stripping words out. While we can do that in a programming language and consider whether such a context is referentially transparent, we also might recognize that this idea of "substitution" is critical to the very notion of a computer language.
So, let's be clear: there are two kinds of referential transparency we can consider over lambda calculus—the syntactic one and the semantic one. The syntactic one requires we define "contexts" as holes in the literal words written in a programming language. That lets us consider holes like
let x = 3 in _
and fill it in with things like "x". We'll leave the analysis of that replacement for later. At the semantic level we use lambda terms to denote contexts
\x -> x + 3 -- similar to the context "_ + 3"
and are restricted to filling in the hole not with syntax fragments but instead only valid semantic values, the action of that being performed by application
(\x -> x + 3) 5
==>
5 + 3
==>
8
So, when someone refers to referential transparency in Haskell it's important to figure out what kind of referential transparency they're referring to.
Which kind is being referred to in this question? Since it's about the notion of an expression containing a free variable, I'm going to suggest that it's syntactic. There are two major thrusts for my reasoning here. Firstly, in order to convert a syntax to a semantics requires that the syntax be valid. In the case of Haskell this means both syntactic validity and a successfully type check. However, we'll note that a program fragment like
x + 3
is actually a syntax error since x is simply unknown, unbound leaving us unable to consider the semantics of it as a Haskell program. Secondly, the very notion of a variable such as one that can be let-bound (and consider the difference between "variable" as it refers to a "slot" such as an IORef) is entirely a syntactic construct—there's no way to even talk about them from inside the semantics of a Haskell program.
So let's refine the question to be:
Can an expression containing free variables be (syntactically) referentially transparent?
and the answer is, uninterestingly, no. Referential transparency is a property of "contexts", not expressions. So let's explore the notion of free variables in contexts instead.
Free variable contexts
How can a context meaningfully have a free variable? It could be beside the hole
E1 ... x ... _ ... E2
and so long as we cannot insert something into that syntactic hole which "reaches over" and affects x syntactically then we're fine. So, for instance, if we fill that hole with something like
E1 ... x ... let x = 3 in E ... E2
then we haven't "captured" the x and thus can perhaps consider that syntactic hole to be referentially transparent. However, we're being nice to our syntax. Let's consider a more dangerous example
do x <- foo
let x = 3
_
return x
Now we see that the hole we've provided in some sense has dominion over the later phrase "return x". In fact, if we inject a fragment like "let x = 4" then it indeed changes the meaning of the whole. In that sense, the syntax here is no referentially transparent.
Another interesting interaction between referential transparency and free variables is the notion of an assigning context like
let x = 3 in _
where, from an outside perspective, both phrases "x" and "y" are reference the same thing, some named variable, but
let x = 3 in x ==/== let x = 3 in y
Progression from thorniness around equality and context
Now, hopefully the previous section explained a few ways for referential transparency to break under various kinds of syntactic contexts. It's worth asking harder questions about what kinds of contexts are valid and what kinds of expressions are equivalent. For instance, we might desugar our do notation in a previous example and end up noticing that we weren't working with a genuine context, but instead sort of a higher-order context
foo >>= \x -> (let x = 3 in ____(return x)_____)
Is this a valid notion of context? It depends a lot on what kind of meaning we're giving the program. The notion of desugaring the syntax already implies that the syntax must be well-defined enough to allow for such desugaring.
As a general rule, we must be very careful with defining both contexts and notions of equality. Further, the more meaning we demand the fragments of our language to take on the greater the ways they can be equal and the fewer the valid contexts we can build.
Ultimately, this leads us all the way to what I called "semantic referential transparency" earlier where we can only substitute proper values into a proper, closed lambda expression and we take the resulting equality to be "equality as programs".
What this ends up meaning is that as we impute more and more meaning on our language, as we begin to accept fewer and fewer things as valid, we get stronger and stronger guarantees about referential transparency.
Purity
And so this finally leads to the notion of a pure function. My understanding here is (even) less complete, but it's worth noting that purity, as a concept, does not much exist until we've moved to a very rich semantic space—that of Haskell semantics as a category over lifted Complete Partial Orders.
If that doesn't make much sense, then just imagine purity is a concept that only exists when talking about Haskell values as functions and equality of programs. In particular, we examine the collection of Haskell functions
trivial :: a -> ()
trivial x = x `seq` ()
where we have a trivial function for every choice of a. We'll notate the specific choice using an underscore
trivial_Int :: Int -> ()
trivial_Int x = x `seq` ()
Now we can define a (very strictly) pure function to be a function f :: a -> b such that
trivial_b . f = trivial_a
In other words, if we throw out the result of computing our function, the b, then we may as well have never computed it in the first place.
Again, there's no notion of purity without having Haskell values and no notion of Haskell values when your expressions contain free variables (since it's a syntax error).
So what's the answer?
Ultimately, the answer is that you can't talk about purity around free variables and you can break referential transparency in lots of ways whenever you are talking about syntax. At some point as you convert your syntactic representation to its semantic denotation you must forget the notion and names of free variables in order to have them represent the reduction semantics of lambda terms and by this point we've begun to have referential transparency.
Finally, purity is something even more stringent than referential transparency having to do with even the reduction characteristics of your (referentially transparent) lambda terms.
By the definition of purity given above, most of Haskell isn't pure itself as Haskell may represent non-termination. Many feel that this is a better definition of purity, however, as non-termination can be considered a side effect of computation instead of a meaningful resultant value.

The Wikipedia definition is incomplete, insofar a pure function may use constants to compute its answer.
When we look at
increment n = 1+n
this is obvious. Perhaps it was not mentioned because it is that obvious.
Now the trick in Haskell is that not only top level values and functions are constants, but inside a closure also the variables(!) closed over:
add x = (\y -> x+y)
Here x stands for the value we applied add to - we call it variable not because it could change within the right hand side of add, but because it can be different each time we apply add. And yet, from the point of view of the lambda, x is a constant.
It follows that free variables always name constant values at the point where they are used and hence do not impact purity.

Short answer is YES f is pure
In Haskell map is defined with foldr. Would you agree that map is functional? If so did it matter that it had global function foldr that wasn't supplied to map as an argument?
In map foldr is a free variable. It's not doubt about it. It makes no difference that it's a function or something that evaluates to a value. It's the same.
Free variables, like the functions foldl and +, are essential for functional languages to exist. Without it you wouldn't have abstraction and the languages would be worse off than the Fortran.

Can compilers deduce/prove mathematically?

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?

Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.

The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.

The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.

What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)

As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.

Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

Does functional programming mandate new naming conventions?

I recently started studying functional programming using Haskell and came upon this article on the official Haskell wiki: How to read Haskell.
The article claims that short variable names such as x, xs, and f are fitting for Haskell code, because of conciseness and abstraction. In essence, it claims that functional programming is such a distinct paradigm that the naming conventions from other paradigms don't apply.
What are your thoughts on this?

In a functional programming paradigm, people usually construct abstractions not only top-down, but also bottom-up. That means you basically enhance the host language. In this kind of situations I see terse naming as appropriate. The Haskell language is already terse and expressive, so you should be kind of used to it.
However, when trying to model a certain domain, I don't believe succinct names are good, even when the function bodies are small. Domain knowledge should reflect in naming.
Just my opinion.
In response to your comment
I'll take two code snippets from Real World Haskell, both from chapter 3.
In the section named "A more controlled approach", the authors present a function that returns the second element of a list. Their final version is this:
tidySecond :: [a] -> Maybe a
tidySecond (_:x:_) = Just x
tidySecond _ = Nothing
The function is generic enough, due to the type parameter a and the fact we're acting on a built in type, so that we don't really care what the second element actually is. I believe x is enough in this case. Just like in a little mathematical equation.
On the other hand, in the section named "Introducing local variables", they're writing an example function that tries to model a small piece of the banking domain:
lend amount balance = let reserve = 100
newBalance = balance - amount
in if balance < reserve
then Nothing
else Just newBalance
Using short variable name here is certainly not recommended. We actually do care what those amounts represent.

I think if the semantics of the arguments are clear within the context of the code then you can get away with short variable names. I often use these in C# lambdas for the same reason. However if it is ambiguous, you should be more explicit with naming.
map :: (a->b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
To someone who hasn't had any exposure to Haskell, that might seem like ugly, unmaintainable code. But most Haskell programmers will understand this right away. So it gets the job done.
var list = new int[] { 1, 2, 3, 4, 5 };
int countEven = list.Count(n => n % 2 == 0)
In that case, short variable name seems appropriate.
list.Aggregate(0, (total, value) => total += value);
But in this case it seems more appropriate to name the variables, because it isn't immediately apparent what the Aggregate is doing.
Basically, I believe not to worry too much about convention unless it's absolutely necessary to keep people from screwing up. If you have any choice in the matter, use what makes sense in the context (language, team, block of code) you are working, and will be understandable by someone else reading it hours, weeks or years later. Anything else is just time-wasting OCD.

I think scoping is the #1 reason for this. In imperative languages, dynamic variables, especially global ones need to be named properly, as they're used in several functions. With lexical scoping, it's clear what the symbol is bound to at compile time.
Immutability also contributes to this to some extent- in traditional languages like C/ C++/ Java, a variable can represent different data at different points in time. Therefore, it needs to be given a name to give the programmer an idea of its functionality.
Personally, I feel that features features like first-class functions make symbol names pretty redundant. In traditional languages, it's easier to relate to a symbol; based on its usage, we can tell if it's data or a function.

I'm studying Haskell now, but I don't feel that its naming conventions is so very different. Of course, in Java you're hardly to find a names like xs. But it is easy to find names like x in some mathematical functions, i, j for counters etc. I consider such names to be perfectly appropriate in right context. xs in Haskell is appropriate only generic functions over lists. There's a lot of them in Haskell, so this name is wide-spread. Java doesn't provide easy way to handle such a generic abstractions, that's why names for lists (and lists themselves) are usually much more specific, e.g. lists or users.

I just attended a number of talks on Haskell with lots of code samples. As longs as the code dealt with x, i and f the naming didn't bother me. However, as soon as we got into heavy duty list manipulation and the like I found the three letters or so names to be a lot less readable than I prefer.
To be fair a significant part of the naming followed a set of conventions, so I assume that once you get into the lingo it will be a little easier.
Fortunately, nothing prevents us from using meaningful names, but I don't agree that the language itself somehow makes three letter identifiers meaningful to the majority of people.

When in Rome, do as the Romans do
(Or as they say in my town: "Donde fueres, haz lo que vieres")

Anything that aids readability is a good thing - meaningful names are therefore a good thing in any language.
I use short variable names in many languages but they're reserved for things that aren't important in the overall meaning of the code or where the meaning is clear in the context.
I'd be careful how far I took the advice about Haskell names

My Haskell practice is only of mediocre level, thus, I dare to try to reply only the second, more general part of Your question:
"In essence, it claims that functional programming is such a distinct paradigm that the naming conventions from other paradigms don't apply."
I suspect, the answer is "yes", but my motivation behind this opinion is restricted only on experience in just one single functional language. Still, it may be interesting, because this is an extremely minimalistic one, thus, theoretically very "pure", and underlying a lot of practical functional languages.
I was curios how easy it is to write practical programs on such an "extremely" minimalistic functional programming language like combinatory logic.
Of course, functional programming languages lack mutable variables, but combinatory logic "goes further one step more" and it lacks even formal parameters. It lacks any syntactic sugar, it lacks any predefined datatypes, even booleans or numbers. Everything must be mimicked by combinators, and traced back to the applications of just two basic combinators.
Despite of such extreme minimalism, there are still practical methods for "programming" combinatory logic in a neat and pleasant way. I have written a quine in it in a modular and reusable way, and it would not be nasty even to bootstrap a self-interpreter on it.
For summary, I felt the following features in using this extremely minimalistic functional programming language:
There is a need to invent a lot of auxiliary functions. In Haskell, there is a lot of syntactic sugar (pattern matching, formal parameters). You can write quite complicated functions in few lines. But in combinatory logic, a task that could be expressed in Haskell by a single function, must be replaced with well-chosen auxiliary functions. The burden of replacing Haskell syntactic sugar is taken by cleverly chosen auxiliary functions in combinatory logic. As for replying Your original question: it is worth of inventing meaningful and catchy names for these legions of auxiliary functions, because they can be quite powerful and reusable in many further contexts, sometimes in an unexpected way.
Moreover, a programmer of combinatory logic is not only forced to find catchy names of a bunch of cleverly chosen auxiliary functions, but even more, he is forced to (re)invent whole new theories. For example, for mimicking lists, the programmer is forced to mimick them with their fold functions, basically, he has to (re)invent catamorphisms, deep algebraic and category theory concepts.
I conjecture, several differences can be traced back to the fact that functional languages have a powerful "glue".

In Haskell, meaning is conveyed less with variable names than with types. Being purely functional has the advantage of being able to ask for the type of any expression, regardless of context.

I agree with a lot of the points made here about argument naming but a quick 'find on page' shows that no one has mentioned Tacit programming (aka pointfree / pointless). Whether this is easier to read may be debatable so it's up to you & your team, but definitely worth a thorough consideration.
No named arguments = No argument naming conventions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string