How is (==) defined in Haskell? - haskell

I'm writing a small functional programming language in Haskell, but I can't find a definition of how (==) is implemented, as this seems to be quite tricky?

Haskell uses the concept of a "typeclass". The actual definition is something like this:
class Eq a where
(==) :: a -> a -> Bool
-- More functions follow, for more complex concepts of equality (eg NaN)
Then you can define it for your own types. For example:
-- Eq can't be automatically derived, because of the function
data Foo = Foo Int (Char -> Bool)
-- So define it here
instance Eq Foo where
(Foo x _) == (Foo y _) = x == y

I think Your question is very, very interesting. If You have meant also that You want to know the theoretical roots behind Your question, then, I think, we can abstract away from Haskell and investigate Your question in more general algorithm concepts. As for Haskell, I think, the two following facts matter:
Functions are first-class citizens in Haskell
Haskell is Turing-complete
but I have not done the discussion yet (how the strength of a language matters exactly here).
Possibility for specific cases, but no-go theorem for comprehensive
I think, in the roots, two theorems of computer science provide the answer. If we want to abstract away from technical details, we can investigate Your question in lambda calculus (or in combinatory logic). Can equality be defined in them? Thus, let us restrict ourselves first to the field of lambda-calculus or combinatory logic.
It must be noted that both of these algorithm approaches are very minimalistic. There are no ``predefined'' datatypes in them, not even numbers, nor booleans, nor lists. But You can mimic all of them, in clever ways.
Instead of booleans, You can use projection (selector) functions (Church booleans).
Instead of C-unions (or C++ class inheritance), You can use continuations. More precisely said, it is case analysis that You can implement in a concise and straightforward way.
You can mimic natural numbers with functions that iterate function composition (Church numeral).
You can implement lists and trees with sophisticated algebraic methods (catamorphisms).
Thus, You can mimic all meaningful datatypes even in such minimalistic "functional languages" like lambda calculus and combinatory logic. You can use lambda functions (or combinators) in a clever scene that mimic the datatype You want.
Now let us try to reply Your questions in these minimalistic functional languages first, to see whether the answer is Haskell-specific, or rather the mere consequence of some more general theorems.
Böhm's theorem provides: for any two previously given different expressions (that halt, and don't freeze the computer), a suitable testing function can be always written that decides correctly whether the two given expressions are semantically the same (Csörnyei 2007: 132, = Th 7.2.2). In most practical cases (lists, trees, booleans, numbers), Böhm's theorem provides that a suitable specific equality function can always be written. See an example for lists in Tromp 1999: Sec 2.
Scott-Curry undecidability theorem excludes that any all-general equality function could be ever written, meaningful for each possible scene (Csörnyei 2007: 140, = Th 7.4.1).
A go-theorem
After You have "implemented" a datatype, you can write its corresponding equality function for that. For most practical cases (lists, numbers, case analysis selections), there is no mystical "datatype" for which a corresponding equality function would lack. This positive answer is provided by Böhm's theorem.
You can write a Church-numeral-equality function that takes two Church numerals, and answers whether they equal. You can write another lambda-function/combinator that takes two (Church)-booleans and answers whether they equal or not. Moreover, You can implement lists in pure lambda calculus/CL (a proposed way is to use the notion of catamorphisms), and then, You can define a function that answers equality for lists of booleans. You can write another function that answers equality for lists of Church numerals. You can implement trees too, and thereafter, You can write a function that answers equality for trees (on booleans, and another, on Church numerals).
You can automatize some of this job, but not all. You can derive automatically some (but not all) equality functions automatically. If You have already the specific map functions for trees and lists, and equality functions for booleans and numbers, then You can derive equality functions also for boolean trees, boolean lists, number lists, number trees automatically.
A no-go theorem
But there is no way to define a single all-automatic equality function working for all possible ,"datatypes". If You "implement" a concrete, given datatype in lambda calculus, You usually have to plan its specific equality function for that scene.
Moreover, there is no way to define a lambda-function that would take two lambda-terms and answer, whether the two lambda-terms would behave the same way when reduced. Even more, there is no way to define a lambda-function that would take the representation (quotation) of two lambda-terms and answer, whether the two original lambda-terms would behave the same way when reduced (Csörnyei 2007: 141, Conseq 7.4.3). That no-go answer is provided by Scott-Curry undecidability theorem (Csörnyei 2007: 140, Th 7.4.1).
In other algorithm approaches
I think, the two above answers are not restricted to lambda calculus and combinatory logic. The similar possibility and restriction applies for some other algorithm concepts. For example, there is no recursive function that would take the Gödel numbers of two unary functions and decide whether these encoded functions behave the same extensionally (Monk 1976: 84, = Cor 5.18). This is a consequence of Rice's theorem (Monk 1976: 84, = Th 5.17). I feel, Rice's theorem sounds formally very similar to Scott-Curry undecidability theorem, but I have not considered it yet.
Comprehensive equality in a very restricted sense
If I wanted to write a combinatory logic interpreter that provides comprehensive equality testing (restricted for halting, normal-form-having terms), then I would implement that so:
I'd reduce both combinatory logic-terms under consideration to their normal forms,
and see whether they are identical as terms.
If so, then their unreduced original forms must have been equivalent semantically too.
But this works only with serious restrictions, although this method works well for several practical goals. We can make operations among numbers, lists, trees etc, and check whether we get the expected result. My quine (written in pure combinatory logic) uses this restricted concept of equality, and it suffices, despite of the fact this quine requires very sophisticated constructs (term trees implemented in combinatory logic itself).
I am yet unknowing what the limits of this restricted equality-concept are, but I suspect, it is very restricted, if compared to the correct definition of equality., The motivation behind it use is that it is computable at all, unlike the unrestricted concept of equality.
The restrictions can be seen also from the fact that this restricted equality concept is able to work only for combinators that have normal forms. For a counterexample: the restricted equality-concept cannot check whether I Ω = Ω, although we know well that the two terms can be converted mutually into each other.
I must consider yet, how the existence of this restricted concept of equality is related to the negative results claimed by Scott-Curry undecidability theorem and Rice's theorem. Both theorems deal with partial functions, but I do not know yet how this matters exactly.
Extensionality
But there are also further limitations of the restricted equality concept. It cannot deal with the concept of extensionality. For example, it does not notice that S K would be related to K I in any way, despite of the fact that S K behaves the same as K I when they are applied to at least two arguments:
The latter example must be explained in more details. We know that S K and K I are not identical as terms: S K ≢ K I. But if we apply both , respectively, to any two arguments X and Y, we see a relatedness:
S K X Y ⊳ K Y (X Y) ⊳ Y
K I X Y ⊳ I Y ⊳ Y
and of course Y ≡ Y, for any Y.
Of course, we cannot "try" such relatedness for each possible X and Y argument instances, because there can be can be infinitely many such CL-term instances to be substituted into these metavariables. But we do not have to stuck in this problem of infinity. If we augment our object language (combinatory logic) with (free) variables:
K is a term
S is a term
Any (free) variable is a term (new line, this is the modification!)
If both X and Y are terms, then also (X Y) is a term
Terms cannot be obtained in any other way
and we define reduction rules accordingly, in a proper way, then we can state an extensional definition of equality in a "finite" way, without relying on metavariables with infinite possible instances.
Thus, if free variables are allowed in combinatory logic terms (object language is augmented with its own object variables), then extensionality can be implemented to some degree. I have not yet considered this. As for the example above, we can use a notation
S K =2 K I
(Curry & Feys & Craig 1958: 162, = 5C 5), based on the fact that S K x y and K I x y can be proven to be equal (already without resorting to extensionality). Here, x and y are not metavariables for infinitely many possible instances of CL-terms in equation schemes, but are first-class citizens of the object language itself. Thus, this equation is no more an equation scheme, but one single equation.
As for the theoretical study, we can mean = by the "union" of =n instances for all n.
Alternatively, equality can be defined so that its inductive definition takes also extensionality in consideration. We add one further rule of inference dealing with extensionality (Csörnyei 2007: 158):
...
...
If E, F are combinators, x is an (object) variable, and x is not contained neither in E nor in F, then, from E x = F x we can infer E = F
The constraint about not-containing is important, as the following counterexample shows: K x ≠ I, despite of being K x x = I x. The "roles" of the two (incidentally identic) variable occurrences differ entirely. Excluding such incidence, that is the motivation of the constraint.
-
The use of this new rule of inference can be exemplified by showing how theorem S K x = K I x can be proven:
S K = K I is regarded to hold because S K x = K I x has already been proven to hold, see proof below:
S K x = K I x is regarded to hold because S K x y = K I x y has already been proven to hold, see below:
S K x y = K I x y can be proven without resorting to extensionality, we need only the familiar conversion rules.
What these remaining rules of inferences are? Here are they listed (Csörnyei 2007: 157):
Conversion axiom schemes:
``K E F = E'' is deducible (K-axiom scheme)
``S F G H = F H (G H)'' is deducible (S-axiom scheme)
Equality axiom schemes and rules of inference
``E = E'' is deducible (Reflexivity axiom scheme)
If "E = F" is deducible, then "F = E" is also deducible (Symmetry rule of inference)
If "E = F" is deducible, and "F = G" is deducible too, then also "E = G" is reducible (Transitivity rule)
If "E = F" is deducible, then "E G = F G" is also deducible (Leibniz rule I)
If "E = F" is deducible, then "G E = G F" is also deducible (Leibniz rule II)
References
Csörnyei, Zoltán (2007): Lambda-kalkulus. A funkcionális programozás alapjai. Budapest: Typotex. ISBN-978-963-9664-46-3.
Curry, Haskell B. & Feys, Robert & Craig, William (1958). Combinatory Logic. Vol. I. Amsterdam: North-Holland Publishing Company.
Madore, David (2003). The Unlambda Programming Language. Unlambda: Your Functional Programming Language Nightmares Come True.
Monk, J. Donald (1976). Mathematical Logic. Graduate Texts in Mathematics. New York • Heidelberg • Berlin: Springer-Verlag.
Tromp, John (1999). Binary Lambda Calculus and Combinatory Logic. Downloadable in PDF and Postscript from the author's John's Lambda Calculus and Combinatory Logic Playground.
Appendix
Böhm's theorem
I have not explained yet clearly how Böhm's theorem is related to the fact that in most practical cases, a suitable equality testing function can surely be written for a meaningful datatype (even in such minimalistic functional languages like pure lambda calculus or combinatory logic).
Statement
Let E and F be two different, closed terms of lambda calculus,
and let both of them have normal forms.
Then, the theorem claims, there is a suitable way for testing equality with applying them to a suitable series of arguments. In other words: there exists a natural number n and a series of closed lambda-terms G1, G2, G3,... Gn such that applying them to this series of arguments reduces to false and true, respectively:
E G1 G2 G3... Gn ⊳ false
F G1 G2 G3... Gn ⊳ true
where true and false are the two well-known, lamb-tame, easily manageable and distinguishable lambda terms:
true ≡ λ x y . x
false ≡ λ x y . y
Application
How this theorem can be exploited for implementing practical datatypes in pure lambda calculus? An implicit application of this theorem is exemplified by the way linked list can be defined in combinatory logic (Tromp 1999: Sec 2).

(==) is part of the type class Eq. A separate implementation is provided by each type that instances Eq. So to find the implementation you should usually look at where your type is defined.

Smells like homework to me. Elaborate on why you find it tricky.
You might look at how ML and various Lisps attempt to solve the problem.
You might also look in the source code of one of other languages interpreters/compilers, some are written with study in mind.

Related

Under what notion of equality are typeclass laws written?

Haskell typeclasses often come with laws; for instance, instances of Monoid are expected to observe that x <> mempty = mempty <> x = x.
Typeclass laws are often written with single-equals (=) rather than double-equals (==). This suggests that the notion of equality used in typeclass laws is something other than that of Eq (which makes sense, since Eq is not a superclass of Monoid)
Searching around, I was unable to find any authoritative statement on the meaning of = in typeclass laws. For instance:
The Haskell 2010 report does not even contain the word "law" in it
Speaking with other Haskell users, most people seem to believe that = usually means extensional equality or substitution but is fundamentally context-dependent. Nobody provided any authoritative source for this claim.
The Haskell wiki article on monad laws states that = is extensional, but, again, fails to provide a source, and I wasn't able to track down any way to contact the author of the relevant edit.
The question, then: Is there any authoritative source on or standard for the semantics for = in typeclass laws? If so, what is it? Additionally, are there examples where the intended meaning of = is particularly exotic?
(As a side note, treating = extensionally can get tricky. For instance, there is a Monoid (IO a) instance, but it's not really clear what extensional equality of IO values looks like.)
I suspect most folks use = to mean "moral equality" as from Fast and Loose Reasoning is Morally Correct, which you can think of as extensional equality up to defined-ness.
But there's no hard-and-fast rule here. There's a lot of libraries, and a lot of authors, and if you take any two authors they probably have some minor detail about = on which they disagree.
Typeclass laws are not part of the Haskell language, so they are not subject to the same kind of language-theoretic semantic analysis as the language itself.
Instead, these laws are typically presented as an informal mathematical notation. Most presentations do not need a more detailed mathematical exposition, so they do not provide one.
I agree with comingstorm that the equality in those laws is that of a mathematical language. But I would also say that it is in the respect of the operator ==.
Why? Because == is supposed to implement mathematical equality.
For example, look at fractions (rational numbers). They can be implemented as pairs of integers with some rules. The pair (a, b) represents the fraction a/b. The pairs (a, b) and (c, d) represent the same rational number if a*d == b*c. The two pairs are then said to be equivalent, and we talk about an equivalence relation. In mathematics we let a rational number be an equivalence class of pairs under this equivalence. In programming we instead define the operator == to tell if two pairs are equivalent, i.e. if they represent the same fraction.

How to use category theory diagrams with polyary functions?

So, there's a lot of buzz about categories all around the Haskell ecosystem. But I feel one piece is missing from the common sense I have so far absorbed by osmosis. (I did read the first few pages of Mac Lane's famous introduction as well, but I don't believe I have enough mathematical maturity to carry the wisdom from this text to actual programming I have at hand.) I will now follow with a real world example involving a binary function that I have trouble depicting in categorical terms.
So, I have this function chain that allows me to S -> A, where A is a type synonym for a function, akin to a -> b. Now, I want to depict a process that does S -> a -> b, but I end up with an arrow pointing to another arrow rather than an object. How do I deal with such predicament?
I did overhear someone talking about a thing called n-category but I don't know if I should even try to understand what it is and how it's useful.
Though I believe my abstraction is accurate, the actual functions are parsePath >>> either error id >>> toAxis :: String -> Text.XML.Cursor.Axis from selectors and Axis = Text.XML.Cursor.Cursor -> [Text.XML.Cursor.Cursor] from xml-conduit.
There are two approaches to model binary functions as morphism in category theory (n-ary functions are dealt with similarly -- no new machinery is needed). One is to consider the uncurried version:
(A * B) -> C
where we take the product of the types A and B as a starting object. For that we need the category to contain such a products. (In Haskell, products are written (A, B). Well, technically in Haskell this is not exactly the product as in categories, but let's ignore that.)
Another is to consider the result type (B -> C) as an object in the category. Usually, this is called an exponential object, written as C^B. Assuming our category has such objects, we can write
A -> C^B
These two representations of binary functions are isomorphic: using curry and uncurry we can transform each one into the other.
Indeed, when there is such a (natural) isomorphism, we get a so called cartesian closed category, which is the simplest form of category which can describe a simply typed lambda calculus -- the core of every typed functional language.
This isomorphism is often cited as an adjunction between two functors
(- * B) -| (- ^ B)
I can use tuple projections to depict this situation, as follows:
-- Or, in actual Haskell terms:
This diagram features backwards fst & snd arrows in place of a binary function that constructs the tuple from its constituents, and that I can in no way depict directly. The caveat is that, while in this diagram Cursor has only one incoming arrow, I should remember that in actual code some real arrows X -> Axis & Y -> Cursor should go to both of the projections of the tuple, not just the symbolic projecting functions. The flow will then be uniformly left to right.
Pragmatically speaking, I traded an arrow with two sources (that constructs a tuple and isn't a morphism) for two reversed arrows (the tuple's projections that are legal morphisms in all regards).

Type constraints on dimensionality of vectors in F# and Haskell (Dependent Types)

I'm new to F# and Haskell and am implementing a project in order to determine which language I would prefer to devote more time to.
I have a numerous situations where I expect a given numerical type to have given dimensions based on parameters given to a top-level function (ie, at runtime). For example, in this F# snippet, I have
type DataStreamItem = LinearAlgebra.Vector<float32>
type Ball =
{R : float32;
X : DataStreamItem}
and I expect all instances of type DataStreamItem to have D dimensions.
My question is in the interests of algorithm development and debugging since such shape-mismatche-bugs can be a headache to pin down but should be a non-issue when the algorithm is up-and-running:
Is there a way, in either F# or Haskell, to constrain DataStreamItem and / or Ball to have dimensions of D? Or do I need to resort to pattern matching on every calculation?
If the latter is the case, are there any good, light-weight paradigms to catch such constraint violations as soon as they occur (and that can be removed when performance is critical)?
Edit:
To clarify the sense in which D is constrained:
D is defined such that if you expressed the algorithm of the function main(DataStream) as a computation graph, all of the intermediate calculations would depend on the dimension of D for the execution of main(DataStream). The simplest example I can think of would be a dot-product of M with DataStreamItem: the dimension of DataStream would determine the creation of dimension parameters of M
Another Edit:
A week later, I find the following blog outlining precisely what I was looking for in dependant types in Haskell:
https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html
And Another:
This reddit contains some discussion on Dependent Types in Haskell and contains a link to the quite interesting dissertation proposal of R. Eisenberg.
Neither Haskell not F# type system is rich enough to (directly) express statements of the sort "N nested instances of a recursive type T, where N is between 2 and 6" or "a string of characters exactly 6 long". Not in those exact terms, at least.
I mean, sure, you can always express such a 6-long string type as type String6 = String6 of char*char*char*char*char*char or some variant of the sort (which technically should be enough for your particular example with vectors, unless you're not telling us the whole example), but you can't say something like type String6 = s:string{s.Length=6} and, more importantly, you can't define functions of the form concat: String<n> -> String<m> -> String<n+m>, where n and m represent string lengths.
But you're not the first person asking this question. This research direction does exist, and is called "dependent types", and I can express the gist of it most generally as "having higher-order, more powerful operations on types" (as opposed to just union and intersection, as we have in ML languages) - notice how in the example above I parametrize the type String with a number, not another type, and then do arithmetic on that number.
The most prominent language prototypes (that I know of) in this direction are Agda, Idris, F*, and Coq (not really the full deal AFAIK). Check them out, but beware: this is kind of the edge of tomorrow, and I wouldn't advise starting a big project based on those languages.
(edit: apparently you can do certain tricks in Haskell to simulate dependent types, but it's not very convenient, and you have to enable UndecidableInstances)
Alternatively, you could go with a weaker solution of doing the checks at runtime. The general gist is: wrap your vector types in a plain wrapper, don't allow direct construction of it, but provide constructor functions instead, and make those constructor functions ensure the desired property (i.e. length). Something like:
type Stream4 = private Stream4 of DataStreamItem
with
static member create (item: DataStreamItem) =
if item.Length = 4 then Some (Stream4 item)
else None
// Alternatively:
if item.Length <> 4 then failwith "Expected a 4-long vector."
item
Here is a fuller explanation of the approach from Scott Wlaschin: constrained strings.
So if I understood correctly, you're actually not doing any type-level arithmetic, you just have a “length tag” that's shared in a chain of function calls.
This has long been possible to do in Haskell; one way that I consider quite elegant is to annotate your arrays with a standard fixed-length type of the desired length:
newtype FixVect v s = FixVect { getFixVect :: VU.Vector s }
To ensure the correct length, you only provide (polymorphic) smart constructors that construct from the fixed-length type – perfectly safe, though the actual dimension number is nowhere mentioned!
class VectorSpace v => FiniteDimensional v where
asFixVect :: v -> FixVect v (Scalar v)
instance FiniteDimensional Float where
asFixVect s = FixVect $ VU.singleton s
instance (FiniteDimensional a, FiniteDimensional b, Scalar a ~ Scalar b) => FiniteDimensional (a,b) where
asFixVect (a,b) = case (asFixVect a, asFixVect b) of
(FixVect av, FixVect bv) -> FixVect $ av<>bv
This construction from unboxed tuples is really inefficient, however this doesn't mean you can write efficient programs with this paradigm – if the dimension always stays constant, you only need to wrap and unwrap the once and can do all the critical operations through safe yet runtime-unchecked zips, folds and LA combinations.
Regardless, this approach isn't really widely used. Perhaps the single constant dimension is in fact too limiting for most relevant operations, and if you need to unwrap to tuples often it's way too inefficient. Another approach that is taking off these days is to actually tag the vectors with type-level numbers. Such numbers have become available in a usable form with the introduction of data kinds in GHC-7.4. Up until now, they're still rather unwieldy and not fit for proper arithmetic, but the upcoming 8.0 will greatly improve many aspects of this dependently-typed programming in Haskell.
A library that offers efficient length-indexed arrays is linear.

Can a pure function have free variables?

For example, a referentially transparent function with no free variables:
g op x y = x `op` y
A now now a function with the free (from the point-of-view of f) variables op and x:
x = 1
op = (+)
f y = x `op` y
f is also referentially transparent. But is it a pure function?
If it's not a pure function, what is the name for a function that is referentially transparent, but makes use of 1 or more variables bound in an enclosing scope?
Motivation for this question:
It's not clear to me from Wikipedia's article:
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
(emphasis mine)
nor from Google searches whether a pure function can depend on free (in the sense of being bound in an enclosing scope, and not being bound in the scope of the function) variables.
Also, this book says:
If functions without free variables are pure, are closures impure?
The function function (y) { return x } is interesting. It contains a
free variable, x. A free variable is one that is not bound within
the function. Up to now, we’ve only seen one way to “bind” a variable,
namely by passing in an argument with the same name. Since the
function function (y) { return x } doesn’t have an argument named x,
the variable x isn’t bound in this function, which makes it “free.”
Now that we know that variables used in a function are either bound or
free, we can bifurcate functions into those with free variables and
those without:
Functions containing no free variables are called pure functions.
Functions containing one or more free variables are called closures.
So what is the definition of a "pure function"?
To the best of my understanding "purity" is defined at the level of semantics while "referentially transparent" can take meaning both syntactically and embedded in lambda calculus substitution rules. Defining either one also leads to a bit of a challenge in that we need to have a robust notion of equality of programs which can be challenging. Finally, it's important to note that the idea of a free variable is entirely syntactic—once you've gone to a value domain you can no longer have expressions with free variables—they must be bound else that's a syntax error.
But let's dive in and see if this becomes more clear.
Quinian Referential Transparency
We can define referential transparency very broadly as a property of a syntactic context. Per the original definition, this would be built from a sentence like
New York is an American city.
of which we've poked a hole
_ is an American city.
Such a holey-sentence, a "context", is said to be referentially transparent if, given two sentence fragments which both "refer" to the same thing, filling the context with either of those two does not change its meaning.
To be clear, two fragments with the same reference we can pick would be "New York" and "The Big Apple". Injecting those fragments we write
New York is an American city.
The Big Apple is an American city.
suggesting that
_ is an American city.
is referentially transparent. To demonstrate the quintessential counterexample, we might write
"The Big Apple" is an apple-themed epithet referring to New York.
and consider the context
"_" is an apple-themed epithet referring to New York.
and now when we inject the two referentially identical phrases we get one valid and one invalid sentence
"The Big Apple" is an apple-themed epithet referring to New York.
"New York" is an apple-themed epithet referring to New York.
In other words, quotations break referential transparency. We can see how this occurs by causing the sentence to refer to a syntactic construct instead of purely the meaning of that construct. This notion will return later.
Syntax v Semantics
There's something confusing going on in that this definition of referential transparency above applies directly to English sentences of which we build contexts by literally stripping words out. While we can do that in a programming language and consider whether such a context is referentially transparent, we also might recognize that this idea of "substitution" is critical to the very notion of a computer language.
So, let's be clear: there are two kinds of referential transparency we can consider over lambda calculus—the syntactic one and the semantic one. The syntactic one requires we define "contexts" as holes in the literal words written in a programming language. That lets us consider holes like
let x = 3 in _
and fill it in with things like "x". We'll leave the analysis of that replacement for later. At the semantic level we use lambda terms to denote contexts
\x -> x + 3 -- similar to the context "_ + 3"
and are restricted to filling in the hole not with syntax fragments but instead only valid semantic values, the action of that being performed by application
(\x -> x + 3) 5
==>
5 + 3
==>
8
So, when someone refers to referential transparency in Haskell it's important to figure out what kind of referential transparency they're referring to.
Which kind is being referred to in this question? Since it's about the notion of an expression containing a free variable, I'm going to suggest that it's syntactic. There are two major thrusts for my reasoning here. Firstly, in order to convert a syntax to a semantics requires that the syntax be valid. In the case of Haskell this means both syntactic validity and a successfully type check. However, we'll note that a program fragment like
x + 3
is actually a syntax error since x is simply unknown, unbound leaving us unable to consider the semantics of it as a Haskell program. Secondly, the very notion of a variable such as one that can be let-bound (and consider the difference between "variable" as it refers to a "slot" such as an IORef) is entirely a syntactic construct—there's no way to even talk about them from inside the semantics of a Haskell program.
So let's refine the question to be:
Can an expression containing free variables be (syntactically) referentially transparent?
and the answer is, uninterestingly, no. Referential transparency is a property of "contexts", not expressions. So let's explore the notion of free variables in contexts instead.
Free variable contexts
How can a context meaningfully have a free variable? It could be beside the hole
E1 ... x ... _ ... E2
and so long as we cannot insert something into that syntactic hole which "reaches over" and affects x syntactically then we're fine. So, for instance, if we fill that hole with something like
E1 ... x ... let x = 3 in E ... E2
then we haven't "captured" the x and thus can perhaps consider that syntactic hole to be referentially transparent. However, we're being nice to our syntax. Let's consider a more dangerous example
do x <- foo
let x = 3
_
return x
Now we see that the hole we've provided in some sense has dominion over the later phrase "return x". In fact, if we inject a fragment like "let x = 4" then it indeed changes the meaning of the whole. In that sense, the syntax here is no referentially transparent.
Another interesting interaction between referential transparency and free variables is the notion of an assigning context like
let x = 3 in _
where, from an outside perspective, both phrases "x" and "y" are reference the same thing, some named variable, but
let x = 3 in x ==/== let x = 3 in y
Progression from thorniness around equality and context
Now, hopefully the previous section explained a few ways for referential transparency to break under various kinds of syntactic contexts. It's worth asking harder questions about what kinds of contexts are valid and what kinds of expressions are equivalent. For instance, we might desugar our do notation in a previous example and end up noticing that we weren't working with a genuine context, but instead sort of a higher-order context
foo >>= \x -> (let x = 3 in ____(return x)_____)
Is this a valid notion of context? It depends a lot on what kind of meaning we're giving the program. The notion of desugaring the syntax already implies that the syntax must be well-defined enough to allow for such desugaring.
As a general rule, we must be very careful with defining both contexts and notions of equality. Further, the more meaning we demand the fragments of our language to take on the greater the ways they can be equal and the fewer the valid contexts we can build.
Ultimately, this leads us all the way to what I called "semantic referential transparency" earlier where we can only substitute proper values into a proper, closed lambda expression and we take the resulting equality to be "equality as programs".
What this ends up meaning is that as we impute more and more meaning on our language, as we begin to accept fewer and fewer things as valid, we get stronger and stronger guarantees about referential transparency.
Purity
And so this finally leads to the notion of a pure function. My understanding here is (even) less complete, but it's worth noting that purity, as a concept, does not much exist until we've moved to a very rich semantic space—that of Haskell semantics as a category over lifted Complete Partial Orders.
If that doesn't make much sense, then just imagine purity is a concept that only exists when talking about Haskell values as functions and equality of programs. In particular, we examine the collection of Haskell functions
trivial :: a -> ()
trivial x = x `seq` ()
where we have a trivial function for every choice of a. We'll notate the specific choice using an underscore
trivial_Int :: Int -> ()
trivial_Int x = x `seq` ()
Now we can define a (very strictly) pure function to be a function f :: a -> b such that
trivial_b . f = trivial_a
In other words, if we throw out the result of computing our function, the b, then we may as well have never computed it in the first place.
Again, there's no notion of purity without having Haskell values and no notion of Haskell values when your expressions contain free variables (since it's a syntax error).
So what's the answer?
Ultimately, the answer is that you can't talk about purity around free variables and you can break referential transparency in lots of ways whenever you are talking about syntax. At some point as you convert your syntactic representation to its semantic denotation you must forget the notion and names of free variables in order to have them represent the reduction semantics of lambda terms and by this point we've begun to have referential transparency.
Finally, purity is something even more stringent than referential transparency having to do with even the reduction characteristics of your (referentially transparent) lambda terms.
By the definition of purity given above, most of Haskell isn't pure itself as Haskell may represent non-termination. Many feel that this is a better definition of purity, however, as non-termination can be considered a side effect of computation instead of a meaningful resultant value.
The Wikipedia definition is incomplete, insofar a pure function may use constants to compute its answer.
When we look at
increment n = 1+n
this is obvious. Perhaps it was not mentioned because it is that obvious.
Now the trick in Haskell is that not only top level values and functions are constants, but inside a closure also the variables(!) closed over:
add x = (\y -> x+y)
Here x stands for the value we applied add to - we call it variable not because it could change within the right hand side of add, but because it can be different each time we apply add. And yet, from the point of view of the lambda, x is a constant.
It follows that free variables always name constant values at the point where they are used and hence do not impact purity.
Short answer is YES f is pure
In Haskell map is defined with foldr. Would you agree that map is functional? If so did it matter that it had global function foldr that wasn't supplied to map as an argument?
In map foldr is a free variable. It's not doubt about it. It makes no difference that it's a function or something that evaluates to a value. It's the same.
Free variables, like the functions foldl and +, are essential for functional languages to exist. Without it you wouldn't have abstraction and the languages would be worse off than the Fortran.

Can compilers deduce/prove mathematically?

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?
Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.
The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.
The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.
What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)
As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.
Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

Resources