Can compilers deduce/prove mathematically? - haskell

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?

Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.

The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.

The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.

What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)

As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.

Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

Related

how lazy is haskell when dealing with complicated program flow

I was wondering how smart/lazy Haskell is. Can I always be sure that Haskell will only do what is necessary to generate a certain output?
No.
Haskell specifies a denotational semantics for its core lambda-like calculus, and you can rely on that semantics. Additionally, there is a metatheory proof that a particular reduction order -- known colloquially as "lazy evaluation" -- realizes that semantics; and so many people use that as their mental model of how Haskell programs behave.
There are two broad categories of ways that a Haskell program may end up evaluating more than necessary:
The Haskell implementation may choose to evaluate more. GHC uses lazy evaluation in most places, but I believe it will use other evaluation orders for efficiency in some cases. You could also look at the Eager Haskell project, which is attempting to use another implementation strategy. In principle, an implementation would be within its rights to choose to speculatively fork some computations to another thread (and then throw away the results if they weren't needed). And so on and so forth -- there are many possible variations on this theme.
The denotational semantics specified may demand more evaluation than "necessary". For example, one that occasionally trips up beginners:
primes :: [Int]
primes = 2 : filter prime [3,5..]
prime :: Int -> Bool
prime x = and [x `mod` p /= 0 | p <- primes, p < x]
When checking whether 3 should be in the list primes, it is in fact not necessary to check any of the elements of primes past 2, because the sequence is strictly monotonically increasing. But Haskell is not (does not try to be) smart enough to notice that; it will go straight on trying to check the rest of the primes and end up in an infinite loop instead of giving the list of primes.
An even smaller example: you could think that x && False is always False, but x will typically be evaluated anyway, because the semantics says this should be an infinite loop if x is. (Contrast False && x, which typically does not result in evaluating x.)
That said, when you say "complex structure", one thing that comes to mind is: does Haskell do the laziness thing even with custom data types that I define? That is, do complex structures like hash maps and balanced trees and k-d trees and so forth get treated lazily? The answer there is yes, at least for GHC; there is nothing fundamentally special about any of the types in the Prelude except IO. Lists, booleans, Maybes, and so forth are lazy not because the compiler knows special things about them, but simply as a consequence of the lazy evaluation reduction rules specified by Haskell and their declarations as types.
Of course, there are ways to opt-out of laziness. Some of the structures you will see on Hackage do that in various ways; but don't worry, usually this will be declared in their documentation.

Automatically inserting laziness in Haskell

Haskell pattern matching is often head strict, for example,f (x:xs) = ...
requires input list to be evaluated to (thunk : thunk). But sometimes such evaluation is not needed and function can afford to be non-strict on some arguments, for example f (x:xs) = 3.
Ideally, in such situations we could avoid evaluating arguments to get the behaviour of const 3, which could be done with irrefutable pattern: f ~(x:xs) = 3. This gives us performance benefits and greater error tolerance.
My question is: Does GHC already implement such transformations via some kind of strictness analysis? Appreciate it if you could also point me to some readings on it.
As far as I know, GHC will never make something more lazy than specified by the programmer, even if it can prove that doesn't change the semantics of the term. I don't think there is any fundamental reason to avoid changing the laziness of a term when we can prove the semantics don't change; I suspect it's more of an empirical observation that we don't know of any situations where that would be a really great idea. (And if a transformation would change the semantics, I would consider it a bug for GHC to make that change.)
There is only one possible exception that comes to mind, the so-called "full laziness" transformation, described well on the wiki. In short, GHC will translate
\a b -> let c = {- something that doesn't mention b -} in d
to
\a -> let c = {- same thing as before -} in \b -> d
to avoid recomputing c each time the argument is applied to a new b. But it seems to me that this transformation is more about memoization than about laziness: the two terms above appear to me to have the same (denotational) semantics wrt laziness/strictness, and are only operationally different.

What is Haskell missing for totality checking?

A total (functional) language is one in which everything can be shown to terminate. Obviously, there are lots of places where I don't want this - throwing exceptions is sometimes handy, a web-server isn't supposed to terminate, etc. But sometimes, I would like a local totality check to enable certain optimizations. For example, if I have a provably-total function
commutativity :: forall (n :: Nat) (m :: Nat). n + m :~: m + n
commutativity = ...
then, since :~: has exactly one inhabitant (Refl), GHC could optimize
gcastWith (commutativity #n #m) someExpression
==>
someExpression
And my proof of commutativity goes from having an O(n) runtime cost to being free. So, now for my question:
What are some of the subtle difficulties in making a totality checker for Haskell?
Obviously, such a checker is conservative so whenever GHC isn't sure something is total (or is to lazy to check) it could assume it isn't... Seems to me it might not be too difficult to cobble together a not-so-smart checker that would still be very useful (at least it should be straightforward to eliminate all my arithmetic proofs). Yet, I can't seem to find any efforts to build such a thing into GHC, so obviously I'm missing some pretty big constraints. Go ahead SO, crush my dreams. :)
Relevant but not recent: Unfailing Haskell by Neil Mitchell, 2005.
Liquid Haskell has totality checking: https://github.com/ucsd-progsys/liquidhaskell#termination-check
By default a termination check is performed on all recursive functions.
Use the no-termination option to disable the check
liquid --no-termination test.hs
In recursive functions the first algebraic or integer argument should be decreasing.
The default decreasing measure for lists is length and Integers its value.
(I included screenshot and quote for posterity.)
Similar to in Agda or other languages with totality checking, the arguments to the function must become structurally smaller over time to reach the base case. Combined with a totality checker, this makes for a reliable check for a number of functions. LH also supports helping the checker along by indicating how things decrease, which you could do with an abstract data type that's opaque or from an FFI. It's really quite practical.

Monotonic sequence type in haskell

I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.
This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.

What are super combinators and constant applicative forms?

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Resources