How to prove this lemma about eta expansion? - variadic-functions

(here is a gist of my work so far.)
Coq comes with a rule about eta reduction, allowing us to prove something like the following:
Variables X Y Z : Type.
Variable f : X -> Y -> Z.
Lemma eta1 : (fun x => f x) = f.
Proof.
auto.
Qed.
The eta rule is not merely an equality rewrite, so we can use it beneath binders as well:
Lemma eta2 : (fun x y => f x y) = f.
Proof.
auto.
Qed.
Of course, one would expect that you could generalize this to an f of arbitrary arity. Here is my naïve attempt:
Require Import Coq.Lists.List.
Import ListNotations.
Fixpoint toType(ts : list Type)(t : Type) : Type :=
match ts with
|[] => t
|u::vs => u -> (toType vs t)
end.
Compute toType [X;Y] Z.
(*
= X -> Y -> Z
: Type
*)
Fixpoint etaexpand(ts : list Type) : forall t : Type, toType ts t -> toType ts t :=
match ts as ts return forall t, toType ts t -> toType ts t with
|[] => fun t x => x
|u::vs => fun t f (x:u) => etaexpand vs t (f x)
end.
Compute etaexpand [X;Y] Z f.
(*
= fun (x : X) (x0 : Y) => f x x0
: toType [X; Y] Z
*)
Lemma etaexpand_id : forall ts t f, etaexpand ts t f = f.
Proof.
induction ts; intros.
auto.
simpl.
(*stuck here*)
I get stuck on the inductive step of the above lemma. Naturally, I want to rewrite using the inductive hypothesis, but I cannot since the relevant subterm occurs beneath a binder. Of course, I myself know that the inductive hypothesis should be usable beneath binders, since it's just a chain of eta rewrites. I'm wondering then if there's a clever way to state and convince Coq of this fact.

In case anyone's curious, here's the solution I came up with after some thought.
The key is to simultaneously prove the following "niceness" property for etaexpand ts t:
Definition nice{X Y}(F : Y -> Y) := (forall y, F y = y) -> forall f : X -> Y,
(fun x => F (f x)) = f.

Related

How to do cases on a function between finite types?

I'm just starting out with lean.
Say I want to prove something about functions between two finite types. For example
example (f : Bool -> Bool) : (∀ x : Bool, f (f (f x)) = f x) := sorry
since there are just a few possibilities for f ideally I'd want something like cases x <;> cases f <;> rfl, but I can't do cases f.
Currently, I don't know how to prove this at all, I'd think cases/match on (f false) and (f true) but lean doesn't seem to remember this information about the function once it goes into the case.
Here's a proof along your original lines (not dec_trivial), note we use the h : syntax to name the case hypothesis:
example (f : Bool -> Bool) : (∀ x : Bool, f (f (f x)) = f x) :=
by
intro x
cases x <;>
cases h : f true <;>
cases h2 : f false <;>
simp [h, h2]

How to understand nested lambda functions in Haskell

I am trying to understand the meaning of the following 2 lambda expressions in Haskell:
f = \x -> x (\y -> x y)
g = \x -> (\y -> y) x
I tried to convert them, and I got this:
f x y = x x y
g x y = y x
Is this correct? I assumed the arguments of both functions have to be x and y, as they are both found in a lambda expression in the function description. I basically understood it this way: f(x) = x f(y) and f(y) = y x. And for g, g(x) = g(y) x and g(y) = y. But as I am new to Haskell, I'm not very confident with these types of conversion. If not correct, what would be a correct conversion?
Neither is correct. Your solution uses the functions
f x y = x x y
g x y = y x
which actually mean
f = \x -> (\y -> x x y)
g = \x -> (\y -> y x)
and those differ from the original expressions
f = \x -> x (\y -> x y)
g = \x -> (\y -> y) x
The above two equations can be rewritten as
f x = x (\y -> x y)
g x = (\y -> y) x
But from here, there is no way to turn the remaining lambdas into more arguments for f or g. At best, we can simplify them using beta/eta conversion and get
f x = x x -- eta (\y -> x y) = x
g x = x -- beta (\y -> y) x = x
(Also see the comment below by Will Ness, who points out that through an additional eta expansion in f we could reach the OP's definition. Still, that is incidental.)
Finally, note that Haskell will not accept f x = x x since that can not be typed, unless we use rank-2 types and explicitly provide a type annotation like f :: (forall a. a) -> b. The original code f = \x -> x (\y -> x y) suffers from the same issue. That would also be fine in untyped languages, e.g. the untyped lambda calculus in programming languages theory.
The :type command at the GHCi prompt is your friend. Let's take your second example first
λ> :type let g = \x -> (\y -> y) x in g
let g = \x -> (\y -> y) x in g :: p -> p
So g is well-typed and is a convoluted way to write an identity function :: p -> p. Specifically, g takes some x and applies an identity function (\y -> y) to x, resulting in x. GHCi in giving the type uses a fresh type name p, to avoid confusion. No your g x y = ... is not equivalent. (Check it with :type.)
You can abbreviate :type to just :t. Then let's take your first example.
λ> :t let f = \x -> x (\y -> x y) in f
* Occurs check: cannot construct the infinite type: t2 ~ t2 -> t3
* In the first argument of `x', namely `(\ y -> x y)'
In the expression: x (\ y -> x y)
In the expression: \ x -> x (\ y -> x y)
* Relevant bindings include
x :: t2 -> t3 (bound at <interactive>:1:10)
f :: (t2 -> t3) -> t3 (bound at <interactive>:1:5)
Errk. Is your suggested f the same as that?
λ> :t let f x y = x x y in f
* Occurs check: cannot construct the infinite type:
t3 ~ t3 -> t4 -> t5
* In the first argument of `x', namely `x'
It at least looks like a similar error message. What are these t2, t3, t4, t5? Again it's GHCi using fresh names for the types, to avoid confusion.
Looking at the let f = ..., GHCi sees x is applied to something, so it gives x :: t2 -> t3 where t2 is the type of its argument, t3 is the return type. It also sees f = \x -> x (blah). So the return type of f must be whatever x returns, i.e. t3, and the argument to f is x. So f :: (t2 -> t3) -> t3.
Inside the (blah), there's x applied to something. So the something (i.e. y) must be the type of x's argument, and the return type must be x's return type. I.e. (\y -> x y) :: t2 -> t3. Errk: then we must have x's argument type same as that, because x is applied to it. And the way we write 'same as' is with ~.
Then the error message tells you GHCi is trying to make sense of t2 ~ (t2 -> t3). (-> binds tighter than ~.) And if you try to subsitute that equivalence for t2 into the RHS you'll get t2 ~ (((... -> t3) -> t3)-> t3) ad infinitum.
Your suggested equivalent for f x y = is not equivalent (the message/typing is a little different). But they're both infinite types, so not allowed.

Reusing patterns in pattern guards or case expressions

My Haskell project includes an expression evaluator, which for the purposes of this question can be simplified to:
data Expression a where
I :: Int -> Expression Int
B :: Bool -> Expression Bool
Add :: Expression Int -> Expression Int -> Expression Int
Mul :: Expression Int -> Expression Int -> Expression Int
Eq :: Expression Int -> Expression Int -> Expression Bool
And :: Expression Bool -> Expression Bool -> Expression Bool
Or :: Expression Bool -> Expression Bool -> Expression Bool
If :: Expression Bool -> Expression a -> Expression a -> Expression a
-- Reduces an Expression down to the simplest representation.
reduce :: Expression a -> Expression a
-- ... implementation ...
The straightforward approach to implementing this is to write a case expression to recursively evaluate and pattern match, like so:
reduce (Add x y) = case (reduce x, reduce y) of
(I x', I y') -> I $ x' + y'
(x', y') -> Add x' y'
reduce (Mul x y) = case (reduce x, reduce y) of
(I x', I y') -> I $ x' * y'
(x', y') -> Mul x' y'
reduce (And x y) = case (reduce x, reduce y) of
(B x', B y') -> B $ x' && y'
(x', y') -> And x' y'
-- ... and similarly for other cases.
To me, that definition looks somewhat awkward, so I then rewrote the definition using pattern guards, like so:
reduce (Add x y) | I x' <- reduce x
, I y' <- reduce y
= I $ x' + y'
I think this definition looks cleaner compared to the case expression, but when defining multiple patterns for different constructors, the pattern is repeated multiple times.
reduce (Add x y) | I x' <- reduce x
, I y' <- reduce y
= I $ x' + y'
reduce (Mul x y) | I x' <- reduce x
, I y' <- reduce y
= I $ x' * y'
Noting these repeated patterns, I was hoping there would be some syntax or structure that could cut down on the repetition in the pattern matching. Is there a generally accepted method to simplify these definitions?
Edit: after reviewing the pattern guards, I've realised they don't work as a drop-in replacement here. Although they provide the same result when x and y can be reduced to I _, they do not reduce any values when the pattern guards do not match. I would still like reduce to simplify subexpressions of Add et al.
One partial solution, which I've used in a similar situation, is to extract the logic into a "lifting" function that takes a normal Haskell operation and applies it to your language's values. This abstracts over the wrappping/unwrapping and resulting error handling.
The idea is to create two typeclasses for going to and from your custom type, with appropriate error handling. Then you can use these to create a liftOp function that could look like this:
liftOp :: (Extract a, Extract b, Pack c) => (a -> b -> c) ->
(Expression a -> Expression b -> Expression c)
liftOp err op a b = case res of
Nothing -> err a' b'
Just res -> pack res
where res = do a' <- extract $ reduce' a
b' <- extract $ reduce' b
return $ a' `op` b'
Then each specific case looks like this:
Mul x y -> liftOp Mul (*) x y
Which isn't too bad: it isn't overly redundant. It encompasses the information that matters: Mul gets mapped to *, and in the error case we just apply Mul again.
You would also need instances for packing and unpacking, but these are useful anyhow. One neat trick is that these can also let you embed functions in your DSL automatically, with an instance of the form (Extract a, Pack b) => Pack (a -> b).
I'm not sure this will work exactly for your example, but I hope it gives you a good starting point. You might want to wire additional error handling through the whole thing, but the good news is that most of that gets folded into the definition of pack, unpack and liftOp, so it's still pretty centralized.
I wrote up a similar solution for a related (but somewhat different) problem. It's also a way to handle going back and forth between native Haskell values and an interpreter, but the interpreter is structured differently. Some of the same ideas should still apply though!
This answer is inspired by rampion's follow-up question, which suggests the following function:
step :: Expression a -> Expression a
step x = case x of
Add (I x) (I y) -> I $ x + y
Mul (I x) (I y) -> I $ x * y
Eq (I x) (I y) -> B $ x == y
And (B x) (B y) -> B $ x && y
Or (B x) (B y) -> B $ x || y
If (B b) x y -> if b then x else y
z -> z
step looks at a single term, and reduces it if everything needed to reduce it is present. Equiped with step, we only need a way to replace a term everywhere in the expression tree. We can start by defining a way to apply a function inside every term.
{-# LANGUAGE RankNTypes #-}
emap :: (forall a. Expression a -> Expression a) -> Expression x -> Expression x
emap f x = case x of
I a -> I a
B a -> B a
Add x y -> Add (f x) (f y)
Mul x y -> Mul (f x) (f y)
Eq x y -> Eq (f x) (f y)
And x y -> And (f x) (f y)
Or x y -> Or (f x) (f y)
If x y z -> If (f x) (f y) (f z)
Now, we need to apply a function everywhere, both to the term and everywhere inside the term. There are two basic possibilities, we could apply the function to the term before applying it inside or we could apply the function afterwards.
premap :: (forall a. Expression a -> Expression a) -> Expression x -> Expression x
premap f = emap (premap f) . f
postmap :: (forall a. Expression a -> Expression a) -> Expression x -> Expression x
postmap f = f . emap (postmap f)
This gives us two possibilities for how to use step, which I will call shorten and reduce.
shorten = premap step
reduce = postmap step
These behave a little differently. shorten removes the innermost level of terms, replacing them with literals, shortening the height of the expression tree by one. reduce completely evaluates the expression tree to a literal. Here's the result of iterating each of these on the same input
"shorten"
If (And (B True) (Or (B False) (B True))) (Add (I 1) (Mul (I 2) (I 3))) (I 0)
If (And (B True) (B True)) (Add (I 1) (I 6)) (I 0)
If (B True) (I 7) (I 0)
I 7
"reduce"
If (And (B True) (Or (B False) (B True))) (Add (I 1) (Mul (I 2) (I 3))) (I 0)
I 7
Partial reduction
Your question implies that you sometimes expect that expressions can't be reduced completely. I'll extend your example to include something to demonstrate this case, by adding a variable, Var.
data Expression a where
Var :: Expression Int
...
We will need to add support for Var to emap:
emap f x = case x of
Var -> Var
...
bind will replace the variable, and evaluateFor performs a complete evaluation, traversing the expression only once.
bind :: Int -> Expression a -> Expression a
bind a x = case x of
Var -> I a
z -> z
evaluateFor :: Int -> Expression a -> Expression a
evaluateFor a = postmap (step . bind a)
Now reduce iterated on an example containing a variable produces the following output
"reduce"
If (And (B True) (Or (B False) (B True))) (Add (I 1) (Mul Var (I 3))) (I 0)
Add (I 1) (Mul Var (I 3))
If the output expression from the reduction is evaluated for a specific value of Var, we can reduce the expression all the way to a literal.
"evaluateFor 5"
Add (I 1) (Mul Var (I 3))
I 16
Applicative
emap can instead be written in terms of an Applicative Functor, and postmap can be made into a generic piece of code suitable for other data types than expressions. How to do so is described in this answer to rampion's follow-up question.

Why is the type of this function (a -> a) -> a?

Why is the type of this function (a -> a) -> a?
Prelude> let y f = f (y f)
Prelude> :t y
y :: (t -> t) -> t
Shouldn't it be an infinite/recursive type?
I was going to try and put into words what I think it's type should be, but I just can't do it for some reason.
y :: (t -> t) -> ?WTFIsGoingOnOnTheRHS?
I don't get how f (y f) resolves to a value. The following makes a little more sense to me:
Prelude> let y f x = f (y f) x
Prelude> :t y
y :: ((a -> b) -> a -> b) -> a -> b
But it's still ridiculously confusing. What's going on?
Well, y has to be of type (a -> b) -> c, for some a, b and c we don't know yet; after all, it takes a function, f, and applies it to an argument, so it must be a function taking a function.
Since y f = f x (again, for some x), we know that the return type of y must be the return type of f itself. So, we can refine the type of y a bit: it must be (a -> b) -> b for some a and b we don't know yet.
To figure out what a is, we just have to look at the type of the value passed to f. It's y f, which is the expression we're trying to figure out the type of right now. We're saying that the type of y is (a -> b) -> b (for some a, b, etc.), so we can say that this application of y f must be of type b itself.
So, the type of the argument to f is b. Put it all back together, and we get (b -> b) -> b — which is, of course, the same thing as (a -> a) -> a.
Here's a more intuitive, but less precise view of things: we're saying that y f = f (y f), which we can expand to the equivalent y f = f (f (y f)), y f = f (f (f (y f))), and so on. So, we know that we can always apply another f around the whole thing, and since the "whole thing" in question is the result of applying f to an argument, f has to have the type a -> a; and since we just concluded that the whole thing is the result of applying f to an argument, the return type of y must be that of f itself — coming together, again, as (a -> a) -> a.
Just two points to add to other people's answers.
The function you're defining is usually called fix, and it is a fixed-point combinator: a function that computes the fixed point of another function. In mathematics, the fixed point of a function f is an argument x such that f x = x. This already allows you to infer that the type of fix has to be (a -> a) -> a; "function that takes a function from a to a, and returns an a."
You've called your function y, which seems to be after the Y combinator, but this is an inaccurate name: the Y combinator is one specific fixed point combinator, but not the same as the one you've defined here.
I don't get how f (y f) resolves to a value.
Well, the trick is that Haskell is a non-strict (a.k.a. "lazy") language. The calculation of f (y f) can terminate if f doesn't need to evaluate its y f argument in all cases. So, if you're defining factorial (as John L illustrates), fac (y fac) 1 evaluates to 1 without evaluating y fac.
Strict languages can't do this, so in those languages you cannot define a fixed-point combinator in this way. In those languages, the textbook fixed-point combinator is the Y combinator proper.
#ehird's done a good job of explaining the type, so I'd like to show how it can resolve to a value with some examples.
f1 :: Int -> Int
f1 _ = 5
-- expansion of y applied to f1
y f1
f1 (y f1) -- definition of y
5 -- definition of f1 (the argument is ignored)
-- here's an example that uses the argument, a factorial function
fac :: (Int -> Int) -> (Int -> Int)
fac next 1 = 1
fac next n = n * next (n-1)
y fac :: Int -> Int
fac (y fac) -- def. of y
-- at this point, further evaluation requires the next argument
-- so let's try 3
fac (y fac) 3 :: Int
3 * (y fac) 2 -- def. of fac
3 * (fac (y fac) 2) -- def. of y
3 * (2 * (y fac) 1) -- def. of fac
3 * (2 * (fac (y fac) 1) -- def. of y
3 * (2 * 1) -- def. of fac
You can follow the same steps with any function you like to see what will happen. Both of these examples converge to values, but that doesn't always happen.
Let me tell about a combinator. It's called the "fixpoint combinator" and it has the following property:
The Property: the "fixpoint combinator" takes a function f :: (a -> a) and discovers a "fixed point" x :: a of that function such that f x == x. Some implementations of the fixpoint combinator might be better or worse at "discovering", but assuming it terminates, it will produce a fixed point of the input function. Any function that satisfies The Property can be called a "fixpoint combinator".
Call this "fixpoint combinator" y. Based on what we just said, the following are true:
-- as we said, y's input is f :: a -> a, and its output is x :: a, therefore
y :: (a -> a) -> a
-- let x be the fixed point discovered by applying f to y
y f == x -- because y discovers x, a fixed point of f, per The Property
f x == x -- the behavior of a fixed point, per The Property
-- now, per substitution of "x" with "f x" in "y f == x"
y f == f x
-- again, per substitution of "x" with "y f" in the previous line
y f == f (y f)
So there you go. You have defined y in terms of the essential property of the fixpoint combinator:
y f == f (y f). Instead of assuming that y f discovers x, you can assume that x represents a divergent computation, and still come to the same conclusion (iinm).
Since your function satisfies The Property, we can conclude that it is a fixpoint combinator, and that the other properties we have stated, including the type, are applicable to your function.
This isn't exactly a solid proof, but I hope it provides additional insight.

Y Combinator in Haskell

Is it possible to write the Y Combinator in Haskell?
It seems like it would have an infinitely recursive type.
Y :: f -> b -> c
where f :: (f -> b -> c)
or something. Even a simple slightly factored factorial
factMaker _ 0 = 1
factMaker fn n = n * ((fn fn) (n -1)
{- to be called as
(factMaker factMaker) 5
-}
fails with "Occurs check: cannot construct the infinite type: t = t -> t2 -> t1"
(The Y combinator looks like this
(define Y
(lambda (X)
((lambda (procedure)
(X (lambda (arg) ((procedure procedure) arg))))
(lambda (procedure)
(X (lambda (arg) ((procedure procedure) arg)))))))
in scheme)
Or, more succinctly as
(λ (f) ((λ (x) (f (λ (a) ((x x) a))))
(λ (x) (f (λ (a) ((x x) a))))))
For the applicative order
And
(λ (f) ((λ (x) (f (x x)))
(λ (x) (f (x x)))))
Which is just a eta contraction away for the lazy version.
If you prefer short variable names.
Here's a non-recursive definition of the y-combinator in haskell:
newtype Mu a = Mu (Mu a -> a)
y f = (\h -> h $ Mu h) (\x -> f . (\(Mu g) -> g) x $ x)
hat tip
The Y combinator can't be typed using Hindley-Milner types, the polymorphic lambda calculus on which Haskell's type system is based. You can prove this by appeal to the rules of the type system.
I don't know if it's possible to type the Y combinator by giving it a higher-rank type. It would surprise me, but I don't have a proof that it's not possible. (The key would be to identify a suitably polymorphic type for the lambda-bound x.)
If you want a fixed-point operator in Haskell, you can define one very easily because in Haskell, let-binding has fixed-point semantics:
fix :: (a -> a) -> a
fix f = f (fix f)
You can use this in the usual way to define functions and even some finite or infinite data structures.
It is also possible to use functions on recursive types to implement fixed points.
If you're interested in programming with fixed points, you want to read Bruce McAdam's technical report That About Wraps it Up.
The canonical definition of the Y combinator is as follows:
y = \f -> (\x -> f (x x)) (\x -> f (x x))
But it doesn't type check in Haskell because of the x x, since it would require an infinite type:
x :: a -> b -- x is a function
x :: a -- x is applied to x
--------------------------------
a = a -> b -- infinite type
If the type system were to allow such recursive types, it would make type checking undecidable (prone to infinite loops).
But the Y combinator will work if you force it to typecheck, e.g. by using unsafeCoerce :: a -> b:
import Unsafe.Coerce
y :: (a -> a) -> a
y = \f -> (\x -> f (unsafeCoerce x x)) (\x -> f (unsafeCoerce x x))
main = putStrLn $ y ("circular reasoning works because " ++)
This is unsafe (obviously). rampion's answer demonstrates a safer way to write a fixpoint combinator in Haskell without using recursion.
Oh
this wiki page and
This Stack Overflow answer seem to answer my question.
I will write up more of an explanation later.
Now, I've found something interesting about that Mu type. Consider S = Mu Bool.
data S = S (S -> Bool)
If one treats S as a set and that equals sign as isomorphism, then the equation becomes
S ⇋ S -> Bool ⇋ Powerset(S)
So S is the set of sets that are isomorphic to their powerset!
But we know from Cantor's diagonal argument that the cardinality of Powerset(S) is always strictly greater than the cardinality of S, so they are never isomorphic.
I think this is why you can now define a fixed point operator, even though you can't without one.
Just to make rampion's code more readable:
-- Mu :: (Mu a -> a) -> Mu a
newtype Mu a = Mu (Mu a -> a)
w :: (Mu a -> a) -> a
w h = h (Mu h)
y :: (a -> a) -> a
y f = w (\(Mu x) -> f (w x))
-- y f = f . y f
in which w stands for the omega combinator w = \x -> x x, and y stands for the y combinator y = \f -> w . (f w).

Resources