Type signatures that never make sense - haskell

Consider
(a->a) -> [a] -> Bool
Is there any meaningful definition for this signature? That is, a definition that not simply ignores the argument?
x -> [a] -> Bool
It seems there are many such signatures that can be ruled out immediately.

Carsten König suggested in a comment to use the free theorem. Let's try that.
Prime the cannon
We start by generating the free theorem corresponding to the type (a->a) -> [a] -> Bool. This is a property that every function with that type must satisfy, as established by the famous Wadler's paper Theorems for free!.
forall t1,t2 in TYPES, R in REL(t1,t2).
forall p :: t1 -> t1.
forall q :: t2 -> t2.
(forall (x, y) in R. (p x, q y) in R)
==> (forall (z, v) in lift{[]}(R). f_{t1} p z = f_{t2} q v)
lift{[]}(R)
= {([], [])}
u {(x : xs, y : ys) | ((x, y) in R) && ((xs, ys) in lift{[]}(R))}
An example
To better understand the theorem above, let's run over a concrete example. To use the theorem, we need to take any two types t1,t2, so we can pick t1=Bool and t2=Int.
Then we need to choose a function p :: Bool -> Bool (say p=not), and a function q :: Int -> Int (say q = \x -> 1-x).
Now, we need to define a relation R between Bools and Ints. Let's take the standard boolean
<->integer correspondence, i.e.:
R = {(False,0),(True,1)}
(the above is a one-one correspondence, but it does not have to be, in general).
Now we need to check that (forall (x, y) in R. (p x, q y) in R). We only have two cases to check for (x,y) in R:
Case (x,y) = (False,0): we verify that (not False, 1-0) = (True, 1) in R (ok!)
Case (x,y) = (True ,1): we verify that (not True , 1-1) = (False,0) in R (ok!)
So far so good. Now we need to "lift" the relation so to work on lists: e.g.
[True,False,False,False] is in relation with [1,0,0,0]
This extended relation is the one named lift{[]}(R) above.
Finally, the theorem states that, for any function f :: (a->a) -> [a] -> Bool we must have
f_Bool not [True,False,False,False] = f_Int (\x->1-x) [1,0,0,0]
where above f_Bool simply makes it explicit that f is used in the specialised case in which a=Bool.
The power of this lies in that we do not know what the code of f actually is. We are deducing what f must satisfy by only looking at its polymorphic type.
Since we get types from type inference, and we can turn types into theorems, we really get "theorems for free!".
Back to the original goal
We want to prove that f does not use its first argument, and that it does not care about its second list argument, either, except for its length.
For this, take R be the universally true relation. Then, lift{[]}(R) is a relation which relates two lists iff they have the same length.
The theorem then implies:
forall t1,t2 in TYPES.
forall p :: t1 -> t1.
forall q :: t2 -> t2.
forall z :: [t1].
forall v :: [t2].
length z = length v ==> f_{t1} p z = f_{t2} q v
Hence, f ignores the first argument and only cares about the length of the second one.
QED

You can't do anything interesting with x on it's own.
You can do stuff with [x]; for example, you can count how many nodes are in the list. So, for example,
foo :: (a -> a) -> [a] -> Bool
foo _ [] = True
foo _ (_:_) = False
bar :: x -> [a] -> Bool
bar _ [] = True
bar _ (_:_) = False
If you have an x and a function that turns an x into something else, you can do interesting stuff:
big :: (x -> Int) -> x -> Bool
big f n = if f n > 10 then True else False
If x belongs to some type class, then you can use all the methods of that class on it. (This is really a special-case of the previous one.)
double :: Num x => x -> x
double = (2*)
On the other hand, there are plenty of type signatures for which no valid functions exist:
magic :: x -> y
magic = -- erm... good luck with that!
I read somewhere that the type signatures involving only variables for which a real function exists are exactly the logical theorems that are true. (I don't know the name for this property, but it's quite interesting.)
f1 :: (x -> y) -> x -> y
-- Given that X implies Y, and given that X is true, then Y is true.
-- Well, duh.
f2 :: Either (x -> y) (x -> z) -> x -> Either y z
-- Given that X implies Y or X implies Z, and given X, then either Y or Z is true.
-- Again, duh.
f3 :: x -> y
-- Given that X is true, then any Y is true.
-- Erm, no. Just... no.

Related

Why `f x = x x` and `g x = x x x x x` have the same type

I'm playing with Rank-N-type and trying to type x x. But I find it counter intuitive that these two functions can be typed in the same way.
f :: (forall a b. a -> b) -> c
f x = x x
g :: (forall a b. a -> b) -> c
g x = x x x x x
I have also noticed that something like f x = x x ... x x (many xs) still has the same type.
Can anyone explain why it is the case?
The key is that x :: a -> b is a function that can provide a value of any type, no matter what argument is given. That means x can be applied to itself, and the result can be applied to x again, and so on and so on.
At least, that's what it promises the type checker it can do. The type checker isn't concerned about whether or not any such value exists, only that the types align. Neither f nor g can actually be called, because no such value of type a -> b exists (ignoring bottom and unsafeCoerce).
A simpler example
This is a phenomenon that can be observed whenever we use a variable which has a polymorphic type (like your x). The identity function id is perhaps the most famous example.
id :: forall a . a -> a
Here, all these expressions type check, and have type Int -> Int:
id :: Int -> Int
id id :: Int -> Int
id id id :: Int -> Int
id id id id :: Int -> Int
...
How is that possible? Well, the crux is that each time we write id we are actually meaning "the identity function on some unknown type a that should be inferred from the context". Crucially, each use of id has its own a.
Let's write id #T to mean the specific identity function on type T.
Writing
id :: Int -> Int
actually means
id #Int :: Int -> Int
which is straightforward. Instead, writing
id id :: Int -> Int
actually means
id #(Int -> Int) (id #Int) :: Int -> Int
where the first id now refers to the function space Int -> Int! And, of course,
id id id :: Int -> Int
means
(id #((Int -> Int) -> (Int -> Int))) (id #(Int -> Int)) (id #Int) :: Int -> Int
And so on. We do not realize that types get that messy since Haskell infers those for us.
The specific case
In your specific case,
g :: (forall a b. a -> b) -> c
g x = x x x x x
we can make that type check in many ways. A possible way is to define A ~ Int, B ~ Bool, T ~ (A -> B) and then infer:
g x = x #T #(T -> T -> T -> c) (x #A #B) (x #A #B) (x #A #B) (x #A #B)
I suggest to spend some time to realize that everything type checks. (Moreover our choices of A and B are completely arbitrary, and we could use any other types there. We could even use distinct As and Bs for each x, as long as the first x is suitably instantiated!)
It is then obvious that such inference is also possible even when x x x ... is a longer sequence.
This shouldn't really be any more surprising than the fact that
m :: (∀ a . a) -> (∀ a . a) -> (Int, Bool)
m p q = (p, q)
has the same type as
n :: (∀ a . a) -> (∀ a . a) -> (Int, Bool)
n p q = (q, p)
Much like in your example, this works because the universally-quantified argument can be used in lots of different way, with the compiler in each case choosing an appropriate type and enforcing x to act as having that type.
This is actually a rather contrived situation because types like ∀ a . a or ∀ a b . a->b are uninhabited (modulo ⊥), so you would never actually be able to use a RankN function with such an argument; pragmatically you wouldn't even write it either then!
Practically useful RankN functions usually impose some extra structure or typeclass constraint in their arguments, like
foo :: (∀ a . [a] -> [a]) -> ...
or
qua :: (∀ n . Num n => Int -> n -> n) -> ...

Two functions seem equal but different in Haskell

I'm trying to implement boolean without Prelude in Haskell.
When expression, beq true true "TRUE" "FALSE" is evaluated, it's okay. But when I try to evaluate beq' true true "TRUE" "FALSE", it fails by some difference between expected type and actual type.
This is the code.
import qualified Prelude as P
i = \x -> x
k = \x y -> x
ki = k i
true = k
false = ki
not = \p -> p false true
beq = \p q -> p (q true false) (q false true)
beq' = \p q -> p q (not q)
So I checked inferred types of theses.
*Main> :type beq
beq
:: (t1 -> t1 -> t2)
-> ((p1 -> p1 -> p1) -> (p2 -> p2 -> p2) -> t1) -> t2
*Main> :type beq'
beq'
:: (((p1 -> p2 -> p2) -> (p3 -> p4 -> p3) -> t1) -> t1 -> t2)
-> ((p1 -> p2 -> p2) -> (p3 -> p4 -> p3) -> t1) -> t2
And it was not equal.
Here are the questions.
I thought it has the same type signature because beq and beq' seemingly make the same result when it folded and substituted. Like there are many ways to implement one function. But it wasn't. Are there some secret rules and syntax in Haskell?
If I want to write beq with the function not, how can I make it works?
How to fix the encoding
Church encodings work very well in an untyped calculus.
When types are added, things get more complicated. With simple types, for instance, the encodings are lost. With polymorphism they can be recovered, if higher ranks are supported. Note that type inference can't work well with higher types, so some explicit type annotation is needed.
For example, your not should be written as:
{-# LANGUAGE RankNTypes #-}
type ChBool = forall a. a -> a -> a
not :: ChBool -> ChBool
not f x y = f y x
It is important that boolean values are modeled as polymorphic functions, since otherwise they can only be used on a single type, making many examples fail. For instance, consider
foo :: Bool -> (Int, String)
foo b = (b 3 2, b "aa" "bb")
Here b needs to be used twice, once on Ints and once on Strings. If Bool is not polymorphic this will fail.
Why beta-reduction changes the inferred type
Further, you seem to be convinced that you can beta-reduce Haskell expressions and the inferred type before and after the reduction must be the same. That's not true, in general, as you discovered in your experiments. To see why, here's a simple example:
id1 x = x
The inferred type here is id1 :: forall a. a -> a, obviously. Consider instead this variant:
id2 x = (\ _ -> x) e
Note that id2 beta-reduces to id1, whatever e might be. By choosing e carefully, though, we can restrict the type of x. E.g. let's choose e = x "hello"
id2 x = (\ _ -> x) (x "hello")
Now, the inferred type is id2 :: forall b. (String -> b) -> String -> b since x can only be a String-accepting function. It does not matter that e will not be evaluated, the type inference algorithm will make that well-typed anyway. This makes the inferred type of id2 differ from the one of id1.

How do I use the Church encoding for Free Monads?

I've been using the Free datatype in Control.Monad.Free from the free package. Now I'm trying to convert it to use F in Control.Monad.Free.Church but can't figure out how to map the functions.
For example, a simple pattern matching function using Free would look like this -
-- Pattern match Free
matchFree
:: (a -> r)
-> (f (Free f a) -> r)
-> Free f a
-> r
matchFree kp _ (Pure a) = kp a
matchFree _ kf (Free f) = kf f
I can easily convert it to a function that uses F by converting to/from Free -
-- Pattern match F (using toF and fromF)
matchF
:: Functor f
=> (a -> r)
-> (f (F f a) -> r)
-> F f a
-> r
matchF kp kf = matchF' . fromF
where
matchF' (Pure a) = kp a
matchF' (Free f) = kf (fmap toF f)
However I can't figure out how to get it done without using toF and fromF -
-- Pattern match F (without using toF)???
-- Doesn't compile
matchF
:: Functor f
=> (a -> r)
-> (f (F f a) -> r)
-> F f a
-> r
matchF kp kf f = f kp kf
There must be a general pattern I am missing. Can you help me figure it out?
You asked for the "general pattern you are missing". Let me give my own attempt at explaining it, though Petr Pudlák's answer is also pretty good. As user3237465 says, there are two encodings that we can use, Church and Scott, and you're using Scott rather than Church. So here's the general review.
How encodings work
By continuation passing, we can describe any value of type x by some unique function of type
data Identity x = Id { runId :: x }
{- ~ - equivalent to - ~ -}
newtype IdentityFn x = IdFn { runIdFn :: forall z. (x -> z) -> z }
The "forall" here is very important, it says that this type leaves z as an unspecified parameter. The bijection is that Id . ($ id) . runIdFn goes from IdentityFn to Identity while IdFn . flip ($) . runId goes the other way. The equivalence comes because there is essentially nothing one can do with the type forall z. z, no manipulations are sufficiently universal. We can equivalently state that newtype UnitFn = UnitFn { runUnitFn :: forall z. z -> z } has only one element, namely UnitFn id, which means that it corresponds to the unit type data Unit = Unit in a similar way.
Now the currying observation that (x, y) -> z is isomorphic to x -> y -> z is the tip of a continuation-passing iceberg which allows us to represent data structures in terms of pure functions, with no data structures, because clearly the type Identity (x, y) is equivalent therefore to forall z. (x -> y -> z) -> z. So "gluing" together two items is the same as creating a value of this type, which just uses pure functions as "glue".
To see this equivalence, we have to just handle two other properties.
The first is sum-type constructors, in the form of Either x y -> z. See, Either x y -> z is isomorphic to
newtype EitherFn x y = EitherFn { runEitherFn :: forall z. (x -> z) -> (y -> z) -> z }
from which we get the basic idea of the pattern:
Take a fresh type variable z that does not appear in the body of the expression.
For each constructor of the data type, create a function-type which takes all of its type-arguments as parameters, and returns a z. Call these "handlers" corresponding to the constructors. So the handler for (x, y) is (x, y) -> z which we curry to x -> y -> z, and the handlers for Left x | Right y are x -> z and y -> z. If there are no parameters, you can just take a value z as your function rather than the more cumbersome () -> z.
Take all of those handlers as parameters to an expression forall z. Handler1 -> Handler2 -> ... -> HandlerN -> z.
One half of the isomorphism is basically just to hand the constructors in as the desired handlers; the other pattern-matches on the constructors and applies the correponding handlers.
Subtle missing things
Again, it's fun to apply these rules to various things; for example as I noted above, if you apply this to data Unit = Unit you find that any unit type is the identity function forall z. z -> z, and if you apply this to data Bool = False | True you find the logic functions forall z. z -> z -> z where false = const while true = const id. But if you do play with it you will notice that something's missing still. Hint: if we look at
data List x = Nil | Cons x (List x)
we see that the pattern should look like:
data ListFn x = ListFn { runListFn :: forall z. z -> (x -> ??? -> z) -> z }
for some ???. The above rules don't pin down what goes there.
There are two good options: either we use the power of the newtype to its fullest to put ListFn x there (the "Scott" encoding), or we can preemptively reduce it with the functions we've been given, in which case it becomes a z using the functions that we already have (the "Church" encoding). Now since the recursion is already being performed for us up-front, the Church encoding is only perfectly equivalent for finite data structures; the Scott encoding can handle infinite lists and such. It can also be hard to understand how to encode mutual recursion in the Church form whereas the Scott form is usually a little more straightforward.
Anyway, the Church encoding is a little harder to think about, but a little more magical because we get to approach it with wishful thinking: "assume that this z is already whatever you're trying to accomplish with tail list, then combine it with head list in the appropriate way." And this wishful thinking is precisely why people have trouble understanding foldr, as the one side of this bijection is precisely the foldr of the list.
There are some other problems like "what if, like Int or Integer, the number of constructors is big or infinite?". The answer to this particular question is to use the functions
data IntFn = IntFn { runIntFn :: forall z. (z -> z) -> z -> z }
What is this, you ask? Well, a smart person (Church) has worked out that this is a way to represent integers as the repetition of composition:
zero f x = x
one f x = f x
two f x = f (f x)
{- ~ - increment an `n` to `n + 1` - ~ -}
succ n f = f . n f
Actually on this account m . n is the product of the two. But I mention this because it is not too hard to insert a () and flip arguments around to find that this is actually forall z. z -> (() -> z -> z) -> z which is the list type [()], with values given by length and addition given by ++ and multiplication given by >>.
For greater efficiency, you might Church-encode data PosNeg x = Neg x | Zero | Pos x and use the Church encoding (keeping it finite!) of [Bool] to form the Church encoding of PosNeg [Bool] where each [Bool] implicitly ends with an unstated True at its most-significant bit at the end, so that [Bool] represents the numbers from +1 to infinity.
An extended example: BinLeaf / BL
One more nontrivial example, we might think about the binary tree which stores all of its information in leaves, but also contains annotations on the internal nodes: data BinLeaf a x = Leaf x | Bin a (BinLeaf a x) (BinLeaf a x). Following the recipe for Church encoding we do:
newtype BL a x = BL { runBL :: forall z. (x -> z) -> (a -> z -> z -> z) -> z}
Now instead of Bin "Hello" (Leaf 3) (Bin "What's up?" (Leaf 4) (Leaf 5) we construct instances in lowercase:
BL $ \leaf bin -> bin "Hello" (leaf 3) (bin "What's up?" (leaf 4) (leaf 5)
The isomorphism is thus very easy one way: binleafFromBL f = runBL f Leaf Bin. The other side has a case dispatch, but is not too bad.
What about recursive algorithms on the recursive data? This is where it gets magical: foldr and runBL of Church encoding have both run whatever our functions were on the subtrees before we get to the trees themselves. Suppose for example that we want to emulate this function:
sumAnnotate :: (Num n) => BinLeaf a n -> BinLeaf (n, a) n
sumAnnotate (Leaf n) = Leaf n
sumAnnotate (Bin a x y) = Bin (getn x' + getn y', a) x' y'
where x' = sumAnnotate x
y' = sumAnnotate y
getn (Leaf n) = n
getn (Bin (n, _) _ _) = n
What do we have to do?
-- pseudo-constructors for BL a x.
makeLeaf :: x -> BL a x
makeLeaf x = BL $ \leaf _ -> leaf x
makeBin :: a -> BL a x -> BL a x -> BL a x
makeBin a l r = BL $ \leaf bin -> bin a (runBL l leaf bin) (runBL r leaf bin)
-- actual function
sumAnnotate' :: (Num n) => BL a n -> BL n n
sumAnnotate' f = runBL f makeLeaf (\a x y -> makeBin (getn x + getn y, a) x y) where
getn t = runBL t id (\n _ _ -> n)
We pass in a function \a x y -> ... :: (Num n) => a -> BL (n, a) n -> BL (n, a) n -> BL (n, a) n. Notice that the two "arguments" are of the same type as the "output" here. With Church encoding, we have to program as if we've already succeeded -- a discipline called "wishful thinking".
The Church encoding for the Free monad
The Free monad has normal form
data Free f x = Pure x | Roll f (Free f x)
and our Church encoding procedure says that this becomes:
newtype Fr f x = Fr {runFr :: forall z. (x -> z) -> (f z -> z) -> z}
Your function
matchFree p _ (Pure x) = p x
matchFree _ f (Free x) = f x
becomes simply
matchFree' p f fr = runFr fr p f
Let me describe the difference for a simpler scenario - lists. Let's focus on how one can consume lists:
By a catamorphism, which essentially means that we can express it using
foldr :: (a -> r -> r) -> r -> [a] -> r
As we can see, the folding functions never get hold of the list tail, only its processed value.
By pattern matching we can do somewhat more, in particular we can construct a generalized fold of type
foldrGen :: (a -> [a] -> r) -> r -> [a] -> r
It's easy to see that one can express foldr using foldrGen. However, as foldrGen isn't recursive, this expression involves recursion.
To generalize both concepts, we can introduce
foldrPara :: (a -> ([a], r) -> r) -> r -> [a] -> r
which gives the consuming function even more power: Both the reduced value of the tail, as well as the tail itself. Clearly this is more generic than both previous ones. This corresponds to a paramorphism which “eats its argument and keeps it too”.
But it's also possible to do it the other way round. Even though paramorphisms are more general, they can be expressed using catamorphisms (at some overhead cost) by re-creating the original structure on the way:
foldrPara :: (a -> ([a], r) -> r) -> r -> [a] -> r
foldrPara f z = snd . foldr f' ([], z)
where
f' x t#(xs, r) = (x : xs, f x t)
Now Church-encoded data structures encode the catamorphism pattern, for lists it's everything that can be constructed using foldr:
newtype List a = L (forall r . r -> (a -> r -> r) -> r)
nil :: List a
nil = L $ \n _ -> n
cons :: a -> List a -> List a
cons x (L xs) = L $ \n c -> c x (xs n c)
fromL :: List a -> [a]
fromL (L f) = f [] (:)
toL :: [a] -> List a
toL xs = L (\n c -> foldr c n xs)
In order to see the sub-lists, we have take the same approach: re-create them on the way:
foldrParaL :: (a -> (List a, r) -> r) -> r -> List a -> r
foldrParaL f z (L l) = snd $ l (nil, z) f'
where
f' x t#(xs, r) = (x `cons` xs, f x t)
This applies generally to Church-encoded data structures, like to the encoded free monad. They express catamorphisms, that is folding without seeing the parts of the structure, only with the recursive results. To get hold of sub-structures during the process, we need to recreate them on the way.
Your
matchF
:: Functor f
=> (a -> r)
-> (f (F f a) -> r)
-> F f a
-> r
looks like the Scott-encoded Free monad. The Church-encoded version is just
matchF
:: Functor f
=> (a -> r)
-> (f r -> r)
-> F f a
-> r
matchF kp kf f = runF f kp kf
Here are Church- and Scott-encoded lists for comparison:
newtype Church a = Church { runChurch :: forall r. (a -> r -> r) -> r -> r }
newtype Scott a = Scott { runScott :: forall r. (a -> Scott a -> r) -> r -> r }
It's a bit of a nasty one. This problem is a more general version of a puzzle everyone struggles with the first time they're exposed to it: defining the predecessor of a natural number encoded as a Church numeral (think: Nat ~ Free Id ()).
I've split my module into a lot of intermediate definitions to highlight the solution's structure. I've also uploaded a self-contained gist for ease of use.
I start with nothing exciting: redefining F given that I don't have this package installed at the moment.
{-# LANGUAGE Rank2Types #-}
module MatchFree where
newtype F f a = F { runF :: forall r. (a -> r) -> (f r -> r) -> r }
Now, even before considering pattern-matching, we can start by defining the counterpart of the usual datatype's constructors:
pureF :: a -> F f a
pureF a = F $ const . ($ a)
freeF :: Functor f => f (F f a) -> F f a
freeF f = F $ \ pr fr -> fr $ fmap (\ inner -> runF inner pr fr) f
Next, I'm introducing two types: Open and Close. Close is simply the F type but Open corresponds to having observed the content of an element of F f a: it's Either a pure a or an f (F f a).
type Open f a = Either a (f (F f a))
type Close f a = F f a
As hinted by my hand-wavy description, these two types are actually equivalent and we can indeed write functions converting back and forth between them:
close :: Functor f => Open f a -> Close f a
close = either pureF freeF
open :: Functor f => Close f a -> Open f a
open f = runF f Left (Right . fmap close)
Now, we can come back to your problem and the course of action should be pretty clear: open the F f a and then apply either kp or kf depending on what we got. And it indeed works:
matchF
:: Functor f
=> (a -> r)
-> (f (F f a) -> r)
-> F f a
-> r
matchF kp kf = either kp kf . open
Coming back to the original comment about natural numbers: predecessor implemented using Church numeral is linear in the size of the natural number when we could reasonably expect a simple case analysis to be constant time. Well, just like for natural numbers, this case analysis is pretty expensive because, as show by the use of runF in the definition of open, the whole structure is traversed.

Representing Integers as Functions (Church Numerals?)

Given the following function definition and assuming similar definitions for all positive integers give the type definition and code for a function called plus that will take as arguments two such functions representing integers and return a function that represents the sum of the two input integers. E.g. (plus one two) should evaluate to a function that takes two arguments f x and returns (f(f(f x))).
one f x = f x
two f x = f (f x)
three f x = f (f (f x)))
etc.
I am new to functional programming and I can't get my head around this. I firstly don't know how I can define the functions for all the positive integers without writing them out (which is obviously impossible). As in, if I have plus(sixty, forty), how can my function recognize that sixty is f applied 60 times to x?
I am meant to be writing this in Miranda, but I am more familiar with Haskell, so help for either is welcome.
Apply equational reasoning1, and abstraction. You have
one f x = f x -- :: (a -> b) -> a -> b
two f x = f (f x) -- = f (one f x) -- :: (a -> a) -> a -> a
three f x = f (f (f x)) -- = f (two f x) -- :: (a -> a) -> a -> a
-- ~~~~~~~~~~~
Thus, a successor function next is naturally defined, so that three = next two. Yes, it is as simple as writing next two instead of three in the equation above:
next :: ((b -> c) -> a -> b) -> (b -> c) -> a -> c
-- three f x = next two f x = f (two f x) -- `two` is a formal parameter
-- ~~~~~~~~~~~
next num f x = f (num f x) -- generic name `num`
zero :: t -> a -> a
zero f x = x
This captures the pattern of succession. f will be used as a successor function, and x as zero value. The rest follows. For instance,
plus :: (t -> b -> c) -> (t -> a -> b) -> t -> a -> c
plus two one f x = two f (one f x) -- formal parameters two, one
-- = f (f (one f x)) -- an example substitution
-- = f (f (f x) -- uses the global definitions
-- = three f x -- for one, two, three
i.e. one f x will be used as a zero value by two (instead of the "usual" x), thus representing three. A "number" n represents a succession of n +1 operations.
The above, again, actually defines the general plus operation because two and one are just two formal function parameters:
Prelude> plus three two succ 0 -- built-in `succ :: Enum a => a -> a`
5
Prelude> :t plus three two
plus three two :: (a -> a) -> a -> a
Prelude> plus three two (1:) [0]
[1,1,1,1,1,0]
The key thing to gasp here is that a function is an object that will produce a value, when called. In itself it's an opaque object. The "observer" arguments that we apply to it, supply the "meaning" for what it means to be zero, or to find a successor, and thus define what result is produced when we make an observation of a number's value.
1i.e. replace freely in any expression the LHS with the RHS of a definition, or the RHS with the LHS, as you see fit (up to the variables renaming of course, to not capture/shadow the existing free variables).
To convert a number to a numeral you can use something like:
type Numeral = forall a . (a -> a) -> (a -> a)
toChurch :: Int -> Numeral
toChurch 0 _ x = x
toChurch n f x = f $ toChurch (pred n) f x
fromChurch :: Numeral -> Int
fromChurch numeral = numeral succ 0
You don't need to recognize how many times the function is calling f. For example, to implement succ, which adds 1 to a Church numeral, you can do something like this:
succ n f x = f (n f x)
Then you first use n to apply f however many times it needs to, and then you do the final f yourself. You could also do it the other way round, and first apply f once yourself and then let n do the rest.
succ n f x = n f (f x)
You can use a similar technique to implement plus.

Why do we use folds to encode datatypes as functions?

Or to be specific, why do we use foldr to encode lists and iteration to encode numbers?
Sorry for the longwinded introduction, but I don't really know how to name the things I want to ask about so I'll need to give some exposition first. This draws heavily from this C.A.McCann post that just not quite satisfies my curiosity and I'll also be handwaving the issues with rank-n-types and infinite lazy things.
One way to encode datatypes as functions is to create a "pattern matching" function that receives one argument for each case, each argument being a function that receives the values corresponding to that constructor and all arguments returning a same result type.
This all works out as expected for non-recursive types
--encoding data Bool = true | False
type Bool r = r -> r -> r
true :: Bool r
true = \ct cf -> ct
false :: Bool r
false = \ct cf -> cf
--encoding data Either a b = Left a | Right b
type Either a b r = (a -> r) -> (b -> r) -> r
left :: a -> Either a b r
left x = \cl cr -> cl x
right :: b -> Either a b r
right y = \cl cr -> cr y
However, the nice analogy with pattern matching breaks down with recursive types. We might be tempted to do something like
--encoding data Nat = Z | S Nat
type RecNat r = r -> (RecNat -> r) -> r
zero = \cz cs -> cz
succ n = \cz cs -> cs n
-- encoding data List a = Nil | Cons a (List a)
type RecListType a r = r -> (a -> RecListType -> r) -> r
nil = \cnil ccons -> cnil
cons x xs = \cnil ccons -> ccons x xs
but we can't write those recursive type definitions in Haskell! The usual solution is to force the callback of the cons/succ case to be applied to all levels of recursion instead of just the first one (ie, writing a fold/iterator). In this version we use the return type r where the recursive type would be:
--encoding data Nat = Z | S Nat
type Nat r = r -> (r -> r) -> r
zero = \cz cf -> cz
succ n = \cz cf -> cf (n cz cf)
-- encoding data List a = Nil | Cons a (List a)
type recListType a r = r -> (a -> r -> r) -> r
nil = \z f -> z
cons x xs = \z f -> f x (xs z f)
While this version works, it makes defining some functions much harder. For example, writing a "tail" function for lists or a "predecessor" function for numbers is trivial if you can use pattern matching but gets tricky if you need to use the folds instead.
So onto my real questions:
How can we be sure that the encoding using folds is as powerful as the hypothetical "pattern matching encoding"? Is there a way to take an arbitrary function definition via pattern matching and mechanically convert it to one using only folds instead? (If so, this would also help make tricky definitions such as tail or foldl in terms of foldr as less magical)
Why doesn't the Haskell type system allow for the recursive types needed in the "pattern matching" encoding?. Is there a reason for only allowing recursive types in datatypes defined via data? Is pattern matching the only way to consume recursive algebraic datatypes directly? Does it have to do with the type inferencing algorithm?
Given some inductive data type
data Nat = Succ Nat | Zero
we can consider how we pattern match on this data
case n of
Succ n' -> f n'
Zero -> g
it should be obvious that every function of type Nat -> a can be defined by giving an appropriate f and g and that the only ways to make a Nat (baring bottom) is using one of the two constructors.
EDIT: Think about f for a moment. If we are defining a function foo :: Nat -> a by giving the appropriate f and g such that f recursively calls foo than we can redefine f as f' n' (foo n') such that f' is not recursive. If the type a = (a',Nat) than we can instead write f' (foo n). So, without loss of generality
foo n = h $ case n
Succ n' -> f (foo n)
Zero -> g
this is the formulation that makes the rest of my post make sense:
So, we can thus think about the case statement as applying a "destructor dictionary"
data NatDict a = NatDict {
onSucc :: a -> a,
onZero :: a
}
now our case statement from before can become
h $ case n of
Succ n' -> onSucc (NatDict f g) n'
Zero -> onZero (NatDict f g)
given this we can derive
newtype NatBB = NatBB {cataNat :: forall a. NatDict a -> a}
we can then define two functions
fromBB :: NatBB -> Nat
fromBB n = cataNat n (NatDict Succ Zero)
and
toBB :: Nat -> NatBB
toBB Zero = Nat $ \dict -> onZero dict
toBB (Succ n) = Nat $ \dict -> onSucc dict (cataNat (toBB n) dict)
we can prove these two functions are witness to an isomorphism (up to fast and lose reasoning) and thus show that
newtype NatAsFold = NatByFold (forall a. (a -> a) -> a -> a)
(which is just the same as NatBB) is isomorphic to Nat
We can use the same construction with other types, and prove that the resulting function types are what we want just by proving that the underlying types are isomorphic with algebraic reasoning (and induction).
As to your second question, Haskell's type system is based on iso-recursive not equi-recursive types. This is probably becuase the theory and type inference is easier to work out with iso-recursive types, and they have all the power they just impose a little more work on the programmers part. I like to claim that you can get your iso-recursive types without any overhead
newtype RecListType a r = RecListType (r -> (a -> RecListType -> r) -> r)
but apparently GHCs optimizer chokes on those sometimes :(.
The Wikipedia page on Scott encoding has some useful insights. The short version is, what you're referring to is the Church encoding, and your "hypothetical pattern-match encoding" is the Scott encoding. Both are sensible ways of doing things, but the Church encoding requires lighter type machinery to use (in particular, it does not require recursive types).
The proof that the two are equivalent uses the following idea:
churchfold :: (a -> b -> b) -> b -> [a] -> b
churchfold _ z [] = z
churchfold f z (x:xs) = f x (churchfold f z xs)
scottfold :: (a -> [a] -> b) -> b -> [a] -> b
scottfold _ z [] = z
scottfold f _ (x:xs) = f x xs
scottFromChurch :: (a -> [a] -> b) -> b -> [a] -> b
scottFromChurch f z xs = fst (churchfold g (z, []) xs)
where
g x ~(_, xs) = (f x xs, x : xs)
The idea is that since churchfold (:) [] is the identity on lists, we can use a Church fold that produces the list argument it is given as well as the result it is supposed to produce. Then in the chain x1 `f` (x2 `f` (... `f` xn) ... ) the outermost f receives a pair (y, x2 : ... : xn : []) (for some y we don't care about), so returns f x1 (x2 : ... : xn : []). Of course, it also has to return x1 : ... : xn : [] so that any more applications of f could also work.
(This is actually a little similar to the proof of the mathematical principle of strong (or complete) induction, from the "weak" or usual principle of induction).
By the way, your Bool r type is a bit too big for real Church booleans – e.g. (+) :: Bool Integer, but (+) isn't really a Church boolean. If you enable RankNTypes then you can use a more precise type: type Bool = forall r. r -> r -> r. Now it is forced to be polymorphic, so genuinely only contains two (ignoring seq and bottom) inhabitants – \t _ -> t and \_ f -> f. Similar ideas apply to your other Church types, too.

Resources