Lambda calculus beta reduction specific steps and why - haskell

From a book I'm recently reading:
First:
𝜆𝑧.(𝜆𝑚.𝜆𝑛.𝑚)(𝑧)((𝜆𝑝.𝑝)𝑧)
The outermost lambda binding 𝑧 is, at this point, irreducible because it has no argument to apply to. What remains is to go inside the terms one layer at a time until we find something reducible.
Next:
𝜆𝑧.(𝜆𝑛.𝑧)((𝜆𝑝.𝑝)𝑧)
We can apply the lambda binding 𝑚 to the argument 𝑧. We keep searching for terms we can apply. The next thing we can apply is the lambda binding 𝑛 to the lambda term ((𝜆𝑝.𝑝)𝑧).
I don't get it. In the first section, it says 𝜆𝑧 has no argument to apply to, that I can probably understand, but then at the Next section I think the z can be bound to ((𝜆𝑝.𝑝)𝑧) because as you can see, 𝜆𝑧.(𝜆𝑛.𝑧), the body of 𝜆𝑧 clearly has a z argument which can be bound. But the book just ignored the head 𝜆𝑧 and directly bound n to ((𝜆𝑝.𝑝)𝑧). I mean 𝜆𝑛.𝑧 doesn't have an n argument, why does it get to be bound?
Can somebody explain to me about this?

Using normal order evaluation, you can get the answer in two beta reductions
// beta reduction 1
λz.(λm.λn.m)(z)((λp.p)z) →β (λn.m) [m := z]
λz.(λm.λn.z)(z)((λp.p)z)
// beta reduction 2
λz.(λn.z)((λp.p)z) →β z [n := ((λp.p)z)]
λz.(λn.z)((λp.p)z)
// result
λz.z
The second reduction might seem tricky because n is bound to ((λp.p)z) but the expression is just z, so n is thrown away.
Using applicative order evaluation, it takes one extra step
// beta reduction 1
λz.(λm.λn.m)(z)((λp.p)z) →β p [p := z]
λz.(λm.λn.m)(z)((λp.z)z)
// beta reduction 2
λz.(λm.λn.m)(z)(z) →β (λn.m) [m := z]
λz.(λm.λn.z)(z)(z)
// beta reduction 3
λz.(λn.z)(z) →β z [n := z]
λz.(λn.z)(z)
// result
λz.z
In this scenario, whether we use normal order evaluation or applicative order evaluation, the result is the same. Differing evaluation strategies sometimes evaluate to different results.
An important note, the reduction steps we've done above will not take place until λz is applied (depending on the implementation). In the example code you have provided, λz is never applied, therefore simplifying λz's term is just for exercise, in this case.
All we've done is demonstrate lambda equivalence (under two different evaluation strategies)
λz.(λm.λn.m)(z)((λp.p)z) <=> λz.z

Lambda notation is a bit weird to parse. IMO Haskell syntax is clearer:
\z -> (\m -> \n -> m) z ((\p -> p) z)
or, even more explicit
\z -> (
(
( \m -> (\n -> m) )
z
)
( (\p -> p) z )
)
The first reduction step is
\z -> (
(
(\n -> z)
)
( (\p -> p) z )
)
or
\z -> (
(\n -> z)
( (\p -> p) z )
)
then you can indeed bind ((\p -> p) z) – not to z but to n! (Which isn't actually used at all though.)
\z -> (
(z)
)
or simply \z -> z. So, we still have that z lambda, which as the book said is irredicible. We just don't have anything else!
...I'm not sure if that was actually your question. If it was rather: why must we not first see if we can reduce ((\p -> p) z), then the answer is, I think, lambda calculus as such doesn't define this at all (it just defines what transformation you can apply, not in which order you should do it.Actually I'm not sure about this, correct me if I'm wrong). In a strict language like Scheme, you would indeed first reduce ((\p -> p) z); Haskell wouldn't do that since there's no need. Either way, it doesn't really matter, because the result is discarded anyway.
( \z -> (\n -> z) ((\p -> p) z) )
≡ ( \z -> (\n -> z) z )
≡ ( \z -> (\n -> z) foo )
≡ ( \z -> z )

The key is that n is never used in the abstraction. n is still bound to ((λp.p)z), but it is immediately discarded.
λz.(λm.λn.m)(z)((λp.p)z) = λz.(λn.z)((λp.p)z) Replacing m with z
= λz.z Replacing n with ((λp.p)z)

Related

Doing greatest common divisor function with Haskell using until

I need help proggraming a function in Haskell to calculate the greatest common divisor. The problem is I need to use the until function which I'm not being able to implement. I tried using the following code (that didn't work):
mygcd a b = until (==0) (`mod` (a b)) b
Thank for your help!
I would try something like this (pseudocode)
mygcd a b = finalize (until endState nextState (a,b))
where
finalize (x,y) = ...
endState (x,y) = some condition here
nextState (x,y) = (x',y') computed in some way using mod
mygcd a b = until (==0) (`mod` (a b)) b
> :t until
until :: (a -> Bool) -> (a -> a) -> a -> a
So we have our predicate (==0), our function (`mod`(a b)), and starting value b. My first issue with this, however: this won't terminate until the predicate holds true, so it can never produce anything but 0. What does that tell us? We also see that a must be of type Integral a => a -> a since it's applied to b to get something we can use in b `mod` x. And repeating that won't yield a different result, so the expression either produces 0 immediately or after one mod, or never finishes.
I think, to get something useful out of until, you need a more complicated predicate than an equality check. Perhaps a question like "what's the lowest multiple of 13 over 100", or you might make a a tuple and check one part of it. It looks a bit to me like:
until p f v = head (dropWhile (not . p) (iterate f v))

Why does foldr use a helper function?

In explaining foldr to Haskell newbies, the canonical definition is
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
But in GHC.Base, foldr is defined as
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.
I can add some important details about GHC's optimization system.
The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.
There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.
When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.
Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:
dropR :: Char -> String -> String
dropR x r = if x == 'x' && null r then "" else x : r
main :: IO ()
main = do
s <- getLine
putStrLn $ foldr dropR "" s
First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:
main :: IO ()
main = do
s <- getLine
-- I'm changing the where clause to a let expression for the sake of readability
putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s
And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.
GHC cannot inline recursive functions, so
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
cannot be inlined. But
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
is not a recursive function. It is a non-recursive function with a local recursive definition!
This means that, as #bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.
As the comments say:
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
In particular, note the "we inline it when it has two parameters, which are the ones we are keen about specialising!"
What this is saying is that when foldr gets inlined, it's getting inlined only for the specific choice of f and z, not for the choice of the list getting folded. I'm not expert, but it would seem it would make it possible to inline it in situations like
map (foldr (+) 0) some_list
so that the inline happens in this line and not after map has been applied. This makes it optimizable in more situations and more easily. All the helper function does is mask the 3rd argument so {-# INLINE #-} can do its thing.
One tiny important detail not mentioned in other answers is that GHC, given a function definition like
f x y z w q = ...
cannot inline f until all of the arguments x, y, z, w, and q are applied. This means that it's often advantageous to use the worker/wrapper transformation to expose a minimal set of function arguments which must be applied before inlining can occur.

Is the following code really currying in haskell?

I am trying to understand currying by reading various blogs and stack over flow answers and I think I understood some what. In Haskell, every function is curried, it means, when you have a function like f x y = x + y
it really is ((f x) y)
in this, the function initially take the first parameter 'x' as the parameter and partially applies it to function f which in turn returns a function for y. where it takes just y a single parameter and applies the function. In both cases the function takes only one parameter and also the process of reducing a function to take single parameter is called 'currying'. Correct me if my understanding wrong here.
So if it is correct, could you please tell me if the functions 'two' and 'three' are curried functions?
three x y z = x + y + z
two = three 1
same = two 1
In this case, I have two specialized functions, 'two' and 'same' which are reduced to take only one parameter so is it curried?
Let's look at two first.
It has a signature of
two :: Num a => a -> a -> a
forget the Num a for now (it's only a constraint on a - you can read Int here).
Surely this too is a curried function.
The next one is more interesting:
same :: Num a => a -> a
(btw: nice name - it's the same but not exactly id ^^)
TBH: I don't know for sure.
The best definition I know of a curried function is this:
A curried function is a function of N arguments returning another function of (N-1) arguments.
(if you want you can extent this to fully curried functions of course)
This will only fit if you define constants as functions with 0 parameters - which you surely can.
So I would say yes(?) this too is a curried function but only in a mathy borderline way (like the sum of 0 numbers is defined to be 0)
Best just think about this equationally. The following are all equivalent definitions:
f x y z = x+y+z
f x y = \z -> x+y+z
f x = \y -> (\z -> x+y+z)
f = \x -> (\y -> (\z -> x+y+z))
Partial application is only tangentially relevant here. Most often you don't want the actual partial application to be performed and the actual lambda object to be created in memory - hoping instead that the compiler will employ - and optimize better - the full definition at the final point of full application.
The presence of the functions curry/uncurry is yet another confusing issue. Both f (x,y) = ... and f x y = ... are curried in Haskell, of course, but in our heads we tend to think about the first as a function of two arguments, so the functions translating between the two forms are named curry and uncurry, as a mnemonic.
You could think that three function with anonymous functions is:
three = \x -> (\y -> (\z -> x + y + z)))

Implementing Iota in Haskell

Iota is a ridiculously small "programming language" using only one combinator. I'm interested in understanding how it works, but it would be helpful to see the implementation in a language I'm familiar with.
I found an implementation of the Iota programming language written in Scheme. I've been having a little trouble translating it to Haskell though. It's rather simple, but I'm relatively new to both Haskell and Scheme.
How would you write an equivalent Iota implementation in Haskell?
(let iota ()
(if (eq? #\* (read-char)) ((iota)(iota))
(lambda (c) ((c (lambda (x) (lambda (y) (lambda (z) ((x z)(y z))))))
(lambda (x) (lambda (y) x))))))
I've been teaching myself some of this stuff, so I sure hope I get the following right...
As n.m. mentions, the fact that Haskell is typed is of enormous importance to this question; type systems restrict what expressions can be formed, and in particular the most basic type systems for the lambda calculus forbid self-application, which ends up giving you a non-Turing complete language. Turing completeness is added on top of the basic type system as an extra feature to the language (either a fix :: (a -> a) -> a operator or recursive types).
This doesn't mean you can't implement this at all in Haskell, but rather that such an implementation is not going to have just one operator.
Approach #1: implement the second example one-point combinatory logic basis from here, and add a fix function:
iota' :: ((t1 -> t2 -> t1)
-> ((t5 -> t4 -> t3) -> (t5 -> t4) -> t5 -> t3)
-> (t6 -> t7 -> t6)
-> t)
-> t
iota' x = x k s k
where k x y = x
s x y z = x z (y z)
fix :: (a -> a) -> a
fix f = let result = f result in result
Now you can write any program in terms of iota' and fix. Explaining how this works is a bit involved. (EDIT: note that this iota' is not the same as the λx.x S K in the original question; it's λx.x K S K, which is also Turing-complete. It is the case that iota' programs are going to be different from iota programs. I've tried the iota = λx.x S K definition in Haskell; it typechecks, but when you try k = iota (iota (iota iota)) and s = iota (iota (iota (iota iota))) you get type errors.)
Approach #2: Untyped lambda calculus denotations can be embedded into Haskell using this recursive type:
newtype D = In { out :: D -> D }
D is basically a type whose elements are functions from D to D. We have In :: (D -> D) -> D to convert a D -> D function into a plain D, and out :: D -> (D -> D) to do the opposite. So if we have x :: D, we can self-apply it by doing out x x :: D.
Give that, now we can write:
iota :: D
iota = In $ \x -> out (out x s) k
where k = In $ \x -> In $ \y -> x
s = In $ \x -> In $ \y -> In $ \z -> out (out x z) (out y z)
This requires some "noise" from the In and out; Haskell still forbids you to apply a D to a D, but we can use In and out to get around this. You can't actually do anything useful with values of type D, but you could design a useful type around the same pattern.
EDIT: iota is basically λx.x S K, where K = λx.λy.x and S = λx.λy.λz.x z (y z). I.e., iota takes a two-argument function and applies it to S and K; so by passing a function that returns its first argument you get S, and by passing a function that returns its second argument you get K. So if you can write the "return first argument" and the "return second argument" with iota, you can write S and K with iota. But S and K are enough to get Turing completeness, so you also get Turing completeness in the bargain. It does turn out that you can write the requisite selector functions with iota, so iota is enough for Turing completeness.
So this reduces the problem of understanding iota to understanding the SK calculus.

CPS in curried languages

How does CPS in curried languages like lambda calculus or Ocaml even make sense? Technically, all function have one argument. So say we have a CPS version of addition in one such language:
cps-add k n m = k ((+) n m)
And we call it like
(cps-add random-continuation 1 2)
This is then the same as:
(((cps-add random-continuation) 1) 2)
I already see two calls there that aren't tail calls and in reality a complexly nested expression, the (cps-add random-continuation) returns a value, namely a function that consumes a number, and then returns a function which consumes another number and then delivers the sum of both to that random-continuation. But we can't work around this value returning by simply translating this into CPS again, because we can only give each function one argument. We need to have at least two to make room for the continuation and the 'actual' argument.
Or am I missing something completely?
Since you've tagged this with Haskell, I'll answer in that regard: In Haskell, the equivalent of doing a CPS transform is working in the Cont monad, which transforms a value x into a higher-order function that takes one argument and applies it to x.
So, to start with, here's 1 + 2 in regular Haskell: (1 + 2) And here it is in the continuation monad:
contAdd x y = do x' <- x
y' <- y
return $ x' + y'
...not terribly informative. To see what's going on, let's disassemble the monad. First, removing the do notation:
contAdd x y = x >>= (\x' -> y >>= (\y' -> return $ x' + y'))
The return function lifts a value into the monad, and in this case is implemented as \x k -> k x, or using an infix operator section as \x -> ($ x).
contAdd x y = x >>= (\x' -> y >>= (\y' -> ($ x' + y')))
The (>>=) operator (read "bind") chains together computations in the monad, and in this case is implemented as \m f k -> m (\x -> f x k). Changing the bind function to prefix form and substituting in the lambda, plus some renaming for clarity:
contAdd x y = (\m1 f1 k1 -> m1 (\a1 -> f1 a1 k1)) x (\x' -> (\m2 f2 k2 -> m2 (\a2 -> f2 a2 k2)) y (\y' -> ($ x' + y')))
Reducing some function applications:
contAdd x y = (\k1 -> x (\a1 -> (\x' -> (\k2 -> y (\a2 -> (\y' -> ($ x' + y')) a2 k2))) a1 k1))
contAdd x y = (\k1 -> x (\a1 -> y (\a2 -> ($ a1 + a2) k1)))
And a bit of final rearranging and renaming:
contAdd x y = \k -> x (\x' -> y (\y' -> k $ x' + y'))
In other words: The arguments to the function have been changed from numbers, into functions that take a number and return the final result of the entire expression, just as you'd expect.
Edit: A commenter points out that contAdd itself still takes two arguments in curried style. This is sensible because it doesn't use the continuation directly, but not necessary. To do otherwise, you'd need to first break the function apart between the arguments:
contAdd x = x >>= (\x' -> return (\y -> y >>= (\y' -> return $ x' + y')))
And then use it like this:
foo = do f <- contAdd (return 1)
r <- f (return 2)
return r
Note that this is really no different from the earlier version; it's simply packaging the result of each partial application as taking a continuation, not just the final result. Since functions are first-class values, there's no significant difference between a CPS expression holding a number vs. one holding a function.
Keep in mind that I'm writing things out in very verbose fashion here to make explicit all the steps where something is in continuation-passing style.
Addendum: You may notice that the final expression looks very similar to the de-sugared version of the monadic expression. This is not a coincidence, as the inward-nesting nature of monadic expressions that lets them change the structure of the computation based on previous values is closely related to continuation-passing style; in both cases, you have in some sense reified a notion of causality.
Short answer : of course it makes sense, you can apply a CPS-transform directly, you will only have lots of cruft because each argument will have, as you noticed, its own attached continuation
In your example, I will consider that there is a +(x,y) uncurried primitive, and that you're asking what is the translation of
let add x y = +(x,y)
(This add faithfully represent OCaml's (+) operator)
add is syntaxically equivalent to
let add = fun x -> (fun y -> +(x, y))
So you apply a CPS transform¹ and get
let add_cps = fun x kx -> kx (fun y ky -> ky +(x,y))
If you want a translated code that looks more like something you could have willingly written, you can devise a finer transformation that actually considers known-arity function as non-curried functions, and tream all parameters as a whole (as you have in non-curried languages, and as functional compilers already do for obvious performance reasons).
¹: I wrote "a CPS transform" because there is no "one true CPS translation". Different translations have been devised, producing more or less continuation-related garbage. The formal CPS translations are usually defined directly on lambda-calculus, so I suppose you're having a less formal, more hand-made CPS transform in mind.
The good properties of CPS (as a style that program respect, and not a specific transformation into this style) are that the order of evaluation is completely explicit, and that all calls are tail-calls. As long as you respect those, you're relatively free in what you can do. Handling curryfied functions specifically is thus perfectly fine.
Remark : Your (cps-add k 1 2) version can also be considered tail-recursive if you assume the compiler detect and optimize that cps-add actually always take 3 arguments, and don't build intermediate closures. That may seem far-fetched, but it's the exact same assumption we use when reasoning about tail-calls in non-CPS programs, in those languages.
yes, technically all functions can be decomposed into functions with one method, however, when you want to use CPS the only thing you are doing is saying is that at a certain point of computation, run the continuation method.
Using your example, lets have a look. To make things a little easier, let's deconstruct cps-add into its normal form where it is a function only taking one argument.
(cps-add k) -> n -> m = k ((+) n m)
Note at this point that the continuation, k, is not being evaluated (Could this be the point of confusion for you?).
Here we have a method called cps-add k that receives a function as an argument and then returns a function that takes another argument, n.
((cps-add k) n) -> m = k ((+) n m)
Now we have a function that takes an argument, m.
So I suppose what I am trying to point out is that currying does not get in the way of CPS style programming. Hope that helps in some way.

Resources