How does CPS in curried languages like lambda calculus or Ocaml even make sense? Technically, all function have one argument. So say we have a CPS version of addition in one such language:
cps-add k n m = k ((+) n m)
And we call it like
(cps-add random-continuation 1 2)
This is then the same as:
(((cps-add random-continuation) 1) 2)
I already see two calls there that aren't tail calls and in reality a complexly nested expression, the (cps-add random-continuation) returns a value, namely a function that consumes a number, and then returns a function which consumes another number and then delivers the sum of both to that random-continuation. But we can't work around this value returning by simply translating this into CPS again, because we can only give each function one argument. We need to have at least two to make room for the continuation and the 'actual' argument.
Or am I missing something completely?
Since you've tagged this with Haskell, I'll answer in that regard: In Haskell, the equivalent of doing a CPS transform is working in the Cont monad, which transforms a value x into a higher-order function that takes one argument and applies it to x.
So, to start with, here's 1 + 2 in regular Haskell: (1 + 2) And here it is in the continuation monad:
contAdd x y = do x' <- x
y' <- y
return $ x' + y'
...not terribly informative. To see what's going on, let's disassemble the monad. First, removing the do notation:
contAdd x y = x >>= (\x' -> y >>= (\y' -> return $ x' + y'))
The return function lifts a value into the monad, and in this case is implemented as \x k -> k x, or using an infix operator section as \x -> ($ x).
contAdd x y = x >>= (\x' -> y >>= (\y' -> ($ x' + y')))
The (>>=) operator (read "bind") chains together computations in the monad, and in this case is implemented as \m f k -> m (\x -> f x k). Changing the bind function to prefix form and substituting in the lambda, plus some renaming for clarity:
contAdd x y = (\m1 f1 k1 -> m1 (\a1 -> f1 a1 k1)) x (\x' -> (\m2 f2 k2 -> m2 (\a2 -> f2 a2 k2)) y (\y' -> ($ x' + y')))
Reducing some function applications:
contAdd x y = (\k1 -> x (\a1 -> (\x' -> (\k2 -> y (\a2 -> (\y' -> ($ x' + y')) a2 k2))) a1 k1))
contAdd x y = (\k1 -> x (\a1 -> y (\a2 -> ($ a1 + a2) k1)))
And a bit of final rearranging and renaming:
contAdd x y = \k -> x (\x' -> y (\y' -> k $ x' + y'))
In other words: The arguments to the function have been changed from numbers, into functions that take a number and return the final result of the entire expression, just as you'd expect.
Edit: A commenter points out that contAdd itself still takes two arguments in curried style. This is sensible because it doesn't use the continuation directly, but not necessary. To do otherwise, you'd need to first break the function apart between the arguments:
contAdd x = x >>= (\x' -> return (\y -> y >>= (\y' -> return $ x' + y')))
And then use it like this:
foo = do f <- contAdd (return 1)
r <- f (return 2)
return r
Note that this is really no different from the earlier version; it's simply packaging the result of each partial application as taking a continuation, not just the final result. Since functions are first-class values, there's no significant difference between a CPS expression holding a number vs. one holding a function.
Keep in mind that I'm writing things out in very verbose fashion here to make explicit all the steps where something is in continuation-passing style.
Addendum: You may notice that the final expression looks very similar to the de-sugared version of the monadic expression. This is not a coincidence, as the inward-nesting nature of monadic expressions that lets them change the structure of the computation based on previous values is closely related to continuation-passing style; in both cases, you have in some sense reified a notion of causality.
Short answer : of course it makes sense, you can apply a CPS-transform directly, you will only have lots of cruft because each argument will have, as you noticed, its own attached continuation
In your example, I will consider that there is a +(x,y) uncurried primitive, and that you're asking what is the translation of
let add x y = +(x,y)
(This add faithfully represent OCaml's (+) operator)
add is syntaxically equivalent to
let add = fun x -> (fun y -> +(x, y))
So you apply a CPS transform¹ and get
let add_cps = fun x kx -> kx (fun y ky -> ky +(x,y))
If you want a translated code that looks more like something you could have willingly written, you can devise a finer transformation that actually considers known-arity function as non-curried functions, and tream all parameters as a whole (as you have in non-curried languages, and as functional compilers already do for obvious performance reasons).
¹: I wrote "a CPS transform" because there is no "one true CPS translation". Different translations have been devised, producing more or less continuation-related garbage. The formal CPS translations are usually defined directly on lambda-calculus, so I suppose you're having a less formal, more hand-made CPS transform in mind.
The good properties of CPS (as a style that program respect, and not a specific transformation into this style) are that the order of evaluation is completely explicit, and that all calls are tail-calls. As long as you respect those, you're relatively free in what you can do. Handling curryfied functions specifically is thus perfectly fine.
Remark : Your (cps-add k 1 2) version can also be considered tail-recursive if you assume the compiler detect and optimize that cps-add actually always take 3 arguments, and don't build intermediate closures. That may seem far-fetched, but it's the exact same assumption we use when reasoning about tail-calls in non-CPS programs, in those languages.
yes, technically all functions can be decomposed into functions with one method, however, when you want to use CPS the only thing you are doing is saying is that at a certain point of computation, run the continuation method.
Using your example, lets have a look. To make things a little easier, let's deconstruct cps-add into its normal form where it is a function only taking one argument.
(cps-add k) -> n -> m = k ((+) n m)
Note at this point that the continuation, k, is not being evaluated (Could this be the point of confusion for you?).
Here we have a method called cps-add k that receives a function as an argument and then returns a function that takes another argument, n.
((cps-add k) n) -> m = k ((+) n m)
Now we have a function that takes an argument, m.
So I suppose what I am trying to point out is that currying does not get in the way of CPS style programming. Hope that helps in some way.
Related
In Programming in Haskell by Hutton
In general, if # is an operator, then expressions of the form (#), (x #), and (# y) for arguments x and
y are called sections, whose meaning as functions can be
formalised using lambda expressions as follows:
(#) = \x -> (\y -> x # y)
(x #) = \y -> x # y
(# y) = \x -> x # y
What are the difference and relation between "section" and "currying"?
Is a section the result of applying the currying operation to a multi-argument function?
Thanks.
A section is just special syntax for applying an infix operator to a single argument. (# y) is the more useful of the two, as (x #) is equivalent to (#) x (which is just applying the infix operator as a function to a single argument in the usual fashion).
curry f x y = f (x,y). uncurry g (x,y) = g x y.
(+ 3) 4 = (+) 4 3 = 4 + 3. (4 +) 3 = (+) 4 3 = 4 + 3.
A section is a result of partial application of a curried function: (+ 3) = flip (+) 3, (4 +) = (+) 4.
A curried function (like g or (+)) expects its arguments one at a time. An uncurried function (like f) expects its arguments in a tuple.
To partially apply an uncurried function we have first to turn it into a curried function, with curry. To partially apply a curried function we don't need to do anything, just apply it to an argument.
curry :: ((a, b) -> c ) -> ( a -> (b -> c))
uncurry :: (a -> (b -> c)) -> ((a, b) -> c )
x :: a
g :: a -> (b -> c)
--------------------
g x :: b -> c
x :: a
f :: (a, b) -> c
---------------------------
curry f :: a -> (b -> c)
curry f x :: b -> c
Left sections and right sections are syntactical devices for partially applying an infix operator to a single argument (see also chepner's answer). For the sake of accuracy, we should note that currying is not the same thing as partial application:
Currying is converting a function that takes N arguments into a function that takes a single argument and returns a function that takes N-1 arguments.
Partial application is making a function that takes N-1 arguments out of a function that takes N arguments by supplying one of the arguments.
In Haskell, it happens that everything is curried; all functions take just one argument (even uncurried functions in Haskell take a tuple, which is, strictly speaking, a single argument -- you might want to play with the curry and uncurry functions to see how that works). Still, we very often think informally of functions that return functions as functions of multiple arguments. From that vantage point, a nice consequence of currying by default is that partial application of a function to its first argument becomes trivial: while, for instance, elem takes a value and a container and tests if the value is an element of the contaier, elem "apple" takes a container (of strings) and tests if "apple" is an element of it.
As for operators, when we write, for instance...
5 / 2
... we are applying the operator / to the arguments 5 and 2. The operator can also be used in prefix form, rather than infix:
(/) 5 2
In prefix form, the operator can be partially applied in the usual way:
(/) 5
That, however, arguably looks a little awkward -- after all, 5 here is the numerator, and not the denominator. I'd say left section syntax is easier on the eye in this case:
(5 /)
Furthermore, partial application to the second argument is not quite as straightforward to write, requiring a lambda, or flip. In the case of operators, a right section can help with that:
(/ 2)
Note that sections also work with functions made into operators through backtick syntax, so this...
(`elem` ["apple", "grape", "orange"])
... takes a string and tests whether it can be found in ["apple", "grape", "orange"].
I am able to understand the basics of point-free functions in Haskell:
addOne x = 1 + x
As we see x on both sides of the equation, we simplify it:
addOne = (+ 1)
Incredibly it turns out that functions where the same argument is used twice in different parts can be written point-free!
Let me take as a basic example the average function written as:
average xs = realToFrac (sum xs) / genericLength xs
It may seem impossible to simplify xs, but http://pointfree.io/ comes out with:
average = ap ((/) . realToFrac . sum) genericLength
That works.
As far as I understand this states that average is the same as calling ap on two functions, the composition of (/) . realToFrac . sum and genericLength
Unfortunately the ap function makes no sense whatsoever to me, the docs http://hackage.haskell.org/package/base-4.8.1.0/docs/Control-Monad.html#v:ap state:
ap :: Monad m => m (a -> b) -> m a -> m b
In many situations, the liftM operations can be replaced by uses of ap,
which promotes function application.
return f `ap` x1 `ap` ... `ap` xn
is equivalent to
liftMn f x1 x2 ... xn
But writing:
let average = liftM2 ((/) . realToFrac . sum) genericLength
does not work, (gives a very long type error message, ask and I'll include it), so I do not understand what the docs are trying to say.
How does the expression ap ((/) . realToFrac . sum) genericLength work? Could you explain ap in simpler terms than the docs?
Any lambda term can be rewritten to an equivalent term that uses just a set of suitable combinators and no lambda abstractions. This process is called abstraciton elimination. During the process you want to remove lambda abstractions from inside out. So at one step you have λx.M where M is already free of lambda abstractions, and you want to get rid of x.
If M is x, you replace λx.x with id (id is usually denoted by I in combinatory logic).
If M doesn't contain x, you replace the term with const M (const is usually denoted by K in combinatory logic).
If M is PQ, that is the term is λx.PQ, you want to "push" x inside both parts of the function application so that you can recursively process both parts. This is accomplished by using the S combinator defined as λfgx.(fx)(gx), that is, it takes two functions and passes x to both of them, and applies the results together. You can easily verify that that λx.PQ is equivalent to S(λx.P)(λx.Q), and we can recursively process both subterms.
As described in the other answers, the S combinator is available in Haskell as ap (or <*>) specialized to the reader monad.
The appearance of the reader monad isn't accidental: When solving the task of replacing λx.M with an equivalent function is basically lifting M :: a to the reader monad r -> a (actually the reader Applicative part is enough), where r is the type of x. If we revise the process above:
The only case that is actually connected with the reader monad is when M is x. Then we "lift" x to id, to get rid of the variable. The other cases below are just mechanical applications of lifting an expression to an applicative functor:
The other case λx.M where M doesn't contain x, it's just lifting M to the reader applicative, which is pure M. Indeed, for (->) r, pure is equivalent to const.
In the last case, <*> :: f (a -> b) -> f a -> f b is function application lifted to a monad/applicative. And this is exactly what we do: We lift both parts P and Q to the reader applicative and then use <*> to bind them together.
The process can be further improved by adding more combinators, which allows the resulting term to be shorter. Most often, combinators B and C are used, which in Haskell correspond to functions (.) and flip. And again, (.) is just fmap/<$> for the reader applicative. (I'm not aware of such a built-in function for expressing flip, but it'd be viewed as a specialization of f (a -> b) -> a -> f b for the reader applicative.)
Some time ago I wrote a short article about this: The Monad Reader Issue 17, The Reader Monad and Abstraction Elimination.
When the monad m is (->) a, as in your case, you can define ap as follows:
ap f g = \x -> f x (g x)
We can see that this indeed "works" in your pointfree example.
average = ap ((/) . realToFrac . sum) genericLength
average = \x -> ((/) . realToFrac . sum) x (genericLength x)
average = \x -> (/) (realToFrac (sum x)) (genericLength x)
average = \x -> realToFrac (sum x) / genericLength x
We can also derive ap from the general law
ap f g = do ff <- f ; gg <- g ; return (ff gg)
that is, desugaring the do-notation
ap f g = f >>= \ff -> g >>= \gg -> return (ff gg)
If we substitute the definitions of the monad methods
m >>= f = \x -> f (m x) x
return x = \_ -> x
we get the previous definition of ap back (for our specific monad (->) a). Indeed:
app f g
= f >>= \ff -> g >>= \gg -> return (ff gg)
= f >>= \ff -> g >>= \gg -> \_ -> ff gg
= f >>= \ff -> g >>= \gg _ -> ff gg
= f >>= \ff -> \x -> (\gg _ -> ff gg) (g x) x
= f >>= \ff -> \x -> (\_ -> ff (g x)) x
= f >>= \ff -> \x -> ff (g x)
= f >>= \ff x -> ff (g x)
= \y -> (\ff x -> ff (g x)) (f y) y
= \y -> (\x -> f y (g x)) y
= \y -> f y (g y)
The Simple Bit: fixing liftM2
The problem in the original example is that ap works a bit differently from the liftM functions. ap takes a function wrapped up in a monad, and applies it to an argument wrapped up in a monad. But the liftMn functions take a "normal" function (one which is not wrapped up in a monad) and apply it to argument(s) that are wrapped up in monads.
I'll explain more about what that means below, but the upshot is that if you want to use liftM2, then you have to pull (/) out and make it a separate argument at the beginning. (So in this case (/) is the "normal" function.)
let average = liftM2 ((/) . realToFrac . sum) genericLength -- does not work
let average = liftM2 (/) (realToFrac . sum) genericLength -- works
As posted in the original question, calling liftM2 should involve three agruments: liftM2 f x1 x2. Here the f is (/), x1 is (realToFrac . sum) and x2 is genericLength.
The version posted in the question (the one which doesn't work) was trying to call liftM2 with only two arguments.
The explanation
I'll build this up in a few stages. I'll start with some specific values, and build up to a function that can take any set of values. Jump to the last section for the TL:DR
In this example, lets assume the list of numbers is [1,2,3,4]. The sum of these numbers is 10, and the length of the list is 4. The average is 10/4 or 2.5.
To shoe-horn this into the right form for ap, we're going to break this into a function, an input, and a result.
ourFunction = (10/) -- "divide 10 by"
ourInput = 4
ourResult = 2.5
Three kinds of Function Application
ap and listM both involve monads. At this point in the explanation, you can think of a monad as something that a value can be "wrapped up in". I'll give a better definition below.
Normal function application applies a normal function to a normal input. liftM applies a normal function to an input wrapped in a monad, and ap applies a function wrapped in a monad to an input wrapped in a monad.
(10/) 4 -- returns 2.5
liftM (10/) monad(4) -- returns monad(2.5)
ap monad(10/) monad(4) -- returns monad(2.5)
(Note that this is pseudocode. monad(4) is not actually valid Haskell).
(Note that liftM is a different function from liftM2, which was used earlier. liftM takes a function and only one argument, which is a better fit for the pattern i'm describing.)
In the average function defined above, the monads were functions, but "functions-as-monads" can be hard to talk about, so I'll start with simpler examples.
So what's a monad?
A better description of a monad is "something which contains a value, or produces a value, or which you can somehow extract a value from, but which also has something more complicated going on."
That's a really vague description, but it kind of has to be, because the "something more complicated" can be a lot of different things.
Monads can be confusing, but the point of them is that when you use monad operations (like ap and liftM) they will take care of the "something more complicated" for you, so you can just concentrate on the values.
That's probably still not very clear, so let's do some examples:
The Maybe monad
ap (Just (10/)) (Just 4) -- result is (Just 2.5)
One of the simplest monads is 'Maybe'. The value is whatever is contained inside a Just. So if we call ap and give it (Just ourFunction) and (Just ourInput) then we get back (Just ourResult).
The "something more complicated" is the fact that there might not be a value there at all, and you have to allow for the Nothing case.
As mentioned, the point of using a function like ap is that it takes care of these extra complications for us. With the Maybe monad, ap handles this by returning Nothing if either the Maybe-function or the Maybe-input were Nothing.
ap (Just (10/)) Nothing -- result is Nothing
ap Nothing (Just 4) -- result is Nothing
The List Monad
ap [(10/)] [4] -- result is [2.5]
With the list Monad, the value is whatever is inside the list. So ap [ourfunction] [ourInput] returns [ourResult].
The "something more complicated" is that there may be more than one thing inside the list (or exactly one thing, or nothing at all).
With lists, that means ap takes a list of zero or more functions, and a list of zero or more inputs. It handles that by returning a list of zero or more results: one result for every possible combination of function and input.
ap [(10/), (100/)] [5,4,2] -- result is [2.0, 2.5, 5.0, 20.0, 25.0, 50.0]
Functions as Monads
A function like genericLength is considered a Monad because it has a value (the function's output), and it has a "something more complicated" (the fact that you have to supply an input before you can get the value).
This is where it gets a little confusing, because we're dealing with multiple functions, multiple inputs, and multiple results. It is all well defined, it's just hard to describe, so we have to be careful with our terminology.
Lets start with the list [1,2,3,4], and call that our "original input". That's the list we're trying to find the average of. It's the xs argument in the original average function.
If we give our original input ([1,2,3,4]) to genericLength then we get a value of '4'.
Our other function is ((/) . realToFrac . sum). It takes our list [1,2,3,4] and finds the sum (10), turns that into a fractional value, and then feeds it as the first argument to (/). The result is an incomplete division function that is waiting for another argument. ie it takes [1,2,3,4] as an input, and produces (10/) as its output.
This all fits with the way ap is defined for functions. With functions, ap takes two things. The first is a function that reads the original input and produces a new function. The second is a function that reads the original input and produces a new input. The final result is a function that takes the original input, and returns the same thing you would get if you applied the new function to the new input.
You might have to read that a few times to make sense of it. Alternatively, here it is in pseudocode:
average =
ap
(functionThatTakes [1,2,3,4] and returns "(10/)" )
(functionThatTakes [1,2,3,4] and returns " 4 " )
-- which means:
average =
(functionThatTakes [1,2,3,4] and returns "2.5" )
If you compare this to the simpler examples above, you'll see that it still has our function (10/), our input 4 and our result 2.5. And each of them is once again wrapped up in the "something more complicated". In this case, the "something more complicated" is the "function that takes [1,2,3,4] and returns...".
Of course, since they're functions, they don't have to take [1,2,3,4] as their input. If they took a different list of integers (eg [1,2,3,4,5]) then we would get different results (e.g. new function: (15/), new input 5 and new value 3).
Other examples
minPlusMax = ap ((+) . minimum) maximum
-- a function that adds the minimum element of a list, to the maximum element
upperAndLower = ap ((,) . toUpper) toLower
-- a function that takes a Char and returns a tuple, with the upper case and lower case versions of a character
These could all also be defined using liftM2.
average = liftM2 (/) sum genericLength
minPlusMax = liftM2 (+) minimum maximum
upperAndLower = liftM2 (,) toUpper toLower
In explaining foldr to Haskell newbies, the canonical definition is
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
But in GHC.Base, foldr is defined as
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.
I can add some important details about GHC's optimization system.
The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.
There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.
When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.
Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:
dropR :: Char -> String -> String
dropR x r = if x == 'x' && null r then "" else x : r
main :: IO ()
main = do
s <- getLine
putStrLn $ foldr dropR "" s
First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:
main :: IO ()
main = do
s <- getLine
-- I'm changing the where clause to a let expression for the sake of readability
putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s
And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.
GHC cannot inline recursive functions, so
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
cannot be inlined. But
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
is not a recursive function. It is a non-recursive function with a local recursive definition!
This means that, as #bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.
As the comments say:
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
In particular, note the "we inline it when it has two parameters, which are the ones we are keen about specialising!"
What this is saying is that when foldr gets inlined, it's getting inlined only for the specific choice of f and z, not for the choice of the list getting folded. I'm not expert, but it would seem it would make it possible to inline it in situations like
map (foldr (+) 0) some_list
so that the inline happens in this line and not after map has been applied. This makes it optimizable in more situations and more easily. All the helper function does is mask the 3rd argument so {-# INLINE #-} can do its thing.
One tiny important detail not mentioned in other answers is that GHC, given a function definition like
f x y z w q = ...
cannot inline f until all of the arguments x, y, z, w, and q are applied. This means that it's often advantageous to use the worker/wrapper transformation to expose a minimal set of function arguments which must be applied before inlining can occur.
I am trying to understand currying by reading various blogs and stack over flow answers and I think I understood some what. In Haskell, every function is curried, it means, when you have a function like f x y = x + y
it really is ((f x) y)
in this, the function initially take the first parameter 'x' as the parameter and partially applies it to function f which in turn returns a function for y. where it takes just y a single parameter and applies the function. In both cases the function takes only one parameter and also the process of reducing a function to take single parameter is called 'currying'. Correct me if my understanding wrong here.
So if it is correct, could you please tell me if the functions 'two' and 'three' are curried functions?
three x y z = x + y + z
two = three 1
same = two 1
In this case, I have two specialized functions, 'two' and 'same' which are reduced to take only one parameter so is it curried?
Let's look at two first.
It has a signature of
two :: Num a => a -> a -> a
forget the Num a for now (it's only a constraint on a - you can read Int here).
Surely this too is a curried function.
The next one is more interesting:
same :: Num a => a -> a
(btw: nice name - it's the same but not exactly id ^^)
TBH: I don't know for sure.
The best definition I know of a curried function is this:
A curried function is a function of N arguments returning another function of (N-1) arguments.
(if you want you can extent this to fully curried functions of course)
This will only fit if you define constants as functions with 0 parameters - which you surely can.
So I would say yes(?) this too is a curried function but only in a mathy borderline way (like the sum of 0 numbers is defined to be 0)
Best just think about this equationally. The following are all equivalent definitions:
f x y z = x+y+z
f x y = \z -> x+y+z
f x = \y -> (\z -> x+y+z)
f = \x -> (\y -> (\z -> x+y+z))
Partial application is only tangentially relevant here. Most often you don't want the actual partial application to be performed and the actual lambda object to be created in memory - hoping instead that the compiler will employ - and optimize better - the full definition at the final point of full application.
The presence of the functions curry/uncurry is yet another confusing issue. Both f (x,y) = ... and f x y = ... are curried in Haskell, of course, but in our heads we tend to think about the first as a function of two arguments, so the functions translating between the two forms are named curry and uncurry, as a mnemonic.
You could think that three function with anonymous functions is:
three = \x -> (\y -> (\z -> x + y + z)))
I'm very new to Haskell and FP in general. I've read many of the writings that describe what currying is, but I haven't found an explanation to how it actually works.
Here is a function: (+) :: a -> (a -> a)
If I do (+) 4 7, the function takes 4 and returns a function that takes 7 and returns 11. But what happens to 4 ? What does that first function do with 4? What does (a -> a) do with 7?
Things get more confusing when I think about a more complicated function:
max' :: Int -> (Int -> Int)
max' m n | m > n = m
| otherwise = n
what does (Int -> Int) compare its parameter to? It only takes one parameter, but it needs two to do m > n.
Understanding higher-order functions
Haskell, as a functional language, supports higher-order functions (HOFs). In mathematics HOFs are called functionals, but you don't need any mathematics to understand them. In usual imperative programming, like in Java, functions can accept values, like integers and strings, do something with them, and return back a value of some other type.
But what if functions themselves were no different from values, and you could accept a function as an argument or return it from another function? f a b c = a + b - c is a boring function, it sums a and b and then substracts c. But the function could be more interesting, if we could generalize it, what if we'd want sometimes to sum a and b, but sometimes multiply? Or divide by c instead of subtracting?
Remember, (+) is just a function of 2 numbers that returns a number, there's nothing special about it, so any function of 2 numbers that returns a number could be in place of it. Writing g a b c = a * b - c, h a b c = a + b / c and so on just doesn't cut it for us, we need a general solution, we are programmers after all! Here how it is done in Haskell:
let f g h a b c = a `g` b `h` c in f (*) (/) 2 3 4 -- returns 1.5
And you can return functions too. Below we create a function that accepts a function and an argument and returns another function, which accepts a parameter and returns a result.
let g f n = (\m -> m `f` n); f = g (+) 2 in f 10 -- returns 12
A (\m -> m `f` n) construct is an anonymous function of 1 argument m that applies f to that m and n. Basically, when we call g (+) 2 we create a function of one argument, that just adds 2 to whatever it receives. So let f = g (+) 2 in f 10 equals 12 and let f = g (*) 5 in f 5 equals 25.
(See also my explanation of HOFs using Scheme as an example.)
Understanding currying
Currying is a technique that transforms a function of several arguments to a function of 1 argument that returns a function of 1 argument that returns a function of 1 argument... until it returns a value. It's easier than it sounds, for example we have a function of 2 arguments, like (+).
Now imagine that you could give only 1 argument to it, and it would return a function? You could use this function later to add this 1st argument, now encased in this new function, to something else. E.g.:
f n = (\m -> n - m)
g = f 10
g 8 -- would return 2
g 4 -- would return 6
Guess what, Haskell curries all functions by default. Technically speaking, there are no functions of multiple arguments in Haskell, only functions of one argument, some of which may return new functions of one argument.
It's evident from the types. Write :t (++) in interpreter, where (++) is a function that concatenates 2 strings together, it will return (++) :: [a] -> [a] -> [a]. The type is not [a],[a] -> [a], but [a] -> [a] -> [a], meaning that (++) accepts one list and returns a function of type [a] -> [a]. This new function can accept yet another list, and it will finally return a new list of type [a].
That's why function application syntax in Haskell has no parentheses and commas, compare Haskell's f a b c with Python's or Java's f(a, b, c). It's not some weird aesthetic decision, in Haskell function application goes from left to right, so f a b c is actually (((f a) b) c), which makes complete sense, once you know that f is curried by default.
In types, however, the association is from right to left, so [a] -> [a] -> [a] is equivalent to [a] -> ([a] -> [a]). They are the same thing in Haskell, Haskell treats them exactly the same. Which makes sense, because when you apply only one argument, you get back a function of type [a] -> [a].
On the other hand, check the type of map: (a -> b) -> [a] -> [b], it receives a function as its first argument, and that's why it has parentheses.
To really hammer down the concept of currying, try to find the types of the following expressions in the interpreter:
(+)
(+) 2
(+) 2 3
map
map (\x -> head x)
map (\x -> head x) ["conscience", "do", "cost"]
map head
map head ["conscience", "do", "cost"]
Partial application and sections
Now that you understand HOFs and currying, Haskell gives you some syntax to make code shorter. When you call a function with 1 or multiple arguments to get back a function that still accepts arguments, it's called partial application.
You understand already that instead of creating anonymous functions you can just partially apply a function, so instead of writing (\x -> replicate 3 x) you can just write (replicate 3). But what if you want to have a divide (/) operator instead of replicate? For infix functions Haskell allows you to partially apply it using either of arguments.
This is called sections: (2/) is equivalent to (\x -> 2 / x) and (/2) is equivalent to (\x -> x / 2). With backticks you can take a section of any binary function: (2`elem`) is equivalent to (\xs -> 2 `elem` xs).
But remember, any function is curried by default in Haskell and therefore always accepts one argument, so sections can be actually used with any function: let (+^) be some weird function that sums 4 arguments, then let (+^) a b c d = a + b + c in (2+^) 3 4 5 returns 14.
Compositions
Other handy tools to write concise and flexible code are composition and application operator. Composition operator (.) chains functions together. Application operator ($) just applies function on the left side to the argument on the right side, so f $ x is equivalent to f x. However ($) has the lowest precedence of all operators, so we can use it to get rid of parentheses: f (g x y) is equivalent to f $ g x y.
It is also helpful when we need to apply multiple functions to the same argument: map ($2) [(2+), (10-), (20/)] would yield [4,8,10]. (f . g . h) (x + y + z), f (g (h (x + y + z))), f $ g $ h $ x + y + z and f . g . h $ x + y + z are equivalent, but (.) and ($) are different things, so read Haskell: difference between . (dot) and $ (dollar sign) and parts from Learn You a Haskell to understand the difference.
You can think of it like that the function stores the argument and returns a new function that just demands the other argument(s). The new function already knows the first argument, as it is stored together with the function. This is handled internally by the compiler. If you want to know how this works exactly, you may be interested in this page although it may be a bit complicated if you are new to Haskell.
If a function call is fully saturated (so all arguments are passed at the same time), most compilers use an ordinary calling scheme, like in C.
Does this help?
max' = \m -> \n -> if (m > n)
then m
else n
Written as lambdas. max' is a value of a lambda that itself returns a lambda given some m, which returns the value.
Hence max' 4 is
max' 4 = \n -> if (4 > n)
then 4
else n
Something that may help is to think about how you could implement curry as a higher order function if Haskell didn't have built in support for it. Here is a Haskell implementation that works for a function on two arguments.
curry :: (a -> b -> c) -> a -> (b -> c)
curry f a = \b -> f a b
Now you can pass curry a function on two arguments and the first argument and it will return a function on one argument (this is an example of a closure.)
In ghci:
Prelude> let curry f a = \b -> f a b
Prelude> let g = curry (+) 5
Prelude> g 10
15
Prelude> g 15
20
Prelude>
Fortunately we don't have to do this in Haskell (you do in Lisp if you want currying) because support is built into the language.
If you come from C-like languages, their syntax might help you to understand it. For example in PHP the add function could be implemented as such:
function add($a) {
return function($b) use($a) {
return $a + $b;
};
}
Haskell is based on Lambda calculus. Internally what happens is that everything gets converted into a function. So your compiler evaluates (+) as follows
(+) :: Num a => a -> a -> a
(+) x y = \x -> (\y -> x + y)
That is, (+) :: a -> a -> a is essentially the same as (+) :: a -> (a -> a). Hope this helps.