Breakpoints in the argument-passing scheme of OCaml - haskell

Today, I was going through the source code of Jane Street's Core_kernel module and I came across the compose function:
(* The typical use case for these functions is to pass in functional arguments
and get functions as a result. For this reason, we tell the compiler where
to insert breakpoints in the argument-passing scheme. *)
let compose f g = (); fun x -> f (g x)
I would have defined the compose function as:
let compose f g x = f (g x)
The reason they give for defining compose the way they did is “because compose is a function which takes functions f and g as arguments and returns the function fun x -> f (g x) as a result, they defined compose the way they did to tell the compiler to insert a breakpoint after f and g but before x in the argument-passing scheme.”
So I have two questions:
Why do we need breakpoints in the argument-passing scheme?
What difference would it make if we defined compose the normal way?
Coming from Haskell, this convention doesn't make any sense to me.

This is an efficiency hack to avoid the cost of a partial application in the expected use case indicated in the comment.
OCaml compiles curried functions into fixed-arity constructs, using a closure to partially apply them where necessary. This means that calls of that arity are efficient - there's no closure construction, just a function call.
There will be a closure construction within compose for fun x -> f (g x), but this will be more efficient than the partial application. Closures generated by partial application go through a wrapper caml_curryN which exists to ensure that effects occur at the correct time (if that closure is itself partially applied).
The fixed arity that the compiler chooses is based on a simple syntactic analysis - essentially, how many arguments are taken in a row without anything in between. The Jane St. programmers have used this to select the arity that they desire by injecting () "in between" arguments.
In short, let compose f g x = f (g x) is a less desirable definition because it would result in the common two-argument case of compose f g being a more expensive partial application.
Semantically, of course, there is no difference at all.

It's worth noting that compilation of partial application has improved in OCaml, and this performance hack is no longer necessary.

Related

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?
If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.
In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

What are the benefits of currying?

I don't think I quite understand currying, since I'm unable to see any massive benefit it could provide. Perhaps someone could enlighten me with an example demonstrating why it is so useful. Does it truly have benefits and applications, or is it just an over-appreciated concept?
(There is a slight difference between currying and partial application, although they're closely related; since they're often mixed together, I'll deal with both terms.)
The place where I realized the benefits first was when I saw sliced operators:
incElems = map (+1)
--non-curried equivalent: incElems = (\elems -> map (\i -> (+) 1 i) elems)
IMO, this is totally easy to read. Now, if the type of (+) was (Int,Int) -> Int *, which is the uncurried version, it would (counter-intuitively) result in an error -- but curryied, it works as expected, and has type [Int] -> [Int].
You mentioned C# lambdas in a comment. In C#, you could have written incElems like so, given a function plus:
var incElems = xs => xs.Select(x => plus(1,x))
If you're used to point-free style, you'll see that the x here is redundant. Logically, that code could be reduced to
var incElems = xs => xs.Select(curry(plus)(1))
which is awful due to the lack of automatic partial application with C# lambdas. And that's the crucial point to decide where currying is actually useful: mostly when it happens implicitly. For me, map (+1) is the easiest to read, then comes .Select(x => plus(1,x)), and the version with curry should probably be avoided, if there is no really good reason.
Now, if readable, the benefits sum up to shorter, more readable and less cluttered code -- unless there is some abuse of point-free style done is with it (I do love (.).(.), but it is... special)
Also, lambda calculus would get impossible without using curried functions, since it has only one-valued (but therefor higher-order) functions.
* Of course it actually in Num, but it's more readable like this for the moment.
Update: how currying actually works.
Look at the type of plus in C#:
int plus(int a, int b) {..}
You have to give it a tuple of values -- not in C# terms, but mathematically spoken; you can't just leave out the second value. In haskell terms, that's
plus :: (Int,Int) -> Int,
which could be used like
incElem = map (\x -> plus (1, x)) -- equal to .Select (x => plus (1, x))
That's way too much characters to type. Suppose you'd want to do this more often in the future. Here's a little helper:
curry f = \x -> (\y -> f (x,y))
plus' = curry plus
which gives
incElem = map (plus' 1)
Let's apply this to a concrete value.
incElem [1]
= (map (plus' 1)) [1]
= [plus' 1 1]
= [(curry plus) 1 1]
= [(\x -> (\y -> plus (x,y))) 1 1]
= [plus (1,1)]
= [2]
Here you can see curry at work. It turns a standard haskell style function application (plus' 1 1) into a call to a "tupled" function -- or, viewed at a higher level, transforms the "tupled" into the "untupled" version.
Fortunately, most of the time, you don't have to worry about this, as there is automatic partial application.
It's not the best thing since sliced bread, but if you're using lambdas anyway, it's easier to use higher-order functions without using lambda syntax. Compare:
map (max 4) [0,6,9,3] --[4,6,9,4]
map (\i -> max 4 i) [0,6,9,3] --[4,6,9,4]
These kinds of constructs come up often enough when you're using functional programming, that it's a nice shortcut to have and lets you think about the problem from a slightly higher level--you're mapping against the "max 4" function, not some random function that happens to be defined as (\i -> max 4 i). It lets you start to think in higher levels of indirection more easily:
let numOr4 = map $ max 4
let numOr4' = (\xs -> map (\i -> max 4 i) xs)
numOr4 [0,6,9,3] --ends up being [4,6,9,4] either way;
--which do you think is easier to understand?
That said, it's not a panacea; sometimes your function's parameters will be the wrong order for what you're trying to do with currying, so you'll have to resort to a lambda anyway. However, once you get used to this style, you start to learn how to design your functions to work well with it, and once those neurons starts to connect inside your brain, previously complicated constructs can start to seem obvious in comparison.
One benefit of currying is that it allows partial application of functions without the need of any special syntax/operator. A simple example:
mapLength = map length
mapLength ["ab", "cde", "f"]
>>> [2, 3, 1]
mapLength ["x", "yz", "www"]
>>> [1, 2, 3]
map :: (a -> b) -> [a] -> [b]
length :: [a] -> Int
mapLength :: [[a]] -> [Int]
The map function can be considered to have type (a -> b) -> ([a] -> [b]) because of currying, so when length is applied as its first argument, it yields the function mapLength of type [[a]] -> [Int].
Currying has the convenience features mentioned in other answers, but it also often serves to simplify reasoning about the language or to implement some code much easier than it could be otherwise. For example, currying means that any function at all has a type that's compatible with a ->b. If you write some code whose type involves a -> b, that code can be made work with any function at all, no matter how many arguments it takes.
The best known example of this is the Applicative class:
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
And an example use:
-- All possible products of numbers taken from [1..5] and [1..10]
example = pure (*) <*> [1..5] <*> [1..10]
In this context, pure and <*> adapt any function of type a -> b to work with lists of type [a]. Because of partial application, this means you can also adapt functions of type a -> b -> c to work with [a] and [b], or a -> b -> c -> d with [a], [b] and [c], and so on.
The reason this works is because a -> b -> c is the same thing as a -> (b -> c):
(+) :: Num a => a -> a -> a
pure (+) :: (Applicative f, Num a) => f (a -> a -> a)
[1..5], [1..10] :: Num a => [a]
pure (+) <*> [1..5] :: Num a => [a -> a]
pure (+) <*> [1..5] <*> [1..10] :: Num a => [a]
Another, different use of currying is that Haskell allows you to partially apply type constructors. E.g., if you have this type:
data Foo a b = Foo a b
...it actually makes sense to write Foo a in many contexts, for example:
instance Functor (Foo a) where
fmap f (Foo a b) = Foo a (f b)
I.e., Foo is a two-parameter type constructor with kind * -> * -> *; Foo a, the partial application of Foo to just one type, is a type constructor with kind * -> *. Functor is a type class that can only be instantiated for type constrcutors of kind * -> *. Since Foo a is of this kind, you can make a Functor instance for it.
The "no-currying" form of partial application works like this:
We have a function f : (A ✕ B) → C
We'd like to apply it partially to some a : A
To do this, we build a closure out of a and f (we don't evaluate f at all, for the time being)
Then some time later, we receive the second argument b : B
Now that we have both the A and B argument, we can evaluate f in its original form...
So we recall a from the closure, and evaluate f(a,b).
A bit complicated, isn't it?
When f is curried in the first place, it's rather simpler:
We have a function f : A → B → C
We'd like to apply it partially to some a : A – which we can just do: f a
Then some time later, we receive the second argument b : B
We apply the already evaluated f a to b.
So far so nice, but more important than being simple, this also gives us extra possibilities for implementing our function: we may be able to do some calculations as soon as the a argument is received, and these calculations won't need to be done later, even if the function is evaluated with multiple different b arguments!
To give an example, consider this audio filter, an infinite impulse response filter. It works like this: for each audio sample, you feed an "accumulator function" (f) with some state parameter (in this case, a simple number, 0 at the beginning) and the audio sample. The function then does some magic, and spits out the new internal state1 and the output sample.
Now here's the crucial bit – what kind of magic the function does depends on the coefficient2 λ, which is not quite a constant: it depends both on what cutoff frequency we'd like the filter to have (this governs "how the filter will sound") and on what sample rate we're processing in. Unfortunately, the calculation of λ is a bit more complicated (lp1stCoeff $ 2*pi * (νᵥ ~*% δs) than the rest of the magic, so we wouldn't like having to do this for every single sample, all over again. Quite annoying, because νᵥ and δs are almost constant: they change very seldom, certainly not at each audio sample.
But currying saves the day! We simply calculate λ as soon as we have the necessary parameters. Then, at each of the many many audio samples to come, we only need to perform the remaining, very easy magic: yⱼ = yⱼ₁ + λ ⋅ (xⱼ - yⱼ₁). So we're being efficient, and still keeping a nice safe referentially transparent purely-functional interface.
1 Note that this kind of state-passing can generally be done more nicely with the State or ST monad, that's just not particularly beneficial in this example
2 Yes, this is a lambda symbol. I hope I'm not confusing anybody – fortunately, in Haskell it's clear that lambda functions are written with \, not with λ.
It's somewhat dubious to ask what the benefits of currying are without specifying the context in which you're asking the question:
In some cases, like functional languages, currying will merely be seen as something that has a more local change, where you could replace things with explicit tupled domains. However, this isn't to say that currying is useless in these languages. In some sense, programming with curried functions make you "feel" like you're programming in a more functional style, because you more typically face situations where you're dealing with higher order functions. Certainly, most of the time, you will "fill in" all of the arguments to a function, but in the cases where you want to use the function in its partially applied form, this is a bit simpler to do in curried form. We typically tell our beginning programmers to use this when learning a functional language just because it feels like better style and reminds them they're programming in more than just C. Having things like curry and uncurry also help for certain conveniences within functional programming languages too, I can think of arrows within Haskell as a specific example of where you would use curry and uncurry a bit to apply things to different pieces of an arrow, etc...
In some cases, you want to think about more than functional programs, you can present currying / uncurrying as a way to state the elimination and introduction rules for and in constructive logic, which provides a connection to a more elegant motivation for why it exists.
In some cases, for example, in Coq, using curried functions versus tupled functions can produce different induction schemes, which may be easier or harder to work with, depending on your applications.
I used to think that currying was simple syntax sugar that saves you a bit of typing. For example, instead of writing
(\ x -> x + 1)
I can merely write
(+1)
The latter is instantly more readable, and less typing to boot.
So if it's just a convenient short cut, why all the fuss?
Well, it turns out that because function types are curried, you can write code which is polymorphic in the number of arguments a function has.
For example, the QuickCheck framework lets you test functions by feeding them randomly-generated test data. It works on any function who's input type can be auto-generated. But, because of currying, the authors were able to rig it so this works with any number of arguments. Were functions not curried, there would be a different testing function for each number of arguments - and that would just be tedious.

Removing common haskell piping boilerplate

I have some pretty common Haskell boilerplate that shows up in a lot of places. It looks something like this (when instantiating classes):
a <= b = (modify a) <= (modify b)
like this (with normal functions):
fn x y z = fn (foo x) (foo y) (foo z)
and sometimes even with tuples, as in:
mod (x, y) = (alt x, alt y)
It seems like there should be an easy way to reduce all of this boilerplate and not have to repeat myself quite so much. (These are simple examples, but it does get annoying). I imagine that there are abstractions created for removing such boilerplate, but I'm not sure what they're called nor where to look. Can any haskellites point me in the right direction?
For the (<=) case, consider defining compare instead; you can then use Data.Ord.comparing, like so:
instance Ord Foo where
compare = comparing modify
Note that comparing can simply be defined as comparing f = compare `on` f, using Data.Function.on.
For your fn example, it's not clear. There's no way to simplify this type of definition in general. However, I don't think the boilerplate is too bad in this instance.
For mod:
mod = alt *** alt
using Control.Arrow.(***) — read a b c as b -> c in the type signature; arrows are just a general abstraction (like functors or monads) of which functions are an instance. You might like to define both = join (***) (which is itself shorthand for both f = f *** f); I know at least one other person who uses this alias, and I think it should be in Control.Arrow proper.
So, in general, the answer is: combinators, combinators, combinators! This ties directly in with point-free style. It can be overdone, but when the combinators exist for your situation, such code can not only be cleaner and shorter, it can be easier to read: you only have to learn an abstraction once, and can then apply it everywhere when reading code.
I suggest using Hoogle to find these combinators; when you think you see a general pattern underlying a definition, try abstracting out what you think the common parts are, taking the type of the result, and searching for it on Hoogle. You might find a combinator that does just what you want.
So, for instance, for your mod case, you could abstract out alt, yielding \f (a,b) -> (f a, f b), then search for its type, (a -> b) -> (a, a) -> (b, b) — there's an exact match, but it's in the fgl graph library, which you don't want to depend on. Still, you can see how the ability to search by type can be very valuable indeed!
There's also a command-line version of Hoogle with GHCi integration; see its HaskellWiki page for more information.
(There's also Hayoo, which searches the entirety of Hackage, but is slightly less clever with types; which one you use is up to personal preference.)
For some of the boilerplate, Data.Function.on can be helpful, although in these examples, it doesn't gain much
instance Ord Foo where
(<=) = (<=) `on` modify -- maybe
mod = uncurry ((,) `on` alt) -- Not really

Practical use of curried functions?

There are tons of tutorials on how to curry functions, and as many questions here at stackoverflow. However, after reading The Little Schemer, several books, tutorials, blog posts, and stackoverflow threads I still don't know the answer to the simple question: "What's the point of currying?" I do understand how to curry a function, just not the "why?" behind it.
Could someone please explain to me the practical uses of curried functions (outside of languages that only allow one argument per function, where the necessity of using currying is of course quite evident.)
edit: Taking into account some examples from TLS, what's the benefit of
(define (action kind)
(lambda (a b)
(kind a b)))
as opposed to
(define (action kind a b)
(kind a b))
I can only see more code and no added flexibility...
One effective use of curried functions is decreasing of amount of code.
Consider three functions, two of which are almost identical:
(define (add a b)
(action + a b))
(define (mul a b)
(action * a b))
(define (action kind a b)
(kind a b))
If your code invokes add, it in turn calls action with kind +. The same with mul.
You defined these functions like you would do in many imperative popular languages available (some of them have been including lambdas, currying and other features usually found in functional world, because all of it is terribly handy).
All add and sum do, is wrapping the call to action with the appropriate kind. Now, consider curried definitions of these functions:
(define add-curried
((curry action) +))
(define mul-curried
((curry action) *))
They've become considerable shorter. We just curried the function action by passing it only one argument, the kind, and got the curried function which accepts the rest two arguments.
This approach allows you to write less code, with high level of maintainability.
Just imagine that function action would immediately be rewritten to accept 3 more arguments. Without currying you would have to rewrite your implementations of add and mul:
(define (action kind a b c d e)
(kind a b c d e))
(define (add a b c d e)
(action + a b c d e))
(define (mul a b c d e)
(action * a b c d e))
But currying saved you from that nasty and error-prone work; you don't have to rewrite even a symbol in the functions add-curried and mul-curried at all, because the calling function would provide the necessary amount of arguments passed to action.
They can make code easier to read. Consider the following two Haskell snippets:
lengths :: [[a]] -> [Int]
lengths xs = map length xs
lengths' :: [[a]] -> [Int]
lengths' = map length
Why give a name to a variable you're not going to use?
Curried functions also help in situations like this:
doubleAndSum ys = map (\xs -> sum (map (*2) xs) ys
doubleAndSum' = map (sum . map (*2))
Removing those extra variables makes the code easier to read and there's no need for you to mentally keep clear what xs is and what ys is.
HTH.
You can see currying as a specialization. Pick some defaults and leave the user (maybe yourself) with a specialized, more expressive, function.
I think that currying is a traditional way to handle general n-ary functions provided that the only ones you can define are unary.
For example, in lambda calculus (from which functional programming languages stem), there are only one-variable abstractions (which translates to unary functions in FPLs). Regarding lambda calculus, I think it's easier to prove things about such a formalism since you don't actually need to handle the case of n-ary functions (since you can represent any n-ary function with a number of unary ones through currying).
(Others have already covered some of the practical implications of this decision so I'll stop here.)
Using all :: (a -> Bool) -> [a] -> Bool with a curried predicate.
all (`elem` [1,2,3]) [0,3,4,5]
Haskell infix operators can be curried on either side, so you can easily curry the needle or the container side of the elem function (is-element-of).
We cannot directly compose functions that takes multiple parameters. Since function composition is one of the key concept in functional programming. By using Currying technique we can compose functions that takes multiple parameters.
I would like to add example to #Francesco answer.
So you don't have to increase boilerplate with a little lambda.
It is very easy to create closures. From time to time i use SRFI-26. It is really cute.
In and of itself currying is syntactic sugar. Syntactic sugar is all about what you want to make easy. C for example wants to make instructions that are "cheap" in assembly language like incrementing, easy and so they have syntactic sugar for incrementation, the ++ notation.
t = x + y
x = x + 1
is replaced by t = x++ + y
Functional languages could just as easily have stuff like.
f(x,y,z) = abc
g(r,s)(z) = f(r,s,z).
h(r)(s)(z) = f(r,s,z)
but instead its all automatic. And that allows for a g bound by a particular r0, s0 (i.e. specific values) to be passed as a one variable function.
Take for example perl's sort function which takes
sort sub list
where sub is a function of two variables that evaluates to a boolean and
list is an arbitrary list.
You would naturally want to use comparison operators (<=>) in Perl and have
sortordinal = sort (<=>)
where sortordinal works on lists. To do this you would sort to be a curried function.
And in fact
sort of a list is defined in precisely this way in Perl.
In short: currying is sugar to make first class functions more natural.

Resources