Why does foldr use a helper function? - haskell

In explaining foldr to Haskell newbies, the canonical definition is
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
But in GHC.Base, foldr is defined as
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.

I can add some important details about GHC's optimization system.
The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.
There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.
When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.
Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:
dropR :: Char -> String -> String
dropR x r = if x == 'x' && null r then "" else x : r
main :: IO ()
main = do
s <- getLine
putStrLn $ foldr dropR "" s
First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:
main :: IO ()
main = do
s <- getLine
-- I'm changing the where clause to a let expression for the sake of readability
putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s
And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.

GHC cannot inline recursive functions, so
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
cannot be inlined. But
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
is not a recursive function. It is a non-recursive function with a local recursive definition!
This means that, as #bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.

As the comments say:
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
In particular, note the "we inline it when it has two parameters, which are the ones we are keen about specialising!"
What this is saying is that when foldr gets inlined, it's getting inlined only for the specific choice of f and z, not for the choice of the list getting folded. I'm not expert, but it would seem it would make it possible to inline it in situations like
map (foldr (+) 0) some_list
so that the inline happens in this line and not after map has been applied. This makes it optimizable in more situations and more easily. All the helper function does is mask the 3rd argument so {-# INLINE #-} can do its thing.

One tiny important detail not mentioned in other answers is that GHC, given a function definition like
f x y z w q = ...
cannot inline f until all of the arguments x, y, z, w, and q are applied. This means that it's often advantageous to use the worker/wrapper transformation to expose a minimal set of function arguments which must be applied before inlining can occur.

Related

Are these premises about folds and recursion right?

When using foldr, the recursion occours inside the function, so,
when the given function doesn't strictly evaluate both sides, and
can return based on the first one, foldr must be a good solution,
because it will work on infinity lists
findInt :: Int -> [Int] -> Bool
findInt z [] = False
-- The recursion occours inside de given function
findInt z (x:xs)
| z == x = True
| otherwise = findInt z xs
equivalent to:
findInt' :: Int -> [Int] -> Bool
findInt' z = foldr (\x r -> if z == x then True else r) False
-- Where False is the "default value" (when it finds [], ex: findInt z [] = False)
A situation when foldr is not appropriate:
addAll :: Int -> [Int] -> Int
addAll z [] = z
-- The recursion occours outside the given function (+)
addAll z (x:xs) = addAll (z + x) xs
In this case, because + is strict (needs to evaluate both sides to return)
it would be greately useful if we applied it in some way which we could
have a redex (reducible expression), to make it possible to avoid thunks
and (when forced to run with previous evaluation, not lazy) in constant
space and without pushing to much onto the stack
(similar to the advantages of a for loop in imperative algorithms)
addAll' :: Int -> [Int] -> Int
addAll' z [] = z
addAll' z (x:xs) = let z' = z + x
in seq z' $ addAll' z' xs
equivalent to:
addAll'' :: Int -> [Int] -> Int
addAll'' z = foldl' (+) z
In this little case, using foldr (inside recursion) doesn't make sense
because it wouldn't make redexes.
It would be like this:
addAll''' :: Int -> [Int] -> Int
addAll''' z [] = z
addAll''' z (x:xs) = (+) x $ addAll''' z xs
The main objective of this question is first, know whether my premises are
right or where they could be better and second, help to make it more clear
for others who are also learning Haskell the differences between inside and
outside recursion, among the approaches, to have it clear in mind which one
could be more appropriated to a given situation
Helpful links:
Haskell Wiki
Stackoverflow - Implications of foldr vs. foldl (or foldl')
Aside from the fact that foldr is the natural catamorphism of a list, while foldl and foldl' are not, a few guidelines for their use:
you are correct on that foldr will always return, even on infinite lists, as long as the function is non-strict in its second argument, since the elements of the list are made available to the first argument of the function immediately (as opposed to foldl and foldl', where the elements of the list are not available to the first argument of the function until the list has been entirely consumed);
foldl' will be a better choice for non-infinite lists if you want to ensure constant space, since it's tail recursive, but it will always parse the entire list regardless of the strictness in the evaluation of the arguments to the function passed to it;
in general, foldr is equivalent to recursion, while foldl and foldl' are analogous to loops;
because of the fact that foldr is the natural catamorphism, if your function needs to recreate the list (for example, if your function is just the list constructor ':'), foldr would be more adequate;
with respect to foldl vs. foldl', foldl' is usually preferable because it will not build a huge thunk but, if the function passed to it is non strict in its first argument and the list is not infinite, foldl may return while foldl' may give an error (there is a good example in the Haskell wiki).
As a side note, I believe that you are using the term "inside recursion" to define foldr and "outside recursion" for foldl and foldl', but I haven't seen these terms before in the literature. More commonly these functions are just referred to as folding from the right and folding from the left respectively, terms that while may not be exactly correct, they give a good notion of the order in which the elements of the list are passed to the function.

Are there ways to call two functions (one just after another) in purely functional language? (in non-io mode)

I'm trying to understand order of execution in purely functional language.
I know that in purely functional languages, there is no necessary execution order.
So my question is:
Suppose there are two functions.
I would like to know all ways in which I can call one function after another (except nested call of one function from another) (and except io-mode).
I would like to see examples in Haskell or pseudo-code.
There is no way to do what you describe, if the functions are totally independent and you don't use the result of one when you call the other.
This is because there is no reason to do this. In a side effect free setting, calling a function and then ignoring its result is exactly the same as doing nothing for the amount of time it takes to call that function (setting aside memory usage).
It is possible that seq x y will evaluate x and then y, and then give you y as its result, but this evaluation order isn't guaranteed.
Now, if we do have side effects, such as if we are working inside a Monad or Applicative, this could be useful, but we aren't truly ignoring the result since there is context being passed implicitly. For instance, you can do
main :: IO ()
main = putStrLn "Hello, " >> putStrLn "world"
in the IO Monad. Another example would be the list Monad (which could be thought of as representing a nondeterministic computation):
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String
example = filter biggerThanTen [1..15] >> return 'a' -- This evaluates to "aaaaa"
Note that even here we aren't really ignoring the result. We ignore the specific values, but we use the structure of the result (in the second example, the structure would be the fact that the resulting list from filter biggerThanTen [1..15] has 5 elements).
I should point out, though, that things that are sequenced in this way aren't necessarily evaluated in the order that they are written. You can sort of see this with the list Monad example. This becomes more apparent with bigger examples though:
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y))) -- ==> [10,100,1000,20,200,2000,30,300,3000]
The main takeaway here is that evaluation order (in the absence of side effects like IO and ignoring bottoms) doesn't affect the ultimate meaning of code in Haskell (other than possible differences in efficiency, but that is another topic). As a result, there is never a reason to call two functions "one after another" in the fashion described in the question (that is, where the calls are totally independent from each other).
Do notation
Do notation is actually exactly equivalent to using >>= and >> (there is actually one other thing involved that takes care of pattern match failures, but that is irrelevant to the discussion at hand). The compiler actually takes things written in do notation and converts them to >>= and >> through a process called "desugaring" (since it removes the syntactic sugar). Here are the three examples from above written with do notation:
IO Example
main :: IO ()
main = do
putStrLn "Hello, "
putStrLn "World"
First list example
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String -- String is a synonym for [Char], by the way
example = do
filter biggerThanTen [1..15]
return 'a'
Second list example
example2 :: [Int]
example2 = do
x <- [1,2,3]
y <- [10,100,1000]
return (x * y)
Here is a side-by-side comparison of the conversions:
do --
m -- m >> n
n --
do --
x <- m -- m >>= (\x ->
... -- ...)
The best way to understand do notation is to first understand >>= and return since, as I said, that's what the compiler transforms do notation into.
As a side-note, >> is just the same as >>=, it just ignores the "result" of it's left argument (although it preserves the "context" or "structure"). So all definitions of >> must be equivalent to m >> n = m >>= (\_ -> n).
Expanding the >>= in the second list example
To help drive home the point that Monads are not usually impure, lets expand the >>= calls in the second list example, using the Monad definition for lists. The definition is:
instance Monad [] where
return x = [x]
xs >>= f = concatMap f xs
and we can convert example2 into:
Step 0 (what we already have)
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
Step 1 (converting the first >>=)
example2 =
concatMap
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
[1,2,3]
Step 2
example2 =
concatMap
(\x -> concatMap
(\y -> return (x * y))
[10,100,1000])
[1,2,3]
Step 3
example2 =
concatMap
(\x -> concatMap
(\y -> [x * y])
[10,100,1000])
[1,2,3]
So, there is no magic going on here, just normal function calls.
You can write a function whose arguments depend on the evaluation of another function:
-- Ads the first two elements of a list together
myFunc :: [Int] -> Int
myFunc xs = (head xs) + (head $ tail xs)
If that's what you mean. In this case, you can't get the output of myFunc xs without evaluating head xs, head $ tail xs and (+). There is an order here. However, the compiler can choose which order to execute head xs and head $ tail xs in since they are not dependent on each other, but it can't do the addition without having both of the other results. It could even choose to evaluate them in parallel, or on different machines. The point is that pure functions, because they have no side effects, don't have to be evaluated in a given order until their results are interdependent.
Another way to look at the above function is as a graph:
myFunc
|
(+)
/ \
/ \
head head
\ |
\ tail
\ /
xs
In order to evaluate a node, all nodes below it have to be evaluated first, but different branches can be evaluated in parallel. First xs must be evaluated, at least partially, but after that the two branches can be evaluated in parallel. There are some nuances due to lazy evaluation, but this is essentially how the compiler constructs evaluation trees.
If you really want to force one function call before the other, you can use the seq function. It takes two arguments, forces the first to be evaluated, then returns the second, e.g.
myFunc2 :: [Int] -> Int
myFunc2 xs = hxs + (hxs `seq` (head $ tail xs))
where hxs = head xs
This will force head xs to evaluate before head $ tail xs, but this is more dealing with strictness than sequencing functions.
Here is an easy way:
case f x of
result1 -> case g y of
result2 -> ....
Still, unless g y uses something from result1 and the subsequent calculations something from result2, or the pattern is such that the result must be evaluated, there is no guarantee that either of f or g are actually called, nor in what order.
Still, you wanted a way to call one function after another, and this is such a way.

Why is this tail-recursive Haskell function slower ?

I was trying to implement a Haskell function that takes as input an array of integers A
and produces another array B = [A[0], A[0]+A[1], A[0]+A[1]+A[2] ,... ]. I know that scanl from Data.List can be used for this with the function (+). I wrote the second implementation
(which performs faster) after seeing the source code of scanl. I want to know why the first implementation is slower compared to the second one, despite being tail-recursive?
-- This function works slow.
ps s x [] = x
ps s x y = ps s' x' y'
where
s' = s + head y
x' = x ++ [s']
y' = tail y
-- This function works fast.
ps' s [] = []
ps' s y = [s'] ++ (ps' s' y')
where
s' = s + head y
y' = tail y
Some details about the above code:
Implementation 1 : It should be called as
ps 0 [] a
where 'a' is your array.
Implementation 2: It should be called as
ps' 0 a
where 'a' is your array.
You are changing the way that ++ associates. In your first function you are computing ((([a0] ++ [a1]) ++ [a2]) ++ ...) whereas in the second function you are computing [a0] ++ ([a1] ++ ([a2] ++ ..)). Appending a few elements to the start of the list is O(1), whereas appending a few elements to the end of a list is O(n) in the length of the list. This leads to a linear versus quadratic algorithm overall.
You can fix the first example by building the list up in reverse order, and then reversing again at the end, or by using something like dlist. However the second will still be better for most purposes. While tail calls do exist and can be important in Haskell, if you are familiar with a strict functional language like Scheme or ML your intuition about how and when to use them is completely wrong.
The second example is better, in large part, because it's incremental; it immediately starts returning data that the consumer might be interested in. If you just fixed the first example using the double-reverse or dlist tricks, your function will traverse the entire list before it returns anything at all.
I would like to mention that your function can be more easily expressed as
drop 1 . scanl (+) 0
Usually, it is a good idea to use predefined combinators like scanl in favour of writing your own recursion schemes; it improves readability and makes it less likely that you needlessly squander performance.
However, in this case, both my scanl version and your original ps and ps' can sometimes lead to stack overflows due to lazy evaluation: Haskell does not necessarily immediately evaluate the additions (depends on strictness analysis).
One case where you can see this is if you do last (ps' 0 [1..100000000]). That leads to a stack overflow. You can solve that problem by forcing Haskell to evaluate the additions immediately, for instance by defining your own, strict scanl:
myscanl :: (b -> a -> b) -> b -> [a] -> [b]
myscanl f q [] = []
myscanl f q (x:xs) = q `seq` let q' = f q x in q' : myscanl f q' xs
ps' = myscanl (+) 0
Then, calling last (ps' [1..100000000]) works.

Lazy Evaluation and Strict Evaluation Haskell

I understand what lazy evaluation is, and how it works and the advantages it has, but could you explain me what strict evaluation really is in Haskell? I can't seem to find much info about it, since lazy evaluation is the most known.
What are the benefit of each of them over the other. When is strict evaluation actually used?
Strictness happens in a few ways in Haskell,
First, a definition. A function is strict if and only if when its argument a doesn't terminate, neither does f a. Nonstrict (sometimes called lazy) is just the opposite of this.
You can be strict in an argument, either using pattern matching
-- strict
foo True = 1
foo False = 1
-- vs
foo _ = 1
Since we don't need to evaluate the argument, we could pass something like foo (let x = x in x) and it'd still just return 1. With the first one however, the function needs to see what value the input is so it can run the appropriate branch, thus it is strict.
If we can't pattern match for whatever reason, then we can use a magic function called seq :: a -> b -> b. seq basically stipulates that whenever it is evaluated, it will evaluated a to what's called weak head normal form.
You may wonder why it's worth it. Let's consider a case study, foldl vs foldl'. foldl is lazy in it's accumulator so it's implemented something like
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f accum [] = acuum
foldl f accum (x:xs) = foldl (f accum x) xs
Notice that since we're never strict in accum, we'll build up a huge series of thunks, f (f (f (f (f (f ... accum)))..)))
Not a happy prospect since this will lead to memory issues, indeed
*> foldl (+) 0 [1..500000000]
*** error: stack overflow
Now what'd be better is if we forced evaluation at each step, using seq
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f accum [] = accum
foldl' f accum (x:xs) = let accum' = f accum x
in accum' `seq` foldl' f accum' xs
Now we force the evaluation of accum at each step making it much faster. This will make foldl' run in constant space and not stackoverflow like foldl.
Now seq only evaluates it values to weak head normal form, sometimes we want them to be evaluated fully, to normal form. For that we can use a library/type class
import Control.DeepSeq -- a library on hackage
deepseq :: NFData a => a -> b -> a
This forces a to be fully evaluated so,
*> [1, 2, error "Explode"] `seq` 1
1
*> [1, 2, error "Explode"] `deepseq` 1
error: Explode
*> undefined `seq` 1
error: undefined
*> undefined `deepseq` 1
error undefined
So this fully evaluates its arguments. This is very useful for parallel programming for example, where you want to fully evaluate something on one core before it's sent back to the main thread, otherwise you'd just create a thunk and all the actual computation would still be sequential.

CPS in curried languages

How does CPS in curried languages like lambda calculus or Ocaml even make sense? Technically, all function have one argument. So say we have a CPS version of addition in one such language:
cps-add k n m = k ((+) n m)
And we call it like
(cps-add random-continuation 1 2)
This is then the same as:
(((cps-add random-continuation) 1) 2)
I already see two calls there that aren't tail calls and in reality a complexly nested expression, the (cps-add random-continuation) returns a value, namely a function that consumes a number, and then returns a function which consumes another number and then delivers the sum of both to that random-continuation. But we can't work around this value returning by simply translating this into CPS again, because we can only give each function one argument. We need to have at least two to make room for the continuation and the 'actual' argument.
Or am I missing something completely?
Since you've tagged this with Haskell, I'll answer in that regard: In Haskell, the equivalent of doing a CPS transform is working in the Cont monad, which transforms a value x into a higher-order function that takes one argument and applies it to x.
So, to start with, here's 1 + 2 in regular Haskell: (1 + 2) And here it is in the continuation monad:
contAdd x y = do x' <- x
y' <- y
return $ x' + y'
...not terribly informative. To see what's going on, let's disassemble the monad. First, removing the do notation:
contAdd x y = x >>= (\x' -> y >>= (\y' -> return $ x' + y'))
The return function lifts a value into the monad, and in this case is implemented as \x k -> k x, or using an infix operator section as \x -> ($ x).
contAdd x y = x >>= (\x' -> y >>= (\y' -> ($ x' + y')))
The (>>=) operator (read "bind") chains together computations in the monad, and in this case is implemented as \m f k -> m (\x -> f x k). Changing the bind function to prefix form and substituting in the lambda, plus some renaming for clarity:
contAdd x y = (\m1 f1 k1 -> m1 (\a1 -> f1 a1 k1)) x (\x' -> (\m2 f2 k2 -> m2 (\a2 -> f2 a2 k2)) y (\y' -> ($ x' + y')))
Reducing some function applications:
contAdd x y = (\k1 -> x (\a1 -> (\x' -> (\k2 -> y (\a2 -> (\y' -> ($ x' + y')) a2 k2))) a1 k1))
contAdd x y = (\k1 -> x (\a1 -> y (\a2 -> ($ a1 + a2) k1)))
And a bit of final rearranging and renaming:
contAdd x y = \k -> x (\x' -> y (\y' -> k $ x' + y'))
In other words: The arguments to the function have been changed from numbers, into functions that take a number and return the final result of the entire expression, just as you'd expect.
Edit: A commenter points out that contAdd itself still takes two arguments in curried style. This is sensible because it doesn't use the continuation directly, but not necessary. To do otherwise, you'd need to first break the function apart between the arguments:
contAdd x = x >>= (\x' -> return (\y -> y >>= (\y' -> return $ x' + y')))
And then use it like this:
foo = do f <- contAdd (return 1)
r <- f (return 2)
return r
Note that this is really no different from the earlier version; it's simply packaging the result of each partial application as taking a continuation, not just the final result. Since functions are first-class values, there's no significant difference between a CPS expression holding a number vs. one holding a function.
Keep in mind that I'm writing things out in very verbose fashion here to make explicit all the steps where something is in continuation-passing style.
Addendum: You may notice that the final expression looks very similar to the de-sugared version of the monadic expression. This is not a coincidence, as the inward-nesting nature of monadic expressions that lets them change the structure of the computation based on previous values is closely related to continuation-passing style; in both cases, you have in some sense reified a notion of causality.
Short answer : of course it makes sense, you can apply a CPS-transform directly, you will only have lots of cruft because each argument will have, as you noticed, its own attached continuation
In your example, I will consider that there is a +(x,y) uncurried primitive, and that you're asking what is the translation of
let add x y = +(x,y)
(This add faithfully represent OCaml's (+) operator)
add is syntaxically equivalent to
let add = fun x -> (fun y -> +(x, y))
So you apply a CPS transform¹ and get
let add_cps = fun x kx -> kx (fun y ky -> ky +(x,y))
If you want a translated code that looks more like something you could have willingly written, you can devise a finer transformation that actually considers known-arity function as non-curried functions, and tream all parameters as a whole (as you have in non-curried languages, and as functional compilers already do for obvious performance reasons).
¹: I wrote "a CPS transform" because there is no "one true CPS translation". Different translations have been devised, producing more or less continuation-related garbage. The formal CPS translations are usually defined directly on lambda-calculus, so I suppose you're having a less formal, more hand-made CPS transform in mind.
The good properties of CPS (as a style that program respect, and not a specific transformation into this style) are that the order of evaluation is completely explicit, and that all calls are tail-calls. As long as you respect those, you're relatively free in what you can do. Handling curryfied functions specifically is thus perfectly fine.
Remark : Your (cps-add k 1 2) version can also be considered tail-recursive if you assume the compiler detect and optimize that cps-add actually always take 3 arguments, and don't build intermediate closures. That may seem far-fetched, but it's the exact same assumption we use when reasoning about tail-calls in non-CPS programs, in those languages.
yes, technically all functions can be decomposed into functions with one method, however, when you want to use CPS the only thing you are doing is saying is that at a certain point of computation, run the continuation method.
Using your example, lets have a look. To make things a little easier, let's deconstruct cps-add into its normal form where it is a function only taking one argument.
(cps-add k) -> n -> m = k ((+) n m)
Note at this point that the continuation, k, is not being evaluated (Could this be the point of confusion for you?).
Here we have a method called cps-add k that receives a function as an argument and then returns a function that takes another argument, n.
((cps-add k) n) -> m = k ((+) n m)
Now we have a function that takes an argument, m.
So I suppose what I am trying to point out is that currying does not get in the way of CPS style programming. Hope that helps in some way.

Resources