Accumulator in foldr - haskell

In the Haskell Wikibook, foldr is implemented as follows:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f acc [] = acc
foldr f acc (x:xs) = f x (foldr f acc xs)
It is stated that the initial value of the accumulator is set as an argument. But as I understand it, acc is the identity value for the operation (e.g. 0 for sum or 1 for product) and its value does not change during the execution of the function. Why then is it referred to here and in other texts as an accumulator, implying that it changes or accumulates a value step by step?
I can see that an accumulator is relevant in a left fold, such as foldl, but is the wikibook explanation incorrect, and only for symmetry, in which case it is wrong?

Consider the evaluation of a simple foldr expression based on the (correct) definition you provided:
foldr (+) 0 [1,2,3,4]
= 1 + foldr (+) 0 [2,3,4]
= 1 + 2 + foldr (+) 0 [3,4]
= 1 + 2 + 3 + foldr (+) 0 [4]
= 1 + 2 + 3 + 4 + foldr (+) 0 []
= 1 + 2 + 3 + 4 + 0
= 10
So you are right: acc doesn't really "accumulate" anything. It never takes on a value other than 0.
Why is it called "acc" if it isn't an accumulator? Similarity to foldl? Hysterical raisins? A lie to children? I'm not sure.
Edit: I'll also point out that the GHC implementation of foldr uses z (presumably for zero) rather than acc.

acc doesn't really accumulate anything in the case of foldr as has been pointed out.
I'd add that without it, it's not clear what should happen when the input is an empty list.
It also changes the type signature of f, limiting the functions that can be used.
E.g:
foldr' :: (a -> a -> a) -> [a] -> a
foldr' f [] = error "empty list???"
foldr' f (x:[]) = x
foldr' f (x:xs) = f x (foldr' f xs)

Related

Haskell dependent, independent variables in lambda function as applied to foldr

Given
> foldr (+) 5 [1,2,3,4]
15
this second version
foldr (\x n -> x + n) 5 [1,2,3,4]
also returns 15. The first thing I don't understand about the second version is how foldr knows which variable is associated with the accumulator-seed 5 and which with the list variable's elements [1,2,3,4]. In the lambda calculus way, x would seem to be the dependent variable and n the independent variable. So if this
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
is foldr and these
:type foldr
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
:t +d foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
its type declarations, can I glean, deduce the answer to "which is dependent and which is independent" from the type declaration itself? It would seem both examples of foldr above must be doing this
(+) 1 ((+) 2 ((+) 3 ((+) 4 ((+) 5 0))))
I simply guessed the second, lambda function version above, but I don't really understand how it works, whereas the first version with (+) breaks down as shown directly above.
Another example would be this
length' = foldr (const (1+)) 0
where, again, const seems to know to "throw out" the incoming list elements and simply increment, starting with the initial accumulator value. This is the same as
length' = foldr (\_ acc -> 1 + acc) 0
where, again, Haskell knows which of foldr's second and third arguments -- accumulator and list -- to treat as the dependent and independent variable, seemingly by magic. But no, I'm sure the answer lies in the type declaration (which I can't decipher, hence, this post), as well as the lore of lambda calculus, of which I'm a beginner.
Update
I've found this
reverse = foldl (flip (:)) []
and then applying to a list
foldl (flip (:)) [] [1,2,3]
foldl (flip (:)) (1:[]) [2,3]
foldl (flip (:)) (2:1:[]) [3]
foldl (flip (:)) (3:2:1:[]) []
. . .
Here it's obvious that the order is "accumulator" and then list, and flip is flipping the first and second variables, then subjecting them to (:). Again, this
reverse = foldl (\acc x -> x : acc) []
foldl (\acc x -> x : acc) [] [1,2,3]
foldl (\acc x -> x : acc) (1:[]) [1,2,3]
. . .
seems also to rely on order, but in the example from further above
length' = foldr (\_ acc -> 1 + acc) 0
foldr (\_ acc -> 1 + acc) 0 [1,2,3]
how does it know 0 is the accumulator and is bound to acc and not the first (ghost) variable? So as I understand (the first five pages of) lambda calculus, any variable that is "lambda'd," e.g., \x is a dependent variable, and all other non-lambda'd variables are independent. Above, the \_ is associated with [1,2,3] and the acc, ostensibly the independent variable, is 0; hence, order is not dictating assignment. It's as if acc was some keyword that when used always binds to the accumulator, while x is always talking about the incoming list members.
Also, what is the "algebra" in the type definition where t a is transformed to [a]? Is this something from category theory? I see
Data.Foldable.toList :: t a -> [a]
in the Foldable definition. Is that all it is?
By "dependent" you most probably mean bound variable.
By "independent" you most probably mean free (i.e. not bound) variable.
There are no free variables in (\x n -> x + n). Both x and n appear to the left of the arrow, ->, so they are named parameters of this lambda function, bound inside its body, to the right of the arrow. Being bound means that each reference to n, say, in the function's body is replaced with the reference to the corresponding argument when this lambda function is indeed applied to its argument(s).
Similarly both _ and acc are bound in (\_ acc -> 1 + acc)'s body. The fact that the wildcard is used here, is immaterial. We could just have written _we_dont_care_ all the same.
The parameters in lambda function definition get "assigned" (also called "bound") the values of the arguments in an application, purely positionally. The first argument will be bound / assigned to the first parameter, the second argument - to the second parameter. Then the lambda function's body will be entered and further reduced according to the rules.
This can be seen a bit differently stating that actually in lambda calculus all functions have only one parameter, and multi-parameter functions are actually nested uni-parameter lambda functions; and that the application is left-associative i.e. nested to the left.
What this actually means is quite simply
(\ x n -> x + n) 5 0
=
(\ x -> (\ n -> x + n)) 5 0
=
((\ x -> (\ n -> x + n)) 5) 0
=
(\ n -> 5 + n) 0
=
5 + 0
As to how Haskell knows which is which from the type signatures, again, the type variables in the functional types are also positional, with first type variable corresponding to the type of the first expected argument, the second type variable to the second expected argument's type, and so on.
It is all purely positional.
Thus, as a matter of purely mechanical and careful substitution, since by the definition of foldr it holds that foldr g 0 [1,2,3] = g 1 (foldr g 0 [2,3]) = ... = g 1 (g 2 (g 3 0)), we have
foldr (\x n -> x + n) 0 [1,2,3]
=
(\x n -> x + n) 1 ( (\x n -> x + n) 2 ( (\x n -> x + n) 3 0 ))
=
(\x -> (\n -> x + n)) 1 ( (\x n -> x + n) 2 ( (\x n -> x + n) 3 0 ))
=
(\n -> 1 + n) ( (\x n -> x + n) 2 ( (\x n -> x + n) 3 0 ))
=
1 + ( (\x n -> x + n) 2 ( (\x n -> x + n) 3 0 ))
=
1 + ( (\x (\n -> x + n)) 2 ( (\x n -> x + n) 3 0 ))
=
1 + (\n -> 2 + n) ( (\x n -> x + n) 3 0 )
=
1 + (2 + (\x n -> x + n) 3 0 )
=
1 + (2 + (\x -> (\n -> x + n)) 3 0 )
=
1 + (2 + (\n -> 3 + n) 0 )
=
1 + (2 + ( 3 + 0))
In other words, there is absolutely no difference between (\x n -> x + n) and (+).
As for that t in foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b, what that means is that given a certain type T a, if instance Foldable T exists, then the type becomes foldr :: (a -> b -> b) -> b -> T a -> b, when it's used with a value of type T a.
One example is Maybe a and thus foldr (g :: a -> b -> b) (z :: b) :: Maybe a -> b.
Another example is [] a and thus foldr (g :: a -> b -> b) (z :: b) :: [a] -> b.
(edit:) So let's focus on lists. What does it mean for a function foo to have that type,
foo :: (a -> b -> b) -> b -> [a] -> b
? It means that it expects an argument of type a -> b -> b, i.e. a function, let's call it g, so that
foo :: (a -> b -> b) -> b -> [a] -> b
g :: a -> b -> b
-------------------------------------
foo g :: b -> [a] -> b
which is itself a function, expecting of some argument z of type b, so that
foo :: (a -> b -> b) -> b -> [a] -> b
g :: a -> b -> b
z :: b
-------------------------------------
foo g z :: [a] -> b
which is itself a function, expecting of some argument xs of type [a], so that
foo :: (a -> b -> b) -> b -> [a] -> b
g :: a -> b -> b
z :: b
xs :: [a]
-------------------------------------
foo g z xs :: b
And what could such function foo g z do, given a list, say, [x] (i.e. x :: a, [x] :: [a])?
foo g z [x] = b where
We need to produce a b value, but how? Well, g :: a -> b -> b produces a function b -> b given an value of type a. Wait, we have that!
f = g x -- f :: b -> b
and what does it help us? Well, we have z :: b, so
b = f z
And what if it's [] we're given? We don't have any as then at all, but we have a b type value, z -- so instead of the above we'd just define
b = z
And what if it's [x,y] we're given? We'll do the same f-building trick, twice:
f1 = g x -- f1 :: b -> b
f2 = g y -- f2 :: b -> b
and to produce b we have many options now: it's z! or maybe, it's f1 z!? or f2 z? But the most general thing we can do, making use of all the data we have access to, is
b = f1 (f2 z)
for a right-fold (...... or,
b = f2 (f1 z)
for a left).
And if we substitute and simplify, we get
foldr g z [] = z
foldr g z [x] = g x z -- = g x (foldr g z [])
foldr g z [x,y] = g x (g y z) -- = g x (foldr g z [y])
foldr g z [x,y,w] = g x (g y (g w z)) -- = g x (foldr g z [y,w])
A pattern emerges.
Etc., etc., etc.
A sidenote: b is a bad naming choice, as is usual in Haskell. r would be much much better -- a mnemonic for "recursive result".
Another mnemonic is the order of g's arguments: a -> r -> r suggests, nay dictates, that a list's element a comes as a first argument; r the recursive result comes second (the Result of Recursively processing the Rest of the input list -- recursively, thus in the same manner); and the overall result is then produced by this "step"-function, g.
And that's the essence of recursion: recursively process self-similar sub-part(s) of the input structure, and complete the processing by a simple single step:
a a
: `g`
[a] r
------------- -------------
[a] r
[a]
a [a]
--------
(x : xs) -> r
xs -> r
----------------------
( x , r ) -> r --- or, equivalently, x -> r -> r
Well, the foldr itself knows this by definition. It was defined in such way that its function argument accepts the accumulator as 2nd argument.
Just like when you write a div x y = ... function you are free to use y as dividend.
Maybe you got confused by the fact that foldr and foldl has swapped arguments in the accumulator funtions?
As Steven Leiva says here, a foldr (1) takes a list and replaces the cons operators (:) with the given function and (2) replaces the last empty list [] with the accumulator-seed, which is what the definition of foldr says it will do
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
So de-sugared [1,2,3] is
(:) 1 ((:) 2 ((:) 3 []))
and the recursion is in effect replacing the (:) with f, and as we see in foldr f z (x:xs) = f x (foldr f z xs), the z seed value is going along for the ride until the base case where it is substituted for the [], fulfilling (1) and (2) above.
My first confusion was seeing this
foldr (\x n -> x + n) 0 [1,2,3]
and not understanding it would be expanded out, per definition above, to
(\x n -> x + n) 1 ((\x n -> x + n) 2 ((\x n -> x + n) 3 0 ))
Next, due to a weak understanding of how the actual beta reduction would progress, I didn't understand the second-to-third step below
(\x -> (\n -> x + n)) 1 ...
(\n -> 1 + n) ...
1 + ...
That second-to-third step is lambda calculus being bizarre all right, but is at the root of why (+) and (\x n -> x + n) are the same thing. I don't think it's pure lambda calculus addition, but it (verbosely) mimics addition in recursion. I probably need to jump back into lambda calculus to really grasp why (\n -> 1 + n) turns into 1 +
My worse mental block was thinking I was looking at some sort of eager evaluation inside the parentheses first
foldr ((\x n -> x + n) 0 [1,2,3,4])
where the three arguments to foldr would interact first, i.e., 0 would be bound to the x and the list member to the n
(\x n -> x + n) 0 [1,2,3,4]
0 + 1
. . . then I didn't know what to think. Totally wrong-headed, even though, as Will Ness points out above, beta reduction is positional in binding arguments to variables. But, of course, I left out the fact that Haskell currying means we follow the expansion of foldr first.
I still don't fully understand the type definition
foldr :: (a -> b -> b) -> b -> [a] -> b
other than to comment/guess that the first a and the [a] mean a is of the type of the members of the incoming list and that the (a -> b -> b) is a prelim-microcosm of what foldr will do, i.e., it will take an argument of the incoming list's type (in our case the elements of the list?) then another object of type b and produce an object b. So the seed argument is of type b and the whole process will finally produce something of type b, also the given function argument will take an a and ultimately give back an object b which actually might be of type a as well, and in fact is in the above example with integers... IOW, I don't really have a firm grasp of the type definition...

Using foldr with only two parameters

I've got a few exercises to prepare for the exam in Haskell/Prolog.
One Haskell task is to rewrite the function below:
original :: [Integer] -> Integer
original [] = 0
original (x:xs) | x < 20 = 5 * x - 3 + original xs
| otherwise = original xs
But the condition is that I am only allowed to remove the two "undifined" in the scheme below:
alternative :: [Integer] -> Integer
alternative = foldr undefined undefined
My problem is that I dont know how this could match the normal foldr structure with 3 parameters (function, "start value" or how is it called?,list)?
Maybe an equivalent example would be helpfull, not the full soultion please!
Futhermore I am not allowed to use "let" or "where".
Thank you for any help!
Sooo... I just followed the idea from #hugo to just first complete the task on the "normal" way, which works but is not allowed by our university correction tool:
alternative :: [Integer] -> Integer
alternative list = foldr (\ x y -> if x < 20 then 5*x -3 + y else y) 0 list
AND after try end error i got the solution:
alternative :: [Integer] -> Integer
alternative = foldr (\ x y -> if x < 20 then 5*x -3 + y else y) 0
A list like [1,4,2,5] is syntactical sugar for (:) 1 ((:) 4 ((:) 2 ((:) 5 []))). foldr f z basically replaces the (:) data constructor with f, and the empty list data constructor [] with z. So foldr f z will result in f 1 (f 4 (f 2 (f 5 z))).
Since you write original [] = 0, this thus means that for z, we can use 0. For f we can use if x < 20 then (+) (5*x-3) else id, since in case x < 20, we add 5*x-3 to the value, and otherwise, we do nothing with the recursively computed value.
We can thus make an alternative implementation that looks like:
alternative :: (Foldable f, Num a, Ord a) => f a -> a
alternative = foldr f 0
where f x ys | x < 20 = 5*x - 3 + ys
| otherwise = ys
or without the where clause with an inline lambda expression:
alternative :: (Foldable f, Num a, Ord a) => f a -> a
alternative = foldr (\x -> if x < 20 then (+) (5*x-3) else id) 0

How does fold works for empty list?

When we fold a list with one or more elements inside as done below:
foldr (+) 0 [1,2,3]
We get:
foldr (+) 0 (1 : 2 : 3 : [])
foldr (+) 1 + (2 +(3 + 0)) // 6
Now when the list is empty:
foldr (+) 0 []
Result: foldr (+) 0 ([])
Since (+) is binary operator, it needs two arguments to complete but here we end up (+) 0. How does it result in 0 and not throwing error of partially applied function.
Short answer: you get the initial value z.
If you give foldl or foldr an empty list, then it returns the initial value. foldr :: (a -> b -> b) -> b -> t a -> b works like:
foldr f z [x1, x2, ..., xn] == x1 `f` (x2 `f` ... (xn `f` z)...)
So since there are no x1, ..., xn the function is never applied, and z is returned.
We can also inspect the source code:
foldr :: (a -> b -> b) -> b -> [a] -> b
-- foldr _ z [] = z
-- foldr f z (x:xs) = f x (foldr f z xs)
{-# INLINE [0] foldr #-}
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
So if we give foldr an empty list, then go will immediately work on that empty list, and return z, the initial value.
A cleaner syntax (and a bit less efficient, as is written in the comment of the function) would thus be:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
Note that - depending on the implementation of f - it is possible to foldr on infinite lists: if at some point f only looks at the initial value, and then returns a value, then the recursive part can be dropped.

Haskell Length function implementation

I am learning Haskell programming, and I am trying to understand how lists work, hence I attempted writing two possible length functions:
myLength :: [a] -> Integer
myLength = foldr (\x -> (+) 1) 0
myLength1 :: [a] -> Integer
myLength1 [] = 0
myLength1 (x:xs) = (+1) (myLength1 xs)
Which one is better?
From my point of view, myLength1 is much easier to understand, and looks natural for operating on lists.
On the other hand, myLength is shorter and it does not use recursion; does this imply myLength runs faster than myLength1?
Take in mind this "pseudo implementation" of foldr:
foldr :: function -> initializer -> [a] -> b
foldr _ i [] = i
foldr f i (x:xs) = x `f` (foldr f i xs)
Now we have your code
myLength :: [a] -> Integer
myLength = foldr (\x -> (+) 1) 0
myLength1 :: [a] -> Integer
myLength1 [] = 0
myLength1 (x:xs) = (+1) (myLength1 xs)
Since foldr is also recursive itself, your myLength1 and myLength will be almost the same but in the first case the recursive call is done by foldr instead of explicitly by yourself. They should run around the same time.
Both functions do the same thing : foldr use recursion and will end up executing similarly to your directly recursive function. It could be argued that the foldr version is cleaner (once you're accustomed to them, higher order function are often more readable than direct recursion).
But those two functions are pretty bad : they'll both end up building a big thunk (an unevaluated value) 1 + (1 + (1 + ... + 0)..)) which will take a lot of memory ( O(n) space ) and will slow evaluation. To avoid that you should start adding the 1s from the beginning of the list, like so :
betterLength xs = go 0 xs
where
go n [] = n
go n (_:xs) = n `seq` go (n+1) xs
The seq ensures that n is evaluated before the go function is called recursively and thus there is no accumulation of +1. With the BangPatterns extension, you can write this :
betterLength xs = go 0 xs
where
go n [] = n
go !n (_:xs) = go (n+1) xs
It is also possible to do this version with a fold :
betterLength = foldl' (\n _ -> n + 1) 0
where the foldl' start from the left and is strict (').
Using foldr, it can be implemented as:
length' xs = foldr (\_ n -> 1 + n) 0 xs
Explanation:
The lambda function (\_ x -> n + 1) will increment the accumulator by one every time there is an element. For instance:
lenght' [1..4]
will be applied as:
1 + ( 1 + ( 1 + ( 1 + 0)))
Recall that foldr is defined like this:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f v [] = v
foldr f v (x:xs) = f x (foldr f v xs)

Haskell Fold with anonymous function

I have a problem with one of the Haskell basics: Fold + anonymous functions
I'm developing a bin2dec program with foldl.
The solution looks like this:
bin2dec :: String -> Int
bin2dec = foldl (\x y -> if y=='1' then x*2 + 1 else x*2) 0
I understand the basic idea of foldl / foldr but I can't understand what the parameters x y stands for.
See the type of foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
Consider foldl f z list
so foldl basically works incrementally on the list (or anything foldable), taking 1 element from the left and applying f z element to get the new element to be used for the next step while folding over the rest of the elements. Basically a trivial definition of foldl might help understanding it.
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
The diagram from Haskell wiki might help building a better intuition.
Consider your function f = (\x y -> if y=='1' then x*2 + 1 else x*2) and try to write the trace for foldl f 0 "11". Here "11" is same as ['1','1']
foldl f 0 ['1','1']
= foldl f (f 0 '1') ['1']
Now f is a function which takes 2 arguments, first a integer and second a character and returns a integer.
So In this case x=0 and y='1', so f x y = 0*2 + 1 = 1
= foldl f 1 ['1']
= foldl f (f 1 '1') []
Now again applying f 1 '1'. Here x=1 and y='1' so f x y = 1*2 + 1 = 3.
= foldl f 3 []
Using the first definition of foldl for empty list.
= 3
Which is the decimal representation of "11".
Use the types! You can type :t in GHCi followed by any function or value to see its type. Here's what happens if we ask the for the type of foldl
Prelude> :t foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
The input list is of type [b], so it's a list of bs. The output type is a, which is what we're going to produce. You also have to supply an initial value for the fold, also of type a. The function is of type
a -> b -> a
The first parameter (a) is the value of the fold computed so far. The second parameter (b) is the next element of the list. So in your example
\x y -> if y == '1' then x * 2 + 1 else x * 2
the parameter x is the binary number you've computed so far, and y is the next character in the list (either a '1' or a '0').

Resources