Definiton of length using foldr - haskell

I'm trying to understand a part in the lecture notes of a class I'm taking. It defines the length function as:
length = foldr (\_ n -> 1 + n) 0
Can someone explain me how this works? I can't wrap my mind around it.

First, type of foldr: (a -> b -> b) -> b -> [a] -> b
Taking the usage into context, foldr takes in 3 arguments: a function (that takes in a. an element of a list and b. an accumulator, and returns the accumulator), the starting value of accumulator, and a list. foldr returns the final result of the accumulator after applying the function through the list.
As for this piece of code:
length = foldr (\_ n -> 1 + n) 0
As you can see, it is missing the list - so the return value of the right hand side is a function that will take in a list and produce an Int (same type as 0). Type: [a] -> Int.
As for what the right hand side means: (\_ n -> 1 + n) 0
\ means declare an unnamed function
_ means ignore the element from the list (correspond to a in the type of foldr). As you know, foldr will go through the list and apply the function to each element. This is the element passed into the function. We don't have any use of it in a length function, so we denote that it should be ignored.
n is the parameter for the Int passed in as accumulator.
-> means return
1 + n will increment the accumulator. You can imagine that the return value is passed back to foldr and foldr saves the value to pass into the next call to the function (\_ n -> 1 + n).
The 0 outside the bracket is the starting value of the counter.

The function foldr is to fold the list with a right associative operator, you can easily understand what the function does if you use the operator(+), (The function has the same behavior as sum):
foldr (+) 0 [1,2,3,4,5] = 1+(2+(3+(4+(5+0))))
For your length function, it is equivalent to:
foldr (\_ n -> 1 + n) 0 [1,2,3,4,5] = 1+(1+(1+(1+(1+0))))
That is what the foldr for

There's several equivalent ways to understand it. First one: foldr f z [1, 2, 3, 4, ..., n] computes the following value:
f 1 (f 2 (f 3 (f 4 (f ... (f n z)))))
So in your case:
length [1,2,3,4] = foldr (\_ n -> 1 + n) 0 [1,2,3,4]
= (\_ n -> 1 + n) 1 ((\_ n -> 1 + n) 2 ((\_ n -> 1 + n) 3 ((\_ n -> 1 + n) 4 0)))
= (\_ n -> 1 + n) 1 ((\_ n -> 1 + n) 2 ((\_ n -> 1 + n) 3 (1 + 0)))
= (\_ n -> 1 + n) 1 ((\_ n -> 1 + n) 2 (1 + (1 + 0)))
= (\_ n -> 1 + n) 1 (1 + (1 + (1 + 0)))
= 1 + (1 + (1 + (1 + 0)))
= 1 + (1 + (1 + 1))
= 1 + (1 + 2)
= 1 + 3
= 4
Another one is to start from this function, which copies a list:
listCopy :: [a] -> [a]
listCopy [] = []
listCopy (x:xs) = x : listCopy xs
That may look like a trivial function, but foldr is basically just that, but except of hardcoding the empty list [] and the pair constructor : into the right hand side, we instead use some arbitrary constant and function supplied as arguments. I sometimes like to call these arguments fakeCons and fakeNil (cons and nil are the names of the : operator and [] constant in the Lisp language), because in a sense you're "copying" the list but using fake constructors:
foldr fakeCons fakeNil [] = fakeNil
foldr fakeCons fakeNil (x:xs) = fakeCons x (subfold xs)
where subfold = foldr fakeCons fakeNil
So under this interpretation, your length function is "copying" a list, except that instead of the empty list it's using 0, and instead of : it's discarding the elements and adding 1 to the running total.
And here's yet a third intepretation of foldr f z xs:
z is the solution of your problem when the list is empty.
f is a function that takes two arguments: an element of the list , and a partial solution: the solution to your problem for the list of elements that appear to the right of the element that's passed to f. f then produces a solution that's "one element bigger."
So in the case of length:
The length of an empty list is 0, so that's why you use 0 as the second argument to foldr.
If the length of xs is n, then the length of x:xs is n+1. That's what your first argument to foldr, \_ n -> n + 1, is doing: it's computing the length of a list, given as arguments the first element of the list (which in this case we ignore) and the length of the rest of the list (n).
This way of thinking about foldr is very powerful, and should not be underestimated. Basically, in the function that you pass as the first argument to foldr, you're allowed to assume that the problem you're trying to solve has already been solved for all lists shorter than the one you're dealing with. All your argument function has to do, then, is to compute an answer for a list that's one element longer.

Related

Haskell naive duplicate filtering

I do not understand a sample solution for the following problem: given a list of elements, remove the duplicates. Then count the unique digits of a number. No explicit recursion may be used for either problem.
My code:
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = foldr (\x ys -> x:(filter (x /=) ys)) []
differentDigits :: Int -> Int
differentDigits xs = length (removeDuplicates (show xs))
The solution I am trying to understand has a different definition for differentDigits, namely
differentDigits xs = foldr (\ _ x -> x + 1) 0 ( removeDuplicates ( filter (/= '_') ( show xs )))
Both approaches work, but I cannot grasp the sample solution. To break my question down into subquestions,
How does the first argument to filter work? I mean
(/= '_')
How does the lambda for foldr work? In
foldr (\ _ x -> x + 1)
^
the variable x should still be the Char list? How does Haskell figure out that actually 0 should be incremented?
filter (/= '_') is, I'm pretty sure, redundant. It filters out underscore characters, which shouldn't be present in the result of show xs, assuming xs is a number of some sort.
foldr (\ _ x -> x + 1) 0 is equivalent to length. The way foldr works, it takes the second argument (which in your example is zero) as the starting point, then applies the first argument (in your example, lambda) to it over and over for every element of the input list. The element of the input list is passed into the lambda as first argument (denoted _ in your example), and the running sum is passed as second argument (denoted x). Since the lambda just returns a "plus one" number on every pass, the result will be a number representing how many times the lambda was called - which is the length of the list.
First, note that (2) is written in so called point free style, leaving out the third argument of foldr.
https://en.wikipedia.org/wiki/Tacit_programming#Functional_programming
Also, the underscore in \_ x -> x + 1 is a wild card, that simply marks the place of a parameter but that does not give it a name (a wild card works as a nameless parameter).
Second, (2) is a really nothing else than a simple recursive function that folds to the right. foldr is a compact way to write such recursive functions (in your case length):
foldr :: (a -> b -> b) -> b -> [a]
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
If we write
foldr f c ls
ls is the list over which our recursive function should recur (a is the type of the elements).
c is the result in the base case (when the recursive recursive function is applied on an empty list).
f computes the result in the general case (when the recursive function is applied on a non-empty list). f takes two arguments:
The head of the list and
the result of the recursive call on the tail of the list.
So, given f and c, foldr will go through the list ls recursively.
A first example
The Wikipedia page about point free style gives the example of how we can compute the sum of all elements in a list using foldr:
Instead of writing
sum [] = 0
sum (x:xs) = x + sum xs
we can write
sum = foldr (+) 0
The operator section (+) is a 2-argument function that adds its arguments. The expression
sum [1,2,3,4]
is computed as
1 + (2 + (3 + (4)))
(hence "folding to the right").
Example: Multiplying all elements.
Instead of
prod [] = 1
prod (x:xs) = x * prod xs
we can write
prod = foldr (*) 1
Example: Remove all occurrences of a value from a list.
Instead of
remove _ [] = []
remove v (x:xs) = if x==v then remove v xs else x:remove v xs
we can write
remove v = foldr (\x r -> if x==v then r else x:r) []
Your case, (2)
We can now fully understand that
length = foldr (\ _ x -> x + 1) 0
in fact is the same as
length [] = 0
length (x:xs) = length xs + 1
that is, the length function.
Hope this recursive view on foldr helped you understand the code.

Haskell's foldr/foldl definitions trip newbie? For foldl Actual function takes f (default case) x while for foldr function takes f x (default case)?

First thing, I understand (almost) fold functions. Given the function I can work out easily what will happen and how to use it.
The question is about the way it is implemented which leads to slight difference in the function definition which took some time to understand.To make matters worse most example for folds have same type of the list and default case, which does not help in the understranding as these can be different.
Usage:
foldr f a xs
foldl f a xs
where a is the default case
definition:
foldr: (a -> b -> b) -> b -> [a] -> b
foldl: (a -> b -> a) -> a -> [b] -> a
In definition I understand a is the first variable to be passed and b second variable to be passed to function.
Eventually I understood that this is happening due to the fact that when f finally gets evaluated in foldr it is implemented as f x a (i.e. default case is passed as second parameter). But for foldl it is implemented as f a x (i.e. default case is passed as first parameter).
Would not the function definition be same if we had passed the default case as same (either 1st parameter in both or 2nd) in both cases? Was there any particular reason for this choice?
To make things a little clearer, I will rename a couple type variables in your foldl signature...
foldr: (a -> b -> b) -> b -> [a] -> b
foldl: (b -> a -> b) -> b -> [a] -> b
... so that in both cases a stands for the type of the list elements, and b for that of the fold results.
The key difference between foldr and foldl can be seen by expanding their recursive definitions. The applications of f in foldr associate to the right, and the initial value shows up to the right of the elements:
foldr f a [x,y,z] = x `f` (y `f` (z `f` a))
With foldl, it is the other way around: the association is to the left, and the initial value shows up to the left (as Silvio Mayolo emphasises in his answer, that's how it has to be so that the initial value is in the innermost sub-expression):
foldl f a [x,y,z] = ((a `f` x) `f` y) `f` z
That explains why the list element is the first argument to the function given to foldr, and the second to the one given to foldl. (One might, of course, give foldl the same signature of foldr and then use flip f instead of f when defining it, but that would achieve nothing but confusion.)
P.S.: Here is a good, simple example of folds with the types a and b different from each other:
foldr (:) [] -- id
foldl (flip (:)) [] -- reverse
A fold is a type of catamorphism, or a way of "tearing down" a data structure into a scalar. In our case, we "tear down" a list. Now, when working with a catamorphism, we need to have a case for each data constructor. Haskell lists have two data constructors.
[] :: [a]
(:) :: a -> [a] -> [a]
That is, [] is a constructor which takes no arguments and produces a list (the empty list). (:) is a constructor which takes two arguments and makes a list, prepending the first argument onto the second. So we need to have two cases in our fold. foldr is the direct example of a catamorphism.
foldr :: (a -> b -> b) -> b -> [a] -> b
The first function will be called if we encounter the (:) constructor. It will be passed the first element (the first argument to (:)) and the result of the recursive call (calling foldr on the second argument of (:)). The second argument, the "default case" as you call it, is for when we encounter the [] constructor, in which case we simply use the default value itself. So it ends up looking like this
foldr (+) 4 [1, 2, 3]
1 + (2 + (3 + 4))
Now, could we have designed foldl the same way? Sure. foldl isn't (exactly) a catamorphism, but it behaves like one in spirit. In foldr, the default case is the innermost value; it's only used at the "last step" of the recursion, when we've run out of list elements. In foldl, we do the same thing for consistency.
foldl (+) 4 [1, 2, 3]
((4 + 1) + 2) + 3
Let's break that down in more detail. foldl can be thought of as using an accumulator to get the answer efficiently.
foldl (+) 4 [1, 2, 3]
foldl (+) (4 + 1) [2, 3]
foldl (+) ((4 + 1) + 2) [3]
foldl (+) (((4 + 1) + 2) + 3) []
-- Here, we've run out of elements, so we use the "default" value.
((4 + 1) + 2) + 3
So I suppose the short answer to your question is that it's more consistent (and more useful), mathematically speaking, to make sure the base case is always at the innermost position in the recursive call, rather than focusing on it being on the left or the right all the time.
Consider the calls foldl (+) 0 [1,2,3,4] and foldr (+) 0 [1,2,3,4] and try to visualize what they do:
foldl (+) 0 [1,2,3,4] = ((((0 + 1) + 2) + 3) + 4)
foldr (+) 0 [1,2,3,4] = (0 + (1 + (2 + (3 + 4))))
Now, let's try to swap the arguments to the call to (+) in each step:
foldl (+) 0 [1,2,3,4] = (4 + (3 + (2 + (1 + 0))))
Note that despite the symmetry this is not the same as the previous foldr. We are still accumulating from the left of the list, I've just changed the order of operands.
In this case, because addition is commutative, we get the same result, but if you try to fold over some non-commutative function, e.g. string concatenation, the result is different. Folding over ["foo", "bar", "baz"], you would obtain "foobarbaz" or "bazbarfoo" (while a foldr would result in "foobarbaz" as well because string concatenation is associative).
In other words, the two definitions as they are make the two functions have the same result for commutative and associative binary operations (like common arithmetic addition/multiplication). Swapping the arguments to the accumulating function breaks this symmetry and forces you to use flip to recover the symmetric behavior.
The two folds yield different results due to their opposite associativity. The base value always shows up within the inner most parens. List traversal happens the same way for both folds.
right fold with (+) using the prefix notation
foldr (+) 10 [1,2,3]
=> + 1 (+ 2 (+ 3 10))
=> + 1 (+ 2 13)
=> + 1 15
=> 16
foldl (+) 10 [1,2,3]
=> + (+ (+ 10 1) 2) 3
=> + (+ 11 2) 3
=> + 13 3
=> 16
both folds evaluate to the same result because (+) is commutative, i.e.
+ a b == + b a
lets see what happens when the function is not commutative, e.g. division or exponentiation
foldl (/) 1 [1, 2, 3]
=> / (/ (/ 1 1) 2) 3
=> / (/ 1 2) 3
=> / 0.5 3
=> 0.16666667
foldr (/) 1 [1, 2, 3]
=> / 1 (/ 2 (/ 3 1))
=> / 1 (/ 2 3)
=> / 1 0.666666667
=> 1.5
now, lets evaluate foldr with function flip (/)
let f = flip (/)
foldr f 1 [1, 2, 3]
=> f 1 (f 2 (f 3 1))
=> f 1 (f 2 0.3333333)
=> f 1 0.16666667
=> 0.16666667
similarly, lets evaluate foldl with f
foldl f 1 [1, 2, 3]
=> f (f (f 1 1) 2) 3
=> f (f 1 2) 3
=> f 2 3
=> 1.5
So, in this case, flipping the order of the arguments of the folding function can make left fold return the same value as a right fold and vice versa. But that is not guaranteed. Example:
foldr (^) 1 [1, 2, 3] = 1
foldl (^) 1 [1, 2, 3] = 1
foldr (flip (^)) 1 [1,2,3] = 1
foldl (flip (^)) 1 [1,2,3] = 9 -- this is the odd case
foldl (flip (^)) 1 $ reverse [1,2,3] = 1
-- we again get 1 when we reverse this list
incidentally, reverse is equivalent to
foldl (flip (:)) []
but try defining reverse using foldr

Using fold* to grow a list in Haskell

I'm trying to solve the following problem in Haskell: given an integer return the list of its digits. The constraint is I have to only use one of the fold* functions (* = {r,l,1,l1}).
Without such constraint, the code is simple:
list_digits :: Int -> [Int]
list_digits 0 = []
list_digits n = list_digits r ++ [n-10*r]
where
r = div n 10
But how do I use fold* to, essentially grow a list of digits from an empty list?
Thanks in advance.
Is this a homework assignment? It's pretty strange for the assignment to require you to use foldr, because this is a natural use for unfoldr, not foldr. unfoldr :: (b -> Maybe (a, b)) -> b -> [a] builds a list, whereas foldr :: (a -> b -> b) -> b -> [a] -> b consumes a list. An implementation of this function using foldr would be horribly contorted.
listDigits :: Int -> [Int]
listDigits = unfoldr digRem
where digRem x
| x <= 0 = Nothing
| otherwise = Just (x `mod` 10, x `div` 10)
In the language of imperative programming, this is basically a while loop. Each iteration of the loop appends x `mod` 10 to the output list and passes x `div` 10 to the next iteration. In, say, Python, this'd be written as
def list_digits(x):
output = []
while x > 0:
output.append(x % 10)
x = x // 10
return output
But unfoldr allows us to express the loop at a much higher level. unfoldr captures the pattern of "building a list one item at a time" and makes it explicit. You don't have to think through the sequential behaviour of the loop and realise that the list is being built one element at a time, as you do with the Python code; you just have to know what unfoldr does. Granted, programming with folds and unfolds takes a little getting used to, but it's worth it for the greater expressiveness.
If your assignment is marked by machine and it really does require you to type the word foldr into your program text, (you should ask your teacher why they did that and) you can play a sneaky trick with the following "id[]-as-foldr" function:
obfuscatedId = foldr (:) []
listDigits = obfuscatedId . unfoldr digRem
Though unfoldr is probably what the assignment meant, you can write this using foldr if you use foldr as a hylomorphism, that is, building up one list while it tears another down.
digits :: Int -> [Int]
digits n = snd $ foldr go (n, []) places where
places = replicate num_digits ()
num_digits | n > 0 = 1 + floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n, ds) = let (q,r) = n `quotRem` 10 in (q, r : ds)
Effectively, what we're doing here is using foldr as "map-with-state". We know ahead of time
how many digits we need to output (using log10) just not what those digits are, so we use
unit (()) values as stand-ins for those digits.
If your teacher's a stickler for just having a foldr at the top-level, you can get
away with making go partial:
digits' :: Int -> [Int]
digits' n = foldr go [n] places where
places = replicate num_digits ()
num_digits | n > 0 = floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n:ds) = let (q,r) = n `quotRem` 10 in (q:r:ds)
This has slightly different behaviour on non-positive numbers:
>>> digits 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits' 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits 0
[]
>>> digits' 0
[0]
>>> digits (negate 1234567890)
[]
>>> digits' (negate 1234567890)
[-1234567890]

Meaning of backslash in Haskell code?

So I have this haskell code, and I understand half of it, but I can't get my head around this \x -> here:
testDB :: Catalogue
testDB = fromList [
("0265090316581", ("The Macannihav'nmor Highland Single Malt", "75ml bottle")),
("0903900739533", ("Bagpipes of Glory", "6-CD Box")),
("9780201342758", ("Thompson - \"Haskell: The Craft of Functional Programming\"", "Book")),
("0042400212509", ("Universal deep-frying pan", "pc"))
]
-- Exercise 1
longestProductLen :: [(Barcode, Item)] -> Int
longestProductLen = maximum . map (\(x, y) -> length $ fst y)
formatLine :: Int -> (Barcode, Item) -> String
formatLine k (x, (y1, y2)) = x ++ "..." ++ y1 ++ (take (k - length y1) (repeat '.')) ++ "..." ++ y2
showCatalogue :: Catalogue -> String
showCatalogue c = foldr (++) "" $ map (\x -> (formatLine (longestProductLen (toList testDB)) x) ++ "\n") $ toList c
I understand that longestProductLen returns and integer meaning the longest title in testDB and then it uses this integer to match k in formatLine, but I can't understnad how it matches (Bardcode, Item) and I guess it has something to do with \x ->, if it does can you please explain how it does that?
Thank you!
The syntax
function x y = <body>
Is equivalent to
function = \x y -> <body>
And is called a lambda or anonymous function. The compiler actually turns all your functions into assignments of lambda functions (the second form) since it's just giving a function value (functions are values in Haskell) a name.
If you see it given as an argument to another function like map:
map (\x -> x + 1) [1, 2, 3]
This is semantically equivalent to
map add1 [1, 2, 3] where add1 x = x + 1
Lambdas can perform arbitrary pattern matching on their arguments, too. Also, if you have a definition like
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
This is equivalent to
fib = \n -> case n of
0 -> 1
1 -> 1
n -> fib (n - 1) + fib (n - 2)
Because the compiler will first translate the multiple pattern matching into a case statement, then convert it to assigning a lambda to a name (in this case assigning the lambda to the name fib).
That's Haskell's syntax for lambda abstraction. The Haskell
\x -> e
corresponds to the mathematical
λx.e
In this case,
\(x, y) -> length $ fst y
is a function that takes a pair (x,y) and returns the length of the first component of the pair y. This is a slightly odd way to write the expression; it would be better to write it as
\(x, (y1, y2)) -> length y1
or as
length . fst . snd
for consistency.

explain how to use this specific function of foldl

sum :: (Num a) => [a] -> a
sum xs = foldl (\acc x -> acc + x) 0 xs
foldl is folds the list up from the left side. So first we get the acc=0 and put the list xs to x ,then doing the function ->acc+x.After calculation, we get the new acc which is equal to acc+x. But why is that? I think this result of acc+x is the new value of x based on the function x->acc+x.
Let's take a look at your definition of sum
sum :: (Num a) => [a] -> a
sum xs = foldl (\acc x -> acc + x) 0 xs
Let's also take a peek at foldl's signature:
foldl :: (a -> b -> a) -> a -> [b] -> a
Hmm, ok, what do we have to feed foldl in order to get the value at the very, very end (->a)?
It needs a curried function (a->b->a). All though not accurate, for brevity's sake, we'll say its a function that takes two arguments (but you and I know that really, it takes one argument and returns another function that takes one argument).
It needs a value of type a. Notice that our curried function from Step 1. takes something of type a and returns something of type a. Interesting...hmmm...
It needs a list of type b. Notice our curried function from Step 1 takes, as well as something of type a, something of type b.
So, do we give it what it wants?
We give it (\acc x -> acc + x). This is an anonymous function, or lambda, that takes two arguments, (remember, it's curried, though), acc and x, and return's their sum.
We give it 0 as our starting value
We give it xs as the list to fold.
Ok dokie. So, let's just let foldl work its Haskell magic. Let's imagine we called sum [1,2,3]
foldl calls our function (\acc x -> acc + x), using 0 for acc and the first value of xs, 1.
0 + 1
This result does not get stored away in acc or x, since they are just arguments in our little lambda function. foldl is going to use that value (see SanSS's answer for the specific implementation).
Remember that the result of our lambda function is the same type as the first parameter? foldl can use that previous sum and pass it back to the lambda function, along with the second element.
(0 + 1) + 2
And again until it has done this for all the elements:
((0 + 1) + 2) + 3
6
As pointed out by Dan, this is the same if you had done:
sum xs = foldl (+) 0 xs
You can tell more easily with this function that we aren't just 'setting' some variable and adding onto it.
Hope this helps.
Side note:
For your definition of sum, you don't have to explicitly state that sum takes xs. You could leave it as:
sum = foldl (\acc x -> acc + x) 0
This takes advantage of currying, because if we provide foldl just its first two arguments -- a curried function like (a->b->a) and a value of type a -- what do we get?
[b] -> a
A function that takes a list of type b and returns a value of type a! This is called pointfree style. Just something to consider :-)
You should look at the definition of foldl:
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
foldl recieves a funcion which takes 2 arguments, a value (the "starter value" or accumulator) and a list.
In case the list is empty it returns the current calculation.
If the case is not empty then it calls recursively with the same function as function, the accumulator is the result of the invocation of the function using the accumulator as the first argument and the first element of the list as the second argument and the tail of the list is used as the list for the recursive call.
So the lambda function used in sum becomes quite clear it takes acc as first argument and the element of the list as second argument and return the sum of both.
The result of the invocations for:
sum [1,2,3] = ((0 + 1) + 2) + 3 = 6
From your question, it sounds like you don't understand how the lambda function (\acc x -> acc + x) works here.
The function is not x->acc+x, but acc x->acc + x. In fact, you could rewrite the "sum" equation as
sum xs = foldl (+) 0 xs
Since (\acc x -> acc + x) is the same as (+)
I suggest you (re)read http://learnyouahaskell.com/higher-order-functions#lambdas

Resources