Haskell thunks - foldl vs foldr - haskell

Learning Haskell, I came across the fact that foldl creates thunks and might crash the stack, so it's better to use foldl' from Data.List. Why is it just foldl, and not, for example, foldr?
Thanks

There is no need for foldr' because you can cause the effect yourself.
Here is why: Consider foldl f 0 [1,2,3]. This expands to f (f (f 0 1) 2) 3, so by the time you get anything back to work with, thunks for (f 0 1) and (f (f 0 1) 2) have to be created. If you want to avoid this (by evaluating these subexpressions before continuing), you have to instruct foldl to do it for you – that is foldl'.
With foldr, things are different. What you get back from foldr f 0 [1, 2, 3] is f 1 (foldr f 0 [2, 3]) (where the expression in parenthesis is a thunk). If you want to evaluate (parts of) the outer application of f, you can do that now, without a linear number of thunks being created first.
But in general, you are using foldr with lazy functions for f that can already do something (e.g. produce list constructors) before looking at the second argument.
Using foldr with a strict f (e.g. (+)) has the unwanted effect of putting all applications on the stack until the end of the list is reached; clearly not what you want, and not a situation where a however-looking foldr' could help.

Related

Understanding foldr and foldl functions

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f v [] = v
foldr f v (x:xs) = f x (foldr f v xs)
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f v [] = v
foldl f v (x:xs) = foldl f (f v x) xs
I am trying to wrap my head around this two functions. I have two questions. One regarding function f. In general,
foldr f v xs
f has access to the first element of xs and the recursively processed tail. Here:
foldl f v xs
f has access to the last element of xs and the recursively processed tail.
Is this an useful (and correct) way to think about it ?
My second question is related to fold "right" or "left". In many places, they say that foldr "starts from the right". For example, if I expand the expression
foldr (+) 0 [1,2,3]
I get
(+) 1 (foldr (+) 0 [2,3])
So, I see it is "starting from the left" of the list. The first element and the recursively processed tail are the arguments to the function. Could someone give some light into this issue ?
EDIT: One of my question focuses is on the function f passed to fold; the linked answer doesn't address that point.
"Starting from the right" is good basic intuition, but it can also mislead, as you've already just discovered. The truth of the matter is that lists in Haskell are singly linked and we only have access to one side directly, so in some sense every list operation in Haskell "starts" from the left. But what it does from there is what's important. Let's finish expanding your foldr example.
foldr (+) 0 [1, 2, 3]
1 + foldr 0 [2, 3]
1 + (2 + foldr 0 [3])
1 + (2 + (3 + foldr 0 []))
1 + (2 + (3 + 0))
Now the same for foldl.
foldl (+) 0 [1, 2, 3]
foldl (+) (0 + 1) [2, 3]
foldl (+) ((0 + 1) + 2) [3]
foldl (+) (((0 + 1) + 2) + 3) []
((0 + 1) + 2) + 3
In the foldr case, we make our recursive call directly, so we take the head and make it an argument to our accumulating function, and then we make the other argument our recursive call.
In the foldl case, we make our recursive call by changing the the accumulator argument. Since we're changing the argument rather than the result, the order of evaluation gets flipped around.
The difference is in the way the parentheses "associate". In the foldr case, the parentheses associate to the right, while in the foldl case they associate to the left. Likewise, the "initial" value is on the right for foldr and on the left for foldl.
The general advice for the use of folds on lists in Haskell is this.
Use foldr if you want lazy evaluation that respects the list structure. Since foldr does its recursion inside the function call, so if the folding function happens to be guarded (i.e. by a data constructor), then our foldr call is guarded. For instance, we can use foldr to efficiently construct an infinite list out of another infinite list.
Use foldl' (note the ' at the end), the strict left fold, for situations where you want the operation to be strict. foldl' forces each step of the fold to weak head normal form before continuing, preventing thunks from building up. So whereas foldl will build up the entire internal expression and then potentially evaluate it at the end, foldl' will do the work as we go, which saves a ton of memory on large lists.
Don't use foldl on lists. The laziness gained by foldl is almost never useful, since the only way to get anything useful out of a left fold is to force the whole fold anyway, building up the thunks internally is not useful.
For other data structures which are not right-biased, the rules may be different. All of this is running on the assumption that your Foldable is a Haskell list.

foldr1 and infinite list on Haskell

Reading about folds on this wonderful book I have a question regarding foldr1 and the head' implementation proposed there, the code in question is:
head' = foldr1 (\x _ -> x)
this code works on infinite list, whereas foldl1 don't. A good visual explanation about why is this answer.
I do not quite understand though why does it work, considering that foldr1 is using the last element as accumulator. For example:
foldr1 (\x _ -> x) [1..]
This works because (I Think) lazy evaluation, even though foldr is starting from the last element of the list (which is infinite), I'm assuming because the function is not making use of any intermediate result, just return the first element.
So, is the compiler smart enough to know that, because inside of the lambda function only x is being used, just returns the first element of the list? even though it should start from the end?
On the contrary, doing
scanr1 (\x _ -> x) [1..]
Will print all elements of the infinite list without ending, which I suppose it's what the foldr is doing, just the compiler is smart enough to not evaluate it and return the head.
Thanks in advance.
Update
I found a really good answer that helped me understand how foldr works more deeply:
https://stackoverflow.com/a/63177677/1612432
foldr1 is using the last element as an initial accumulator value, but the combining function (\x _ -> x) is lazy in its second argument.
So provided the list is non-empty (let alone infinite), the "accumulator" value is never needed, thus never demanded.
foldr does not mean it should start from the right, just that the operations are grouped / associated / parenthesized on the right. If the combining function is strict in its 2nd argument that will entail indeed starting the calculations from the right, but if not -- then not.
So no, this is not about compiler being smart, this is about Haskell's lazy semantics that demand this. foldr is defined so that
foldr g z [x1,x2,...,xn] = g x1 (foldr g z [x2,...,xn])
and
foldr1 g xs = foldr g (last xs) (init xs)
and that's that.

How does GHC know how to cache one function but not the others?

I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):
import qualified Data.Set as Set
import qualified Data.List as List
-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys =
foldl (\acc x -> if x == y then True else acc) False ys
-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys =
head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys
-- Then I discovered `nub` which seems to be highly optimized:
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys =
head $ List.nub $ filter (==True) $ map (==y) ys
I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:
3 `elem1` [1..1000000] -- => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] -- => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] -- => (0.01 secs, 77,272 bytes)
I then tried to do the same on a (much) bigger list:
3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]
elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?
Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.
No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.
Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.
This is because foldris defined as:
foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))
whereas foldl is implemented as:
foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3
Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.
If we implement the foldl and foldr version, we get:
y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys
We then get:
Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)
Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.
The performance is almost identical:
Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)
In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.
Explanation
foldl goes to the innermost element of the list, applies the computation, and does so again recursively to the result and the next innermost value of the list, and so on.
foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
So in order to produce the result, it has to traverse all the list.
Conversely, in your function elem3 as everything is lazy, nothing gets computed at all, until you call head.
But in order to compute that value, you just the first value of the (filtered) list, so you just need to go as far as 3 is encountered in your big list. which is very soon, so the list is not traversed. if you asked for the 1000000th element, eleme3 would probably perform as badly as the other ones.
Lazyness
Lazyness ensure that your language is always composable : breaking a function into subfunction does not changes what is done.
What you are seeing can lead to a space leak which is really about how control flow works in a lazy language. both in strict and in lazy, your code will decide what gets evaluated, but with a subtle difference :
In a strict language, the builder of the function will choose, as it forces evaluation of its arguments: whoever is called is in charge.
In a lazy language, the consumer of the function chooses. whoever called is in charge. It may choose to only evaluate the first element (by calling head), or every other element. All that provided its own caller choose to evaluate his own computation as well. there is a whole chain of command deciding what to do.
In that reading, your foldl based elem function uses that "inversion of control" in an essential way : elem gets asked to produce a value. foldl goes deep inside the list. if the first element if y then it return the trivial computation True. if not, it forwards the requests to the computation acc. In other words, what you read as values acc, x or even True, are really placeholders for computations, which you receive and yield back. And indeed, acc may be some unbelievably complex computation (or divergent one like undefined), as long as you transfer control to the computation True, your caller will never see the existence of acc.
foldr vs foldl vs foldl' VS speed
As suggested in another answer, foldr might best your intent on how to traverse the list, and will shield you away from space leaks (whereas foldl' will prevent space leaks as well if you really want to traverse the other way, which can lead to buildup of complex computations ... and can be very useful for circular computation for instance).
But the speed issue is really an algorithmic one. There might be better data structure for set membership if and only if you know beforehand that you have a certain pattern of usage.
For instance, it might be useful to pay some upfront cost to have a Set, then have fast membership queries, but that is only useful if you know that you will have such a pattern where you have a few sets and lots of queries to those sets. Other data structure are optimal for other patterns, and it's interesting to note that from a API/specification/interface point of view, they are usually the same to the consumer. That's a general phenomena in any languages, and why many people love abstract data types/modules in programming.
Using foldr and expecting to be faster really encodes the assumption that, given your static knowledge of your future access pattern, the values you are likely to test membership of will sit at the beginning. Using foldl would be fine if you expect your values to be at the end of it.
Note that using foldl, you might construct the entire list, you do not construct the values themselves, until you need it of course, for instance to test for equality, as long as you have not found the searched element.

foldr - further explanation and example with a map function

I've looked at different folds and folding in general as well as a few others and they explain it fairly well.
I'm still having trouble on how it would work in this case.
length :: [t] -> Int
length list = foldr (+) 0 (map (\x ->1) list)
Could someone go through that step by step and try to explain that to me.
And also how would foldl work as well.
(map (\x ->1) list) takes the list and turns it into a list of 1 values:
(map (\x ->1) ["a", "b", "c"]) == [1, 1, 1]
Now, if you substitute that in the original foldr, it looks like this:
foldr (+) 0 [1, 1, 1]
The starting point is 0 and the aggregation function is (+). As it steps through each element in the list, you are basically adding up all the 1 values, and that's how you end up returning the length.
foldr starts from the right and works back to the head of the list. foldl starts from the left and works through the list. Because the aggregation function is (+) :: Num a => a -> a -> a, the ordering of the left and right arguments in (+) is logically inconsequential (with the caveat that foldl has stack overflow problems with large lists because of lazy evaluation)

Lazy Evaluation and Strict Evaluation Haskell

I understand what lazy evaluation is, and how it works and the advantages it has, but could you explain me what strict evaluation really is in Haskell? I can't seem to find much info about it, since lazy evaluation is the most known.
What are the benefit of each of them over the other. When is strict evaluation actually used?
Strictness happens in a few ways in Haskell,
First, a definition. A function is strict if and only if when its argument a doesn't terminate, neither does f a. Nonstrict (sometimes called lazy) is just the opposite of this.
You can be strict in an argument, either using pattern matching
-- strict
foo True = 1
foo False = 1
-- vs
foo _ = 1
Since we don't need to evaluate the argument, we could pass something like foo (let x = x in x) and it'd still just return 1. With the first one however, the function needs to see what value the input is so it can run the appropriate branch, thus it is strict.
If we can't pattern match for whatever reason, then we can use a magic function called seq :: a -> b -> b. seq basically stipulates that whenever it is evaluated, it will evaluated a to what's called weak head normal form.
You may wonder why it's worth it. Let's consider a case study, foldl vs foldl'. foldl is lazy in it's accumulator so it's implemented something like
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f accum [] = acuum
foldl f accum (x:xs) = foldl (f accum x) xs
Notice that since we're never strict in accum, we'll build up a huge series of thunks, f (f (f (f (f (f ... accum)))..)))
Not a happy prospect since this will lead to memory issues, indeed
*> foldl (+) 0 [1..500000000]
*** error: stack overflow
Now what'd be better is if we forced evaluation at each step, using seq
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f accum [] = accum
foldl' f accum (x:xs) = let accum' = f accum x
in accum' `seq` foldl' f accum' xs
Now we force the evaluation of accum at each step making it much faster. This will make foldl' run in constant space and not stackoverflow like foldl.
Now seq only evaluates it values to weak head normal form, sometimes we want them to be evaluated fully, to normal form. For that we can use a library/type class
import Control.DeepSeq -- a library on hackage
deepseq :: NFData a => a -> b -> a
This forces a to be fully evaluated so,
*> [1, 2, error "Explode"] `seq` 1
1
*> [1, 2, error "Explode"] `deepseq` 1
error: Explode
*> undefined `seq` 1
error: undefined
*> undefined `deepseq` 1
error undefined
So this fully evaluates its arguments. This is very useful for parallel programming for example, where you want to fully evaluate something on one core before it's sent back to the main thread, otherwise you'd just create a thunk and all the actual computation would still be sequential.

Resources