Is using fold less efficient than standard recursion - haskell

I'm going through the Learn You a Haskell book right now and I'm curious about how this particular example works. The book first demonstrates an implementation of findKey using traditional recursion:
findKey :: (Eq k) => k -> [(k,v)] -> Maybe v
findKey key [] = Nothing
findKey key ((k,v):xs) = if key == k
then Just v
else findKey key xs
The book then follows up with a shorter implementation using foldr
findKey :: (Eq k) => k -> [(k,v)] -> Maybe v
findKey key = foldr (\(k,v) acc -> if key == k then Just v else acc) Nothing
With the standard recursion, the function should immediately return once it hits the first element with the provided key. If I understand the foldr implementation correctly, it will iterate over the entire list every time, even if it matched the first element it came across. That doesn't seem like a very efficient way to handle the problem.
Is there something I'm not getting about how the foldr implementation works? Or is there some kind of magic within Haskell that makes this implementation not quite as inefficient as I think it is?

foldr is written using standard recursion.
The recursive call to foldr is hidden inside of acc. If your code doesn't use acc, it will never be computed (because Haskell is lazy). So the foldr version is efficient and will also return early.
Here's an example demonstrating this:
Prelude> foldr (\x z -> "done") "acc" [0 ..]
"done"
This expression returns "done" immediately, even though the input list is infinitely long.
If foldr is defined as:
foldr f z (x : xs) = f x (foldr f z xs)
foldr _ z [] = z
, then evaluation goes via
f x (foldr f z xs)
where
f = \x z -> "done"
x = 0
z = "acc"
xs = ... -- unevaluated, but is [1 ..]
which is
(\x z -> "done") 0 (foldr (\x z -> "done") "acc" [1 ..])
which turns into "done" because the first function doesn't use z, so the recursive call is never needed.

If I understand the foldr implementation correctly, it will iterate over the entire list every time, even if it matched the first element it came across.
This is wrong. foldr will evaluate the list only as much as needed.
E.g.
foldr (&&) True [True, False, error "unreached code here"]
returns False since the error is never evaluated, precisely as in
(True && (False && (error "unreached code here" && True)))
Indeed, since the end of the list is never reached, we can also write
foldr (&&) (error "end") [True, False, error "unreached code here"]
and still obtain False.

Here is code which demonstrates that foldr does indeed "short-circuit" the evaluation of findKey:
import Debug.Trace
findKey :: (Eq k) => k -> [(k,v)] -> Maybe v
findKey key = foldr (\(k,v) acc -> if key == k then Just v else acc) Nothing
tr x = trace msg x
where msg = "=== at: " ++ show x
thelist = [ tr (1,'a'), tr (2,'b'), tr (3, 'c'), tr (4, 'd') ]
An example of running findKey in ghci:
*Main> findKey 2 thelist
=== at: (1,'a')
=== at: (2,'b')
Just 'b'
*Main>

Think of foldr using the following definition (using standard recursion):
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f e [] = e
foldr f e (x:xs) = f x (foldr f e xs)
The third line shows that the second implementation for findKey will return upon finding the first match.
As a sidenote: assume you had the following definition (which does not have identical functionality) for findKey (as an exercise you might want to rewrite the definition using foldr):
findKey :: (Eq k) => k -> [(k,v)] -> [v]
findKey key [] = []
findKey key ((kHead, vHead):rest) = if (key == kHead) then vHead:(findKey key rest) else findKey key rest
Now you might think that this would iterate through the whole input list. Depending on how you invoke this function, it could be the case that it iterates through the whole list, but at the same time this can give you the first match efficiently too. Due to Haskell's lazy evaluation the following code:
head (findKey key li)
will give you the first match (assuming that there's one) with the same efficiency as your first example.

foldr f z [a,b,c,...,n] ==
a `f` (b `f` (c `f` (... (n `f` z) ...))) ==
f a (foldr f z [b,c,...,n]) ==
f a acc where acc = foldr f z [b,c,...,n]
So if your f returns before forcing acc, acc remains not forced, i.e. no part of the list argument beyond its head element a is accessed, like e.g. when you have
f a acc = ...
If, on the other hand, your f does force its second argument, e.g. if it's defined as
f a (x:xs) = ...
then the acc is forced before f starts its work, and the list will be accessed in whole before the processing begins -- in whole, because acc = f b acc2 and that invocation of f must force its second argument, acc2, so its value, acc, can be forced (pattern-matched with (x:xs), that is); and so forth.

Related

Why is it sometimes possible to fold an infinite list from the right?

I have been going through the excellent CIS 194 course when I got stuck on Part 5 of Homework 6. It revolves around implementing the ruler function without any divisibility testing.
I found that it is possible to build the ruler function by continuously interspersing an accumulator with values from an infinite list.
nats = [0,1,2,3,..]
[3]
[2,3,2]
[1,2,1,3,1,2,1]
[0,1,0,2,0,1,0,3,0,1,0,2,0]
Then I tried implementing this algorithm for Stream datatype which is a list without nil
data Stream a = Cons a (Stream a)
streamToList :: Stream a -> [a]
streamToList (Cons x xs) = x : streamToList xs
instance Show a => Show (Stream a) where
show = show . take 20 . streamToList
streamFromSeed :: (a -> a) -> a -> Stream a
streamFromSeed f x = Cons x (streamFromSeed f (f x))
nats :: Stream Integer
nats = streamFromSeed succ 0
interleave x (Cons y ys) = Cons x (Cons y (interleave x ys))
foldStream f (Cons x xs) = f x (foldStream f xs)
ruler = foldStream interleave nats
As expected, I got stackoverflow error since I was trying to fold from the right. However, I was surprised to see the same algorithm work for normal infinite lists.
import Data.List
interleave x list = [x] ++ (intersperse x list) ++ [x]
ruler = take 20 (foldr interleave [] [0..])
What am I missing? Why one implementation works while the other doesn't?
Your interleave is insufficiently lazy. The magic thing that right folds must do to work on infinite structures is to not inspect the result of the folded value too closely before they do the first bit of computation. So:
interleave x stream = Cons x $ case stream of
Cons y ys -> Cons y (interleave x ys)
This produces Cons x _ before inspecting stream; in contrast, your version requires stream to be evaluated a bit before it can pass to the right hand side of the equation, which essentially forces the entire fold to happen before any constructor gets produced.
You can also see this in your list version of interleave:
interleave x list = [x] ++ intersperse x list ++ [x]
The first element of the returned list (x) is known before intersperse starts pattern matching on list.
We can inspect the source code of foldr [src]. A less noisy version looks like:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
Haskell does not evaluate eagerly. This thus means that, unless you need (foldr f z xs), it will not evaluate the accumulator. This thus means that f does not need the second parameter, for example because the first item x has a certain value, it will not evaluate the accumulator.
For example if we implement takeWhileNeq:
takeWhileNeq a = foldr f []
where f x xs -> if x == a then [] else (x:xs)
if we thus run this on a list takeWhileNeq 2 [1,4,2,5], then it will not evaluate anything. If we however want to print the result it will evaluate this as:
f 1 (foldr f [4,2,5])
and f will inspect if 1 == 2, since that is not the case, it will return (x:xs), so:
-> 1 : foldr f [4,2,5]
so now it will evaluate 4 == 2, and because this is false, it will evaluate this to:
-> 1 : (4 : foldr f [2,5])
now we evaluate 2 == 2, and since this is True, the function returns the empty list, and ingores the accumulator, so it will never look at foldr f [5]:
-> 1 : (4 : [])
For an infinite list, it will thus also result an empty list and ignore folding the rest of the list.

Is there any terminating fold in Haskell?

I need some kind of fold which can terminate if I already have the data I want.
For example I need to find first 3 numbers which are greater than 5. I decided to use Either for termination and my code looks like this:
terminatingFold :: ([b] -> a -> Either [b] [b]) -> [a] -> [b]
terminatingFold f l = reverse $ either id id $ fold [] l
where fold acc [] = Right acc
fold acc (x:xs) = f acc x >>= flip fold xs
first3NumsGreater5 acc x =
if length acc >= 3
then Left acc
else Right (if x > 5 then (x : acc) else acc)
Are there some more clever/generic approaches?
The result of your function is a list, and it would be desirable if it were produced lazily, that is, extracting one item from the result should only require evaluating the input list up until the item is found there.
Unfolds are under-appreciated for these kinds of tasks. Instead of focusing on "consuming" the input list, let's think of it as a seed from which (paired with some internal accumulator) we can produce the result, element by element.
Let's define a Seed type that contains a generic accumulator paired with the as-yet unconsumed parts of the input:
{-# LANGUAGE NamedFieldPuns #-}
import Data.List (unfoldr)
data Seed acc input = Seed {acc :: acc, pending :: [input]}
Now let's reformulate first3NumsGreater5 as a function that either produces the next output element from the Seed, of signals that there aren't any more elements:
type Counter = Int
first3NumsGreater5 :: Seed Counter Int -> Maybe (Int, Seed Counter Int)
first3NumsGreater5 (Seed {acc, pending})
| acc >= 3 =
Nothing
| otherwise =
case dropWhile (<= 5) pending of
[] -> Nothing
x : xs -> Just (x, Seed {acc = succ acc, pending = xs})
Now our main function can be written in terms of unfoldr:
unfoldFromList ::
(Seed acc input -> Maybe (output, Seed acc input)) ->
acc ->
[input] ->
[output]
unfoldFromList next acc pending = unfoldr next (Seed {acc, pending})
Putting it to work:
main :: IO ()
main = print $ unfoldFromList first3NumsGreater5 0 [0, 6, 2, 7, 9, 10, 11]
-- [6,7,9]
Normally an early termination-capable fold is foldr with the combining function which is non-strict in its second argument. But, its information flow is right-to-left (if any), while you want it left-to-right.
A possible solution is to make foldr function as a left fold, which can then be made to stop early:
foldlWhile :: Foldable t
=> (a -> Bool) -> (r -> a -> r) -> r
-> t a -> r
foldlWhile t f a xs = foldr cons (\acc -> acc) xs a
where
cons x r acc | t x = r (f acc x)
| otherwise = acc
You will need to tweak this for t to test the acc instead of x, to fit your purposes.
This function is foldlWhile from https://wiki.haskell.org/Foldl_as_foldr_alternative, re-written a little. foldl'Breaking from there might fit the bill a bit better.
foldr with the lazy reducer function can express corecursion perfectly fine just like unfoldr does.
And your code is already lazy: terminatingFold (\acc x -> Left acc) [1..] => []. That's why I'm not sure if this answer is "more clever", as you've requested.
edit: following a comment by #danidiaz, to make it properly lazy you'd have to code it as e.g.
first3above5 :: (Foldable t, Ord a, Num a)
=> t a -> [a]
first3above5 xs = foldr cons (const []) xs 0
where
cons x r i | x > 5 = if i==2 then [x]
else x : r (i+1)
| otherwise = r i
This can be generalized further by abstracting the test and the count.
Of course it's just reimplementing take 3 . filter (> 5), but shows how to do it in general with foldr.

Haskell map until first condition met

I want to map a conditional function only on the first item that passes.
map (>5) [1,2,3,4,5,6,7,8,9]
would result in
[False,False,False,False,False,True,True,True,True]
I'm looking for something that would result in
[False,False,False,False,False,True,False,False,False]
So only the first occurrence of being greater than 5 results in True.
I tried scanl, various folds and tried to roll my own mapUntil kind of thing.
Seems like a simple problem but I'm drawing a blank.
break specifically separates the list in 2 parts where the first part is all False, the opposite of span.
break (>5) [1,2,3,8,2,5,1,7,9]
>>> ([1,2,3],[8,2,5,1,7,9])
Then it's just what chi did:
oneTrue f lst = map (const False) a ++ rest b
where (a,b) = break f lst
rest [] = []
rest (x:xs) = True : map (const False) xs
A basic solution:
mapUntil p = onlyOne . map p
where
onlyOne [] = []
onlyOne (x:xs)
| x = True : map (const False) xs
| otherwise = False : onlyOne xs
With library helpers:
mapUntil p = snd . mapAccumL (\x y -> (x||y, not x && y)) False . map p
Above x is a boolean standing for "have seen a true before?", as a kind-of state. y is the list element. x||y is the new state, while not x && y is the new list element.
Alternatively (using Control.Arrow.second):
mapUntil p = uncurry (++) . second go . break id . map p
where
go [] = []
go (x:xs) = x : map (const False) xs
I would use the mapAccumL tool like;
λ> Data.List.mapAccumL (\b n -> if b then (b, (not b)) else (n > 5, n > 5)) False [1,2,3,4,5,6,7,8,9]
(True,[False,False,False,False,False,True,False,False,False])
Here we carry the b as the state of our interim calculations and in every step decide according to it's previous state. Obviously you need the snd part of the final result.
Edit : After reading the new comment of #Gord under his question I decided to extend my answer to cover his true problem.
Rephrasing the case event of branch that starts with pointerPress (x,y) into...
To start with, you never use x or y from the pattern match (x,y) so lets call it c. Then...
PointerPress c -> State circleCoords circleColors circleDraggeds c
where
bools = fmap checkMouseOverlaps $ (,) <$> circleCoords <*> [c]
circleDraggeds = snd $ mapAccumL (\a b -> if a then (a, not a)
else (b,b)) False bools
What's happening part;
(,) <$> circleCoords <*> [c]
circleCoords is a list of coordinates like [c0,c1,c2] and we fmap (the infix version (<$>) here) (,) function to it and it becomes an applicative of coordinates like [(c0,),(c1,),(c2,)]. Then we apply it to [c] aka [(x,y)] to turn it into [(c0,c),(c1,c),(c2,c)].
fmap checkMouseOverlaps $ toAbove
obviously yields to
[checkMouseOverlaps (c0,c), checkMouseOverlaps (c1,c), checkMouseOverlaps (c2,c)]
which is bools :: [Bool].
The the rest follows the logic explained at the top of my answer.
circleDraggeds = snd $ mapAccumL (\a b -> if a then (a, not a)
else (b,b)) False bools
This can be solve directly with recursion. Similar to chi's solution but without function composition
mapUntil :: (a -> Bool) -> [a] -> [Bool]
mapUntil _ [] = []
mapUntil f (x:xs) =
let b = f x -- calculate f x
in if b -- if true
then b : map (const False) xs -- prepend to the solution and map False to the rest of the list (b is True)
else b : mapUntil f xs -- keep applying mapUntil (b is False)
>>> mapUntil (>5) [1,2,3,4,5,6,7,8,9]
[False,False,False,False,False,True,False,False,False]
Map the condition over the list, then zip the result with the False prefix of the result concatenated with a True followed by an infinite list of Falses:
{-# LANGUAGE BlockArguments, ApplicativeDo, ViewPatterns #-}
import Control.Applicative (ZipList(..))
f :: (a -> Bool) -> [a] -> [Bool]
f cond (map cond -> bs) = getZipList do
r <- ZipList $ takeWhile not bs ++ [True] ++ repeat False
_ <- ZipList $ bs
pure r
or, equivalently:
f' :: (a -> Bool) -> [a] -> [Bool]
f' cond (map cond -> bs) = zipWith const (takeWhile not bs ++ [True] ++ repeat False) bs

Why foldr works on infinity list?

This function may work on infinity association lists, and it is easy to find out why:
findKey :: (Eq k) => k -> [(k,v)] -> Maybe v
findKey key [] = Nothing
findKey key ((k,v):xs) = if key == k
then Just v
else findKey key xs
When it find the key, it returns Just v, stopping the recursion.
Now look at this another implementation:
findKey' :: (Eq k) => k -> [(k,v)] -> Maybe v
findKey' key = foldr (\(k,v) acc -> if key == k then Just v else acc) Nothing
How does the compiler/interpreter know that when the key matches k, it can return it?
*Main> findKey' 1 $ zip [1..] [1..]
returns Just 1
When it finds that key == k, it returns Just v. Why does the recursion stop there, allowing us to do such things with infinity association lists?
Because the function passed to foldr does not always evaluate the acc parameter, i.e. it is lazy in that parameter.
For example,
(\(k,v) acc -> if 1 == k then Just v else acc) (1,"one") (error "here be dragons!")
will return "one" without even attempting to evaluate the error expression.
Moreover, foldr by definition satisfies:
foldr f a (x:xs) = f x (foldr f a xs)
If x:xs is infinite, but f does not use its second argument, then foldr can return immediately.
In your example, f evaluates its second element if and only if the first argument is not the wanted association. This means that the association list will only be evaluated only enough to find the key association.
If you like to experiment, try this instead:
foldr (\(k,v) acc -> case acc of
Nothing -> if key == k then Just v else acc
Just y -> if key == k then Just v else acc) Nothing
The case looks redundant, since the function returns the same thing in both branches. However, this demands the evaluation of acc breaking the code on infinite lists.
Another thing you might want to try
foldr (:) [] [0..]
This basically rebuilds the infinite list as it is.
foldr (\x xs -> x*10 : xs) [] [0..]
This multiplies everything by 10, and is equivalent to map (*10) [0..].
The non-empty case of foldr can be defined as foldr f init (x:xs) = f x (foldr f init xs). In your case f is (\(k,v) acc -> if key == k then Just v else acc), so (k,v) stands for the current element in the list and acc stands for (foldr f init xs). That is, acc stands for the recursive call. In the then-case, you do not use acc, therefore the recursive call does not happen since Haskell is lazy meaning arguments aren't evaluated until (and unless) used.

Find the K'th element of a list using foldr

I try to implement own safe search element by index in list.
I think, that my function have to have this signature:
safe_search :: [a] -> Int -> Maybe a
safe_search xs n = foldr iteration init_val xs n
iteration = undefined
init_val = undefined
I have problem with implementation of iteration. I think, that it has to look like this:
safe_search :: [a] -> Int -> Maybe a
safe_search xs n = foldr iteration init_val xs n
where
iteration :: a -> (Int -> [a]) -> Int -> a
iteration x g 0 = []
iteration x g n = x (n - 1)
init_val :: Int -> a
init_val = const 0
But It has to many errors. My intuition about haskell is wrong.
you have
safe_search :: [a] -> Int -> Maybe a
safe_search xs n = foldr iteration init_val xs n
if null xs holds, foldr iteration init_val [] => init_val, so
init_val n
must make sense. Nothing to return, so
= Nothing
is all we can do here, to fit the return type.
So init_val is a function, :: Int -> Maybe a. By the definition of foldr, this is also what the "recursive" argument to the combining function is, "coming from the right":
iteration x r
but then this call must also return just such a function itself (again, by the definition of foldr, foldr f z [a,b,c,...,n] == f a (f b (f c (...(f n z)...))), f :: a -> b -> b i.e. it must return a value of the same type as it gets in its 2nd argument ), so
n | n==0 = Just x
That was easy, 0-th element is the one at hand, x; what if n > 0?
| n>0 = ... (n-1)
Right? Just one more step left for you to do on your own... :) It's not x (the list's element) that goes on the dots there; it must be a function. We've already received such a function, as an argument...
To see what's going on here, it might help to check the case when the input is a one-element list, first,
safe_search [x] n = foldr iteration init_val [x] n
= iteration x init_val n
and with two elements,
[x1, x2] n = iteration x1 (iteration x2 init_val) n
-- iteration x r n
Hope it is clear now.
edit: So, this resembles the usual foldr-based implementation of zip fused with the descending enumeration from n down, indeed encoding the more higher-level definition of
foo xs n = ($ zip xs [n,n-1..]) $
dropWhile ((>0) . snd) >>>
map fst >>>
take 1 >>> listToMaybe
= drop n >>> take 1 >>> listToMaybe $ xs
Think about a few things.
What type should init_val have?
What do you need to do with g? g is the trickiest part of this code. If you've ever learned about continuation-passing style, you should probably think of both init_val and g as continuations.
What does x represent? What will you need to do with it?
I wrote up an explanation some time ago about how the definition of foldl in terms of foldr works. You may find it helpful.
I suggest to use standard foldr pattern, because it is easier to read and understand the code, when you use standard functions:
foldr has the type foldr :: (a -> b -> b) -> [a] -> b -> [b],
where third argument b is the accumulator acc for elements of your list [a].
You need to stop adding elements of your list [a] to acc after you've added desired element of your list. Then you take head of the resulting list [b] and thus get desired element of the list [a].
To get n'th element of the list xs, you need to add length xs - n elements of xs to the accumulator acc, counting from the end of the list.
But where to use an iterator if we want to use the standard foldr function to improve the readability of our code? We can use it in our accumulator, representing it as a tuple (acc, iterator). We subtract 1 from the iterator each turn we add element from our initial list xs to the acc and stop to add elements of xs to the acc when our iterator is equal 0.
Then we apply head . fst to the result of our foldr function to get the desired element of the initial list xs and wrap it with Just constructor.
Of course, if length - 1 of our initial list xs is less than the index of desired element n, the result of the whole function safeSearch will be Nothing.
Here is the code of the function safeSearch:
safeSearch :: Int -> [a] -> Maybe a
safeSearch n xs
| (length xs - 1) < n = Nothing
| otherwise = return $ findElem n' xs
where findElem num =
head .
fst .
foldr (\x (acc,iterator) ->
if iterator /= 0
then (x : acc,iterator - 1)
else (acc,iterator))
([],num)
n' = length xs - n

Resources