I just started Haskell 2 days ago so I'm not yet sure about how to optimise my code.
As an exercise, I have rewritten foldl and foldr ( I will give foldl here but foldr is the same, replacing last with head and init with tail).
The code is:
module Main where
myFoldl :: ( a -> ( b -> a ) ) -> a -> ( [b] -> a )
myFoldl func = ( \x -> (\theList
-> if (length theList == 0) then
x
else
myFoldl func (func x (last theList) ) (init theList)
) )
My only concern is that I suspect Haskell can't apply tail call optimisation here because the recursive call is not made at the end of the function.
How can I make this tail call optimised? Is Haskell's built-in implementation of foldl implemented differently to mine?
Your use of parentheses in your code sample and your emphasis on tail recursion suggests you're coming to Haskell from Lisp or Scheme. If you're coming to Haskell from an eager language like Scheme, be warned: tail calls are not nearly as predictive of performance in Haskell as they are in an eager language. You can have tail-recursive functions that execute in linear space because of laziness, and you can have non-tail recursive functions that execute in constant space because of laziness. (Confused already?)
First flaw in your definition is the use of the length theList == 0. This forces evaluation of the whole spine of the list, and is O(n) time. It's better to use pattern matching, like in this naïve foldl definition in Haskell:
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
This, however, performs infamously badly in Haskell, because we don't actually compute the f z x part until the caller of foldl demands the result; so this foldl accumulates unevaluated thunks in memory for each list element, and gains no benefit from being tail recursive. In fact, despite being tail-recursive, this naïve foldl over a long list can lead to a stack overflow! (The Data.List module has a foldl' function that doesn't have this problem.)
As a converse to this, many Haskell non-tail recursive functions perform very well. For example, take this definition of find, based on the accompanying non-tail recursive definition of foldr:
find :: (a -> Boolean) -> [a] -> Maybe a
find pred xs = foldr find' Nothing xs
where find' elem rest = if pred elem then Just elem else rest
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (subfold xs)
where subfold = foldr f z
This actually executes in linear time and constant space, thanks to laziness. Why?
Once you find an element that satisfies the predicate, there is no need to traverse the rest of the list to compute the value of rest.
Once you look at an element and decide that it doesn't match, there's no need to keep any data about that element.
The lesson I'd impart right now is: don't bring in your performance assumptions from eager languages into Haskell. You're just two days in; concentrate first on understanding the syntax and semantics of the language, and don't contort yourself into writing optimized versions of functions just yet. You're going to get hit with the foldl-style stack overflow from time to time at first, but you'll master it in time.
EDIT [9/5/2012]: Simpler demonstration that lazy find runs in constant space despite not being tail recursive. First, simplified definitions:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
find :: (a -> Bool) -> [a] -> Maybe a
find p xs = let step x rest = if p x then Just x else rest
in foldr step Nothing xs
Now, using equational reasoning (i.e., substituting equals with equals, based on the definition above), and evaluating in a lazy order (outermost first), let's calculate find (==400) [1..]:
find (==400) [1..]
-- Definition of `find`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [1..]
-- `[x, y, ...]` is the same as `x:[y, ...]`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (1:[2..])
-- Using the second equation in the definition of `foldr`:
=> let step x rest = if x == 400 then Just x else rest
in step 1 (foldr step Nothing [2..])
-- Applying `step`:
=> let step x rest = if x == 400 then Just x else rest
in if 1 == 400 then Just 1 else foldr step Nothing [2..]
-- `1 == 400` is `False`
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 1 else foldr step Nothing [2..]
-- `if False then a else b` is the same as `b`
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [2..]
-- Repeat the same reasoning steps as above
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (2:[3..])
=> let step x rest = if x == 400 then Just x else rest
in step 2 (foldr step Nothing [3..])
=> let step x rest = if x == 400 then Just x else rest
in if 2 == 400 then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [3..]
.
.
.
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [400..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (400:[401..])
=> let step x rest = if x == 400 then Just x else rest
in step 400 (foldr step Nothing [401..])
=> let step x rest = if x == 400 then Just x else rest
in if 400 == 400 then Just 400 else foldr step Nothing [401..]
=> let step x rest = if x == 400 then Just x else rest
in if True then Just 400 else foldr step Nothing [401..]
-- `if True then a else b` is the same as `a`
=> let step x rest = if x == 400 then Just x else rest
in Just 400
-- We can eliminate the `let ... in ...` here:
=> Just 400
Note that the expressions in the successive evaluation steps don't get progressively more complex or longer as we proceed through the list; the length or depth of the expression at step n is not proportional to n, it's basically fixed. This in fact demonstrates how find (==400) [1..] can be lazily executed in constant space.
Idiomatic Haskell looks very different to this, eschewing if-then-else, nested lambdas, parentheses, and destructuring functions (head, tail). Instead, you'd write it something like:
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = go z0 xs0
where
go z [] = z
go z (x:xs) = go (f z x) xs
Relying instead on pattern matching, a where clause, tail recursion, guarded declarations.
Related
I am a newbie to Haskell and I wrote the following code
module RememberMap (rememberMap) where
rememberMap :: (a -> b -> (a, c)) -> a -> [b] -> [c]
rememberMap f acc xs = go acc xs []
where
go acc [x] carry = carry <> [step]
where
(_, step) = f acc x
go acc (x:xs) carry = go accStep xs (carry <> [step])
where
(accStep, step) = f acc x
I wrote this contaminator with the Intent to help me with the most Common difficulty that i have when writing my Haskell code, That is that I recurrently find myself willing to map something (specially in CodeWarrior's Katas) like to map something, but that something required knowledge of the elements before it. But it had the problem of being non-streaming, ergo, it does no allow me to use lazy proprieties of Haskell with it, thus I would like to know if (a) there is already a solution to this problem (preferably Arrows) or (b) how to make it lazy.
To make the function stream you need to have the cons operator outside the recursive call, so a caller can see the first element without the whole recursion needing to happen. So you expect it to look something like:
rememberMap f acc (x:xs) = element : ... recursion ...
Once you understand this there is not much more to do:
rememberMap _ _ [] = []
rememberMap f acc (x:xs) = y : rememberMap f acc' xs
where
(acc', y) = f acc x
You can make an auxiliary function to avoid passing f around if you want, but there's no reason for it to have the extra list that you called carry.
There are mapAccumL and traverse with the lazy State monad.
I do not understand a sample solution for the following problem: given a list of elements, remove the duplicates. Then count the unique digits of a number. No explicit recursion may be used for either problem.
My code:
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = foldr (\x ys -> x:(filter (x /=) ys)) []
differentDigits :: Int -> Int
differentDigits xs = length (removeDuplicates (show xs))
The solution I am trying to understand has a different definition for differentDigits, namely
differentDigits xs = foldr (\ _ x -> x + 1) 0 ( removeDuplicates ( filter (/= '_') ( show xs )))
Both approaches work, but I cannot grasp the sample solution. To break my question down into subquestions,
How does the first argument to filter work? I mean
(/= '_')
How does the lambda for foldr work? In
foldr (\ _ x -> x + 1)
^
the variable x should still be the Char list? How does Haskell figure out that actually 0 should be incremented?
filter (/= '_') is, I'm pretty sure, redundant. It filters out underscore characters, which shouldn't be present in the result of show xs, assuming xs is a number of some sort.
foldr (\ _ x -> x + 1) 0 is equivalent to length. The way foldr works, it takes the second argument (which in your example is zero) as the starting point, then applies the first argument (in your example, lambda) to it over and over for every element of the input list. The element of the input list is passed into the lambda as first argument (denoted _ in your example), and the running sum is passed as second argument (denoted x). Since the lambda just returns a "plus one" number on every pass, the result will be a number representing how many times the lambda was called - which is the length of the list.
First, note that (2) is written in so called point free style, leaving out the third argument of foldr.
https://en.wikipedia.org/wiki/Tacit_programming#Functional_programming
Also, the underscore in \_ x -> x + 1 is a wild card, that simply marks the place of a parameter but that does not give it a name (a wild card works as a nameless parameter).
Second, (2) is a really nothing else than a simple recursive function that folds to the right. foldr is a compact way to write such recursive functions (in your case length):
foldr :: (a -> b -> b) -> b -> [a]
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
If we write
foldr f c ls
ls is the list over which our recursive function should recur (a is the type of the elements).
c is the result in the base case (when the recursive recursive function is applied on an empty list).
f computes the result in the general case (when the recursive function is applied on a non-empty list). f takes two arguments:
The head of the list and
the result of the recursive call on the tail of the list.
So, given f and c, foldr will go through the list ls recursively.
A first example
The Wikipedia page about point free style gives the example of how we can compute the sum of all elements in a list using foldr:
Instead of writing
sum [] = 0
sum (x:xs) = x + sum xs
we can write
sum = foldr (+) 0
The operator section (+) is a 2-argument function that adds its arguments. The expression
sum [1,2,3,4]
is computed as
1 + (2 + (3 + (4)))
(hence "folding to the right").
Example: Multiplying all elements.
Instead of
prod [] = 1
prod (x:xs) = x * prod xs
we can write
prod = foldr (*) 1
Example: Remove all occurrences of a value from a list.
Instead of
remove _ [] = []
remove v (x:xs) = if x==v then remove v xs else x:remove v xs
we can write
remove v = foldr (\x r -> if x==v then r else x:r) []
Your case, (2)
We can now fully understand that
length = foldr (\ _ x -> x + 1) 0
in fact is the same as
length [] = 0
length (x:xs) = length xs + 1
that is, the length function.
Hope this recursive view on foldr helped you understand the code.
Why does computing the following expression terminate?
foldr (\x t -> if x > 5 then Just x else t) Nothing $ [1..]
Is there anything special about Maybe (or one of the type classes it implements) that causes evaluation to stop after the lambda returns a Just?
Maybe ,Just, and Nothing play no active role here. What we see is just laziness at work. Indeed, for any (total) function f and value a, this would also terminate:
foldr (\x t -> if x > 5 then f x else t) a $ [1..]
This is perfectly equivalent to the plain recursion
foo [] = a
foo (x:xs) = if x > 5 then f x else foo xs
when called as foo [1..]. Eventually, x becomes 6, f 6 is returned, and no more recursive calls are made.
Currently I am using
takeWhile (\x -> x /= 1 && x /= 89) l
to get the elements from a list up to either a 1 or 89. However, the result doesn't include these sentinel values. Does Haskell have a standard function that provides this variation on takeWhile that includes the sentinel in the result? My searches with Hoogle have been unfruitful so far.
Since you were asking about standard functions, no. But also there isn't a package containing a takeWhileInclusive, but that's really simple:
takeWhileInclusive :: (a -> Bool) -> [a] -> [a]
takeWhileInclusive _ [] = []
takeWhileInclusive p (x:xs) = x : if p x then takeWhileInclusive p xs
else []
The only thing you need to do is to take the value regardless whether the predicate returns True and only use the predicate as a continuation factor:
*Main> takeWhileInclusive (\x -> x /= 20) [10..]
[10,11,12,13,14,15,16,17,18,19,20]
Is span what you want?
matching, rest = span (\x -> x /= 1 && x /= 89) l
then look at the head of rest.
The shortest way I found to achieve that is using span and adding a function before it that takes the result of span and merges the first element of the resulting tuple with the head of the second element of the resulting tuple.
The whole expression would look something like this:
(\(f,s) -> f ++ [head s]) $ span (\x -> x /= 1 && x /= 89) [82..140]
The result of this expression is
[82,83,84,85,86,87,88,89]
The first element of the tuple returned by span is the list that takeWhile would return for those parameters, and the second element is the list with the remaining values, so we just add the head from the second list to our first list.
This is my take version using foldr:
myTake n list = foldr step [] list
where step x y | (length y) < n = x : y
| otherwise = y
main = do print $ myTake 2 [1,2,3,4]
The output is not what I expect:
[3,4]
I then tried to debug by inserting the length of y into itself and the result was:
[3,2,1,0]
I don't understand why the lengths are inserted in decreasing order. Perhaps something obvious I missed?
If you want to implement take using foldr you need to simulate traversing the list from left to right. The point is to make the folding function depend on an extra argument which encodes the logic you want and not only depend on the folded tail of the list.
take :: Int -> [a] -> [a]
take n xs = foldr step (const []) xs n
where
step x g 0 = []
step x g n = x:g (n-1)
Here, foldr returns a function which takes a numeric argument and traverses the list from left to right taking from it the amount required. This will also work on infinite lists due to laziness. As soon as the extra argument reaches zero, foldr will short-circuit and return an empty list.
foldr will apply the function step starting from the *last elements**. That is,
foldr step [] [1,2,3,4] == 1 `step` (2 `step` (3 `step` (4 `step` [])))
== 1 `step` (2 `step` (3 `step` (4:[])))
== 1 `step` (2 `step (3:4:[])) -- length y == 2 here
== 1 `step` (3:4:[])
== 3:4:[]
== [3, 4]
The lengths are "inserted" in decreasing order because : is a prepending operation. The longer lengths are added to the beginning of the list.
(Image taken from http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29)
*: For simplicity, we assume every operation is strict, which is true in OP's step implementation.
The other answers so far are making it much too complicated, because they seem excessively wedded to the notion that foldr works "from right to left." There is a sense in which it does, but Haskell is a lazy language, so a "right to left" computation that uses a lazy fold step will actually be executed from left to right, as the result is consumed.
Study this code:
take :: Int -> [a] -> [a]
take n xs = foldr step [] (tagFrom 1 xs)
where step (a, i) rest
| i > n = []
| otherwise = a:rest
tagFrom :: Enum i => i -> [a] -> [(a, i)]
tagFrom i xs = zip xs [i..]