haskell - is it my partitions lazy? - haskell

For example
partitions [1,2,3] =
[([],[1,2,3])
,([1],[2,3])
,([1,2],[3])
,([1,2,3],[])]
partitions :: [a] -> [([a], [a])]
partitions (x:xs) = ([], x:xs):[(x:ys, rs) | (ys, rs) <- partitions xs]
I wonder is it lazy solution. For example partitions [1..] is infinite. Additionally, take 5 $ partitions [1..] is also infinite. I think that obvious given the fact that result of this function is infinite. However, I am not sure if it is lazy, if I correctly understand laziness.

There are different degrees of laziness.
One might say that your function is strict since partitions undefined triggers an exception, but that would be too pedantic.
Chances are that by "lazy" you actually mean "it will produce a part the output after having accessed only a part of the input". Several degrees of laziness then arise, depending on how much input is needed for each part of the output.
In your case, the shape of the function is as follows:
foo [] = (some constant value)
foo (x:xs) = C expression1 ... expressionN
where C is a value constructor. More precisely, C = (:) and N=2. Since Haskell constructors are lazy (unless bang annotations are involved), the result of foo (x:xs) will always be non-bottom: consuming an element in the input list is enough to produce an element of the output list.
You might be confused by the output of partitions [1..] being an infinite list of pairs (xs, ys) where each ys is an infinite list. This makes the notion of laziness much more complex, since you might now wonder, e.g., "how much input is accessed for me to take the 100th pair of the output, and then access the 500th element of its second component?". Such questions are perfectly legitimate, and tricky to answer in general.
Still, your code will never demand the full input list to output a finite portion of the output. This makes it "lazy".
For completeness, let me show a non lazy function:
partitions (x:xs) = case partitions xs of
[] -> expression0
(y:ys) -> expression1 : expression2
Above, the result of the recursive call is demanded before the head of the output list is produced. That will demand the whole input before any part of the output is generated.

Related

Efficient string swapping in Haskell

I'm trying to solve a problem for a functional programming exercise in Haskell. I have to implement a function such that, given a string with an even number of characters, the function returns the same string with character pairs swapped.
Like this:
"helloworld" -> "ehllworodl"
This is my current implementation:
swap :: String -> String
swap s = swapRec s ""
where
swapRec :: String -> String -> String
swapRec [] result = result
swapRec (x:y:xs) result = swapRec xs (result++[y]++[x])
My function returns the correct results, however the programming exercise is timed, and It seems like my code is running too slowly.
Is there something I could do to make my code run faster, or I am following the wrong approach to the problem ?
Yes. If you use (++) :: [a] -> [a] -> [a], then this takes linear time in the number of elements of the first list you want to concatenate. Since result can be large, this will result in a ineffeciency: the algorithm is then O(n2).
You however do not need to construct the result with an accumulator. You can return a list, and do the processing of the remaining elements with a recursive call, like:
swap :: [a] -> [a]
swap [] = []
swap [x] = [x]
swap (x:y:xs) = y : x : swap xs
The above also uncovered a problem with the implementation: if the list had an odd length, then the function would have crashed. Here in the second case, we handle a list with one element by returning that list (perhaps you need to modify this according to the specifications).
Furthermore here we can benefit of Haskell's laziness: if we have a large list, want to pass it through the swap function, but are only interested in the first five elements, then we will not calculate the entire list.
We can also process all kinds of list with the above function: a list of numbers, of strings, etc.
Note that (++) itself is not inherently bad: if you need to concatenate, it is of course the most efficient way to do this. The problem is that you here in every recursive step will concatenate again, and the left list is growing each time.
Affixing something at the end of the accumulator passed into a recursive call
swapRec (x:y:xs) resultSoFar = swapRec xs
(resultSoFar ++ [y] ++ [x])
is the same as prepending it at the start of the result returned from the recursive call:
swapRec (x:y:xs) = [y] ++ [x] ++ swapRec xs
You will have to amend your function accordingly throughout.
This is known as guarded recursion. What you were using is known as tail recursion (a left fold).
The added benefit is that it will now be on-line (i.e., taking O(1) time per each processed element). You were creating the (++) nesting on the left which leads to quadratic behaviour, as discussed e.g. here.

How does GHC know how to cache one function but not the others?

I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):
import qualified Data.Set as Set
import qualified Data.List as List
-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys =
foldl (\acc x -> if x == y then True else acc) False ys
-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys =
head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys
-- Then I discovered `nub` which seems to be highly optimized:
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys =
head $ List.nub $ filter (==True) $ map (==y) ys
I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:
3 `elem1` [1..1000000] -- => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] -- => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] -- => (0.01 secs, 77,272 bytes)
I then tried to do the same on a (much) bigger list:
3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]
elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?
Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.
No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.
Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.
This is because foldris defined as:
foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))
whereas foldl is implemented as:
foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3
Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.
If we implement the foldl and foldr version, we get:
y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys
We then get:
Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)
Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.
The performance is almost identical:
Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)
In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.
Explanation
foldl goes to the innermost element of the list, applies the computation, and does so again recursively to the result and the next innermost value of the list, and so on.
foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
So in order to produce the result, it has to traverse all the list.
Conversely, in your function elem3 as everything is lazy, nothing gets computed at all, until you call head.
But in order to compute that value, you just the first value of the (filtered) list, so you just need to go as far as 3 is encountered in your big list. which is very soon, so the list is not traversed. if you asked for the 1000000th element, eleme3 would probably perform as badly as the other ones.
Lazyness
Lazyness ensure that your language is always composable : breaking a function into subfunction does not changes what is done.
What you are seeing can lead to a space leak which is really about how control flow works in a lazy language. both in strict and in lazy, your code will decide what gets evaluated, but with a subtle difference :
In a strict language, the builder of the function will choose, as it forces evaluation of its arguments: whoever is called is in charge.
In a lazy language, the consumer of the function chooses. whoever called is in charge. It may choose to only evaluate the first element (by calling head), or every other element. All that provided its own caller choose to evaluate his own computation as well. there is a whole chain of command deciding what to do.
In that reading, your foldl based elem function uses that "inversion of control" in an essential way : elem gets asked to produce a value. foldl goes deep inside the list. if the first element if y then it return the trivial computation True. if not, it forwards the requests to the computation acc. In other words, what you read as values acc, x or even True, are really placeholders for computations, which you receive and yield back. And indeed, acc may be some unbelievably complex computation (or divergent one like undefined), as long as you transfer control to the computation True, your caller will never see the existence of acc.
foldr vs foldl vs foldl' VS speed
As suggested in another answer, foldr might best your intent on how to traverse the list, and will shield you away from space leaks (whereas foldl' will prevent space leaks as well if you really want to traverse the other way, which can lead to buildup of complex computations ... and can be very useful for circular computation for instance).
But the speed issue is really an algorithmic one. There might be better data structure for set membership if and only if you know beforehand that you have a certain pattern of usage.
For instance, it might be useful to pay some upfront cost to have a Set, then have fast membership queries, but that is only useful if you know that you will have such a pattern where you have a few sets and lots of queries to those sets. Other data structure are optimal for other patterns, and it's interesting to note that from a API/specification/interface point of view, they are usually the same to the consumer. That's a general phenomena in any languages, and why many people love abstract data types/modules in programming.
Using foldr and expecting to be faster really encodes the assumption that, given your static knowledge of your future access pattern, the values you are likely to test membership of will sit at the beginning. Using foldl would be fine if you expect your values to be at the end of it.
Note that using foldl, you might construct the entire list, you do not construct the values themselves, until you need it of course, for instance to test for equality, as long as you have not found the searched element.

Why the `foldr`, `foldr1`, `scanr` and `scanr1` functions haven't a _problem with productivity when they are applied to big lists?

I read the old Russian translate of the Learn You a Haskell for Great Good! book. I see that the current English version (online) is newer, therefore I look it time of time also.
The quote:
When you put together two lists (even if you append a singleton list
to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to
walk through the whole list on the left side of ++. That's not a
problem when dealing with lists that aren't too big. But putting
something at the end of a list that's fifty million entries long is
going to take a while. However, putting something at the beginning of
a list using the : operator (also called the cons operator) is
instantaneous.
I assumed that Haskell has to walk through the whole list to get the last item of the list for the foldr, foldr1, scanr and scanr1 functions. Also I assumed that Haskell will do the same for getting a previous element (and so on for each item).
But I see I was mistaken:
UPD
I try this code and I see the similar time of processing for both cases:
data' = [1 .. 10000000]
sum'r = foldr1 (\x acc -> x + acc ) data'
sum'l = foldl1 (\acc x -> x + acc ) data'
Is each list of Haskell bidirectional? I assume that for getting last item of list Haskell at first are to iterate each item and to remember the necessary item (last item for example) for getting (later) the previous item of bidirectional list (for lazy computation). Am I right?
It's tricky since Haskell is lazy.
Evaluating head ([1..1000000]++[1..1000000]) will return immediately, with 1. The lists will never be fully created in memory: only the first element of the first list will be.
If you instead demand the full list [1..1000000]++[1..1000000] then ++ will indeed have to create a two-million long list.
foldr may or may not evaluate the full list. It depends on whether the function we use is lazy. For example, here's map f xs written using foldr:
foldr (\y ys -> f y : ys) [] xs
This is efficient as map f xs is: lists cells are produced on demand, in a streaming fashion. If we need only the first ten elements of the resulting list, then we indeed create only the first ten cells -- foldr will not be applied to the rest of the list. If we need the full resulting list, then foldr will be run over the full list.
Also note that xs++ys can be defined similarly in terms of foldr:
foldr (:) ys xs
and has similar performance properties.
By comparison, foldl instead always runs over the whole list.
In the example you mention we have longList ++ [something], appending to the end of the list. This only costs constant time if all we demand is the first element of the resulting list. But if we really need the last element we added, then appending will need to run over the whole list. This is why appending at the end is considered O(n) instead of O(1).
In the last update, the question speaks about computing the sum with foldr vs foldl, using the (+) operator. In such case, since (+) is strict (it needs both arguments to compute result) then both folds witll need to scan the whole list. The performance in such cases can be comparable. Indeed, they would compute, respectively
1 + (2 + (3 + (4 + ..... -- foldr
(...(((1 + 2) + 3) +4) + .... -- foldl
By comparison foldl' would be more memory efficient, since it starts reducing the above sum before building the above giant expression. That is, it would compute 1+2 first (3), then 3+3 (6), then 6 + 4 (10),... keeping in memory only the last result (a single integer) while the list is being scanned.
To the OP: the topic of laziness is not easy to grasp the first time. It is quite vast -- you just met a ton of different examples which have subtle but significant performance differences. It's hard to explain everything succinctly -- it's just too broad. I'd recommend to focus on small examples and start digesting those first.

Problems with enforcing strictness in haskell

If want to pretend that Haskell is strict and I have an algorithm in mind that does not exploit laziness (so for instance it does not use infinite lists), what problems can occur if I used only strict data types and annotated any function that I use, to be strict in its arguments? Will there be a performance penalty, if so how bad; can worse problems occur? I know it is dirty, pointless and ugly to mindlessly make every function and data type strict, and I do not intend to do so in practice but I only want to understand if by doing so, Haskell becomes strict by default?
Secondly, if I tone down the paranoia, and only make the data structures strict: will I have to worry about space leaks brought about by a lazy implementation only when I am using some form of accumulation? In other words, assume that the algorithm would not exhibit a space leak in a strict language. Also assume that I implemented it in Haskell using only strict data structures, but was careful to use seq to evaluate any variable that was being passed on in a recursion, or used functions which internally are careful to do that (like fold'), would I avoid any space leaks? Remember that I am assuming that in a strict language, the same algorithm does not lead to a space leak. So it is a question about the implementation difference between lazy and strict.
The reason I ask the second question is because apart from cases where one is trying to take advantage of laziness by using a lazy data structure, or a spine strict one, all the examples of space leaks that I have seen until now, only involve thunks developing in an accumulator because it was not the function that was recursively called did not evaluate the accumulator before applying itself on it. I am aware that if one wants to take advantage of laziness then one has to be extra careful, but that caution would be needed in a strict by default language too.
Thank you.
Laziness speeding things up
You could be worse off. The naive definition of ++ is:
xs ++ ys = case xs of (x:xs) -> x : (xs ++ ys)
[] -> ys
Laziness makes this O(1), though it may also add O(1) processing to extract the cons. Without laziness, the ++ needs to be evaluated immediately causing an O(n) operation. (If you've never seen the O(.) notation, it is something computer science has stolen from engineers: given a function f the set O( f(n) ) is the set of all algorithms which are eventually at-worst-proportional to f(n), where n is the number of bits of input fed to the function. [Formally, there exists a k and N such that for all n > N the algorithm takes time less than k * f(n).] So I'm saying that laziness makes the above operation O(1) or eventually constant-time, but adds a constant overhead to each extraction, whereas strictness makes the operation O(n) or eventually linear in the number of list elements, assuming that those elements have a fixed size.
There are some practical examples here but the O(1) added processing time can potentially also "stack up" into an O(n) dependency, so the most obvious examples are O(n2) both ways. Still there can be a difference in these examples. For example, one situation that doesn't work well is using a stack (last-in first-out, which is the style of Haskell lists) for a queue (first-in first-out).
So here's a quick library consisting of strict left-folds; I've used case statements so that each line can be pasted into GHCi (with a let):
data SL a = Nil | Cons a !(SL a) deriving (Ord, Eq, Show)
slfoldl' f acc xs = case xs of Nil -> acc; Cons x xs' -> let acc' = f acc x in acc' `seq` slfoldl' f acc' xs'
foldl' f acc xs = case xs of [] -> acc; x : xs' -> let acc' = f acc x in acc' `seq` foldl' f acc' xs'
slappend xs ys = case xs of Nil -> ys; Cons x xs' -> Cons x (slappend xs' ys)
sl_test n = foldr Cons Nil [1..n]
test n = [1..n]
sl_enqueue xs x = slappend xs (Cons x Nil)
sl_queue = slfoldl' sl_enqueue Nil
enqueue xs x = xs ++ [x]
queue = foldl' enqueue []
The trick here is that both queue and sl_queue follow the xs ++ [x] pattern to append an element to the end of the list, which takes a list and builds up an exact copy of that list. GHCi can then run some simple tests. First we make two items and force their thunks to prove that this operation itself is quite fast and not too prohibitively expensive in memory:
*Main> :set +s
*Main> let vec = test 10000; slvec = sl_test 10000
(0.02 secs, 0 bytes)
*Main> [foldl' (+) 0 vec, slfoldl' (+) 0 slvec]
[50005000,50005000]
(0.02 secs, 8604632 bytes)
Now we do the actual tests: summing the queue-versions:
*Main> slfoldl' (+) 0 $ sl_queue slvec
50005000
(22.67 secs, 13427484144 bytes)
*Main> foldl' (+) 0 $ queue vec
50005000
(1.90 secs, 4442813784 bytes)
Notice that both of these suck in terms of memory-performance (the list-append stuff is still secretly O(n2)) where they eventually occupy gigabytes of space, but the strict version nevertheless occupies three times the space and takes ten times the time.
Sometimes the data structures should be changed
If you really want a strict queue, there are a couple options. One is finger trees as in Data.Sequence -- the viewr way they do things is a little complicated but works to get the rightmost elements. However that is a bit heavy and one common solution is O(1) amortized: define the structure
data Queue x = Queue !(SL x) !(SL x)
where the SL terms are the strict stacks above. Define a strict reverse, let's call it slreverse, the obvious way, then consider:
enqueue :: Queue x -> x -> Queue x
enqueue (Queue xs ys) el = Queue xs (Cons el ys)
dequeue :: Queue x -> Maybe (x, Queue x)
dequeue (Queue Nil Nil) = Nothing
dequeue (Queue Nil (Cons x xs)) = Just (x, Queue (slreverse xs) Nil)
dequeue (Queue (Cons x xs ys)) = Just (x, Queue xs ys)
This is "amortized O(1)": each time that a dequeue reverses the list, costing O(k) steps for some k, we ensure that we are creating a structure which won't have to pay these costs for k more steps.
Laziness hides errors
Another interesting point comes from the data/codata distinction, where data are finite structures traversed by recursion on subunits (that is, every data expression halts) while codata are the rest of the structures -- strict lists vs. streams. It turns out that when you properly make this distinction, there is no formal difference between strict data and lazy data -- the only formal difference between strict and lazy is how they handle terms within themselves which loop infinitely: strict will explore the loop and hence will also loop infinitely, while lazy will simply hand the infinite-loop onwards without descending into it.
As such you will find that x = slhead (Cons x undefined) will fail where head (x : undefined) succeeds. So you may "uncover" hidden infinite loops or bugs when you do this.
Caution when making "everything strict"
Not everything necessarily becomes strict when you use strict data structures in your language: notice that I made a point above to define strict foldl, not foldl, for both lists and strict-lists. Common data structures in Haskell will be lazy -- lists, tuples, stuff in popular libraries -- and explicit calls to seq still help when building up a complicated expression.

Complexity of two cumulative sum (cumsum) functions in Haskell

Consider the following two cumulative sum (cumsum) functions:
cumsum :: Num a => [a] -> [a]
cumsum [] = []
cumsum [x] = [x]
cumsum (x:y:ys) = x : (cumsum $ (x+y) : ys)
and
cumsum' :: Num a => [a] -> [a]
cumsum' x = [sum $ take k x | k <- [1..length x]]
Of course, I prefer the definition of cumsum to that of cumsum' and I understand that the former has linear complexity.
But just why does cumsum' also have linear complexity? take itself has linear complexity in the length of its argument and k runs from 1 to length x. Therefore I'd have expected quadratic complexity for cumsum'.
Moreover, the constant of cumsum' is lower than that of cumsum. Is that due to the recursive list appending of the latter?
NOTE: welcoming any smart definition of a cumulative sum.
EDIT: I'm measuring execution times using (after enabling :set +s in GHCi):
last $ cumsum [1..n]
This is a measurement error caused by laziness.
Every value in Haskell is lazy: it isn't evaluated until necessary. This includes sub-structure of values - so for example when we see a pattern (x:xs) this only forces evaluation of the list far enough to identify that the list is non-empty, but it doesn't force the head x or the tail xs.
The definition of last is something like:
last [x] = x
last (x:xs) = last xs
So when last is applied to the result of cumsum', it inspects the list comprehension recursively, but only enough to track down the last entry. It doesn't force any of the entries, but it does return the last one.
When this last entry is printed in ghci or whatever, then it is forced which takes linear time as expected. But the other entries are never calculated so we don't see the "expected" quadratic behaviour.
Using maximum instead of last does demonstrate that cumnorm' is quadratic whereas cumnorm is linear.
[Note: this explanation is somewhat hand-wavy: really evaluation is entirely driven by what's needed for the final result, so even last is only evaluted at all because its result is needed. Search for things like "Haskell evaluation order" and "Weak Head Normal Form" to get a more precise explanation.]

Resources