infinite lists, lazy evaluation and length - haskell

Haskell noob here: I'm still trying to understand the mechanics of the language, so if my question is plain stupid, forgive me and point me to some link which I can learn from (I've searched awhile in similar topics here on stackoverflow, but still I can't get this).
I came out with this function:
chunks :: Int -> [a] -> [[a]]
chunks n xs
| length xs <= n = [xs]
| otherwise = let (ch, rest) = splitAt n xs in ch:chunks n rest
so that
ghci> chunks 4 "abracadabra"
["abra","cada","bra"]
ghci>
ghci> chunks 3 [1..6]
[[1,2,3],[4,5,6]]
I was pretty satisfied with that, then I thought "there's lazy evaluation! I can use this even on an infinite sequence!". So i tried take 4 $ chunks 3 [1..]. I was hoping that the lazy haskell magic would have produced [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], instead it seems like this time lazyness can't help me: it can't reach the end of the computation (is it walking all the way long to the end of [1..]?)
I think the problem is in the "length xs" part: ghci seems to get stuck also on a simple length [1..]. So I'm asking: is length actually iterating the whole list to give a response? If so, I guess length is to be avoided every time I try to implement something working well with the lazy evaluation, so there is some alternative?
(for instance, how can I improve my example to work with infinite lists?)

is length actually iterating the whole list to give a response?
Yes.
I guess length is to be avoided every time I try to implement something working well with the lazy evaluation
Not just that, it also gives you bad runtimes when laziness isn't a factor (being O(n) in cases where an O(1) check often suffices1), so you should avoid it most of the time in general.
how can I improve my example to work with infinite lists?
You don't need to check whether the length of the list is less than n, you just need to check whether it's zero. And that you can do with a simple pattern match.
1 For example something like f xs | length xs >= 2 = ..., which is O(n), can be replaced with f (x1 : x2 : xs) = ..., which is O(1).

Another trick you can do (which I've seen in Data.Text, but am surprised is not in Prelude for lists in general) is to make length short-circuit as soon as possible by returning an Ordering rather than a Bool.
compareLength :: [a] -> Int -> Ordering
compareLength [] n = compare 0 n
compareLength _ 0 = GT
compareLength (x : xs) n = compareLength xs (n - 1)
Then you can use it in chunks.
chunks :: Int -> [a] -> [[a]]
chunks n xs = case compareLength xs n of
LT -> [xs]
_ -> let (ch, rest) = splitAt n xs in ch:chunks n rest
And this works fine.
*Main> take 4 $ chunks 3 [1..]
[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
For this particular case, other implementations might be more idiomatic, but hopefully this is a nice trick to know.

is length actually iterating the whole list to give a response?
Yes, absolutely.
length is to be avoided every time I try to implement something working well with the lazy evaluation
Yes, absolutely.
so there is some alternative?
Yes: solve the problem without referencing length. There are no general methods of problem solving so you need to work out each specific case.
how can I improve my example to work with infinite lists
You are a railroad worker. A huge train if cars begins where you are standing and stretches over the horizon. You have no idea where it ends, if ever. Your job is to separate it into small trains of three cars each. How do you proceed?

Related

Haskell - why would I use infinite data structures?

In Haskell, it is possible to define infinite lists like so:
[1.. ]
If found many articles which describe how to implement infinite lists and I understood how this works.
However, I cant think of any reason to use the concept of infinite datastructures.
Can someone give me an example of a problem, which can be solved easier (or maybe only) with an infinite list in Haskell?
The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).
For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.
This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:
total = 0
1 < 5 is true
1 * 2 == 2
total = 0 + 2 = 2
2 < 5 is true
2 * 2 == 4
total = 2 + 4 = 6
6 < 5 is false
result = 6
This is exactly how you would write a fast loop in an imperative language (pseudocode):
total = 0;
for x in xs {
if (x < 5) {
total = total + x * 2;
}
}
This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.
For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!
For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.
Annotating a list with its indices temporarily uses an infinite list of indices:
zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):
fib = (memo !!)
where
memo = map fib' [0..] -- cache of *all* fibonacci numbers (evaluated on demand)
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n-1) + fib (n-2)
Purely mocking programs with side-effects (free monads)
data IO a = Return a
| GetChar (Char -> IO a)
| PutChar Char (IO a)
Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))
Tries: if you define a type approximately like the following:
data Trie a = Trie a (Trie a) (Trie a)
it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):
findIndices :: [Integer] -> Trie [Integer]
findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:
-- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
> table = findIndices (concat [ [0..n] | n <- [0..] ])
> table `apply` 0
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
all from a one-line infinite fold.
I'm sure there are more examples, there are so many cool things you can do.

Lazy Catalan Numbers in Haskell

How might I go about efficiently generating an infinite list of Catalan numbers? What I have now works reasonably quickly, but it seems to me that there should be a better way.
c 1 = 1
c n = sum (zipWith (*) xs (reverse xs)) : xs
where xs = c (n-1)
catalan = map (head . c) [1..]
I made an attempt at using fix instead, but the lambda isn't lazy enough for the computation to terminate:
catalan = fix (\xs -> xs ++ [zipWith (*) xs (reverse xs)])
I realize (++) isn't ideal
Does such a better way exist? Can that function be made sufficiently lazy? There's an explicit formula for the nth, I know, but I'd rather avoid it.
The Catalan numbers [wiki] can be defined inductively with:
C0 = 1 and Cn+1=(4n+2)×Cn/(n+2).
So we can implement this as:
catalan :: Integral i => [i]
catalan = xs
where xs = 1 : zipWith f [0..] xs
f n cn = div ((4*n+2) * cn) (n+2)
For example:
Prelude> take 10 catalan
[1,1,2,5,14,42,132,429,1430,4862]
I'm guessing you're looking for a lazy, infinite, self-referential list of all the Catalan numbers using one of the basic recurrence relations. That's a common thing to do with the Fibonacci numbers after all. But it would help to specify the recurrence relation you mean, if you want answers to your specific question. I'm guessing this is the one you mean:
cat :: Integer -> Integer
cat 1 = 1
cat n = sum [ cat i * cat (n - i) | i <- [1 .. n - 1] ]
If so, the conversion to a self-referential form looks like this:
import Data.List (inits)
cats :: [Integer]
cats = 1 : [ sum (zipWith (*) pre (reverse pre)) | pre <- tail (inits cats) ]
This is quite a lot more complex than the fibonacci examples, because the recurrence refers to all previous entries in the list, not just a fixed small number of the most recent. Using inits from Data.List is the easiest way to get the prefix at each position. I used tail there because its first result is the empty list, and that's not helpful here. The rest is a straight-forward rewrite of the recurrence relation that I don't have much to say about. Except...
It's going to perform pretty badly. I mean, it's better than the exponential recursive calls of my cat function, but there's a lot of list manipulation going on that's allocating and then throwing away a lot of memory cells. That recurrence relation is not a very good fit for the recursive structure of the list data type. You can explore a lot of ways to make it more efficient, but they'll all be pretty bad in the end. For this particular case, going to a closed-form solution is the way to go if you want performance.
Apparently, what you wanted is
> cats = 1 : unfoldr (\ fx -> let x = sum $ zipWith (*) fx cats in Just (x, x:fx)) [1]
> take 10 cats
[1,1,2,5,14,42,132,429,1430,4862]
This avoids the repeated reversing of the prefixes (as in the linked answer), by unfolding with the state being a reversed prefix while consing onto the state as well as producing the next element.
The non-reversed prefix we don't have to maintain, as zipping the reversed prefix with the catalans list itself takes care of the catalans prefix's length.
You did say you wanted to avoid the direct formula.
The Catalan numbers are best understood by their generating function, which satisfies the relation
f(t) = 1 + t f(t)^2
This can be expressed in Haskell as
f :: [Int]
f = 1 : convolve f f
for a suitable definition of convolve. It is helpful to factor out convolve, for many other counting problems take this form. For example, a generalized Catalan number enumerates ternary trees, and its generating function satisfies the relation
g(t) = 1 + t g(t)^3
which can be expressed in Haskell as
g :: [Int]
g = 1 : convolve g (convolve g g)
convolve can be written using Haskell primitives as
convolve :: [Int] -> [Int] -> [Int]
convolve xs = map (sum . zipWith (*) xs) . tail . scanl (flip (:)) []
For these two examples and many other special cases, there are formulas that are quicker to evaluate. convolve is however more general, and cognitively more efficient. In a typical scenario, one has understood a counting problem in terms of a polynomial relation on its generating function, and now wants to compute some numbers in order to look them up in The On-Line Encyclopedia of Integer Sequences. One wants to get in and out, indifferent to complexity. What language will be least fuss?
If one has seen the iconic Haskell definition for the Fibonacci numbers
fibs :: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
then one imagines there must be a similar idiom for products of generating functions. That search is what brought me here.

Interesting puzzle in Haskell

Hello studying Haskell I came up at an exercise at the web that it requested to create a list given an integer the way described below:
for example if integer was 3 then a list should be generated that it contains the following:
[[3],[1,2],[2,1],[1,1,1]]
note
3=3
1+2=3
2+1=3
1+1+1=3
if integer was 2 then it would be:
[[2],[1,1]]
I cannot think a way of implementing this, so can you provide me with any hints? I believe that I must use list comprehension but I cannot think anything further than this
Always start with a type signature:
sums :: Int -> [[Int]]
Now, let's think about the recursion.
What is the base case? Can you think of a number for which the answer is trivial?
Let's say you've implemented your function and it works for all numbers under 10, so sums 9 for example returns the right answer. How would you implement sums 10?
Don't bother yourself with implementation details (e.g. List comprehension vs. filter and map) until you've answered these questions.
And another tip: Haskell programmers love to show off and create tiny-pointfree functions, but don't let it confuse you. Getting things to work is the important thing. It's better to have a working yet somewhat "ugly" solution than to stare at the screen looking for an elegant one.
Good luck!
Looks a bit like partitioning a list. A bit of googling turns up this
http://www.haskell.org/pipermail/beginners/2011-April/006832.html
partitions [] = [[]]
partitions (x:xs) = [[x]:p | p <- partitions xs]
++ [(x:ys):yss | (ys:yss) <- partitions xs]
which produces something like this
*Main> partitions "abc"
[["a","b","c"],["a","bc"],["ab","c"],["abc"]]
now all you have to do is get the length of the inner lists
map (map length) (partitions "abc")
[[1,1,1],[1,2],[2,1],[3]]
you can also change partitions to give you the result directly
partitions' 0 = [[]]
partitions' n = [1:p | p <- partitions' (n-1)]
++ [(1+ys):yss | (ys:yss) <- partitions' (n-1)]

Functionally solving questions: how to use Haskell?

I am trying to solve one of the problem in H99:
Split a list into two parts; the length of the first part is given.
Do not use any predefined predicates.
Example:
> (split '(a b c d e f g h i k) 3)
( (A B C) (D E F G H I K))
And I can quickly come with a solution:
split'::[a]->Int->Int->[a]->[[a]]
split' [] _ _ _ = []
split' (x:xs) y z w = if y == z then [w,xs] else split' xs y (z+1) (w++[x])
split::[a]->Int->[[a]]
split x y = split' x y 0 []
My question is that what I am doing is kind of just rewriting the loop version in a recursion format. Is this the right way you do things in Haskell? Isn't it just the same as imperative programming?
EDIT: Also, how do you generally avoid the extra function here?
It's convenient that you can often convert an imperative solution to Haskell, but you're right, you do usually want to find a more natural recursive statement. For this one in particular, reasoning in terms of base case and inductive case can be very helpful. So what's your base case? Why, when the split location is 0:
split x 0 = ([], x)
The inductive case can be built on that by prepending the first element of the list onto the result of splitting with n-1:
split (x:xs) n = (x:left, right)
where (left, right) = split xs (n-1)
This may not perform wonderfully (it's probably not as bad as you'd think) but it illustrates my thought process when I first encounter a problem and want to approach it functionally.
Edit: Another solution relying more heavily on the Prelude might be:
split l n = (take n l, drop n l)
It's not the same as imperative programming really, each function call avoids any side effects, they're just simple expressions. But I have a suggestion for your code
split :: Int -> [a] -> ([a], [a])
split p xs = go p ([], xs)
where go 0 (xs, ys) = (reverse xs, ys)
go n (xs, y:ys) = go (n-1) (y : xs, ys)
So how we've declared that we're only returning two things ([a], [a]) instead of a list of things (which is a bit misleading) and that we've constrained our tail recursive call to be in local scope.
I'm also using pattern matching, which is a more idiomatic way to write recursive functions in Haskell, when go is called with a zero, then the first case is run. It's more pleasant generally to write recursive functions that go down rather than up since you can use pattern matching rather than if statements.
Finally this is more efficient since ++ is linear in the length of the first list, which means that the complexity of your function is quadratic rather than linear. This method is also tail recursive unlike Daniel's solution, which is important for handling any large lists.
TLDR: Both versions are functional style, avoiding mutation, using recursion instead of loops. But the version I've presented is a little more Haskell-ish and slightly faster.
A word on tail recursion
This solution uses tail recursion which isn't always essential in Haskell but in this case is helpful when you use the resulting lists, but at other times is actually a bad thing. For example, map isn't tail recursive, but if it was you couldn't use it over infinite lists!
In this case, we can use tail recursion, since an integer is always finite. But, if we only use the first element of the list, Daniel's solution is much faster, since it produces the list lazily. On the other hand, if we use the whole list, my solution is much faster.
split'::[a]->Int->([a],[a])
split' [] _ = ([],[])
split' xs 0 = ([],xs)
split' (x:xs) n = (x:(fst splitResult),snd splitResult)
where splitResult = split' xs (n-1)
It seems you have already shown an example of a better solution.
I would recommend you read SICP. Then you come to the conclusion that the extra function is normal. There's also widely used approach to hide functions in the local area. The book may seem boring to you but in the early chapters she will get used to the functional approach in solving problems.
There are tasks in which the recursive approach is more necessary. But for example if you use tail recursion (which is so often praised without cause) then you will notice that this is just the usual iteration. Often with "extra-function" which hide iteration variable (oh.. word variable is not very appropriate, likely argument).

Haskell lazy evaluation

If I call the following Haskell code
find_first_occurrence :: (Eq a) => a -> [a] -> Int
find_first_occurrence elem list = (snd . head) [x | x <- zip list [0..], fst x == elem]
with the arguments
'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"
how much of the zipped list [('a',0), ('b',1), ] is going to be built?
UPDATE:
I tried to run
find_first_occurrence 10 [1..]
and returns 9 almost instantly, so I guess it does use lazy evaluation at least for simple cases? The answer is also computed "instantly" when I run
let f n = 100 - n
find_first_occurrence 10 (map f [1..])
Short answer: it will be built only up to the element you're searching for. This means that only in the worst case you'll need to build the whole list, that is when no element satisfies the conditions.
Long answer: let me explain why with a pair of examples:
ghci> head [a | (a,b) <- zip [1..] [1..], a > 10]
11
In this case, zip should produce an infinite list, however the laziness enables Haskell to build it only up to (11,11): as you can see, the execution does not diverge but actually gives us the correct answer.
Now, let me consider another issue:
ghci> find_first_occurrence 1 [0, 0, 1 `div` 0, 1]
*** Exception: divide by zero
ghci> find_first_occurrence 1 [0, 1, 1 `div` 0, 0]
1
it :: Int
(0.02 secs, 1577136 bytes)
Since the whole zipped list is not built, haskell obviously will not even evaluate each expression occurring in the list, so when the element is before div 1 0, the function is correctly evaluated without raising exceptions: the division by zero did not occur.
All of it.
Since StackOverflow won't let me post such a short answer: you can't get away with doing less work than looking through the whole list if the thing you're looking for isn't there.
Edit: The question now asks something much more interesting. The short answer is that we will build the list:
('a',0):('b',1):('c',2):('d',3):('X',4):<thunk>
(Actually, this answer is just the slightest bit subtle. Your type signature uses the monomorphic return type Int, which is strict in basically all operations, so all the numbers in the tuples above will be fully evaluated. There are certainly implementations of Num for which you would get something with more thunks, though.)
You can easily answer such a question by introducing undefineds here and there. In our case it is sufficient to change our inputs:
find_first_occurrence 'X' ("abcdX" ++ undefined)
You can see that it produces the result, which means that it does not even look beyond the 'X' it found (otherwise it would have thrown an Exception). Obviously, the zipped list can not be built without looking at the original list.
Another (possibly less reliable) way to analyse your laziness is to use trace function from Debug.Trace:
> let find_first_occurrence elem list = (snd . head) [x | x <- map (\i -> trace (show i) i) $ zip list [0..], fst x == elem]
> find_first_occurrence 'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"
Prints
('a',0)
('b',1)
('c',2)
('d',3)
('X',4)
4

Resources