Understanding the Limitations of Lazy Evaluation (Sieve of Eratosthenes)

Understanding the Limitations of Lazy Evaluation (Sieve of Eratosthenes) - haskell

In the Haskell Wiki article on prime numbers, the following implementation of the Sieve of Eratosthenes is described:
primes = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- tail primes])
When doing...
primes !! 2
... how does Haskell recognize in this specific situation not to try all p's in the tail of primes (a.k.a [3..]), but instead only does a set minus with 3?
In other words: how does Haskell know that any of the other p's (or multiples thereof) will not match 5, which is the eventual answer. Is there a good rule of thumb to know when the compiler is smart enough to handle infinite cases?

(!!) only demands that primes be evaluated enough to find out that there are at least 3 elements, and what the third element is. To get that third element, we need to start evaluating the call to minus.
minus assumes that both its arguments are sorted. This is described in the docs, and is satisfied in the definition of primes. The first comparison minus performs is between 5 and 9, and this shows that 5 is the first element of the result. In the definition of minus this is the case LT -> x : loop xs (y:ys).
(!!) now has the third element of primes, so evaluation does not continue in primes or minus or unionAll. This back-and-forth between evaluation of subexpressions and pattern matching in the outer expressions is lazy evaluation.

Actually, the crux is in the implementation of unionAll. minus just pulls elements from its right argument one by one unawares (it assumes both its arguments are non-decreasing of course).
First, let's re-write it as
primes = 2 : ps
ps = 3 : t
t = minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- ps])
-- primes !! 2 == ps !! 1 == head t
= minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- (3 : t)])
= minus [5,7..] (unionAll ([9, 15..] : [[p*p, p*p+2*p..] | p <- t]))
Now unionAll is smart: it relies on the assumption (here, fact) that in unionAll xs it holds that (map head xs) are non-decreasing.
As such, it knows it does not have to compare 9 with anything! So it just produces it unconditionally (you can consult its definition to check it for yourself):
= minus [5,7..]
(9 : union [15, 21..] (unionAll ........))
Thus minus has something to compare the 5 and the 7 with, and produces:
= 5 : 7 : minus [9,11..]
(9 : union [15, 21..] (unionAll ........))
All this from knowing just the first odd prime, 3.

Related

In Sieve of Eratosthenes implementation with Haskell, why are the multiples of 3,5,7.. not removed from the list?

I'm currently self-studying the book The Haskell Road to Logic, Maths and Programming by Doets and Eijck, and I'm in the chapter 3.
In this chapter, the authors provides a Haskell code for the implementation of Sieve of Eratosthenes algorithm, and I did not like their implementation, so I tried to give my own implementation; however, my version of the code does only removes the multiples of 2, and I couldn't figure out the reason for that.Here is the code:
sieve :: [Int] -> [Int]
sieve (0:xs) = sieve xs
sieve (x:xs) = x : sieve (mark x 2 xs)
where
mark :: Int -> Int -> [Int] -> [Int]
mark n k (y:ys)
| y == n*k = 0 : (mark n (k+1) ys)
| otherwise = y : (mark n (k) ys)
and the output is
*Ch3> sieve [2..]
[2,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,...
So, why does the code do not do the same removal operation of the multiples of, the other numbers, such as 3,5,7.. ?

Short answer: The counter k in mark doesn't increment for n > 2.
mark x 2 [2..] correctly strips the evens from the list, and so the next step is calling sieve [3,5..], which amounts to 3:sieve (mark 3 2 [5,7..]), so let's see what happens here.
mark 3 2 [5,7..] (presumably) attempts to remove all the multiples of 3 from the list, but it does this step-by-step, first attempting to remove 6 from the list. However, as the list only contains odd numbers, 6 is never removed from the list, and the first case always fails. The code continues to check against 6, never moving up to remove 9.
Similarly, 25 is never removed, since the code only tries to remove 2*5 from the list.

Haskell Does Not Evaluate Lazily takeWhile

isqrt :: Integer -> Integer
isqrt = floor . sqrt . fromIntegral
primes :: [Integer]
primes = sieve [2..] where
sieve (p:ps) = p : sieve [x | x <- ps, x `mod` p > 0]
primeFactors :: Integer -> [Integer]
primeFactors n = takeWhile (< n) [x | x <- primes, n `mod` x == 0]
Here is my code. I think you guessed what I am trying to do: A list of prime factors of a given number using infinite list of prime numbers. But this code does not evaluate lazily.
When I use ghci and :l mycode.hs and enter primeFactors 24, the result is [2, 3 ( and the cursor constantly flashing there) there isn't a further Prelude> prompt. I think there is a problem there. What am I doing wrong?
Thanks.

takeWhile never terminates for composite arguments. If n is composite, it has no prime factors >= n, so takeWhile will just sit there.
Apply takeWhile to the primes list and then filter the result with n mod x, like this:
primeFactors n = [x | x <- takeWhile (<= n) primes, n `mod` x == 0]
(<= is used instead of < for maximum correctness, so that prime factors of a prime number would consist of that number).

Have an illustration of what happens:
http://sketchtoy.com/67338195

Your problem isn't directly takeWhile, but rather the list comprehension.
[x | x <- primes, n `mod` x == 0]
For n = 24, we get 24 `mod` 2 == 0 and 24 `mod` 3 == 0, so the value of this list comprehension starts with 2 : 3 : .... But consider the ... part.
The list comprehension has to keep pulling values from primes and checking 24 `mod` x == 0. Since there are no more prime factors of 24 nothing will ever pass that test and get emitted as the third value of the list comprehension. But since there's always another prime to test, it will never stop and conclude that the remaining tail of the list is empty.
Because this is lazily evaluated, if you only ever ask for the first two elements of this list then you're fine. But if your program ever needs the third one (or even just to know whether or not there is a third element), then the list comprehension will just spin forever trying to come up with one.
takeWhile (< 24) keeps pulling elements from its argument until it finds one that is not < 24. 2 and 3 both pass that test, so takeWhile (< 24) does need to know what the third element of the list comprehension is.
But it's not really a problem with takeWhile; the problem is that you've written a list comprehension to find all of the prime factors (and nothing else), and then trying to use a filter on the results of that to cut off the infinite exploration of all the higher primes that can't possibly be factors. That doesn't really make sense if you stop to think about it; by definition anything that isn't a prime factor can't be an element of that list, so you can't filter out the non-factors larger than n from that list. Instead you need to filter the input to that list comprehension so that it doesn't try to explore an infinite space, as #n.m's answer shows.

Project Euler 2: Haskell implementation confusion

I am using GHCi to try to solve problem 2 on Project Euler.
http://projecteuler.net/problem=2
I defined the infinite list fibs as:
Prelude> let fibs = 1 : 2 : zipWith(+) fibs (tail fibs)
I tried using list comprehensions in the following manner:
Prelude> [x | x<-fibs, x mod 2 == 0, x<4000000]
[1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025,121393,196418,317811,514229,832040,1346269,2178309,3524578
Prelude> sum $ [x | x <- [1..], x mod 2 == 0, x<4000000]
But the shell hangs on the second command. I am confused as to why the list comprehension can build the list but the sum function cannot process it.
I found that a working solution is
Prelude> sum $ filter even $ takeWhile (<= 4000000) fibs
But once again I am confused as to why that works when the list comprehension method doesn't.

When evaluating
[x | x<-fibs, x mod 2 == 0, x<4000000]
your program/the Haskell compiler does not know that this list is actually finite. When you call a function that consumes a list entirely such as sum, it just keeps generating Fibonacci numbers, testing them for x<4000000 (and failing every time after a certain point).
In fact, the Haskell compiler cannot know in the general case whether a comprehension expression denotes a finite list; the problem is undecidable.

it hangs on the first command as well, isn't it? :)
a list comprehension with test is equivalent to a filter expression, not a takeWhile expression:
sum $ [x | x <- [1..], x mod 2 == 0, x<4000000] ===
sum $ filter (<4000000) $ filter even $ [1..]
this clearly describes a non-terminating computation.
Haskell comprehensions have no Prolog's cut (i.e. "!") equivalent. In Prolog, we are able to stop a computation "from within the test":
sum(I,Acc,Res):- I >= 4000000, !, Res = Acc ;
Acc2 is Acc+I, sum(I+1,Acc2,Res).
but in Haskell we must emulate the cut by using takeWhile, or take.

Replace [1..] with fibs, and your second variant will be equivalent to the first.
EDIT: oh sorry, it won’t. But it will be “less” wrong :) I should be paying more attention…

Lazy evaluation of terms in an infinite list in Haskell

I am curious about the runtime performance of an infinite list like
the one below:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
This will create an infinite list of the fibonacci sequence.
My question is that if I do the following:
takeWhile (<5) fibs
how many times does fibs evaluate each term in the list? It seems
that since takeWhile checks the predicate function for each item in
the list, the fibs list will evaluate each term multiple times. The
first 2 terms are given for free. When takeWhile wants to evaluate
(<5) on the 3rd element, we will get:
1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3
Now, once takeWhile wants to evaluate (<5) on the 4th element: the
recursive nature of fibs will construct the list again like the
following:
1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5
It would seem that the 3rd element needs to be computed again when we
want to evaluate the value of the 4th element. Furthermore, if the
predicate in takeWhile is large, it would indicate the function is
doing more work that is needed since it is evaluating each preceding
element in the list multiple times. Is my analysis here correct or is
Haskell doing some caching to prevent multiple evaluations here?

This is a self-referential, lazy data structure, where "later" parts of the structure refer to earlier parts by name.
Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!
The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:
> take 2 fibs
(a pointer to a cons cell, pointing at '1', and a tail holding the second '1', and a pointer to a function that holds references back to fibs, and the tail of fibs.
Evaluating one more step expands the structure, and slides the references along:
And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely :)
And because we're referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.

Illustration:
module TraceFibs where
import Debug.Trace
fibs :: [Integer]
fibs = 0 : 1 : zipWith tadd fibs (tail fibs)
where
tadd x y = let s = x+y
in trace ("Adding " ++ show x ++ " and " ++ show y
++ "to obtain " ++ show s)
s
Which produces
*TraceFibs> fibs !! 5
Adding 0 and 1 to obtain 1
Adding 1 and 1 to obtain 2
Adding 1 and 2 to obtain 3
Adding 2 and 3 to obtain 5
5
*TraceFibs> fibs !! 5
5
*TraceFibs> fibs !! 6
Adding 3 and 5 to obtain 8
8
*TraceFibs> fibs !! 16
Adding 5 and 8 to obtain 13
Adding 8 and 13 to obtain 21
Adding 13 and 21 to obtain 34
Adding 21 and 34 to obtain 55
Adding 34 and 55 to obtain 89
Adding 55 and 89 to obtain 144
Adding 89 and 144 to obtain 233
Adding 144 and 233 to obtain 377
Adding 233 and 377 to obtain 610
Adding 377 and 610 to obtain 987
987
*TraceFibs>

When something is evaluated in Haskell, it stays evaluated, as long as it's referenced by the same name1.
In the following code, the list l is only evaluated once (which might be obvious):
let l = [1..10]
print l
print l -- None of the elements of the list are recomputed
Even if something is partially evaluated, that part stays evaluated:
let l = [1..10]
print $ take 5 l -- Evaluates l to [1, 2, 3, 4, 5, _]
print l -- 1 to 5 is already evaluated; only evaluates 6..10
In your example, when an element of the fibs list is evaluated, it stays evaluated. Since the arguments to zipWith reference the actual fibs list, it means that the zipping expression will use the already partially computed fibs list when computing the next elements in the list. This means that no element is evaluated twice.
1This is of course not strictly required by the language semantics, but in practice this is always the case.

Think of it this way. The variable fib is a pointer to a lazy value. (You can think of a lazy value underneath as a data structure like (not real syntax) Lazy a = IORef (Unevaluated (IO a) | Evaluated a); i.e. it starts out as unevaluated with a thunk; then when it is evaluated it "changes" to something that remembers the value.) Because the recursive expression uses the variable fib, they have a pointer to the same lazy value (they "share" the data structure). The first time someone evaluates fib, it runs the thunk to get the value and that value is remembered. And because the recursive expression points to the same lazy data structure, when they evaluate it, they will see the evaluated value already. As they traverse the lazy "infinite list", there will only be one "partial list" in memory; zipWith will have two pointers to "lists" which are simply pointers to previous members of the same "list", due to the fact that it started with pointers to the same list.
Note that this is not really "memoizing"; it's just a consequence of referring to the same variable. There is generally no "memoizing" of function results (the following will be inefficient):
fibs () = 0 : 1 : zipWith tadd (fibs ()) (tail (fibs ()))

Why is this Haskell code snippet not infinitely recursive?

To help me learn Haskell, I am working through the problems on Project Euler. After solving each problem, I check my solution against the Haskell wiki in an attempt to learn better coding practices. Here is the solution to problem 3:
primes = 2 : filter ((==1) . length . primeFactors) [3,5..]
primeFactors n = factor n primes
where
factor n (p:ps)
| p*p > n = [n]
| n `mod` p == 0 = p : factor (n `div` p) (p:ps)
| otherwise = factor n ps
problem_3 = last (primeFactors 317584931803)
My naive reading of this is that primes is defined in terms of primeFactors, which is defined in terms of primes. So evaluating primeFactors 9 would follow this process:
Evaluate factor 9 primes.
Ask primes for its first element, which is 2.
Ask primes for its next element.
As part of this process, evaluate primeFactors 3.
Ask primes for its first element, which is 2.
Ask primes for its next element.
As part of this process, evaluate primeFactors 3.
...
In other words, steps 2-4 would repeat infinitely. Clearly I am mistaken, as the algorithm terminates. What mistake am I making here?

primeFactors only ever reads up to the square root of the number it's evaluating. It never looks further in the list, which means it never "catches up" to the number it's testing for inclusion in the list. Because Haskell is lazy, this means that the primeFactors test does terminate.
The other thing to remember is that primes isn't a function that evaluates to a list each time you access it, but rather a list that's constructed lazily. So once the 15th element has been accessed once, accessing it a second time is "free" (e.g. it doesn't require any further calculation).

Kevin's answer is satisfactory, but allow me to pinpoint the flaw in your logic. It is #6 that is wrong. So we're evaluating primeFactors 3:
primeFactors 3 ==>
factor 3 primes ==>
factor 3 (2 : THUNK) ==>
2*2 > 3 == True ==>
[3]
The THUNK need never be evaluated to determine that the primeFactor 3 is [3].

primeFactors 3 doesn't ask primes for its next element, only the first one, because 2*2 is greater than 3 already

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string