Why is this Haskell code snippet not infinitely recursive? - haskell

To help me learn Haskell, I am working through the problems on Project Euler. After solving each problem, I check my solution against the Haskell wiki in an attempt to learn better coding practices. Here is the solution to problem 3:
primes = 2 : filter ((==1) . length . primeFactors) [3,5..]
primeFactors n = factor n primes
where
factor n (p:ps)
| p*p > n = [n]
| n `mod` p == 0 = p : factor (n `div` p) (p:ps)
| otherwise = factor n ps
problem_3 = last (primeFactors 317584931803)
My naive reading of this is that primes is defined in terms of primeFactors, which is defined in terms of primes. So evaluating primeFactors 9 would follow this process:
Evaluate factor 9 primes.
Ask primes for its first element, which is 2.
Ask primes for its next element.
As part of this process, evaluate primeFactors 3.
Ask primes for its first element, which is 2.
Ask primes for its next element.
As part of this process, evaluate primeFactors 3.
...
In other words, steps 2-4 would repeat infinitely. Clearly I am mistaken, as the algorithm terminates. What mistake am I making here?

primeFactors only ever reads up to the square root of the number it's evaluating. It never looks further in the list, which means it never "catches up" to the number it's testing for inclusion in the list. Because Haskell is lazy, this means that the primeFactors test does terminate.
The other thing to remember is that primes isn't a function that evaluates to a list each time you access it, but rather a list that's constructed lazily. So once the 15th element has been accessed once, accessing it a second time is "free" (e.g. it doesn't require any further calculation).

Kevin's answer is satisfactory, but allow me to pinpoint the flaw in your logic. It is #6 that is wrong. So we're evaluating primeFactors 3:
primeFactors 3 ==>
factor 3 primes ==>
factor 3 (2 : THUNK) ==>
2*2 > 3 == True ==>
[3]
The THUNK need never be evaluated to determine that the primeFactor 3 is [3].

primeFactors 3 doesn't ask primes for its next element, only the first one, because 2*2 is greater than 3 already

Related

Primes in Haskell

I'm learning Haskell, and I've tried to generate an infinite list of primes, but I can't understand what my function is doing wrong.
The function:
prime = 2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..]
I think it's the init prime, but the strange thing is that even if I set an upper bound to the range (5..10 for example), the function loops forever and never gets any result for prime !! 2
Can you please tell me what I'm doing wrong?
Well, for one let's look at what init does for a finite list:
init [1] == []
init [1,2] == [1]
init [1,2,3] == [1,2]
ok, so it gives us all but the last element of the list.
So what's init primes? Well, prime without the last element. Hopefully if we implemented prime correctly it shouldn't have a last element (because there are infinitely many primes!), but more importantly we don't quite need to care yet because we don't have the full list for now anyway - we only care about the first couple of elements after all, so for us it's pretty much the same as just prime itself.
Now, looking at all: What does this do? Well, it takes a list and a predicate and tells us if all the elements of the list satisfy the predicate:
all (<5) [1..4] == True
all even [1..4] == False
it even works with infinite lists!
all (<5) [1..] == False
so what's going on here? Well, here's the thing: It does work with infinite lists... but only if we can actually evaluate the list up to the first element of the list that violates the predicate! Let's see if this holds true here:
all (\y -> (mod 5 y) > 0) (init prime)
so to find out if 5 is a prime number, we'd have to check if there's a number in prime minus the last element of prime that divides it. Let's see if we can do that.
Now let's look at the definition of prime, we get
all (\y -> (mod 5 y) > 0) (2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..])
So to determine whether 5 is a prime number, we only have to check if it's:
divisible by 2 - it's not, let's continue
divisible by 3 - still no
divisible by ...? Well, we're in the process of checking what the 3rd prime is so we don't know yet...
and there's the crux of the problem. With this logic, to determine the third prime number you need to know the third prime number! Of course logically, we actually don't want to check this at all, rather we only need to check if any of the smaller prime numbers are divisors of the current candidate.
So how do we go about doing that? Well, we'll have to change our logic unfortunately. One thing we can do is try to remember how many primes we already have, and only take as many as we need for our comparison:
prime = 2 : 3 : morePrimes 2 [5..]
morePrimes n (x:xs)
| all (\y -> mod x y > 0) (take n prime) = x : morePrimes (n+1) xs
| otherwise = morePrimes n xs
so how does this work? Well, it basically does what we were just talking about: We remember how many primes we already have (starting at 2 because we know we have at least [2,3] in n. We then check if our next prime is divisible by any of the of n primes we already know by using take n, and if it is we know it's our next prime and we need to increment n - otherwise we just carry on.
There's also the more well known form inspired by (although not quite the same as) the Sieve of Eratosthenes:
prime = sieve [2..] where
sieve (p:xs) = p : sieve (filter (\x -> mod x p > 0) xs)
so how does this work? Well, again with a similar idea: We know that the next prime number needs to be non-divisible by any previous prime number. So what do we do? Well, starting at 2 we know that the first element in the list is a prime number. We then throw away every number divisible by that prime number using filter. And afterwards, the next item in the list is going to be a prime number again (because we didn't throw it away), so we can repeat the process.
Neither of these are one liners like the one you were hoping for though.
If the code in the other answer is restructured under the identity
[take n primes | n <- [0..]] == inits primes
eventually we get
import Data.List
-- [ ([], 2), ([2], 3), ([2,3], 5), ... ]
primes = 2 : [ c | (ps, p) <- zip (inits primes) primes,
c <- take 1 [c | c <- [p+1..],
and [mod c p > 0 | p <- ps]]]
Further improving it algorithmically, it becomes
primes = 2 : [ c | (ps, r:q:_) <- zip (inits primes) -- [] [3,4,...]
(tails $ 3 : map (^2) primes), -- [2] [4,9,...]
c <- [r..q-1], and [mod c p > 0 | p <- ps]] -- [2,3] [9,25,...]

Understanding the Limitations of Lazy Evaluation (Sieve of Eratosthenes)

In the Haskell Wiki article on prime numbers, the following implementation of the Sieve of Eratosthenes is described:
primes = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- tail primes])
When doing...
primes !! 2
... how does Haskell recognize in this specific situation not to try all p's in the tail of primes (a.k.a [3..]), but instead only does a set minus with 3?
In other words: how does Haskell know that any of the other p's (or multiples thereof) will not match 5, which is the eventual answer. Is there a good rule of thumb to know when the compiler is smart enough to handle infinite cases?
(!!) only demands that primes be evaluated enough to find out that there are at least 3 elements, and what the third element is. To get that third element, we need to start evaluating the call to minus.
minus assumes that both its arguments are sorted. This is described in the docs, and is satisfied in the definition of primes. The first comparison minus performs is between 5 and 9, and this shows that 5 is the first element of the result. In the definition of minus this is the case LT -> x : loop xs (y:ys).
(!!) now has the third element of primes, so evaluation does not continue in primes or minus or unionAll. This back-and-forth between evaluation of subexpressions and pattern matching in the outer expressions is lazy evaluation.
Actually, the crux is in the implementation of unionAll. minus just pulls elements from its right argument one by one unawares (it assumes both its arguments are non-decreasing of course).
First, let's re-write it as
primes = 2 : ps
ps = 3 : t
t = minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- ps])
-- primes !! 2 == ps !! 1 == head t
= minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- (3 : t)])
= minus [5,7..] (unionAll ([9, 15..] : [[p*p, p*p+2*p..] | p <- t]))
Now unionAll is smart: it relies on the assumption (here, fact) that in unionAll xs it holds that (map head xs) are non-decreasing.
As such, it knows it does not have to compare 9 with anything! So it just produces it unconditionally (you can consult its definition to check it for yourself):
= minus [5,7..]
(9 : union [15, 21..] (unionAll ........))
Thus minus has something to compare the 5 and the 7 with, and produces:
= 5 : 7 : minus [9,11..]
(9 : union [15, 21..] (unionAll ........))
All this from knowing just the first odd prime, 3.

Haskell Does Not Evaluate Lazily takeWhile

isqrt :: Integer -> Integer
isqrt = floor . sqrt . fromIntegral
primes :: [Integer]
primes = sieve [2..] where
sieve (p:ps) = p : sieve [x | x <- ps, x `mod` p > 0]
primeFactors :: Integer -> [Integer]
primeFactors n = takeWhile (< n) [x | x <- primes, n `mod` x == 0]
Here is my code. I think you guessed what I am trying to do: A list of prime factors of a given number using infinite list of prime numbers. But this code does not evaluate lazily.
When I use ghci and :l mycode.hs and enter primeFactors 24, the result is [2, 3 ( and the cursor constantly flashing there) there isn't a further Prelude> prompt. I think there is a problem there. What am I doing wrong?
Thanks.
takeWhile never terminates for composite arguments. If n is composite, it has no prime factors >= n, so takeWhile will just sit there.
Apply takeWhile to the primes list and then filter the result with n mod x, like this:
primeFactors n = [x | x <- takeWhile (<= n) primes, n `mod` x == 0]
(<= is used instead of < for maximum correctness, so that prime factors of a prime number would consist of that number).
Have an illustration of what happens:
http://sketchtoy.com/67338195
Your problem isn't directly takeWhile, but rather the list comprehension.
[x | x <- primes, n `mod` x == 0]
For n = 24, we get 24 `mod` 2 == 0 and 24 `mod` 3 == 0, so the value of this list comprehension starts with 2 : 3 : .... But consider the ... part.
The list comprehension has to keep pulling values from primes and checking 24 `mod` x == 0. Since there are no more prime factors of 24 nothing will ever pass that test and get emitted as the third value of the list comprehension. But since there's always another prime to test, it will never stop and conclude that the remaining tail of the list is empty.
Because this is lazily evaluated, if you only ever ask for the first two elements of this list then you're fine. But if your program ever needs the third one (or even just to know whether or not there is a third element), then the list comprehension will just spin forever trying to come up with one.
takeWhile (< 24) keeps pulling elements from its argument until it finds one that is not < 24. 2 and 3 both pass that test, so takeWhile (< 24) does need to know what the third element of the list comprehension is.
But it's not really a problem with takeWhile; the problem is that you've written a list comprehension to find all of the prime factors (and nothing else), and then trying to use a filter on the results of that to cut off the infinite exploration of all the higher primes that can't possibly be factors. That doesn't really make sense if you stop to think about it; by definition anything that isn't a prime factor can't be an element of that list, so you can't filter out the non-factors larger than n from that list. Instead you need to filter the input to that list comprehension so that it doesn't try to explore an infinite space, as #n.m's answer shows.

Using non-deterministic list monad to find long Collatz sequences

I wrote the following code to solve Project Euler's No. 14:
The following iterative (Collatz) sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Q: Which starting number, under one million, produces the longest chain?
And my code:
collatz :: Integer -> [Integer]
collatz 1 = [1]
collatz n =
filter (< 1000000) prev >>= poss
where prev = collatz (n - 1)
poss :: Integer -> [Integer]
poss prev
| even prev && prev `mod` 3 == 1 && (prev - 1) `div` 3 > 1 = [2 * prev, (prev - 1) `div` 3]
| otherwise = [2 * prev]
Where collatz n returns a list of numbers that will generate a Collatz chain of length n. The problem is, I can only either not restrict the result or restrict the whole chain, instead of only the seed number, to be under 1000,000. Is it possible to use this model to solve the problem at all?
I think that this approach - while interesting - is fundamentally doomed. Suppose I discover that all the seeds which result in a chain of length 500 are above 2,000,000. How can I know that I won't find that in three more steps there's a seed under 1,000,000 that gets me there? I see no way to know when you're done.
The only viable approach I see to this problem is to compute the collatz length for every number from 1 to 999,999 and then do something like:
main :: IO ()
main = do
let collatzMax = maximumBy (compare `on` collatzLength) [1..999999]
print collatzMax
On the other hand, this provides a great opportunity to learn about CAFs since the function collatzLength could be naively defined as:
collatzLength 1 = 1
collatzLength n | n `mod` 2 == 0 = 1 + collatzLength (n `div` 2)
collatzLength n = 1 + collatzLength (3 * n + 1)
And that kind of recursion screams out for a CAF.
Sure, there are memoization modules that will go and build the CAF for you, but building one yourself is a useful exercise. It's a whole little mini-course in lazy infinitely-recursive data structures.
If that defeats you, you can glance at this spoiler of how to use a CAF and then rewrite it using a different data structure. (what about a 10-way tree instead of a binary tree? What about traversing the tree in a different order? Can you remove the call to showIntAtBase?)
Your idea is interesting, although not the most efficient one. It could be worth trying, although it'll be probably memory intensive. Some thoughts:
As some chains can go over 1000000, so you can't just filter out everything less in collatz. You need to keep all the numbers in each pass.
Calling collatz this way is inefficient, as it computes the sets all over again. Making it an infinite list that shares values would be more efficient:
collatz :: [[Integer]]
collatz = [1] : map (>>= poss) collatz
You need to figure out when you're done. For this you'd need to go through the number lists generated by collatz and count how many of them are below 1000000. When you have seen all the numbers below the limit, the last list will contain the numbers with the longest chain.
That said, I'm afraid this approach isn't computationally feasible. In particular, you'll generate exponentially many numbers and exponentially large ones. For example, if the longest chain would be 500, the result of collatz in that step would contain numbers up to 2^500. And as mentioned, there is no way to tell which of these huge numbers might be the one leading to the solution, so you can't just discard them.

Prime Factoring Function in Haskell

I am trying to make a function that will display a number's prime factors with a list (infinite) that I give it. Here is what I have so far:
-- Here is a much more efficient (but harder to understand) version of primes.
-- Try "take 100 primes" as an example (or even more if you like)
primes = 2 : primesFrom3 where
primesFrom3 = sieve [3,5..] 9 primesFrom3
sieve (x:xs) b ~ps#(p:q:_)
| x < b = x : sieve xs b ps
| otherwise = sieve [x | x <- xs, rem x p /= 0] (q^2) (tail ps)
-- Write a function that factors its first argument using the (infinite)
-- list of available factors given in its second argument
-- (using rem x p /= 0 to check divisibility)
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if (rem n p /= 0) then
(primeFactsWith n ps)
else (primeFactsWith p ps)
The top half was not written by me and works just fine. I am trying to get the second half to work, but it isn't. Read the comments in the code to better understand exactly what I am trying to do. Thanks! Oh and please don't just spout the answer. Give me some hints on how to do it and maybe what is wrong.
What's wrong
The problem is that you do a recursive call in both branches, therefore the function will never stop.
Some Hints
To build a recursive list-producing function, you'll need two branches or cases:
Base case no recursive call, this stops the recursion and returns the final part of the result.
Recursive case here you modify the parameters of the function and call it again with the modified parameters, possibly also returning a part of the result.
You need two sub branches at the recursive branch. One if you've found a prime factor, and another if the current number is no prime factor.
Here is a skeleton, you need to fill in the parts in the <> brackets.
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if <halt condition> then
<final result>
else if (rem n p /= 0) then
<not a factor - recursive call 1>
else
<found a factor - return it,
and make recursive call 2>
If you have found a prime factor, you can divide the number by it, to get a smaller number, without that factor. To perform integer division Haskell provides a function named div.
If you reach the number 1, you have generated all prime factors and you can stop. The final part of a prime factors list, that comes after all its factors, is an empty list.
You can drop any prime from your infinite list if you no longer need it, but be aware that a number could contain a prime several times in the factors list. If you want to drop p you can just use ps, from the pattern; if you want to keep p you must use (p:ps).
The cons operator (:) can be used to build a list. You can use it to return one number of the result list, and use a recursive call to find the remaining numbers, e.g.
x : foo y z
I hope that helps, if you have any questions don't hesitate to ask.
Here's a hint.
So you're recursing, which is good.
In one branch you keep looking for factors of n. In the other branch you seem to look for the factors of p, which is a bit weird, but whatevs.
Where do you return the factors of n you've found?

Resources