Primes in Haskell - haskell

I'm learning Haskell, and I've tried to generate an infinite list of primes, but I can't understand what my function is doing wrong.
The function:
prime = 2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..]
I think it's the init prime, but the strange thing is that even if I set an upper bound to the range (5..10 for example), the function loops forever and never gets any result for prime !! 2
Can you please tell me what I'm doing wrong?

Well, for one let's look at what init does for a finite list:
init [1] == []
init [1,2] == [1]
init [1,2,3] == [1,2]
ok, so it gives us all but the last element of the list.
So what's init primes? Well, prime without the last element. Hopefully if we implemented prime correctly it shouldn't have a last element (because there are infinitely many primes!), but more importantly we don't quite need to care yet because we don't have the full list for now anyway - we only care about the first couple of elements after all, so for us it's pretty much the same as just prime itself.
Now, looking at all: What does this do? Well, it takes a list and a predicate and tells us if all the elements of the list satisfy the predicate:
all (<5) [1..4] == True
all even [1..4] == False
it even works with infinite lists!
all (<5) [1..] == False
so what's going on here? Well, here's the thing: It does work with infinite lists... but only if we can actually evaluate the list up to the first element of the list that violates the predicate! Let's see if this holds true here:
all (\y -> (mod 5 y) > 0) (init prime)
so to find out if 5 is a prime number, we'd have to check if there's a number in prime minus the last element of prime that divides it. Let's see if we can do that.
Now let's look at the definition of prime, we get
all (\y -> (mod 5 y) > 0) (2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..])
So to determine whether 5 is a prime number, we only have to check if it's:
divisible by 2 - it's not, let's continue
divisible by 3 - still no
divisible by ...? Well, we're in the process of checking what the 3rd prime is so we don't know yet...
and there's the crux of the problem. With this logic, to determine the third prime number you need to know the third prime number! Of course logically, we actually don't want to check this at all, rather we only need to check if any of the smaller prime numbers are divisors of the current candidate.
So how do we go about doing that? Well, we'll have to change our logic unfortunately. One thing we can do is try to remember how many primes we already have, and only take as many as we need for our comparison:
prime = 2 : 3 : morePrimes 2 [5..]
morePrimes n (x:xs)
| all (\y -> mod x y > 0) (take n prime) = x : morePrimes (n+1) xs
| otherwise = morePrimes n xs
so how does this work? Well, it basically does what we were just talking about: We remember how many primes we already have (starting at 2 because we know we have at least [2,3] in n. We then check if our next prime is divisible by any of the of n primes we already know by using take n, and if it is we know it's our next prime and we need to increment n - otherwise we just carry on.
There's also the more well known form inspired by (although not quite the same as) the Sieve of Eratosthenes:
prime = sieve [2..] where
sieve (p:xs) = p : sieve (filter (\x -> mod x p > 0) xs)
so how does this work? Well, again with a similar idea: We know that the next prime number needs to be non-divisible by any previous prime number. So what do we do? Well, starting at 2 we know that the first element in the list is a prime number. We then throw away every number divisible by that prime number using filter. And afterwards, the next item in the list is going to be a prime number again (because we didn't throw it away), so we can repeat the process.
Neither of these are one liners like the one you were hoping for though.

If the code in the other answer is restructured under the identity
[take n primes | n <- [0..]] == inits primes
eventually we get
import Data.List
-- [ ([], 2), ([2], 3), ([2,3], 5), ... ]
primes = 2 : [ c | (ps, p) <- zip (inits primes) primes,
c <- take 1 [c | c <- [p+1..],
and [mod c p > 0 | p <- ps]]]
Further improving it algorithmically, it becomes
primes = 2 : [ c | (ps, r:q:_) <- zip (inits primes) -- [] [3,4,...]
(tails $ 3 : map (^2) primes), -- [2] [4,9,...]
c <- [r..q-1], and [mod c p > 0 | p <- ps]] -- [2,3] [9,25,...]

Related

Why does my function not work with an infinite list?

I'm trying to learn haskell and implemented a function conseq that would return a list of consecutive elements of size n.
conseq :: Int -> [Int] -> [[Int]]
conseq n x
| n == length(x) = [x]
| n > length(x) = [x]
| otherwise = [take n x] ++ (conseq n (drop 1 x))
This works correctly.
> take 5 $ conseq 2 [1..10]
[[1,2],[2,3],[3,4],[4,5],[5,6]]
However, if I pass [1..] instead of [1..10], the program gets stuck in an infinite loop.
As I understood it, haskell has lazy evaluation so I should still be able to get the same result right? Is it length? Shouldn't the first two conditions evaluate to false as soon as the length becomes greater than n?
What did I misunderstand?
One of the main reasons why using length is not a good idea is because when it has to be evaluated on an infinite list, it will get stuck in an infinite loop.
The good news is however, we don't need length. It would also make the time complexity worse. We can work with two enumerators, one is n-1 places ahead of the other. If this enumerator reaches the end of the list, then we know that the first enumerator still has n-1 elements, and thus we can stop yielding values:
conseq :: Int -> [a] -> [[a]]
conseq n ys = go (drop (n-1) ys) ys
where go [] _ = []
go (_:as) ba#(~(_:bs)) = take n ba : go as bs
This gives us thus:
Prelude> conseq 3 [1 ..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12],[11,12,13],[12,13,14],[13,14,15],[14,15,16],[15,16,17],[16,17,18],[17,18,19],[18,19,20],[19,20,21],[20,21,22],[21,22,23],[22,23,24],[23,24,25],[24,25,26],[25,26,27],…
Prelude> conseq 3 [1 .. 4]
[[1,2,3],[2,3,4]]
The first thing your function does is calculate length(x), so it knows whether it should return [x], [x], or [take n x] ++ (conseq n (drop 1 x))
length counts the number of elements in the list - all the elements. If you ask for the length of an infinite list, it never finishes counting.

Haskell Does Not Evaluate Lazily takeWhile

isqrt :: Integer -> Integer
isqrt = floor . sqrt . fromIntegral
primes :: [Integer]
primes = sieve [2..] where
sieve (p:ps) = p : sieve [x | x <- ps, x `mod` p > 0]
primeFactors :: Integer -> [Integer]
primeFactors n = takeWhile (< n) [x | x <- primes, n `mod` x == 0]
Here is my code. I think you guessed what I am trying to do: A list of prime factors of a given number using infinite list of prime numbers. But this code does not evaluate lazily.
When I use ghci and :l mycode.hs and enter primeFactors 24, the result is [2, 3 ( and the cursor constantly flashing there) there isn't a further Prelude> prompt. I think there is a problem there. What am I doing wrong?
Thanks.
takeWhile never terminates for composite arguments. If n is composite, it has no prime factors >= n, so takeWhile will just sit there.
Apply takeWhile to the primes list and then filter the result with n mod x, like this:
primeFactors n = [x | x <- takeWhile (<= n) primes, n `mod` x == 0]
(<= is used instead of < for maximum correctness, so that prime factors of a prime number would consist of that number).
Have an illustration of what happens:
http://sketchtoy.com/67338195
Your problem isn't directly takeWhile, but rather the list comprehension.
[x | x <- primes, n `mod` x == 0]
For n = 24, we get 24 `mod` 2 == 0 and 24 `mod` 3 == 0, so the value of this list comprehension starts with 2 : 3 : .... But consider the ... part.
The list comprehension has to keep pulling values from primes and checking 24 `mod` x == 0. Since there are no more prime factors of 24 nothing will ever pass that test and get emitted as the third value of the list comprehension. But since there's always another prime to test, it will never stop and conclude that the remaining tail of the list is empty.
Because this is lazily evaluated, if you only ever ask for the first two elements of this list then you're fine. But if your program ever needs the third one (or even just to know whether or not there is a third element), then the list comprehension will just spin forever trying to come up with one.
takeWhile (< 24) keeps pulling elements from its argument until it finds one that is not < 24. 2 and 3 both pass that test, so takeWhile (< 24) does need to know what the third element of the list comprehension is.
But it's not really a problem with takeWhile; the problem is that you've written a list comprehension to find all of the prime factors (and nothing else), and then trying to use a filter on the results of that to cut off the infinite exploration of all the higher primes that can't possibly be factors. That doesn't really make sense if you stop to think about it; by definition anything that isn't a prime factor can't be an element of that list, so you can't filter out the non-factors larger than n from that list. Instead you need to filter the input to that list comprehension so that it doesn't try to explore an infinite space, as #n.m's answer shows.

Keep getting stack overflow

I am repeatedly getting a stack overflow on my solution to Project Euler #7 and i have no idea why.
Here is my code:
import System.Environment
checkPrime :: Int -> Bool
checkPrime n = not $ testList n [2..n `div` 2]
--testList :: Int -> [Int] -> Bool
testList _ [] = False
testList n xs
| (n `rem` (head xs) == 0) = True
| otherwise = testList n (tail xs)
primesTill n = sum [1 | x <- [2..n], checkPrime x]
nthPrime n = nthPrime' n 2
nthPrime' n x
| (primesTill x == n) = x
| otherwise = nthPrime' n x+1
main = print (nthPrime 10001)
resolving the stackoverflow
As #bheklilr mentioned in his comment the stackoverflow is caused by a wrong evaluation order in the otherwise branch of the nthPrime' function:
nthPrime' n x+1
Will be interpreted as
(nthPrime' n x)+1
Because this expression is called recursively, your call of nthPrime' n 2 will expand into
(nthPrime' n 2)+1+1+1+1+1+1+1+1 ...
but the second parameter will never get incremented and your program collects a mass of unevaluated thunks. The evaluation can only happen if the first parameter is reduced to an Int, but your function is in an endless recursion so this will never take place. All the plus ones are stored on the stack, if there is no more space left you'll get a stackoverflow error.
To solve this problem you need to put parranteses around the x+1 so your recursive call will look like this
nthPrime' n (x+1)
Now the parameters gets incremented before it is passed to the recursive call.
This should solve your stackoverflow problem, you can try it out with a smaller number e.g. 101 and you'll get the desired result.
runtime optimization
If you test your program with the original value 10001 you may realize that it still won't finish in a reasonable amount of time.
I won't go into the details of fancy algorithms to solve this problems, if you're interested in them you can easily find them online.
Instead I'll show you were the problem in your code is and show you a simple solution.
The bottleneck is your nthPrime function:
primesTill n = sum [1 | x <- [2..n], checkPrime x]
nthPrime n = nthPrime' n 2
nthPrime' n x
| (primesTill x == n) = x
| otherwise = nthPrime' n (x+1)
This function checks if the number of primes between 2 and x is equal to n. The idea is correct, but it leads to an exponential runtime. The problem is that you recalculate primesTill x for every iteration. To count the primes smaller than x you calculate them all and than sum them up. In the next step for x+1 you forget every thing you know about the numbers between 2 and x and test them all again if they are prime only as a last step you test the if x+1 is prime. Than you repeat this - forget every thing and test all numbers again - until you are finished.
Wouldn't it be great if the computer could remember the primes it has already found?
There are many possibilities to do this I'll use a simple infinite list, if you are interested in other approaches you can search for the terms memoization or dynamic programming.
We start with the list comprehension you used in primesTill:
[1 | x <- [2..n], checkPrime x]
This calculates all primes between 2 and n, but immediately forgets the prime number and replaces it with 1, so the first step will be to keep the actual numbers.
[x | x <- [2..n], checkPrime x]
This gives us a list of all prime numbers between 2 and n. If we had a sufficiently large list of prime numbers we could use the index function !! to get the 10001st prime number. So we need to set n to a really really big number, to be sure that the filtered list is long enough?
Lazy evaluation to the rescue!
Lazy evaluation in haskell allows us to build an infinite list, that is only evaluated as much as needed. If we don't supply an upper bound to a list generator it will build such an infinite list for us.
[x | x <- [2..], checkPrime x]
Now we have a infinite list of all prime numbers.
We can bind it to the a name e.g. primes and use it to define nthPrime
primes = [x | x <- [2..], checkPrime x]
nthPrime n = primes !! n
Now you can compile it with ghc -O2, run it and the result will be promptly delivered to you.

Haskell recursive list comprehension causes C Stack Overflow

So I'm making a list of prime numbers to help me learn haskell using simple trial division (no fancy stuff until I get better with the language). I'm trying to use the following code:
primes = 2 : [ x | x <- [3..], all (\p -> (mod x p) /= 0) primes]
This is loaded without an error. However:
>take 2 primes
[2ERROR - C stack overflow
I tried the same thing with nested list comprehensions. It doesn't work. I would guess that I'm making too many recursive calls, but this shouldn't be the case if i'm only computing one prime. In my mind the lazy evaluation should make it so that take 2 primes does something along the lines of:
primes = 2 : [ 3 | all (\p -> (mod 3 p) /= 0) [2] ]
Which doesn't require all that much computation - mod 3 2 == True, so all (\p -> (mod 3 p) /= 0) == True, which means take 2 primes == [2, 3], right? I don't understand why this isn't working. Hopefully someone much more versed in the black magic of functional programming can help me...
This is on HUGS, if that makes any difference.
EDIT- I was able to come up with this solution (not pretty):
primes = 2 : [ x | x <- [3..], all (\p -> (mod x p) /= 0) (takeWhile (<= (ceiling (sqrt (fromIntegral x)))) primes)]
EDIT2- The program works fine when interpreted through HUGS or GHCi, but when I try to compile it with GHC, it outputs test: <<loop>>. Anybody know what the problem is?
Hugs shouldn't do this, but the code is broken anyway so it doesn't matter. Consider:
primes = 2 : [ x | x <- [3..], all (\p -> (mod x p) /= 0) primes]
How do you determine if 3 is prime? well, does mod 3 2 == 0? No. Does mod 3 ??? == 0? OOPS! What is the next element of primes after two? we don't know, we are trying to compute it. You need to add an ordering constraint that adds 3 (or any other x) once all p elem primes less than sqrt x have been tested.
The documentation for all says "For the result to be True, the list must be finite"
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:all
The previous answers explained why the original comprehension didn't work, but not how to write one that would work.
Here is a list comprehension that recursively, lazily (albeit not efficiently) computes all primes:
let primes = [x | x <- 2:[3,5..], x == 2 || not (contains (\p -> 0 == (mod x p)) (takeWhile (\b -> (b * b) < x) primes))]
Obviously we don't need to check mod x p for all primes, we only need to do it for primes less than the sqrt of the potential prime. That's what the takeWhile is for. Forgive the (\b -> (b * b) < x) this should be equivalent to (< sqrt x) but the Haskell type system didn't like that.
The x == 2 prevents the takeWhile from executing at all before we've added any elements to the list.

Explain this chunk of haskell code that outputs a stream of primes

I have trouble understanding this chunk of code:
let
sieve (p:xs) = p : sieve (filter (\ x -> x `mod` p /= 0) xs)
in sieve [2 .. ]
Can someone break it down for me? I understand there is recursion in it, but thats the problem I can't understand how the recursion in this example works.
Contrary to what others have stated here, this function does not implement the true sieve of Eratosthenes. It does returns an initial sequence of the prime numbers though, and in a similar manner, so it's okay to think of it as the sieve of Eratosthenes.
I was about done explaining the code when mipadi posted his answer; I could delete it, but since I spent some time on it, and because our answers are not completely identical, I'll leave it here.
Firs of all, note that there are some syntax errors in the code you posted. The correct code is,
let sieve (p:xs) = p : sieve (filter (\x -> x `mod` p /= 0) xs) in sieve [2..]
let x in y defines x and allows its definition to be used in y. The result of this expression is the result of y. So in this case we define a function sieve and return the result of applying [2..] to sieve.
Now let us have a closer look at the let part of this expression:
sieve (p:xs) = p : sieve (filter (\x -> x `mod` p /= 0) xs)
This defines a function sieve which takes a list as its first argument.
(p:xs) is a pattern which matches p with the head of said list and xs with the tail (everything but the head).
In general, p : xs is a list whose first element is p. xs is a list containing the remaining elements. Thus, sieve returns the first element of the list it receives.
Not look at the remainder of the list:
sieve (filter (\x -> x `mod` p /= 0) xs)
We can see that sieve is being called recursively. Thus, the filter expression will return a list.
filter takes a filter function and a list. It returns only those elements in the list for which the filter function returns true.
In this case xs is the list being filtered and
(\x -> x `mod` p /= 0)
is the filter function.
The filter function takes a single argument, x and returns true iff it is not a multiple of p.
Now that sieve is defined, we pass it [2 .. ], the list of all natural numbers starting at 2. Thus,
The number 2 will be returned. All other natural number which are a multiple of 2 will be discarded.
The second number is thus 3. It will be returned. All other multiples of 3 will be discarded.
Thus the next number will be 5. Etc.
It's actually pretty elegant.
First, we define a function sieve that takes a list of elements:
sieve (p:xs) =
In the body of sieve, we take the head of the list (because we're passing the infinite list [2..], and 2 is defined to be prime) and append it (lazily!) to the result of applying sieve to the rest of the list:
p : sieve (filter (\ x -> x 'mod' p /= 0) xs)
So let's look at the code that does the work on the rest of the list:
sieve (filter (\ x -> x 'mod' p /= 0) xs)
We're applying sieve to the filtered list. Let's break down what the filter part does:
filter (\ x -> x 'mod' p /= 0) xs
filter takes a function and a list on which we apply that function, and retains elements that meet the criteria given by the function. In this case, filter takes an anonymous function:
\ x -> x 'mod' p /= 0
This anonymous function takes one argument, x. It checks the modulus of x against p (the head of the list, every time sieve is called):
x 'mod' p
If the modulus is not equal to 0:
x 'mod' p /= 0
Then the element x is kept in the list. If it is equal to 0, it's filtered out. This makes sense: if x is divisible by p, than x is divisible by more than just 1 and itself, and thus it is not prime.
It defines a generator - a stream transformer called "sieve",
Sieve s =
while( True ):
p := s.head
s := s.tail
yield p -- produce this
s := Filter (nomultsof p) s -- go next
primes := Sieve (Nums 2)
which uses a curried form of an anonymous function equivalent to
nomultsof p x = (mod x p) /= 0
Both Sieve and Filter are data-constructing operations with internal state and by-value argument passing semantics.
Here we can see that the most glaring problem of this code is not, repeat not that it uses trial division to filter out the multiples from the working sequence, whereas it could find them out directly, by counting up in increments of p. If we were to replace the former with the latter, the resulting code would still have abysmal run-time complexity.
No, its most glaring problem is that it puts a Filter on top of its working sequence too soon, when it should really do that only after the prime's square is seen in the input. As a result it creates a quadratic number of Filters compared to what's really needed. The chain of Filters it creates is too long, and most of them aren't even needed at all.
The corrected version, with the filter creation postponed until the proper moment, is
Sieve ps s =
while( True ):
x := s.head
s := s.tail
yield x -- produce this
p := ps.head
q := p*p
while( (s.head) < q ):
yield (s.head) -- and these
s := s.tail
ps := ps.tail -- go next
s := Filter (nomultsof p) s
primes := Sieve primes (Nums 2)
or in Haskell,
primes = sieve primes [2..]
sieve ps (x:xs) = x : h ++ sieve pt [x | x <- t, rem x p /= 0]
where (p:pt) = ps
(h,t) = span (< p*p) xs
rem is used here instead of mod as it can be much faster in some interpreters, and the numbers are all positive here anyway.
Measuring the local orders of growth of an algorithm by taking its run times t1,t2 at problem-size points n1,n2, as logBase (n2/n1) (t2/t1), we get O(n^2) for the first one, and just above O(n^1.4) for the second (in n primes produced).
Just to clarify it, the missing parts could be defined in this (imaginary) language simply as
Nums x = -- numbers from x
while( True ):
yield x
x := x+1
Filter pred s = -- filter a stream by a predicate
while( True ):
if pred (s.head) then yield (s.head)
s := s.tail
see also.
update: Curiously, the first instance of this code in David Turner's 1976 SASL manual according to A.J.T. Davie's 1992 Haskell book,
primes = sieve [2..]
-- [Int] -> [Int]
sieve (p:nos) = p : sieve (remove (multsof p) nos)
actually admits two pairs of implementations for remove and multsof going together -- one pair for the trial division sieve as in this question, and the other for the ordered removal of each prime's multiples directly generated by counting, aka the genuine sieve of Eratosthenes (both would be non-postponed, of course). In Haskell,
-- Int -> (Int -> Bool) -- Int -> [Int]
multsof p n = (rem n p)==0 multsof p = [p*p, p*p+p..]
-- (Int -> Bool) -> ([Int] -> [Int]) -- [Int] -> ([Int] -> [Int])
remove m xs = filter (not.m) xs remove m xs = minus xs m
(If only he would've postponed picking the actual implementation here...)
As for the postponed code, in a pseudocode with "list patterns" it could've been
primes = [2, ...sieve primes [3..]]
sieve [p, ...ps] [...h, p*p, ...nos] =
[...h, ...sieve ps (remove (multsof p) nos)]
which in modern Haskell can be written with ViewPatterns as
{-# LANGUAGE ViewPatterns #-}
primes = 2 : sieve primes [3..]
sieve (p:ps) (span (< p*p) -> (h, _p2 : nos))
= h ++ sieve ps (remove (multsof p) nos)
It's implementing the Sieve of Eratosthenes
Basically, start with a prime (2), and filter out from the rest of the integers, all multiples of two. The next number in that filtered list must also be a prime, and therefore filter all of its multiples from the remaining, and so on.
It says "the sieve of some list is the first element of the list (which we'll call p) and the sieve of the rest of the list, filtered such that only elements not divisible by p are allowed through". It then gets things started by by returning the sieve of all integers from 2 to infinity (which is 2 followed by the sieve of all integers not divisible by 2, etc.).
I recommend The Little Schemer before you attack Haskell.

Resources