Can this prime sieve code be further simplified in Haskell? - haskell

The code works well
primes = next [2 ..]
where
next (p : ps) = p : next ts
where
ts = filter (\x -> mod x p /= 0) ps
Just GHCI think there is a incomplete patter in next.
Well, this is correct from a grammatical point of view.
But obviously the input of 'next' cannot be empty.
So is there a solution other than adding the declaration
({-# OPTIONS_GHC -Wno-incomplete-patterns #-})?

The exhaustiveness checker knows that next has type Num a => [a] -> [a]. The empty list is a valid argument to next, even if you never actually call next on the empty list.
The key here is that you don't really want Num a => [a] as your argument type. You know it will only be called on an infinite list, so use a type that doesn't have finite lists as values.
data Stream a = Cons a (Stream a)
sequence :: Num a => a -> Stream a
sequence x = Cons x (sequence (x + 1))
filterStream :: (a -> Bool) -> Stream a -> Stream a
filterStream p (Cons x xs) | p x = Cons x (filterStream p xs)
| otherwise = filterStream p xs
-- Since you'll probably want a list of values, not just a stream of them, at some point.
toList :: Stream a -> [a]
toList (Cons x xs) = x : toList xs
primes :: Stream Integer
primes = next (sequence 2)
where
next (Cons x xs) = Cons x xs'
where xs' = filterStream (\x -> mod x p /= 0) xs
The Stream library provides a module Data.Stream that defines the Stream type and numerous analogs to list functions.
import qualified Data.Stream as S
-- S.toList exists as well.
primes :: Stream Integer
primes = next (S.fromList [2..])
where next (Cons x xs) = Cons x (S.filter (\x -> mod x p /= 0) xs)

You've already got a proper answer to your question. For completeness, the other option is just to add the unneeded clause that we know will never be called:
primes = next [2 ..]
where
next (p : xs) =
p : next [x | x <- xs, mod x p > 0]
next _ = undefined
Another, more "old-style" solution, is to analyze the argument by explicit calls to head and tail (very much not recommended, in general):
primes = next [2 ..]
where
next xs = let { p = head xs } in
p : next [x | x <- tail xs, mod x p > 0]
This could perhaps count as a simplification.
On an unrelated note, you write that it "works well". Unfortunately, while indeed producing the correct results, it does so very slowly. Because of always taking only one element at a time off the input list, its time complexity is quadratic in the number n of primes produced. In other words, primes !! n takes time quadratic in n. Empirically,
> primes !! 1000
7927 -- (0.27 secs, 102987392 bytes)
> primes !! 2000
17393 -- (1.00 secs, 413106616 bytes)
> primes !! 3000
27457 -- (2.23 secs, 952005872 bytes)
> logBase (2/1) (1.00 / 0.27)
1.8889686876112561 -- n^1.9
> logBase (3/2) (2.23 / 1.00)
1.9779792870810489 -- n^2.0
In fact the whole bunch of the elements may be taken from the input at once, up to the square of the current prime, with the code thus taking only about ~ n1.5 time, give or take a log factor:
{-# LANGUAGE ViewPatterns #-}
primes_ = 2 : next primes_ [3 ..]
where
next (p : ps) (span (< p*p) -> (h, xs)) =
h ++ next ps [x | x <- xs, mod x p > 0]
next _ _ = undefined
Empirically, again, we get
> primes !! 3000
27457 -- (0.08 secs, 29980864 bytes)
> primes !! 30000
350381 -- (1.81 secs, 668764336 bytes)
> primes !! 60000
746777 -- (4.74 secs, 1785785848 bytes)
> primes !! 100000
1299721 -- (9.87 secs, 3633306112 bytes)
> logBase (6/3) (4.74 / 1.81)
1.388897361815054 -- n^1.4
> logBase (10/6) (9.87 / 4.74)
1.4358377567888103 -- n^1.45
As we can see here, the complexity advantage expresses itself as an enormous speedup in absolute terms as well.
So then this sieve is equivalent to the optimal trial division, unlike the one in the question. Of course when it was first proposed in 1976, Haskell had no view patterns yet, and in fact there was yet no Haskell itself.

Related

Hanging self referencing list in Haskell to compute primes list

I recently started learning Haskell. To train a bit I wanted to try generating the list of prime numbers via self reference using the following code:
main = do
print (smaller_than_sqrt 4 2)
print (smaller_than_sqrt_list 5 [2..])
print ("5")
print (is_prime 5 [2..])
print ("7")
print (is_prime 7 [2..])
print ("9")
print (is_prime 9 [2..])
print ("test")
print (take 5 primes) -- Hangs
-- Integer square root
isqrt :: Int -> Int
isqrt = ceiling . sqrt . fromIntegral
-- Checks if x is smaller than sqrt(p)
smaller_than_sqrt :: Int -> Int -> Bool
smaller_than_sqrt p x = x <= isqrt p
-- Checks if x doesn't divide p
not_divides :: Int -> Int -> Bool
not_divides p x = p `mod` x /= 0
-- Takes in a number and an ordered list of numbers and only keeps the one smaller than sqrt(p)
smaller_than_sqrt_list :: Int -> [Int] -> [Int]
smaller_than_sqrt_list p xs = takeWhile (smaller_than_sqrt p) xs
-- Checks if p is prime by looking at the provided list of numbers and checking that none divides p
is_prime :: Int -> [Int] -> Bool
is_prime p xs = all (not_divides p) (smaller_than_sqrt_list p xs)
-- Works fine: primes = 2 : [ p | p <- [3..], is_prime p [2..]]
-- Doesn't work:
primes = 2 : 3 : [ p | p <- [5..], is_prime p primes]
But for some reason referencing primes inside of primes hangs when running runhaskell and is detected as a loop error when running the compiled binary with ghc.
However I don't really understand why.
Clearly, the first two elements of primes are 2 and 3. What comes after that? The next element of primes is the first element of
[p | p <- [5..], is_prime p primes]
What's that? It could be 5, if is_prime 5 primes, or it could be some larger number. To find out which, we need to evaluate
smaller_than_sqrt_list 5 primes
Which requires
takeWhile (<= isqrt 5) primes
Which requires
takeWhile (<= 3) primes
Well, that's easy enough, it starts with 2:3:..., right? Okay, but what's the next element? We need to look at the third element of primes and see whether it's less or equal to 3. But the third element of primes is what we were trying to calculate to begin with!
The problem is that smaller_than_sqrt 5 3 is still True. To compute whether 5 is a prime, the is_prime 5 primes expands to all (not_divides 5) (takeWhile (smaller_than_sqrt 5) primes), and takeWhile will attempt to iterate primes until the predicate no longer holds. It does hold for the first element (2), it still does hold for the second element (3), will it hold for the next element - wait what's the next element? We're still computing which one that is!
It should be sufficient to use floor instead of ceiling in isqrt, or simpler just
smaller_than_sqrt p x = x * x <= p

Build sorted infinite list of infinite lists

I know it is impossible to sort infinite lists, but I am trying to write a definition of the infinite increasing list of multiples of n numbers.
I already have the function
multiples :: Integer -> [Integer]
multiples n = map (*n) [1..]
that returns the infinite list of multiples of n. But now I want to build a function that given a list of Integers returns the increasing infinite list of the multiples of all the numbers in the list. So the function multiplesList :: [Integer] -> [Integer] given the input [3,5] should yield [3,5,6,9,10,12,15,18,20,....].
I'm new at Haskell, and I'm struggling with this. I think I should use foldr or map since I have to apply multiples to all the numbers in the input, but I don't know how. I can't achieve to mix all the lists into one.
I would really appreciate it if someone could help me.
Thank you!
You are in the right path. following the comments here is a template you can complete.
multiples :: Integer -> [Integer]
multiples n = map (*n) [1..]
-- This is plain old gold recursion.
mergeSortedList :: [Integer] -> [Integer] -> [Integer]
mergeSortedList [] xs = undefined
mergeSortedList xs [] = undefined
mergeSortedList (x:xs) (y:ys)
| x < y = x:mergeSortedList xs (y:ys) -- Just a hint ;)
| x == y = undefined
| x > y = undefined
multiplesList :: [Integer] -> [Integer]
multiplesList ms = undefined -- Hint: foldX mergeSortedList initial xs
-- Which are initial and xs?
-- should you foldr or foldl?
We can easily weave two infinite lists together positionally, taking one element from each list at each step,
weave (x:xs) ys = x : weave ys xs
or we could take longer prefixes each time,
-- warning: expository code only
weaveN n xs ys = take n xs ++ weaveN n ys (drop n xs)
but assuming both lists are not only infinite but also strictly increasing (i.e. there are no duplicates in the lists), we can guide the taking of prefixes by the head value of the opposite list:
umerge :: Ord a => [a] -> [a] -> [a]
-- warning: only works with infinite lists
umerge xs (y:ys) = a ++ [y | head b > y ] ++ umerge ys b
where
(a,b) = span (< y) xs
This is thus a possible encoding of the unique merge operation ("unique" meaning, there won't be any duplicates in its output).
Testing, it seems to work as intended:
> take 20 $ umerge [3,6..] [5,10..]
[3,5,6,9,10,12,15,18,20,21,24,25,27,30,33,35,36,39,40,42]
> [3,6..42] ++ [5,10..42] & sort & nub
[3,5,6,9,10,12,15,18,20,21,24,25,27,30,33,35,36,39,40,42]
> [ p | let { ms :: [Integer] ; ms = takeWhile (< 25^2) $
foldl1 umerge [[p*p,p*p+p..] | p <- [2..25]] },
p <- [2..545], not $ elem p ms ]
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,...........,499,503,509,521,523,541]
> length it
100
And with an ingenious little tweak (due to Richard Bird as seen in the JFP article by Melissa O'Neill) it can even be used to fold an infinite list of ascending lists, provided that it is sorted in ascending order of their head elements, so the head of the first argument is guaranteed to be the first in the output and can thus be produced without testing:
umerge1 :: Ord a => [a] -> [a] -> [a]
-- warning: only works with infinite lists
-- assumes x < y
umerge1 (x:xs) ~(y:ys) = x : a ++ [y | head b > y ] ++ umerge ys b
where
(a,b) = span (< y) xs
Now
> take 100 [ p | let { ms :: [Integer] ;
ms = foldr1 umerge1 [[p*p,p*p+p..] | p <- [2..]] },
p <- [2..], not $ elem p $ takeWhile (<= p) ms ]
[2,3,5,7,11,13, ...... 523,541]
the same calculation works indefinitely.
to the literalists in the audience: yes, calling elem here is Very Bad Thing. The OP hopefully should have recognized this on their own, (*) but unfortunately I felt compelled to make this statement, thus inadvertently revealing this to them, depriving them of their would-be well-earned a-ha moment, unfortunately.
Also, umerge1's definition can be radically simplified. Again, this is left to the OP to discover on their own. (which would, again, be much better for them if I wasn't compelled to make this remark revealing it to them --- finding something on your own is that much more powerful and fulfilling)
(*) and search for ways to replace it with something more efficient, on their own. No, this code is not presented as The Best Solution to Their Problem.

What optimizations can be made to this Haskell code?

--Returns last N elements in list
lastN :: Int -> [a] -> [a]
lastN n xs = let m = length xs in drop (m-n) xs
--create contiguous array starting from index b within list a
produceContiguous :: [a] -> Int -> [[a]]
produceContiguous [] _ = [[]]
produceContiguous arr ix = scanl (\acc x -> acc ++ [x]) [arr !! ix] inset
where inset = lastN (length arr - (ix + 1)) arr
--Find maximum sum of all possible contiguous sub arrays, modulo [n]
--d is dummy data
let d = [1,2,3,10,6,3,1,47,10]
let maxResult = maximum $ map (\s -> maximum s) $ map (\c -> map (\ac -> (sum ac )`mod` (last n)) c ) $ map (\n -> produceContiguous d n ) [0..(length d) -1]
I'm a Haskell newb - just a few days into it .. If I'm doing something obviously wrong, whoopsies
You can improve the runtime a lot by observing that map sum (produceContiguous d n) (which has runtime Ω(m^2), m the length of drop n d -- possibly O(m^3) time because you're appending to the end of acc on each iteration) can be collapsed to scanl (+) 0 (drop n d) (which has runtime O(m)). There are plenty of other stylistic changes I would make as well, but that's the main algorithmic one I can think of.
Cleaning up all the stylistic stuff, I would probably write:
import Control.Monad
import Data.List
addMod n x y = (x+y) `mod` n
maxResult n = maximum . (scanl (addMod n) 0 <=< tails)
In ghci:
*Main> jaggedGoofyMax 100 [1..1000]
99
(12.85 secs, 24,038,973,096 bytes)
*Main> dmwitMax 100 [1..1000]
99
(0.42 secs, 291,977,440 bytes)
Not shown here is the version of jaggedGoofyMax that has only the optimization I mentioned in my first paragraph applied, which has slightly better runtime/memory usage stats to dmwitMax when run in ghci (but basically identical to dmwitMax when both are compiled with -O2). So you can see that for even modest input sizes this optimization can make a big difference.

Project Euler 50: Algorithm is incredibly slow, failing to understand why

I'm using Project Euler to learn Haskell. I'm new at Haskell and am having a lot of trouble coming up with an algorithm that doesn't take an absurd amount of time. I'm estimating that the program here would take 14 gigayears to arrive at the solution.
The problem:
Which prime, below one-million, can be written as the sum of the most
consecutive primes?
Here's my source. I've left out isPrime. I've posted it because it's far too inefficient to solve the problem. I think the issue lies with the slicedChains and primeChains calls, but I'm not sure what it is. I've resolved this before with C++. But for whatever reason, the efficient solution seems beyond me in Haskell.
Edit: I've included isPrime.
import System.Environment (getArgs)
import Data.List (nub,maximumBy)
import Data.Ord (comparing)
isPrime :: Integer -> Bool
isPrime 1 = False
isPrime 2 = True
isPrime x
| any (== 0) (fmap (x `mod`) [2..x-1]) = False
| otherwise = True
primeChain :: Integer -> [Integer]
primeChain x = [ n | n <- 1 : 2 : [3,5..x-1], isPrime n ]
slice :: [a] -> [Int] -> [a]
slice xs args = take (to - from + 1) (drop from xs)
where from = head args
to = last args
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
slicedChains :: Int -> [Integer] -> [[Integer]]
slicedChains len xs = nub [x | x <- fmap (xs `slice`) subseqs, length x > 1]
where subseqs = [x | x <- (subsequencesOfSize 2 [1..len]), (last x) > (head x)]
primeSums :: Integer -> [[Integer]]
primeSums x = filter (\ns -> sum ns == x) chain
where xs = primeChain x
len = length xs
chain = slicedChains len xs
compLength :: [[a]] -> [a]
compLength xs = maximumBy (comparing length) xs
cleanSums :: [Integer] -> [[Integer]]
cleanSums xs = fmap (compLength) filtered
where filtered = filter (not . null) (fmap primeSums xs)
main :: IO()
main = do
args <- getArgs
let arg = read (head args) :: Integer
let xs = primeChain arg
print $ maximumBy (comparing length) $ cleanSums xs
Your basic problem is that you are not pruning your search space based on the best solution you have found so far.
I can tell this just from the fact that you are using maximumBy to find the longest sequence.
For instance, if during your search your find a consecutive sequence of 4 primes whose sum is a prime < 10^6, you don't have to examine any sequence which begins with a prime greater than 250000.
To do this kind of pruning you have to keep track of the solution found so far and interleave the testing of candidate sequences with their generation so that the best solution found so far can stop the search early.
Update
There are several inefficiencies in slicedChains. Haskell lists are implemented a linked lists. This video is pretty good overview of linked lists and how they differ from arrays: (link)
The following expressions in your code are going to be problematic w.r.t. efficiency:
* nub has quadratic running time
* length x > 1 - the complexity of length is O(n) where n is the length of the list. A better way to write this is:
lengthGreaterThan1 :: [a] -> Bool
lengthGreaterThan1 (_:_:_) = True
lengthGreaterThan1 _ = False
* subsequencesOfSize 2 [1..len] may be more succinctly written:
[ [a,b] | a <- [1..len], b <- [a+1..len] ]
and this will also ensure that a < b.
* The take and drop calls in slice are also O(n)
* In primeSums the call to primeChain will regenerate essentially the same list over and over again resulting in a lot of multiple calls to isPrime. A better approach is to define primeChain like this:
allPrimes = filter isPrime [1..]
primeChain x = takeWhile (<= x) allPrimes
The list allPrimes will be generated once, and primeChain simply takes prefixes of that list.
* primeSums x is charged with finding sequences whose sum is exactly x, but it looks at a lot of sequences that can't possibly work. For instance, primeSums 31 will examine:
11 + 13 + 17, 11 + 13 + 17 + 23, 11 + 13 + 17 + 23 + 29,
17 + 19, 17 + 19 + 23, 17 + 19 + 23 + 29,
19 + 23, 19 + 23 + 29
23 + 29
even though it's pretty obvious that none of these sums could equal 31.
So the first thing you need is a good data structure: Once you find a sequence of length n you don't care about sequences of shorter length, so your primary needs are: (1) tracking the sum, (2) tracking the primes in the set, (3) removing the least element, (4) adding a new greatest element. The key is amortization, where a big cost is paid infrequently enough that you can pretend it is a small cost per procedure. The data structure looks like this:
data Queue x = Q [x] [x]
q_empty (Q [] []) = True
q_empty _ = False
q_headtails (Q (x:xs) rest) = (x, Q xs rest)
q_headtails (Q [] xs) = case reverse xs of y:ys -> (y, Q ys [])
[] -> error "End of queue."
q_append el (Q beg end) = Q beg (el:end)
So deconstructing the list is possible, but sometimes triggers an O(n) operation, but that's OK because when it does, we won't have to do it for another n steps, so it averages out to one operation per step. (You might also want to do it with a spine-strict list.)
To save on length operations and summing the items of the list you probably want to cache those, too:
type Length = Int
type Sum = Int
type Prime = Int
data PrimeSeq = PS Length Sum (Queue Prime)
headTails (PS len sum q) = (x, PS (len - 1) (sum - x) xs)
where (x, xs) = q_headtails q
append x (PS len sum xs) = PS (len + 1) (sum + x) (q_append x xs)
The algorithm for these looks like:
Cache a copy of the PrimeSeq you're starting with
Keep adding primes to it and testing primality until you get to 10^6.
If you find a new prime with a longer sequence, replace the cache.
Whenever you run into 10^6, revert to the cache, pull a prime off the front of the queue, then repeat as needed.
Your prime generation is quadratic (isPrime 101 tests rem 101 100 == 0 even though 10 is the biggest number by which 101 needs to be tested -- and actually 7 is enough).
Yet even with it, a simple enough list-based code finds the answer in under 2 seconds (on an Intel Core i7 2.5 GHz, interpreted in GHCi). And with the code corrected to take advantage of the above mentioned optimization (and additionally, testing by primes only), it takes 0.1s.
Also, f x | t = False | otherwise = True is the same as f x = not t.
We are asked by the PE site not to give you even a hint.
But in general, the key to efficiency in Haskell, thanks to its laziness, is being generative with as small a duplication of effort as possible. As one example, instead of calculating each slice of a list in isolation starting anew, we can produce the bunch of them together as part of one process,
slices :: Int -> [a] -> [[a]]
slices n = map (take n) . iterate tail -- sequence of list's slices of length n each
Another principle is, try to solve a more general problem, of which yours is an instance.
Having written such a function, we can play with it by trying out different values for its parameters, from smaller to the bigger ones, for an exploratory style of problem solving. We're told about 21 consecutive primes. What about 22 of them? 27? 1127 of them? ... and I've said enough about this already.
If it starts taking too much time, we can assess the full solution's needed run time by empirical orders of growth analysis.
Though the solution is found quickly enough with your unoptimized isPrime code, the exploratory process can be prohibitively slow with it, but it is fast enough with the optimized code:
primes :: [Int]
primes = 2 : filter isPrime [3,5..]
isPrime n = and [rem n p > 0 | p <- takeWhile ((<= n).(^2)) primes]

Setting upper limit to the input set according to the output function

I'm currently stuck on setting upper limits in list comprehensions.
What I'm trying to do is to find all Fibonacci numbers below one million.
For this I had designed a rather simple recursive Fibonacci function
fib :: Int -> Integer
fib n
n == 0 = 0
n == 1 = 1
otherwise = fib (n-1) + fib (n-2)
The thing where I'm stuck on is defining the one million part. What I've got now is:
[ fib x | x <- [0..35], fib x < 1000000 ]
This because I know that the 35th number in the Fibonacci sequence is a high enough number.
However, what I'd like to have is to find that limit via a function and set it that way.
[ fib x | x <- [0..], fib x < 1000000 ]
This does give me the numbers, but it simply doesn't stop. It results in Haskell trying to find Fibonacci numbers below one million further in the sequence, which is rather fruitless.
Could anyone help me out with this? It'd be much appreciated!
The check fib x < 1000000 in the list comprehension filters away the fib x values that are less than 1000000; but the list comprehension has no way of knowing that greater values of x imply greater value of fib x and hence must continue until all x have been checked.
Use takeWhile instead:
takeWhile (< 1000000) [ fib x | x <- [0..35]]
A list comprehension is guaranteed to look at every element of the list. You want takeWhile :: (a -> Bool) -> [a] -> [a]. With it, your list is simply takeWhile (< 1000000) $ map fib [1..]. The takeWhile function simply returns the leading portion of the list which satisfies the given predicate; there's also a similar dropWhile function which drops the leading portion of the list which satisfies the given predicate, as well as span :: (a -> Bool) -> [a] -> ([a], [a]), which is just (takeWhile p xs, dropWhile p xs), and the similar break, which breaks the list in two when the predicate is true (and is equivalent to span (not . p). Thus, for instance:
takeWhile (< 3) [1,2,3,4,5,4,3,2,1] == [1,2]
dropWhile (< 3) [1,2,3,4,5,4,3,2,1] == [3,4,5,4,3,2,1]
span (< 3) [1,2,3,4,5,4,3,2,1] == ([1,2],[3,4,5,4,3,2,1])
break (> 3) [1,2,3,4,5,4,3,2,1] == ([1,2,3],[4,5,4,3,2,1])
It should be mentioned that for such a task the "canonical" (and faster) way is to define the numbers as an infinite stream, e.g.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
takeWhile (<100) fibs
--[0,1,1,2,3,5,8,13,21,34,55,89]
The recursive definition may look scary (or even "magic") at first, but if you "think lazy", it will make sense.
A "loopy" (and in a sense more "imperative") way to define such an infinite list is:
fibs = map fst $ iterate (\(a,b) -> (b,a+b)) (0,1)
[Edit]
For an efficient direct calculation (without infinite list) you can use matrix multiplication:
fib n = second $ (0,1,1,1) ** n where
p ** 0 = (1,0,0,1)
p ** 1 = p
p ** n | even n = (p `x` p) ** (n `div` 2)
| otherwise = p `x` (p ** (n-1))
(a,b,c,d) `x` (q,r,s,t) = (a*q+b*s, a*r+b*t,c*q+d*s,c*r+d*t)
second (_,f,_,_) = f
(That was really fun to write, but I'm always grateful for suggestions)
The simplest thing I can think of is:
[ fib x | x <- [1..1000000] ]
Since fib n > n for all n > 3.

Resources