How to evaluate a lazy list once?

How to evaluate a lazy list once? - haskell

I referred to this post to compute a function nthPrimes that takes
a list of n and returns a list of nth prime:
import qualified Data.Set as PQ
main :: IO ()
main = print $ nthPrimes ns
where
ns = [1,3,10]
nthPrimes :: [Int] -> [Integer]
nthPrimes = map (primes !!)
primes :: [Integer]
primes = 2:sieve [3,5..]
where
sieve (x:xs) = x : sieve' xs (insertprime x xs PQ.empty)
sieve' (x:xs) table
| nextComposite == x = sieve' xs (adjust x table)
| otherwise = x : sieve' xs (insertprime x xs table)
where
(nextComposite,_) = PQ.findMin table
adjust x table
| n == x = adjust x (PQ.insert (n', ns) newPQ)
| otherwise = table
where
Just ((n, n':ns), newPQ) = PQ.minView table
insertprime p xs = PQ.insert (p*p, map (*p) xs)
So, this will print [3,7,31].
However, since the primes function is lazy, it will be evaluated again and again in each call to get an nth-prime, but actually, we could just evaluate all the primes once if we know the n has a Max limit, (for instance 1000), because primes never changes, and computing primes is CPU and Memory heavy.
So the question is how to force the evaluation of primes for the first X elements so that all the pre-evaluated elements can be reused in functions that take from it?
I believe reusing the pre-evaluated elements will reduce the overall Memory usage, especially when ns is a long list, correct?

As #chepner notes in the comments, primes is a list, not a function. The way Haskell works, the elements of the primes list are computed on an as-needed basis but once computed they are kept in memory and not recomputed over and over.
You can see this yourself by loading your module into GHCi:
> :l MyPrimes
> :set +s
> nthPrimes [100000]
[1299721]
(6.45 secs, 3,246,628,344 bytes)
> nthPrimes [100001]
[1299743]
(0.01 secs, 188,184 bytes)
>
Calculating the 100,000th prime takes about 6 seconds. Once that's done, calculating the 100,001st prime takes only 0.01 seconds.

Related

Can this prime sieve code be further simplified in Haskell?

The code works well
primes = next [2 ..]
where
next (p : ps) = p : next ts
where
ts = filter (\x -> mod x p /= 0) ps
Just GHCI think there is a incomplete patter in next.
Well, this is correct from a grammatical point of view.
But obviously the input of 'next' cannot be empty.
So is there a solution other than adding the declaration
({-# OPTIONS_GHC -Wno-incomplete-patterns #-})?

The exhaustiveness checker knows that next has type Num a => [a] -> [a]. The empty list is a valid argument to next, even if you never actually call next on the empty list.
The key here is that you don't really want Num a => [a] as your argument type. You know it will only be called on an infinite list, so use a type that doesn't have finite lists as values.
data Stream a = Cons a (Stream a)
sequence :: Num a => a -> Stream a
sequence x = Cons x (sequence (x + 1))
filterStream :: (a -> Bool) -> Stream a -> Stream a
filterStream p (Cons x xs) | p x = Cons x (filterStream p xs)
| otherwise = filterStream p xs
-- Since you'll probably want a list of values, not just a stream of them, at some point.
toList :: Stream a -> [a]
toList (Cons x xs) = x : toList xs
primes :: Stream Integer
primes = next (sequence 2)
where
next (Cons x xs) = Cons x xs'
where xs' = filterStream (\x -> mod x p /= 0) xs
The Stream library provides a module Data.Stream that defines the Stream type and numerous analogs to list functions.
import qualified Data.Stream as S
-- S.toList exists as well.
primes :: Stream Integer
primes = next (S.fromList [2..])
where next (Cons x xs) = Cons x (S.filter (\x -> mod x p /= 0) xs)

You've already got a proper answer to your question. For completeness, the other option is just to add the unneeded clause that we know will never be called:
primes = next [2 ..]
where
next (p : xs) =
p : next [x | x <- xs, mod x p > 0]
next _ = undefined
Another, more "old-style" solution, is to analyze the argument by explicit calls to head and tail (very much not recommended, in general):
primes = next [2 ..]
where
next xs = let { p = head xs } in
p : next [x | x <- tail xs, mod x p > 0]
This could perhaps count as a simplification.
On an unrelated note, you write that it "works well". Unfortunately, while indeed producing the correct results, it does so very slowly. Because of always taking only one element at a time off the input list, its time complexity is quadratic in the number n of primes produced. In other words, primes !! n takes time quadratic in n. Empirically,
> primes !! 1000
7927 -- (0.27 secs, 102987392 bytes)
> primes !! 2000
17393 -- (1.00 secs, 413106616 bytes)
> primes !! 3000
27457 -- (2.23 secs, 952005872 bytes)
> logBase (2/1) (1.00 / 0.27)
1.8889686876112561 -- n^1.9
> logBase (3/2) (2.23 / 1.00)
1.9779792870810489 -- n^2.0
In fact the whole bunch of the elements may be taken from the input at once, up to the square of the current prime, with the code thus taking only about ~ n1.5 time, give or take a log factor:
{-# LANGUAGE ViewPatterns #-}
primes_ = 2 : next primes_ [3 ..]
where
next (p : ps) (span (< p*p) -> (h, xs)) =
h ++ next ps [x | x <- xs, mod x p > 0]
next _ _ = undefined
Empirically, again, we get
> primes !! 3000
27457 -- (0.08 secs, 29980864 bytes)
> primes !! 30000
350381 -- (1.81 secs, 668764336 bytes)
> primes !! 60000
746777 -- (4.74 secs, 1785785848 bytes)
> primes !! 100000
1299721 -- (9.87 secs, 3633306112 bytes)
> logBase (6/3) (4.74 / 1.81)
1.388897361815054 -- n^1.4
> logBase (10/6) (9.87 / 4.74)
1.4358377567888103 -- n^1.45
As we can see here, the complexity advantage expresses itself as an enormous speedup in absolute terms as well.
So then this sieve is equivalent to the optimal trial division, unlike the one in the question. Of course when it was first proposed in 1976, Haskell had no view patterns yet, and in fact there was yet no Haskell itself.

Why does my function not work with an infinite list?

I'm trying to learn haskell and implemented a function conseq that would return a list of consecutive elements of size n.
conseq :: Int -> [Int] -> [[Int]]
conseq n x
| n == length(x) = [x]
| n > length(x) = [x]
| otherwise = [take n x] ++ (conseq n (drop 1 x))
This works correctly.
> take 5 $ conseq 2 [1..10]
[[1,2],[2,3],[3,4],[4,5],[5,6]]
However, if I pass [1..] instead of [1..10], the program gets stuck in an infinite loop.
As I understood it, haskell has lazy evaluation so I should still be able to get the same result right? Is it length? Shouldn't the first two conditions evaluate to false as soon as the length becomes greater than n?
What did I misunderstand?

One of the main reasons why using length is not a good idea is because when it has to be evaluated on an infinite list, it will get stuck in an infinite loop.
The good news is however, we don't need length. It would also make the time complexity worse. We can work with two enumerators, one is n-1 places ahead of the other. If this enumerator reaches the end of the list, then we know that the first enumerator still has n-1 elements, and thus we can stop yielding values:
conseq :: Int -> [a] -> [[a]]
conseq n ys = go (drop (n-1) ys) ys
where go [] _ = []
go (_:as) ba#(~(_:bs)) = take n ba : go as bs
This gives us thus:
Prelude> conseq 3 [1 ..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12],[11,12,13],[12,13,14],[13,14,15],[14,15,16],[15,16,17],[16,17,18],[17,18,19],[18,19,20],[19,20,21],[20,21,22],[21,22,23],[22,23,24],[23,24,25],[24,25,26],[25,26,27],…
Prelude> conseq 3 [1 .. 4]
[[1,2,3],[2,3,4]]

The first thing your function does is calculate length(x), so it knows whether it should return [x], [x], or [take n x] ++ (conseq n (drop 1 x))
length counts the number of elements in the list - all the elements. If you ask for the length of an infinite list, it never finishes counting.

Project Euler 50: Algorithm is incredibly slow, failing to understand why

I'm using Project Euler to learn Haskell. I'm new at Haskell and am having a lot of trouble coming up with an algorithm that doesn't take an absurd amount of time. I'm estimating that the program here would take 14 gigayears to arrive at the solution.
The problem:
Which prime, below one-million, can be written as the sum of the most
consecutive primes?
Here's my source. I've left out isPrime. I've posted it because it's far too inefficient to solve the problem. I think the issue lies with the slicedChains and primeChains calls, but I'm not sure what it is. I've resolved this before with C++. But for whatever reason, the efficient solution seems beyond me in Haskell.
Edit: I've included isPrime.
import System.Environment (getArgs)
import Data.List (nub,maximumBy)
import Data.Ord (comparing)
isPrime :: Integer -> Bool
isPrime 1 = False
isPrime 2 = True
isPrime x
| any (== 0) (fmap (x `mod`) [2..x-1]) = False
| otherwise = True
primeChain :: Integer -> [Integer]
primeChain x = [ n | n <- 1 : 2 : [3,5..x-1], isPrime n ]
slice :: [a] -> [Int] -> [a]
slice xs args = take (to - from + 1) (drop from xs)
where from = head args
to = last args
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
slicedChains :: Int -> [Integer] -> [[Integer]]
slicedChains len xs = nub [x | x <- fmap (xs `slice`) subseqs, length x > 1]
where subseqs = [x | x <- (subsequencesOfSize 2 [1..len]), (last x) > (head x)]
primeSums :: Integer -> [[Integer]]
primeSums x = filter (\ns -> sum ns == x) chain
where xs = primeChain x
len = length xs
chain = slicedChains len xs
compLength :: [[a]] -> [a]
compLength xs = maximumBy (comparing length) xs
cleanSums :: [Integer] -> [[Integer]]
cleanSums xs = fmap (compLength) filtered
where filtered = filter (not . null) (fmap primeSums xs)
main :: IO()
main = do
args <- getArgs
let arg = read (head args) :: Integer
let xs = primeChain arg
print $ maximumBy (comparing length) $ cleanSums xs

Your basic problem is that you are not pruning your search space based on the best solution you have found so far.
I can tell this just from the fact that you are using maximumBy to find the longest sequence.
For instance, if during your search your find a consecutive sequence of 4 primes whose sum is a prime < 10^6, you don't have to examine any sequence which begins with a prime greater than 250000.
To do this kind of pruning you have to keep track of the solution found so far and interleave the testing of candidate sequences with their generation so that the best solution found so far can stop the search early.
Update
There are several inefficiencies in slicedChains. Haskell lists are implemented a linked lists. This video is pretty good overview of linked lists and how they differ from arrays: (link)
The following expressions in your code are going to be problematic w.r.t. efficiency:
* nub has quadratic running time
* length x > 1 - the complexity of length is O(n) where n is the length of the list. A better way to write this is:
lengthGreaterThan1 :: [a] -> Bool
lengthGreaterThan1 (_:_:_) = True
lengthGreaterThan1 _ = False
* subsequencesOfSize 2 [1..len] may be more succinctly written:
[ [a,b] | a <- [1..len], b <- [a+1..len] ]
and this will also ensure that a < b.
* The take and drop calls in slice are also O(n)
* In primeSums the call to primeChain will regenerate essentially the same list over and over again resulting in a lot of multiple calls to isPrime. A better approach is to define primeChain like this:
allPrimes = filter isPrime [1..]
primeChain x = takeWhile (<= x) allPrimes
The list allPrimes will be generated once, and primeChain simply takes prefixes of that list.
* primeSums x is charged with finding sequences whose sum is exactly x, but it looks at a lot of sequences that can't possibly work. For instance, primeSums 31 will examine:
11 + 13 + 17, 11 + 13 + 17 + 23, 11 + 13 + 17 + 23 + 29,
17 + 19, 17 + 19 + 23, 17 + 19 + 23 + 29,
19 + 23, 19 + 23 + 29
23 + 29
even though it's pretty obvious that none of these sums could equal 31.

So the first thing you need is a good data structure: Once you find a sequence of length n you don't care about sequences of shorter length, so your primary needs are: (1) tracking the sum, (2) tracking the primes in the set, (3) removing the least element, (4) adding a new greatest element. The key is amortization, where a big cost is paid infrequently enough that you can pretend it is a small cost per procedure. The data structure looks like this:
data Queue x = Q [x] [x]
q_empty (Q [] []) = True
q_empty _ = False
q_headtails (Q (x:xs) rest) = (x, Q xs rest)
q_headtails (Q [] xs) = case reverse xs of y:ys -> (y, Q ys [])
[] -> error "End of queue."
q_append el (Q beg end) = Q beg (el:end)
So deconstructing the list is possible, but sometimes triggers an O(n) operation, but that's OK because when it does, we won't have to do it for another n steps, so it averages out to one operation per step. (You might also want to do it with a spine-strict list.)
To save on length operations and summing the items of the list you probably want to cache those, too:
type Length = Int
type Sum = Int
type Prime = Int
data PrimeSeq = PS Length Sum (Queue Prime)
headTails (PS len sum q) = (x, PS (len - 1) (sum - x) xs)
where (x, xs) = q_headtails q
append x (PS len sum xs) = PS (len + 1) (sum + x) (q_append x xs)
The algorithm for these looks like:
Cache a copy of the PrimeSeq you're starting with
Keep adding primes to it and testing primality until you get to 10^6.
If you find a new prime with a longer sequence, replace the cache.
Whenever you run into 10^6, revert to the cache, pull a prime off the front of the queue, then repeat as needed.

Your prime generation is quadratic (isPrime 101 tests rem 101 100 == 0 even though 10 is the biggest number by which 101 needs to be tested -- and actually 7 is enough).
Yet even with it, a simple enough list-based code finds the answer in under 2 seconds (on an Intel Core i7 2.5 GHz, interpreted in GHCi). And with the code corrected to take advantage of the above mentioned optimization (and additionally, testing by primes only), it takes 0.1s.
Also, f x | t = False | otherwise = True is the same as f x = not t.
We are asked by the PE site not to give you even a hint.
But in general, the key to efficiency in Haskell, thanks to its laziness, is being generative with as small a duplication of effort as possible. As one example, instead of calculating each slice of a list in isolation starting anew, we can produce the bunch of them together as part of one process,
slices :: Int -> [a] -> [[a]]
slices n = map (take n) . iterate tail -- sequence of list's slices of length n each
Another principle is, try to solve a more general problem, of which yours is an instance.
Having written such a function, we can play with it by trying out different values for its parameters, from smaller to the bigger ones, for an exploratory style of problem solving. We're told about 21 consecutive primes. What about 22 of them? 27? 1127 of them? ... and I've said enough about this already.
If it starts taking too much time, we can assess the full solution's needed run time by empirical orders of growth analysis.
Though the solution is found quickly enough with your unoptimized isPrime code, the exploratory process can be prohibitively slow with it, but it is fast enough with the optimized code:
primes :: [Int]
primes = 2 : filter isPrime [3,5..]
isPrime n = and [rem n p > 0 | p <- takeWhile ((<= n).(^2)) primes]

Project euler 10 - [haskell] Why so inefficient?

Alright, so i've picked up project euler where i left off when using java, and i'm at problem 10. I use Haskell now and i figured it'd be good to learn some haskell since i'm still very much a beginner.
http://projecteuler.net/problem=10
My friend who still codes in java came up with a very straight forward way to implement the sieve of eratosthenes:
http://puu.sh/5zQoU.png
I tried implementing a better looking (and what i thought was gonna be a slightly more efficient) Haskell function to find all primes up to 2,000,000.
I came to this very elegant, yet apparently enormously inefficient function:
primeSieveV2 :: [Integer] -> [Integer]
primeSieveV2 [] = []
primeSieveV2 (x:xs) = x:primeSieveV2( (filter (\n -> ( mod n x ) /= 0) xs) )
Now i'm not sure why my function is so much slower than his (he claim his works in 5ms), if anything mine should be faster, since i only check composites once (they are removed from the list when they are found) whereas his checks them as many times as they can be formed.
Any help?

You don't actually have a sieve here. In Haskell you could write a sieve as
import Data.Vector.Unboxed hiding (forM_)
import Data.Vector.Unboxed.Mutable
import Control.Monad.ST (runST)
import Control.Monad (forM_, when)
import Prelude hiding (read)
sieve :: Int -> Vector Bool
sieve n = runST $ do
vec <- new (n + 1) -- Create the mutable vector
set vec True -- Set all the elements to True
forM_ [2..n] $ \ i -> do -- Loop for i from 2 to n
val <- read vec i -- read the value at i
when val $ -- if the value is true, set all it's multiples to false
forM_ [2*i, 3*i .. n] $ \j -> write vec j False
freeze vec -- return the immutable vector
main = print . ifoldl' summer 0 $ sieve 2000000
where summer s i b = if b then i + s else s
This "cheats" by using a mutable unboxed vector, but it's pretty darn fast
$ ghc -O2 primes.hs
$ time ./primes
142913828923
real: 0.238 s
This is about 5x faster than my benchmarking of augustss's solution.

To actually implement the sieve efficiently in Haskell you probably need to do it the Java way (i.e., allocate a mutable array an modify it).
For just generating primes I like this:
primes = 2 : filter (isPrime primes) [3,5 ..]
where isPrime (p:ps) x = p*p > x || x `rem` p /= 0 && isPrime ps x
And then you can print the sum of all primes primes < 2,000,000
main = print $ sum $ takeWhile (< 2000000) primes
You can speed it up by adding a type signature primes :: [Int].
But it works well with Integer as well and that also gives you the correct sum (which 32 bit Int will not).
See The Genuine Sieve of Eratosthenes for more information.

The time complexity of your code is n2 (in n primes produced). It is impractical to run for producing more than first 10...20 thousand primes.
The main problem with that code is not that it uses rem but that it starts its filters prematurely, so creates too many of them. Here's how you fix it, with a small tweak:
{-# LANGUAGE PatternGuards #-}
primes = 2 : sieve primes [3..]
sieve (p:ps) xs | (h,t) <- span (< p*p) xs = h ++ sieve ps [x | x <- t, rem x p /= 0]
-- sieve ps (filter (\x->rem x p/=0) t)
main = print $ sum $ takeWhile (< 100000) primes
This improves the time complexity by about n1/2 (in n primes produced) and gives it a drastic speedup: it gets to 100,000 75x faster. Your 28 seconds should become ~ 0.4 sec. But, you probably tested it in GHCi as interpreted code, not compiled. Marking it1) as :: [Int] and compiling with -O2 flag gives it another ~ 40x speedup, so it'll be ~ 0.01 sec. To reach 2,000,000 with this code takes ~ 90x longer, for a whopping ~ 1 sec of projected run time.
1) be sure to use sum $ map (fromIntegral :: Int -> Integer) $ takeWhile ... in main.
see also: http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth

Setting upper limit to the input set according to the output function

I'm currently stuck on setting upper limits in list comprehensions.
What I'm trying to do is to find all Fibonacci numbers below one million.
For this I had designed a rather simple recursive Fibonacci function
fib :: Int -> Integer
fib n
n == 0 = 0
n == 1 = 1
otherwise = fib (n-1) + fib (n-2)
The thing where I'm stuck on is defining the one million part. What I've got now is:
[ fib x | x <- [0..35], fib x < 1000000 ]
This because I know that the 35th number in the Fibonacci sequence is a high enough number.
However, what I'd like to have is to find that limit via a function and set it that way.
[ fib x | x <- [0..], fib x < 1000000 ]
This does give me the numbers, but it simply doesn't stop. It results in Haskell trying to find Fibonacci numbers below one million further in the sequence, which is rather fruitless.
Could anyone help me out with this? It'd be much appreciated!

The check fib x < 1000000 in the list comprehension filters away the fib x values that are less than 1000000; but the list comprehension has no way of knowing that greater values of x imply greater value of fib x and hence must continue until all x have been checked.
Use takeWhile instead:
takeWhile (< 1000000) [ fib x | x <- [0..35]]

A list comprehension is guaranteed to look at every element of the list. You want takeWhile :: (a -> Bool) -> [a] -> [a]. With it, your list is simply takeWhile (< 1000000) $ map fib [1..]. The takeWhile function simply returns the leading portion of the list which satisfies the given predicate; there's also a similar dropWhile function which drops the leading portion of the list which satisfies the given predicate, as well as span :: (a -> Bool) -> [a] -> ([a], [a]), which is just (takeWhile p xs, dropWhile p xs), and the similar break, which breaks the list in two when the predicate is true (and is equivalent to span (not . p). Thus, for instance:
takeWhile (< 3) [1,2,3,4,5,4,3,2,1] == [1,2]
dropWhile (< 3) [1,2,3,4,5,4,3,2,1] == [3,4,5,4,3,2,1]
span (< 3) [1,2,3,4,5,4,3,2,1] == ([1,2],[3,4,5,4,3,2,1])
break (> 3) [1,2,3,4,5,4,3,2,1] == ([1,2,3],[4,5,4,3,2,1])

It should be mentioned that for such a task the "canonical" (and faster) way is to define the numbers as an infinite stream, e.g.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
takeWhile (<100) fibs
--[0,1,1,2,3,5,8,13,21,34,55,89]
The recursive definition may look scary (or even "magic") at first, but if you "think lazy", it will make sense.
A "loopy" (and in a sense more "imperative") way to define such an infinite list is:
fibs = map fst $ iterate (\(a,b) -> (b,a+b)) (0,1)
[Edit]
For an efficient direct calculation (without infinite list) you can use matrix multiplication:
fib n = second $ (0,1,1,1) ** n where
p ** 0 = (1,0,0,1)
p ** 1 = p
p ** n | even n = (p `x` p) ** (n `div` 2)
| otherwise = p `x` (p ** (n-1))
(a,b,c,d) `x` (q,r,s,t) = (a*q+b*s, a*r+b*t,c*q+d*s,c*r+d*t)
second (_,f,_,_) = f
(That was really fun to write, but I'm always grateful for suggestions)

The simplest thing I can think of is:
[ fib x | x <- [1..1000000] ]
Since fib n > n for all n > 3.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to evaluate a lazy list once? - haskell

Related

Can this prime sieve code be further simplified in Haskell?

Why does my function not work with an infinite list?

Project Euler 50: Algorithm is incredibly slow, failing to understand why

Project euler 10 - [haskell] Why so inefficient?

Setting upper limit to the input set according to the output function

Categories

Resources