How to make this Haskell program run faster

How to make this Haskell program run faster - haskell

So I've been trying to learn Haskell by solving some problems on Codeforce.
And I am getting a lot of TLE (Time Limit Exceed) even though I think my time complexity is optimal.
My question is: is the way I wrote this program that makes it slow?
For example, here is the problem.
Basically the answer is to find an for a given n , where
an = 2*an-1 + D(n) and D(n) = the difference of the number of divisors between n and n-1.
(update: the top limit for n is 106).
Below is my program.
import qualified Data.Map.Strict as Map
main = do t <- read <$> getLine
putStrLn . show $ solve t
solve :: Integer -> Integer
solve 0 = 1
solve 1 = 1
solve n = (2*(solve (n-1)) + (fact n) - (fact (n-1))) `mod` 998244353
where fact n = foldl (\s -> \t -> s*(snd t + 1)) 1 (Map.toList . factorization $ n)
--the number of divisors of a number
--copied from Internet,infinite prime list
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h,~(_:t)) = span (< p*p) xs
--make factorization of a number
factorization :: Integer -> Map.Map Integer Integer
factorization 1 = Map.fromList []
factorization x = Map.insertWith (+) factor 1 (factorization (x `div` factor))
where factor = head $ filter (\s -> (x `mod` s) == 0) ls
ls = primes
This program failed to solve in the time limit.
So could anyone point me out where did I do wrong and how to fix it?
Or it just impossible to solve this problem using Haskell in time limit?

There are many ways in which your time complexity is not optimal. The most obvious one is a prime finder using trial division instead of, e.g., a sieve. Maybe it's fine because you only compute the primes once, but it does not inspire confidence.
factorization also has at least one glaring problem. Consider factoring a number like 78893012641, whose prime factorization is 280879^2. You will search each prime number up to 280879: expensive, but pretty much unavoidable. However, at this point you divide by 280879 and then try to factorize 280879, starting from 2 and scanning all the small primes again even though you just found out none of them are a factor!
As Li-yao Xia says in a comment, I would also be suspicious of the multiplication of very large Integers before taking their modulus, instead of taking a modulus after each multiplication.

You haven't copied the right piece of code from the "Internet". You should've instead copied primesTMWE for the primes list, but more importantly, primeFactors for the factorization algorithm.
Your foldl based calculation of the number of divisors from a number's factorization is perfectly fine, except perhaps foldl' should be used instead.
Notice that both solve n and solve (n-1) calculate fact (n-1), so better precalculate all of them..... perhaps a better algorithm exists to find the numbers of divisors for all numbers from 1 to n than calculating it for each number separately.
I suspect even with the right algorithms (which I link above) it's going to be tough, time-wise, if you're going to factorize each number independently (O(n) numbers, O(n1/2)) time to factorize each... each prime, at least).
Perhaps the thing to try here is the smallest-factor sieve which can be built in O(n log log n) time as usual with the sieve of Eratosthenes, and once it's built it lets you find the factorization of each number in O(log log n) time (it's the average number of prime factors for a number). It will have to be built up to n though (you can special-case the evens to halve the space requirements of course; or 6-coprimes to save another 1/6th). Probably as an STUArray (that link is an example; better codes can be found here on SO).
The smallest-factor sieve is just like the sieve of Eratosthenes, except it uses the smallest factor, not just a Boolean, as a mark.
To find a number's factorization then we just repeatedly delete by a number's smallest factor, n / sf(n) =: n1, repeating for n1 / sf(n1) =: n2, then n2, etc. until we hit a prime (which is any number which has itself as the smallest factor).
Since you only use those factors to calculate the number's total number of divisors, you can fuse the two calculations together into one joined loop, for extra efficiency.

Related

list is not being closed after finishing computation

For practice I wrote a Haskell program to find prime factors.
The code is the following:
getfactors :: Int -> [Int]
getfactors n = [x | x<-[1..n], n `mod` x == 0]
prime :: Int -> Bool
prime n | getfactors n == [1,n] = True
| otherwise = False
primefactors :: Int -> [Int]
primefactors n = [x | x <- getfactors n, prime x == True]
Everything works fine for small numbers, but when I enter large numbers the computation stops at the biggest prime factor and the expected list does not close.
For example:
>primefactors 1263
[3,421]
>primefactors 1387781234
[2,7,2161,6553
An explanation is very much appreciated.

Could not reproduce:
> :set +s
> primefactors 1387781234
[2,7,2161,6553]
(368.04 secs, 288,660,869,072 bytes)
Your algorithm is just very slow. There are lots of ways to improve it:
You are checking primality by trial division of all numbers smaller than the candidate. You can improve (without changing algorithm) by checking only up to the square root of the candidate.
Besides trial division, there are a wide range of other primality checking algorithms running the full spectrum from "simple but slow" to "complicated as heck but blazing fast". Other Internet sources will have plenty of details.
If you want to factor many numbers, it may be beneficial to memoize your primality checks between calls -- e.g. by storing a list of primes and iterating over them instead of iterating over all numbers. Since this is the only consumer of your primality check, you may want to consider creating this list directly rather than implementing a primality-checking algorithm first; again there's a wide range of algorithms for this running the spectrum from simple to fast.
Once you find a factor, you can divide the number you are factoring by that to get a smaller, faster number to compute the remaining factors with.
There are probably other easy opportunities to speed things up, and lots of prior work to read about online. Enjoy!

Using non-deterministic list monad to find long Collatz sequences

I wrote the following code to solve Project Euler's No. 14:
The following iterative (Collatz) sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Q: Which starting number, under one million, produces the longest chain?
And my code:
collatz :: Integer -> [Integer]
collatz 1 = [1]
collatz n =
filter (< 1000000) prev >>= poss
where prev = collatz (n - 1)
poss :: Integer -> [Integer]
poss prev
| even prev && prev `mod` 3 == 1 && (prev - 1) `div` 3 > 1 = [2 * prev, (prev - 1) `div` 3]
| otherwise = [2 * prev]
Where collatz n returns a list of numbers that will generate a Collatz chain of length n. The problem is, I can only either not restrict the result or restrict the whole chain, instead of only the seed number, to be under 1000,000. Is it possible to use this model to solve the problem at all?

I think that this approach - while interesting - is fundamentally doomed. Suppose I discover that all the seeds which result in a chain of length 500 are above 2,000,000. How can I know that I won't find that in three more steps there's a seed under 1,000,000 that gets me there? I see no way to know when you're done.
The only viable approach I see to this problem is to compute the collatz length for every number from 1 to 999,999 and then do something like:
main :: IO ()
main = do
let collatzMax = maximumBy (compare `on` collatzLength) [1..999999]
print collatzMax
On the other hand, this provides a great opportunity to learn about CAFs since the function collatzLength could be naively defined as:
collatzLength 1 = 1
collatzLength n | n `mod` 2 == 0 = 1 + collatzLength (n `div` 2)
collatzLength n = 1 + collatzLength (3 * n + 1)
And that kind of recursion screams out for a CAF.
Sure, there are memoization modules that will go and build the CAF for you, but building one yourself is a useful exercise. It's a whole little mini-course in lazy infinitely-recursive data structures.
If that defeats you, you can glance at this spoiler of how to use a CAF and then rewrite it using a different data structure. (what about a 10-way tree instead of a binary tree? What about traversing the tree in a different order? Can you remove the call to showIntAtBase?)

Your idea is interesting, although not the most efficient one. It could be worth trying, although it'll be probably memory intensive. Some thoughts:
As some chains can go over 1000000, so you can't just filter out everything less in collatz. You need to keep all the numbers in each pass.
Calling collatz this way is inefficient, as it computes the sets all over again. Making it an infinite list that shares values would be more efficient:
collatz :: [[Integer]]
collatz = [1] : map (>>= poss) collatz
You need to figure out when you're done. For this you'd need to go through the number lists generated by collatz and count how many of them are below 1000000. When you have seen all the numbers below the limit, the last list will contain the numbers with the longest chain.
That said, I'm afraid this approach isn't computationally feasible. In particular, you'll generate exponentially many numbers and exponentially large ones. For example, if the longest chain would be 500, the result of collatz in that step would contain numbers up to 2^500. And as mentioned, there is no way to tell which of these huge numbers might be the one leading to the solution, so you can't just discard them.

Prime Factoring Function in Haskell

I am trying to make a function that will display a number's prime factors with a list (infinite) that I give it. Here is what I have so far:
-- Here is a much more efficient (but harder to understand) version of primes.
-- Try "take 100 primes" as an example (or even more if you like)
primes = 2 : primesFrom3 where
primesFrom3 = sieve [3,5..] 9 primesFrom3
sieve (x:xs) b ~ps#(p:q:_)
| x < b = x : sieve xs b ps
| otherwise = sieve [x | x <- xs, rem x p /= 0] (q^2) (tail ps)
-- Write a function that factors its first argument using the (infinite)
-- list of available factors given in its second argument
-- (using rem x p /= 0 to check divisibility)
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if (rem n p /= 0) then
(primeFactsWith n ps)
else (primeFactsWith p ps)
The top half was not written by me and works just fine. I am trying to get the second half to work, but it isn't. Read the comments in the code to better understand exactly what I am trying to do. Thanks! Oh and please don't just spout the answer. Give me some hints on how to do it and maybe what is wrong.

What's wrong
The problem is that you do a recursive call in both branches, therefore the function will never stop.
Some Hints
To build a recursive list-producing function, you'll need two branches or cases:
Base case no recursive call, this stops the recursion and returns the final part of the result.
Recursive case here you modify the parameters of the function and call it again with the modified parameters, possibly also returning a part of the result.
You need two sub branches at the recursive branch. One if you've found a prime factor, and another if the current number is no prime factor.
Here is a skeleton, you need to fill in the parts in the <> brackets.
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if <halt condition> then
<final result>
else if (rem n p /= 0) then
<not a factor - recursive call 1>
else
<found a factor - return it,
and make recursive call 2>
If you have found a prime factor, you can divide the number by it, to get a smaller number, without that factor. To perform integer division Haskell provides a function named div.
If you reach the number 1, you have generated all prime factors and you can stop. The final part of a prime factors list, that comes after all its factors, is an empty list.
You can drop any prime from your infinite list if you no longer need it, but be aware that a number could contain a prime several times in the factors list. If you want to drop p you can just use ps, from the pattern; if you want to keep p you must use (p:ps).
The cons operator (:) can be used to build a list. You can use it to return one number of the result list, and use a recursive call to find the remaining numbers, e.g.
x : foo y z
I hope that helps, if you have any questions don't hesitate to ask.

Here's a hint.
So you're recursing, which is good.
In one branch you keep looking for factors of n. In the other branch you seem to look for the factors of p, which is a bit weird, but whatevs.
Where do you return the factors of n you've found?

Most efficient way to get digit count of arbitrarily big number

What is the most efficient way to get the digits of a number?
Lets begin with an example:
Imagine the Fibonacci sequence. Now lets say we want to know which Fibonacci number is the first to have 1000 digits (in base 10 representation). Up to 308 digits (1476th Fibonacci number) we can easily do this by using logBase 10 <number>. If the number is greater than the 1476th Fibonacci number, logBase will return Infinity and the calculation will fail. The problem is that 308 is somewhat far away from 1000, which was our initial goal.
A possible solution is to convert the number we want to know the number of digits of to a string and use it's length to determine the digit count. This is a little bit inefficient for my purposes because trying this with 10000 takes its sweet time.
The most efficient method shown in other questions is hardcoding all possible cases which I really do not want to do, especially because the number of digits exceeds 10 as needed in the proposed solutions.
So to come back to my question: What is the best (most efficient) way to determine a base 10 numbers digit count? Is it really converting it to a string and using its length or are there any "hacker" tricks like 0x5f3759df?
Note: I appreciate solutions in any language, even if this is tagged "haskell".

Why not use div until it's no longer greater than 10?
digitCount :: Integer -> Int
digitCount = go 1 . abs
where
go ds n = if n >= 10 then go (ds + 1) (n `div` 10) else ds
This is O(n) complexity, where n is the number of digits, and you could speed it up easily by checking against 1000, then 100, then 10, but this will probably be sufficient for most uses.
For reference, on my not-so-great laptop running it only in GHCi and using the horribly inaccurate :set +s statistics flag:
> let x = 10 ^ 10000 :: Integer
> :force x
<prints out 10 ^ 10000>
> digitCount x
10001
it :: Int
(0.06 secs, 23759220 bytes)
So it seems pretty quick, it can churn through a 10001 digit number in less than a 10th of a second without optimizations.
If you really wanted the O(log(n)) complexity, I would recommend writing your own version where you divide by 2 each time, but that one is a little more involved and trickier than dividing by 10. For your purposes this version will easily compute the number of digits up to about 20000 digits without problems.

If you just want to find the first number with at least digitCount digits in a list, you could test each number in O(1) by checking if fibBeingTested >= 10digitCount - 1. This works since 10digitCount - 1 is the lowest number with at least digitCount digits:
import Data.List (find)
fibs :: [Integer]
-- ...
findFib :: Int -> Integer
findFib digitCount =
let Just solution = find (>= tenPower) fibs
in
solution
where
tenPower = 10 ^ (digitCount - 1)
We use digitCount - 1 because 10^1, for instance, is 10 which has two digits.
As a result of the O(1) complexity that this comparison has, you can find Fibonacci numbers very quickly. On my machine:
λ> :set +s
λ> findFib 10000
[... the first Fibonacci number with at least 10,000 digits ...]
(0.23 secs, 121255512 bytes)
If the list of fibs has already been computed up to the 10,000th digit Fibonacci (for example, if you run findFib 10000 twice) it's even faster, which shows that more computation is taking place in calculating each Fibonacci number than in finding the one you're looking for:
λ> findFib 10000 -- Second run of findFib 10000
[... the first Fibonacci number with at least 10,000 digits ...]
(0.04 secs, 9922000 bytes)

For just getting up to a Fibonacci number that has more than 1000 digits, length . show (on Integer) suffices.
GHCi> let fibs = Data.Function.fix $ (0:) . scanl (+) 1
GHCi> let digits = length . (show :: Integer -> String)
GHCi> :set +t +s
GHCi> fst . head . dropWhile ((1000>) . digits . snd) $ zip [0..] fibs
4782
it :: Integer
(0.10 secs, 149103264 bytes)
For floating point numbers (so you can use logBase) outside the range of Double look to the numbers package. They are down-right slow, but you do have to pay something for that type of accuracy.

You could always try binary search to find the number of digits of n: first find a k such that 10^2^k ≥ n, and then divide n succesively by 10^2^(k-1), 10^2^(k-2), ..., 10^2^0:
numDigits n = fst $ foldr step (1,n) tenToPow2s
where
pow2s = iterate (*2) 1
tenToPow2s = zip pow2s . takeWhile (<=n) . iterate (^2) $ 10
step (k,t) (d,n) = if n>=t then (d+k, n `div` t) else (d,n)
For the specific case of Fibonacci numbers you could also just try math: the n-th Fibonacci number F(n) is between (φ^n-1)/√5 and (φⁿ+1)/√5 so for the base 10 logarithm we have:
log(F(n)) - n log(φ) + log(√5) ∈ [log(1 - 1/φⁿ), log(1 + 1/φⁿ)]
That interval gets tiny right away.

My solution for Euler Project #3 is too slow

I'm new to Haskell and tinkering around with the Euler Project problems. My solution for problem #3 is far too slow. At first I tried this:
-- Problem 3
-- The prime factors of 13195 are 5, 7, 13 and 29.
-- What is the largest prime factor of the number 600851475143 ?
problem3 = max [ x | x <- [1..n], (mod n x) == 0, n /= x]
where n = 600851475143
Then I changed it to return all x and not just the largest one.
problem3 = [ x | x <- [1..n], (mod n x) == 0, n /= x]
where n = 600851475143
After 30 minutes, the list is still being processed and the output looks like this
[1,71,839,1471,6857,59569,104441,486847,1234169,5753023,10086647,87625999,408464633,716151937
Why is it so slow? Am I doing something terribly wrong or is it normal for this sort of task?

With your solution, there are about 600 billion possible numbers. As noted by delnan, making every check of the number quicker is not going to make much difference, we must limit the number of candidates.
Your solution does not seem to be correct either. 59569 = 71 * 839 isn't it? The question
only asks for prime factors. Notice that 71 and 839 is in your list so you are
doing something right. In fact, you are trying to find all factors.
I think the most dramatic effect you get simply by dividing away the factor before continuing.
euler3 = go 2 600851475143
where
go cand num
| cand == num = [num]
| cand `isFactorOf` num = cand : go cand (num `div` cand)
| otherwise = go (cand + 1) num
isFactorOf a b = b `mod` a == 0
This may seem like an obvious optimization but it relies on the fact that if both a and b divides c and a is coprime to b then a divides c/b.
If you want to do more, the common "Only check until the square root" trick has been
mentioned here. The same trick can be applied to this problem, but the performance gain does not show, unfortunately, on this instance:
euler3 = go 2 600851475143
where
go cand num
| cand*cand > num = [num]
| cand `isFactorOf` num = cand : go cand (num `div` cand)
| otherwise = go (cand + 1) num
isFactorOf a b = b `mod` a == 0
Here, when a candidate is larger than the square root of the remaining number (num), we know that num must be a prime and therefore a prime factor of the original
number (600851475143).
It is possible to remove even more candidates by only considering prime numbers,
but this is slightly more advanced because you need to make a reasonably performant
way of generating primes. See this page for ways of doing that.

It's doing a lot of work! (It's also going to give you the wrong answer, but that's a separate issue!)
There are a few very quick ways you could speed it up by thinking about the problem a little first:
You are applying your function over all numbers 1..n, and checking each one of them to ensure it isn't n. Instead, you could just go over all numbers 1..n-1 and skip out n different checks (small though they are).
The answer is odd, so you can very quickly filter out any even numbers by going from 1..(n-1)/2 and checking for 2x instead of x.
If you think about it, all factors occur in pairs, so you can in fact just search from 1..sqrt(n) (or 1..sqrt(n)/2 if you ignore even numbers) and output pairs of numbers in each step.
Not related to the performance of this function, but it's worth noting that what you've implemented here will find all of the factors of a number, whereas what you want is only the largest prime factor. So either you have to test each of your divisors for primality (which is going to be slow, again) or you can implement the two in one step. You probably want to look at 'sieves', the most simple being the Sieve of Eratosthenes, and how you can implement them.

A complete factorization of a number can take a long time for big numbers. For Project Euler problems, a brute force solution (which this is) is usually not enough to find the answer in your lifetime.
Hint: you do not need to find all prime factors, just the biggest one.

TL;DR: The two things you were doing non-optimally, are: not stopping at the square root, and not dividing out each smallest factor, as they are found.
Here's a little derivation of the (2nd) factorization code shown in the answer by HaskellElephant. We start with your code:
f1 n = [ x | x <- [2..n], rem n x == 0]
n3 = 600851475143
Prelude> f1 n3
[71,839,1471,6857,59569,104441,486847Interrupted.
So it doesn't finish in any reasonable amount of time, and some of the numbers it produces are not prime... But instead of adding primality check to the list comprehension, let's notice that 71 is prime. The first number produced by f1 n is the smallest divisor of n, and thus it is prime. If it weren't, we'd find its smallest divisor first - a contradiction.
So, we can divide it out, and continue searching for the prime factors of newly reduced number:
f2 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f1 m) (1,n)
Prelude> f2 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(*** Exception: Prelude.hea
d: empty list
(the error, because f1 1 == []). We're done! (6857 is the answer, here...). Let's wrap it up:
takeUntil p xs = foldr (\x r -> if p x then [x] else x:r) [] xs
pfactors1 n = map fst . takeUntil ((==1).snd) . f2 $ n -- prime factors of n
Trying out our newly minted solution,
Prelude> map pfactors1 [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259Interrupted.
suddenly we hit a new inefficiency wall, on numbers without small divisors. But if n = a*b and 1 < a <= b, then a*a <= a*b == n and so it is enough to test only until the square root of a number, to find its smallest divisor.
f12 n = [ x | x <- takeWhile ((<= n).(^2)) [2..n], rem n x == 0] ++ [n]
f22 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f12 m) (1,n)
pfactors2 n = map fst . takeUntil ((==1).snd) . f22 $ n
What couldn't finish in half an hour now finishes in under one second (on a typical performant box):
Prelude> f12 n3
[71,839,1471,6857,59569,104441,486847,600851475143]
All the divisors above sqrt n3 were not needed at all. We unconditionally add n itself as the last divisor in f12 so it is able to handle prime numbers:
Prelude> f12 (n3+6)
[600851475149]
Since n3 / sqrt n3 = sqrt n3 ~= 775146, your original attempt at f1 n3 should have taken about a week to finish. That's how important this optimization is, of stopping at the square root.
Prelude> f22 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(1,1),(1,1),(1,1),(1,1),(1,
1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1)Interrupted
We've apparently traded the "Prelude.head: empty list" error for a non-terminating - but productive - behavior.
Lastly, we break f22 up in two parts and fuse them each into the other functions, for a somewhat simplified code. Also, we won't start over anew, as f12 does, searching for the smallest divisor from 2 all the time, anymore:
-- smallest factor of n, starting from d. directly jump from sqrt n to n.
smf (d,n) = head $ [ (x, quot n x) | x <- takeWhile ((<=n).(^2)) [d..]
, rem n x == 0] ++ [(n,1)]
pfactors n = map fst . takeUntil ((==1).snd) . tail . iterate smf $ (2,n)
This expresses guarded (co)recursion through a higher-order function iterate, and is functionally equivalent to that code mentioned above. The following now runs smoothly, and we're even able to find a pair of twin primes as a bonus there:
Prelude Saga> map pfactors [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259,6628403],[5,120170295029],[2,13,37,227,27514
79],[3,7,7,11,163,2279657],[2,2,41,3663728507],[600851475149],[2,3,5,5,19,31,680
0809],[600851475151],[2,2,2,2,37553217197],[3,3,3,211,105468049],[2,7,11161,3845
351],[5,67,881,2035853],[2,2,3Interrupted.

Here is my solution for Euler Project #3. It takes only 1.22 sec on my Macbook Air.
First we should find all factors of the given number. But we know, that even numbers can't be prime numbers (except number 2). So, to solve Euler Project #3 we need not all, but only odd factors:
getOddFactors num = [ x | x <- [3,5..num], num `rem` x == 0 ]
But we can optimize this function. If we plan to find a factor of num greater than sqrt num, we should have another factor which is less than sqrt num - and these possible factors we have found already. Hence, we can limit our list of possible factors by sqrt num:
getOddFactors num = [ x | x <- [3, 5..(floor.sqrt.fromIntegral) num],
num `rem` x == 0 ]
Next we want to know which of our odd factors of num are prime numbers:
isPrime number = [ x | x <- [3..(floor.sqrt.fromIntegral) number],
number `rem` x == 0] == []
Next we can filter odd factors of num with the function isPrime to find all prime factors of num. But to use laziness of Haskell to optimize our solution, we apply function filter isPrime to the reversed list of odd factors of the num. As soon as our function finds the first value which is prime number, Haskell stops computations and returns solution:
largestPrimeFactor = head . filter isPrime . reverse . getOddDivisors
Hence, the solution is:
ghci> largestPrimeFactor 600851475143
6857
(1.22 secs, 110646064 bytes)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to make this Haskell program run faster - haskell

Related

list is not being closed after finishing computation

Using non-deterministic list monad to find long Collatz sequences

Prime Factoring Function in Haskell

Most efficient way to get digit count of arbitrarily big number

My solution for Euler Project #3 is too slow

Categories

Resources