Related
So I've been trying to learn Haskell by solving some problems on Codeforce.
And I am getting a lot of TLE (Time Limit Exceed) even though I think my time complexity is optimal.
My question is: is the way I wrote this program that makes it slow?
For example, here is the problem.
Basically the answer is to find an for a given n , where
an = 2*an-1 + D(n) and D(n) = the difference of the number of divisors between n and n-1.
(update: the top limit for n is 106).
Below is my program.
import qualified Data.Map.Strict as Map
main = do t <- read <$> getLine
putStrLn . show $ solve t
solve :: Integer -> Integer
solve 0 = 1
solve 1 = 1
solve n = (2*(solve (n-1)) + (fact n) - (fact (n-1))) `mod` 998244353
where fact n = foldl (\s -> \t -> s*(snd t + 1)) 1 (Map.toList . factorization $ n)
--the number of divisors of a number
--copied from Internet,infinite prime list
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h,~(_:t)) = span (< p*p) xs
--make factorization of a number
factorization :: Integer -> Map.Map Integer Integer
factorization 1 = Map.fromList []
factorization x = Map.insertWith (+) factor 1 (factorization (x `div` factor))
where factor = head $ filter (\s -> (x `mod` s) == 0) ls
ls = primes
This program failed to solve in the time limit.
So could anyone point me out where did I do wrong and how to fix it?
Or it just impossible to solve this problem using Haskell in time limit?
There are many ways in which your time complexity is not optimal. The most obvious one is a prime finder using trial division instead of, e.g., a sieve. Maybe it's fine because you only compute the primes once, but it does not inspire confidence.
factorization also has at least one glaring problem. Consider factoring a number like 78893012641, whose prime factorization is 280879^2. You will search each prime number up to 280879: expensive, but pretty much unavoidable. However, at this point you divide by 280879 and then try to factorize 280879, starting from 2 and scanning all the small primes again even though you just found out none of them are a factor!
As Li-yao Xia says in a comment, I would also be suspicious of the multiplication of very large Integers before taking their modulus, instead of taking a modulus after each multiplication.
You haven't copied the right piece of code from the "Internet". You should've instead copied primesTMWE for the primes list, but more importantly, primeFactors for the factorization algorithm.
Your foldl based calculation of the number of divisors from a number's factorization is perfectly fine, except perhaps foldl' should be used instead.
Notice that both solve n and solve (n-1) calculate fact (n-1), so better precalculate all of them..... perhaps a better algorithm exists to find the numbers of divisors for all numbers from 1 to n than calculating it for each number separately.
I suspect even with the right algorithms (which I link above) it's going to be tough, time-wise, if you're going to factorize each number independently (O(n) numbers, O(n1/2)) time to factorize each... each prime, at least).
Perhaps the thing to try here is the smallest-factor sieve which can be built in O(n log log n) time as usual with the sieve of Eratosthenes, and once it's built it lets you find the factorization of each number in O(log log n) time (it's the average number of prime factors for a number). It will have to be built up to n though (you can special-case the evens to halve the space requirements of course; or 6-coprimes to save another 1/6th). Probably as an STUArray (that link is an example; better codes can be found here on SO).
The smallest-factor sieve is just like the sieve of Eratosthenes, except it uses the smallest factor, not just a Boolean, as a mark.
To find a number's factorization then we just repeatedly delete by a number's smallest factor, n / sf(n) =: n1, repeating for n1 / sf(n1) =: n2, then n2, etc. until we hit a prime (which is any number which has itself as the smallest factor).
Since you only use those factors to calculate the number's total number of divisors, you can fuse the two calculations together into one joined loop, for extra efficiency.
In Haskell, it is possible to define infinite lists like so:
[1.. ]
If found many articles which describe how to implement infinite lists and I understood how this works.
However, I cant think of any reason to use the concept of infinite datastructures.
Can someone give me an example of a problem, which can be solved easier (or maybe only) with an infinite list in Haskell?
The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).
For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.
This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:
total = 0
1 < 5 is true
1 * 2 == 2
total = 0 + 2 = 2
2 < 5 is true
2 * 2 == 4
total = 2 + 4 = 6
6 < 5 is false
result = 6
This is exactly how you would write a fast loop in an imperative language (pseudocode):
total = 0;
for x in xs {
if (x < 5) {
total = total + x * 2;
}
}
This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.
For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!
For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.
Annotating a list with its indices temporarily uses an infinite list of indices:
zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):
fib = (memo !!)
where
memo = map fib' [0..] -- cache of *all* fibonacci numbers (evaluated on demand)
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n-1) + fib (n-2)
Purely mocking programs with side-effects (free monads)
data IO a = Return a
| GetChar (Char -> IO a)
| PutChar Char (IO a)
Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))
Tries: if you define a type approximately like the following:
data Trie a = Trie a (Trie a) (Trie a)
it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):
findIndices :: [Integer] -> Trie [Integer]
findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:
-- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
> table = findIndices (concat [ [0..n] | n <- [0..] ])
> table `apply` 0
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
all from a one-line infinite fold.
I'm sure there are more examples, there are so many cool things you can do.
I am trying to make a function that will display a number's prime factors with a list (infinite) that I give it. Here is what I have so far:
-- Here is a much more efficient (but harder to understand) version of primes.
-- Try "take 100 primes" as an example (or even more if you like)
primes = 2 : primesFrom3 where
primesFrom3 = sieve [3,5..] 9 primesFrom3
sieve (x:xs) b ~ps#(p:q:_)
| x < b = x : sieve xs b ps
| otherwise = sieve [x | x <- xs, rem x p /= 0] (q^2) (tail ps)
-- Write a function that factors its first argument using the (infinite)
-- list of available factors given in its second argument
-- (using rem x p /= 0 to check divisibility)
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if (rem n p /= 0) then
(primeFactsWith n ps)
else (primeFactsWith p ps)
The top half was not written by me and works just fine. I am trying to get the second half to work, but it isn't. Read the comments in the code to better understand exactly what I am trying to do. Thanks! Oh and please don't just spout the answer. Give me some hints on how to do it and maybe what is wrong.
What's wrong
The problem is that you do a recursive call in both branches, therefore the function will never stop.
Some Hints
To build a recursive list-producing function, you'll need two branches or cases:
Base case no recursive call, this stops the recursion and returns the final part of the result.
Recursive case here you modify the parameters of the function and call it again with the modified parameters, possibly also returning a part of the result.
You need two sub branches at the recursive branch. One if you've found a prime factor, and another if the current number is no prime factor.
Here is a skeleton, you need to fill in the parts in the <> brackets.
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if <halt condition> then
<final result>
else if (rem n p /= 0) then
<not a factor - recursive call 1>
else
<found a factor - return it,
and make recursive call 2>
If you have found a prime factor, you can divide the number by it, to get a smaller number, without that factor. To perform integer division Haskell provides a function named div.
If you reach the number 1, you have generated all prime factors and you can stop. The final part of a prime factors list, that comes after all its factors, is an empty list.
You can drop any prime from your infinite list if you no longer need it, but be aware that a number could contain a prime several times in the factors list. If you want to drop p you can just use ps, from the pattern; if you want to keep p you must use (p:ps).
The cons operator (:) can be used to build a list. You can use it to return one number of the result list, and use a recursive call to find the remaining numbers, e.g.
x : foo y z
I hope that helps, if you have any questions don't hesitate to ask.
Here's a hint.
So you're recursing, which is good.
In one branch you keep looking for factors of n. In the other branch you seem to look for the factors of p, which is a bit weird, but whatevs.
Where do you return the factors of n you've found?
I'm new to Haskell and tinkering around with the Euler Project problems. My solution for problem #3 is far too slow. At first I tried this:
-- Problem 3
-- The prime factors of 13195 are 5, 7, 13 and 29.
-- What is the largest prime factor of the number 600851475143 ?
problem3 = max [ x | x <- [1..n], (mod n x) == 0, n /= x]
where n = 600851475143
Then I changed it to return all x and not just the largest one.
problem3 = [ x | x <- [1..n], (mod n x) == 0, n /= x]
where n = 600851475143
After 30 minutes, the list is still being processed and the output looks like this
[1,71,839,1471,6857,59569,104441,486847,1234169,5753023,10086647,87625999,408464633,716151937
Why is it so slow? Am I doing something terribly wrong or is it normal for this sort of task?
With your solution, there are about 600 billion possible numbers. As noted by delnan, making every check of the number quicker is not going to make much difference, we must limit the number of candidates.
Your solution does not seem to be correct either. 59569 = 71 * 839 isn't it? The question
only asks for prime factors. Notice that 71 and 839 is in your list so you are
doing something right. In fact, you are trying to find all factors.
I think the most dramatic effect you get simply by dividing away the factor before continuing.
euler3 = go 2 600851475143
where
go cand num
| cand == num = [num]
| cand `isFactorOf` num = cand : go cand (num `div` cand)
| otherwise = go (cand + 1) num
isFactorOf a b = b `mod` a == 0
This may seem like an obvious optimization but it relies on the fact that if both a and b divides c and a is coprime to b then a divides c/b.
If you want to do more, the common "Only check until the square root" trick has been
mentioned here. The same trick can be applied to this problem, but the performance gain does not show, unfortunately, on this instance:
euler3 = go 2 600851475143
where
go cand num
| cand*cand > num = [num]
| cand `isFactorOf` num = cand : go cand (num `div` cand)
| otherwise = go (cand + 1) num
isFactorOf a b = b `mod` a == 0
Here, when a candidate is larger than the square root of the remaining number (num), we know that num must be a prime and therefore a prime factor of the original
number (600851475143).
It is possible to remove even more candidates by only considering prime numbers,
but this is slightly more advanced because you need to make a reasonably performant
way of generating primes. See this page for ways of doing that.
It's doing a lot of work! (It's also going to give you the wrong answer, but that's a separate issue!)
There are a few very quick ways you could speed it up by thinking about the problem a little first:
You are applying your function over all numbers 1..n, and checking each one of them to ensure it isn't n. Instead, you could just go over all numbers 1..n-1 and skip out n different checks (small though they are).
The answer is odd, so you can very quickly filter out any even numbers by going from 1..(n-1)/2 and checking for 2x instead of x.
If you think about it, all factors occur in pairs, so you can in fact just search from 1..sqrt(n) (or 1..sqrt(n)/2 if you ignore even numbers) and output pairs of numbers in each step.
Not related to the performance of this function, but it's worth noting that what you've implemented here will find all of the factors of a number, whereas what you want is only the largest prime factor. So either you have to test each of your divisors for primality (which is going to be slow, again) or you can implement the two in one step. You probably want to look at 'sieves', the most simple being the Sieve of Eratosthenes, and how you can implement them.
A complete factorization of a number can take a long time for big numbers. For Project Euler problems, a brute force solution (which this is) is usually not enough to find the answer in your lifetime.
Hint: you do not need to find all prime factors, just the biggest one.
TL;DR: The two things you were doing non-optimally, are: not stopping at the square root, and not dividing out each smallest factor, as they are found.
Here's a little derivation of the (2nd) factorization code shown in the answer by HaskellElephant. We start with your code:
f1 n = [ x | x <- [2..n], rem n x == 0]
n3 = 600851475143
Prelude> f1 n3
[71,839,1471,6857,59569,104441,486847Interrupted.
So it doesn't finish in any reasonable amount of time, and some of the numbers it produces are not prime... But instead of adding primality check to the list comprehension, let's notice that 71 is prime. The first number produced by f1 n is the smallest divisor of n, and thus it is prime. If it weren't, we'd find its smallest divisor first - a contradiction.
So, we can divide it out, and continue searching for the prime factors of newly reduced number:
f2 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f1 m) (1,n)
Prelude> f2 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(*** Exception: Prelude.hea
d: empty list
(the error, because f1 1 == []). We're done! (6857 is the answer, here...). Let's wrap it up:
takeUntil p xs = foldr (\x r -> if p x then [x] else x:r) [] xs
pfactors1 n = map fst . takeUntil ((==1).snd) . f2 $ n -- prime factors of n
Trying out our newly minted solution,
Prelude> map pfactors1 [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259Interrupted.
suddenly we hit a new inefficiency wall, on numbers without small divisors. But if n = a*b and 1 < a <= b, then a*a <= a*b == n and so it is enough to test only until the square root of a number, to find its smallest divisor.
f12 n = [ x | x <- takeWhile ((<= n).(^2)) [2..n], rem n x == 0] ++ [n]
f22 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f12 m) (1,n)
pfactors2 n = map fst . takeUntil ((==1).snd) . f22 $ n
What couldn't finish in half an hour now finishes in under one second (on a typical performant box):
Prelude> f12 n3
[71,839,1471,6857,59569,104441,486847,600851475143]
All the divisors above sqrt n3 were not needed at all. We unconditionally add n itself as the last divisor in f12 so it is able to handle prime numbers:
Prelude> f12 (n3+6)
[600851475149]
Since n3 / sqrt n3 = sqrt n3 ~= 775146, your original attempt at f1 n3 should have taken about a week to finish. That's how important this optimization is, of stopping at the square root.
Prelude> f22 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(1,1),(1,1),(1,1),(1,1),(1,
1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1)Interrupted
We've apparently traded the "Prelude.head: empty list" error for a non-terminating - but productive - behavior.
Lastly, we break f22 up in two parts and fuse them each into the other functions, for a somewhat simplified code. Also, we won't start over anew, as f12 does, searching for the smallest divisor from 2 all the time, anymore:
-- smallest factor of n, starting from d. directly jump from sqrt n to n.
smf (d,n) = head $ [ (x, quot n x) | x <- takeWhile ((<=n).(^2)) [d..]
, rem n x == 0] ++ [(n,1)]
pfactors n = map fst . takeUntil ((==1).snd) . tail . iterate smf $ (2,n)
This expresses guarded (co)recursion through a higher-order function iterate, and is functionally equivalent to that code mentioned above. The following now runs smoothly, and we're even able to find a pair of twin primes as a bonus there:
Prelude Saga> map pfactors [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259,6628403],[5,120170295029],[2,13,37,227,27514
79],[3,7,7,11,163,2279657],[2,2,41,3663728507],[600851475149],[2,3,5,5,19,31,680
0809],[600851475151],[2,2,2,2,37553217197],[3,3,3,211,105468049],[2,7,11161,3845
351],[5,67,881,2035853],[2,2,3Interrupted.
Here is my solution for Euler Project #3. It takes only 1.22 sec on my Macbook Air.
First we should find all factors of the given number. But we know, that even numbers can't be prime numbers (except number 2). So, to solve Euler Project #3 we need not all, but only odd factors:
getOddFactors num = [ x | x <- [3,5..num], num `rem` x == 0 ]
But we can optimize this function. If we plan to find a factor of num greater than sqrt num, we should have another factor which is less than sqrt num - and these possible factors we have found already. Hence, we can limit our list of possible factors by sqrt num:
getOddFactors num = [ x | x <- [3, 5..(floor.sqrt.fromIntegral) num],
num `rem` x == 0 ]
Next we want to know which of our odd factors of num are prime numbers:
isPrime number = [ x | x <- [3..(floor.sqrt.fromIntegral) number],
number `rem` x == 0] == []
Next we can filter odd factors of num with the function isPrime to find all prime factors of num. But to use laziness of Haskell to optimize our solution, we apply function filter isPrime to the reversed list of odd factors of the num. As soon as our function finds the first value which is prime number, Haskell stops computations and returns solution:
largestPrimeFactor = head . filter isPrime . reverse . getOddDivisors
Hence, the solution is:
ghci> largestPrimeFactor 600851475143
6857
(1.22 secs, 110646064 bytes)
I am new to haskell and just learning the fun of functional programming. but have run into trouble right away with an implementation of the fibonacci function. Please find the code below.
--fibonacci :: Num -> [Num]
fibonacci 1 = [1]
fibonacci 2 = [1,1]
--fibonacci 3 = [2]
--fibonacci n = fibonacci n-1
fibonacci n = fibonacci (n-1) ++ [last(fibonacci (n-1)) + last(fibonacci (n-2))]
Rather awkward, I know. I can't find time to look up and write a better one. Though I wonder what makes this so inefficient. I know I should look it up, just hoping someone would feel the need to be pedagogic and spare me the effort.
orangegoat's answer and Sec Oe's answer contain a link to probably the best place to learn how to properly write the fibonacci sequence in Haskell, but here's some reasons why your code is inefficient (note, your code is not that different from the classic naive definition. Elegant? Sure. Efficient? Goodness, no):
Let's consider what happens when you call
fibonacci 5
That expands into
(fibonacci 4) ++ [(last (fibonacci 4)) + (last (fibonacci 3))]
In addition to concatenating two lists together with ++, we can already see that one place we're being inefficient is that we calculate fibonacci 4 twice (the two places we called fibonacci (n-1). But it gets worst.
Everywhere it says fibonacci 4, that expands into
(fibonacci 3) ++ [(last (fibonacci 3)) + (last (fibonacci 2))]
And everywhere it says fibonacci 3, that expands into
(fibonacci 2) ++ [(last (fibonacci 2)) + (last (fibonacci 1))]
Clearly, this naive definition has a lot of repeated computations, and it only gets worse when n gets bigger and bigger (say, 1000). fibonacci is not a list, it just returns lists, so it isn't going to magically memoize the results of the previous computations.
Additionally, by using last, you have to navigate through the list to get its last element, which adds on top of the problems with this recursive definition (remember, lists in Haskell don't support constant time random access--they aren't dynamic arrays, they are linked lists).
One example of a recursive definition (from the links mentioned) that does keep down on the computations is this:
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Here, fibs is actually a list, and we can take advantage of Haskell's lazy evaluation to generate fibs and tail fibs as needed, while the previous computations are still stored inside of fibs. And to get the first five numbers, it's as simple as:
take 5 fibs -- [0,1,1,2,3]
(Optionally, you can replace the first 0 with a 1 if you want the sequence to start at 1).
All the ways to implement the fibonacci sequence in Haskell just follow the link
http://www.haskell.org/haskellwiki/The_Fibonacci_sequence
This implementation is inefficient because it makes three recursive calls. If we were to write a recurrence relation for computing fibonacci n to a normal form (note, pedantic readers: not whnf), it would look like:
T(1) = c
T(2) = c'
T(n) = T(n-1) + T(n-1) + T(n-2) + c''
(Here c, c', and c'' are some constants that we don't know.) Here's a recurrence which is smaller:
S(1) = min(c, c')
S(n) = 2 * S(n-1)
...but this recurrence has a nice easy closed form, namely S(n) = min(c, c') * 2^(n-1): it's exponential! Bad news.
I like the general idea of your implementation (that is, track the second-to-last and last terms of the sequence together), but you fell down by recursively calling fibonacci multiple times, when that's totally unnecessary. Here's a version that fixes that mistake:
fibonacci 1 = [1]
fibonacci 2 = [1,1]
fibonacci n = case fibonacci (n-1) of
all#(last:secondLast:_) -> (last + secondLast) : all
This version should be significantly faster. As an optimization, it produces the list in reverse order, but the most important optimization here was making only one recursive call, not building the list efficiently.
So even if you wouldn't know about the more efficient ways, how could you improve your solution?
First, looking at the signature it seems you don't want an infinite list, but a list of a given length. That's fine, the infinite stuff might be too crazy for you right now.
The second observation is that you need to access the end of the list quite often in your version, which is bad. So here is a trick which is often useful when working with lists: Write a version that work backwards:
fibRev 0 = []
fibRev 1 = [1]
fibRev 2 = [1,1]
fibRev n = let zs#(x:y:_) = fibRev (n-1) in (x+y) : zs
Here is how the last case works: We get the list which is one element shorter and call it zs. At the same time we match against the pattern (x:y:_) (this use of # is called an as-pattern). This gives us the first two elements of that list. To calculate the next value of the sequence, we have just to add these elements. We just put the sum (x+y) in front of the list zs we already got.
Now we have the fibonacci list, but it is backwards. No problem, just use reverse:
fibonacci :: Int -> [Int]
fibonacci n = reverse (fibRev n)
The reverse function isn't that expensive, and we call it here only one time.