list is not being closed after finishing computation - haskell

For practice I wrote a Haskell program to find prime factors.
The code is the following:
getfactors :: Int -> [Int]
getfactors n = [x | x<-[1..n], n `mod` x == 0]
prime :: Int -> Bool
prime n | getfactors n == [1,n] = True
| otherwise = False
primefactors :: Int -> [Int]
primefactors n = [x | x <- getfactors n, prime x == True]
Everything works fine for small numbers, but when I enter large numbers the computation stops at the biggest prime factor and the expected list does not close.
For example:
>primefactors 1263
[3,421]
>primefactors 1387781234
[2,7,2161,6553
An explanation is very much appreciated.

Could not reproduce:
> :set +s
> primefactors 1387781234
[2,7,2161,6553]
(368.04 secs, 288,660,869,072 bytes)
Your algorithm is just very slow. There are lots of ways to improve it:
You are checking primality by trial division of all numbers smaller than the candidate. You can improve (without changing algorithm) by checking only up to the square root of the candidate.
Besides trial division, there are a wide range of other primality checking algorithms running the full spectrum from "simple but slow" to "complicated as heck but blazing fast". Other Internet sources will have plenty of details.
If you want to factor many numbers, it may be beneficial to memoize your primality checks between calls -- e.g. by storing a list of primes and iterating over them instead of iterating over all numbers. Since this is the only consumer of your primality check, you may want to consider creating this list directly rather than implementing a primality-checking algorithm first; again there's a wide range of algorithms for this running the spectrum from simple to fast.
Once you find a factor, you can divide the number you are factoring by that to get a smaller, faster number to compute the remaining factors with.
There are probably other easy opportunities to speed things up, and lots of prior work to read about online. Enjoy!

Related

How to make this Haskell program run faster

So I've been trying to learn Haskell by solving some problems on Codeforce.
And I am getting a lot of TLE (Time Limit Exceed) even though I think my time complexity is optimal.
My question is: is the way I wrote this program that makes it slow?
For example, here is the problem.
Basically the answer is to find an for a given n , where
an = 2*an-1 + D(n) and D(n) = the difference of the number of divisors between n and n-1.
(update: the top limit for n is 106).
Below is my program.
import qualified Data.Map.Strict as Map
main = do t <- read <$> getLine
putStrLn . show $ solve t
solve :: Integer -> Integer
solve 0 = 1
solve 1 = 1
solve n = (2*(solve (n-1)) + (fact n) - (fact (n-1))) `mod` 998244353
where fact n = foldl (\s -> \t -> s*(snd t + 1)) 1 (Map.toList . factorization $ n)
--the number of divisors of a number
--copied from Internet,infinite prime list
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h,~(_:t)) = span (< p*p) xs
--make factorization of a number
factorization :: Integer -> Map.Map Integer Integer
factorization 1 = Map.fromList []
factorization x = Map.insertWith (+) factor 1 (factorization (x `div` factor))
where factor = head $ filter (\s -> (x `mod` s) == 0) ls
ls = primes
This program failed to solve in the time limit.
So could anyone point me out where did I do wrong and how to fix it?
Or it just impossible to solve this problem using Haskell in time limit?
There are many ways in which your time complexity is not optimal. The most obvious one is a prime finder using trial division instead of, e.g., a sieve. Maybe it's fine because you only compute the primes once, but it does not inspire confidence.
factorization also has at least one glaring problem. Consider factoring a number like 78893012641, whose prime factorization is 280879^2. You will search each prime number up to 280879: expensive, but pretty much unavoidable. However, at this point you divide by 280879 and then try to factorize 280879, starting from 2 and scanning all the small primes again even though you just found out none of them are a factor!
As Li-yao Xia says in a comment, I would also be suspicious of the multiplication of very large Integers before taking their modulus, instead of taking a modulus after each multiplication.
You haven't copied the right piece of code from the "Internet". You should've instead copied primesTMWE for the primes list, but more importantly, primeFactors for the factorization algorithm.
Your foldl based calculation of the number of divisors from a number's factorization is perfectly fine, except perhaps foldl' should be used instead.
Notice that both solve n and solve (n-1) calculate fact (n-1), so better precalculate all of them..... perhaps a better algorithm exists to find the numbers of divisors for all numbers from 1 to n than calculating it for each number separately.
I suspect even with the right algorithms (which I link above) it's going to be tough, time-wise, if you're going to factorize each number independently (O(n) numbers, O(n1/2)) time to factorize each... each prime, at least).
Perhaps the thing to try here is the smallest-factor sieve which can be built in O(n log log n) time as usual with the sieve of Eratosthenes, and once it's built it lets you find the factorization of each number in O(log log n) time (it's the average number of prime factors for a number). It will have to be built up to n though (you can special-case the evens to halve the space requirements of course; or 6-coprimes to save another 1/6th). Probably as an STUArray (that link is an example; better codes can be found here on SO).
The smallest-factor sieve is just like the sieve of Eratosthenes, except it uses the smallest factor, not just a Boolean, as a mark.
To find a number's factorization then we just repeatedly delete by a number's smallest factor, n / sf(n) =: n1, repeating for n1 / sf(n1) =: n2, then n2, etc. until we hit a prime (which is any number which has itself as the smallest factor).
Since you only use those factors to calculate the number's total number of divisors, you can fuse the two calculations together into one joined loop, for extra efficiency.

Sieve of Euler space complexity in Haskell

I was given 2 different algorithms written in Haskell aimed to generate the first k primes. As the title suggests, they are the Sieve of Eratoshenes and Euler.
I am trying to understand why Euler implementation uses so much memory. What I thought so far is that the number of streams generated explodes quickly saturating the memory but from what I have understood they should be equal to 2k - 1 where k is the index of the prime we want.
So there should be other major problems.
I tried running the code via GHC's Debugger and it shows that much of the memory allocated is from the minus function so I think streams are definitely part of the problem.
Here minus removes from tcs every multiple of p starting from p*p which is coprime to every prime found so far (except p) and finally generates a new stream passed to the next recursive call of eulerSieve.
minus :: [Int] -> [Int] -> [Int]
minus xs#(x:txs) ys#(y:tys)
| x < y = x : minus txs ys
| otherwise = minus txs tys
eulerSieve :: [Int] -> [Int]
eulerSieve cs#(p:tcs) = p:eulerSieve (minus tcs (map (p*) cs))
Reading here I found that the space is bounded by O(k^2) but it doesn't explain why accurately.
Any suggestion would be appreciated.

Using non-deterministic list monad to find long Collatz sequences

I wrote the following code to solve Project Euler's No. 14:
The following iterative (Collatz) sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Q: Which starting number, under one million, produces the longest chain?
And my code:
collatz :: Integer -> [Integer]
collatz 1 = [1]
collatz n =
filter (< 1000000) prev >>= poss
where prev = collatz (n - 1)
poss :: Integer -> [Integer]
poss prev
| even prev && prev `mod` 3 == 1 && (prev - 1) `div` 3 > 1 = [2 * prev, (prev - 1) `div` 3]
| otherwise = [2 * prev]
Where collatz n returns a list of numbers that will generate a Collatz chain of length n. The problem is, I can only either not restrict the result or restrict the whole chain, instead of only the seed number, to be under 1000,000. Is it possible to use this model to solve the problem at all?
I think that this approach - while interesting - is fundamentally doomed. Suppose I discover that all the seeds which result in a chain of length 500 are above 2,000,000. How can I know that I won't find that in three more steps there's a seed under 1,000,000 that gets me there? I see no way to know when you're done.
The only viable approach I see to this problem is to compute the collatz length for every number from 1 to 999,999 and then do something like:
main :: IO ()
main = do
let collatzMax = maximumBy (compare `on` collatzLength) [1..999999]
print collatzMax
On the other hand, this provides a great opportunity to learn about CAFs since the function collatzLength could be naively defined as:
collatzLength 1 = 1
collatzLength n | n `mod` 2 == 0 = 1 + collatzLength (n `div` 2)
collatzLength n = 1 + collatzLength (3 * n + 1)
And that kind of recursion screams out for a CAF.
Sure, there are memoization modules that will go and build the CAF for you, but building one yourself is a useful exercise. It's a whole little mini-course in lazy infinitely-recursive data structures.
If that defeats you, you can glance at this spoiler of how to use a CAF and then rewrite it using a different data structure. (what about a 10-way tree instead of a binary tree? What about traversing the tree in a different order? Can you remove the call to showIntAtBase?)
Your idea is interesting, although not the most efficient one. It could be worth trying, although it'll be probably memory intensive. Some thoughts:
As some chains can go over 1000000, so you can't just filter out everything less in collatz. You need to keep all the numbers in each pass.
Calling collatz this way is inefficient, as it computes the sets all over again. Making it an infinite list that shares values would be more efficient:
collatz :: [[Integer]]
collatz = [1] : map (>>= poss) collatz
You need to figure out when you're done. For this you'd need to go through the number lists generated by collatz and count how many of them are below 1000000. When you have seen all the numbers below the limit, the last list will contain the numbers with the longest chain.
That said, I'm afraid this approach isn't computationally feasible. In particular, you'll generate exponentially many numbers and exponentially large ones. For example, if the longest chain would be 500, the result of collatz in that step would contain numbers up to 2^500. And as mentioned, there is no way to tell which of these huge numbers might be the one leading to the solution, so you can't just discard them.

Prime Factoring Function in Haskell

I am trying to make a function that will display a number's prime factors with a list (infinite) that I give it. Here is what I have so far:
-- Here is a much more efficient (but harder to understand) version of primes.
-- Try "take 100 primes" as an example (or even more if you like)
primes = 2 : primesFrom3 where
primesFrom3 = sieve [3,5..] 9 primesFrom3
sieve (x:xs) b ~ps#(p:q:_)
| x < b = x : sieve xs b ps
| otherwise = sieve [x | x <- xs, rem x p /= 0] (q^2) (tail ps)
-- Write a function that factors its first argument using the (infinite)
-- list of available factors given in its second argument
-- (using rem x p /= 0 to check divisibility)
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if (rem n p /= 0) then
(primeFactsWith n ps)
else (primeFactsWith p ps)
The top half was not written by me and works just fine. I am trying to get the second half to work, but it isn't. Read the comments in the code to better understand exactly what I am trying to do. Thanks! Oh and please don't just spout the answer. Give me some hints on how to do it and maybe what is wrong.
What's wrong
The problem is that you do a recursive call in both branches, therefore the function will never stop.
Some Hints
To build a recursive list-producing function, you'll need two branches or cases:
Base case no recursive call, this stops the recursion and returns the final part of the result.
Recursive case here you modify the parameters of the function and call it again with the modified parameters, possibly also returning a part of the result.
You need two sub branches at the recursive branch. One if you've found a prime factor, and another if the current number is no prime factor.
Here is a skeleton, you need to fill in the parts in the <> brackets.
primeFactsWith :: Integer -> [Integer] -> [Integer]
primeFactsWith n (p:ps) = if <halt condition> then
<final result>
else if (rem n p /= 0) then
<not a factor - recursive call 1>
else
<found a factor - return it,
and make recursive call 2>
If you have found a prime factor, you can divide the number by it, to get a smaller number, without that factor. To perform integer division Haskell provides a function named div.
If you reach the number 1, you have generated all prime factors and you can stop. The final part of a prime factors list, that comes after all its factors, is an empty list.
You can drop any prime from your infinite list if you no longer need it, but be aware that a number could contain a prime several times in the factors list. If you want to drop p you can just use ps, from the pattern; if you want to keep p you must use (p:ps).
The cons operator (:) can be used to build a list. You can use it to return one number of the result list, and use a recursive call to find the remaining numbers, e.g.
x : foo y z
I hope that helps, if you have any questions don't hesitate to ask.
Here's a hint.
So you're recursing, which is good.
In one branch you keep looking for factors of n. In the other branch you seem to look for the factors of p, which is a bit weird, but whatevs.
Where do you return the factors of n you've found?

Solving Google Code Jam's "Minimum Scalar Product" in Haskell

In preparation for the upcoming Google Code Jam, I've started working on some problems. Here's one of the practice problems I've tried:
http://code.google.com/codejam/contest/32016/dashboard#s=p0
And here's the gist of my current Haskell solution:
{-
- Problem URL: http://code.google.com/codejam/contest/32016/dashboard#s=p0
-
- solve takes as input a list of strings for a particular case
- and returns a string representation of its solution.
-}
import Data.List
solve :: [String] -> String
solve input = show $ minimum options
where (l1:l2:l3:_) = input
n = read l1 :: Int
v1 = parseVector l2 n
v2 = parseVector l3 n
pairs = [(a,b) | a <- permutations v1, b <- permutations v2]
options = map scalar pairs
parseVector :: String -> Int -> [Int]
parseVector line n = map read $ take n $ (words line) :: [Int]
scalar :: ([Int],[Int]) -> Int
scalar (v1,v2) = sum $ zipWith (*) v1 v2
This works with the sample input provided, but dies on the small input to an out of memory error. Increasing the max heap size appears to do nothing but let it run indefinitely.
My main question is how to go about optimizing this so that it will actually return a solution, other than passing in the -O flag to ghc, which I've already done.
Improving your algorithm is the most important thing. Currently, you permute both lists, giving a total of (n!)^2 options to check. You need only permute one of the lists. That should not only reduce the time complexity but also the space usage drastically.
And make sure the minimum gets replaced by the strict minimum (as it should, with optimisations, since you're taking the minimum of a list of Ints, compile with -ddump-rule-firings and check for "minimumInt"). If it isn't, use foldl1' min instead of minimum.
I just checked:
Large dataset
T = 10
100 ≤ n ≤ 800
-100000 ≤ xi, yi ≤ 100000
For that, you need a dramatically better algorithm. 100! is about 9.3 * 10157, the universe will long have ceased to exist before you have checked a measurable portion of all the permutations of 100 elements. You must construct the solving permutations by looking at the data. To find out what the solutions would look like, investigate some small inputs and what permutations realise the minimum scalar product for those (having a look at the Cauchy-Bun'akovskiy-Schwarz inequality would also not hurt).
The solution I came up with:
{-
- Problem URL: http://code.google.com/codejam/contest/32016/dashboard#s=p0
-
- solve takes as input a list of strings for a particular case
- and returns a string representation of its solution.
-}
import Data.List
solve :: [String] -> String
solve input = show result
where (l1:l2:l3:_) = input
n = read l1 :: Int
v1 = parseVector l2 n
v2 = parseVector l3 n
result = scalar (sort v1) (reverse $ sort v2)
parseVector :: String -> Int -> [Integer]
parseVector line n = map read $ take n $ (words line) :: [Integer]
scalar :: [Integer] -> [Integer] -> Integer
scalar v1 v2 = sum $ zipWith (*) v1 v2 :: Integer
This seems to give the right answer and is orders of magnitude faster. The trick I guess is to not take the word "permutation" at face value when reading problems like this.
according to me(my solution work), you really don't need to permute,
you really need to sort both list,then you require queue and stack,
so how it work
ex:-
input
1 3 -5
-2 1 4
sort
-5 1 3
-2 1 4
then -2*3+4*-5+1*1=-6
ex2:-
so you see
-ve number multiply with most +ve or least -ve
+ve number multiply with most -ve or least +ve
My main question is how to go about optimizing this so that it will actually return a solution, other than passing in the -O flag to ghc, which I've already done.
I think you should reconsider your approach. I think this practice problem is not so much about coding skills but more about problem solving. Trying out all possible permutations is the brute force way and I think you have a good feeling now why this brute force strategy does not work in this case ;). It really can be done in 3 lines of code when you use stl algorithms (not including in- and output of course).
Actually Madan Ram already gave some kind of solution. However, in case you will ever be given such a problem in an interview, you will also have to know why it works for all cases (not just for a number of examples). So my advice is to take a pen and a piece of paper and do some examples by hand, then you will find out how to do it.
SPOILER ALERT
Its hard to give a hint without spoiling to much, but as the solution has been posted already... Try to start simple. Which element in the first vector has the biggest contribution to the scalar product? To which element from the second vector do you have to mulitply this one to get in total the smallest contribution to the scalar product?

Resources