Haskell function taking a long time to process - haskell

I am doing question 12 of project euler where I must find the first triangle number with 501 divisors. So I whipped up this with Haskell:
divS n = [ x | x <- [1..(n)], n `rem` x == 0 ]
tri n = (n* (n+1)) `div` 2
divL n = length (divS (tri n))
answer = [ x | x <- [100..] , 501 == (divL x)]
The first function finds the divisors of a number.
The second function calculates the nth triangle number
The 3rd function finds the length of the list that are the divisors of the triangle number
The 4th function should return the value of the triangle number which has 501 divisors.
But so far this run for a while without returning a result. Is the answer very large or do I need some serious optimisation to make this work in a realistic amount of time?

You need to use properties of divisor function: http://en.wikipedia.org/wiki/Divisor_function
Notice that n and n + 1 are always coprime, so that you can get d(n * (n + 1) / 2) by multiplying previously computed values.

It is probably faster to prime-factorise the number and then use the factorisation to find the divisors, than using trial division with all numbers <= sqrt(n).
The Sieve of Eratosthenes is a classical way of finding primes, which may be modified slightly to find the number of divisors of each natural number. Instead of just marking each non-prime as "not prime", you could make a list of all the primes dividing each number.
You can then use those primes to compute the complete set of divisors, or just the number of them, since that is all you need.
Another variation would be to mark not just multiples of primes, but multiples of all natural numbers. Then you could simply use a counter to keep track of the number of divisors for each number.
You also might want to check out The Genuine Sieve of Eratosthenes, which explains why
trial division is way slower than the real sieve.
Last off, you should look carefully at the different kinds of arrays in Haskell. I think it is probably easier to use the ST monad to implement the sieve, but it might be possible to achieve the correct complexity using accumArray, if you can make sure that your update function is strict. I have never managed to get this to work though, so you are on your own here.

If you were using C instead of Haskell, your function would still take much time.
To make it faster you will need to improve the algorithm, using suggestions from the above answers. I suggest to change the title and question description accordingly. Following that I'll delete this comment.
If you wish, I can spoil the problem by sharing my solution.
For now I'll give you my top-level code:
main =
print .
head . filter ((> 500) . length . divisors) .
map (figureNum 3) $ [1..]
The algorithmic improvement lies in the divisors function. You can further improve it using rawicki's suggestion, but already this takes less than 100ms.

Some optimization tips:
check for divisors between 1 and sqrt(n). I promise you won't find any above that limit (except for the number itself).
don't build a list of divisors and count the list, but count them directly.

Related

How to determine the smallest common divisor of a string?

I was asked the following question during a job interview and was stumped by it.
Part of the problem I had is making up my mind about what problem I was solving. At first I didn't think the question was internally consistent but then I realized it is asking you to solve two different things - the first task is to figure out whether one string contains a multiple of another string. But the second task is to find a smaller unit of division within both strings.
It's a bit more clear to me now with the pressure of the interview room behind me but I'm still not sure what the ideal algorithm would be here. Any suggestions?
Given two strings s & t, determine if s is divisible by t.
For example: "abab" is divisible by "ab"
But "ababab" is not divisible by "abab".
If it isn't divisible, return -1.
If it is, return the length of the smallest common divisor:
So, for "abababab" and "abab", return 2 as s is divisible
by t and the smallest common divisor is "ab" with length 2.
Oddly, you're asked to return -1 unless s is divisible by t (which is easy to check), and then you're only left with cases where t divides s.
If t divides s, then the smallest common divisor is just the smallest divisor of t.
The simplest way to find the smallest divisor of t is to check all the factors of its length to see if the prefix of that length divides t.
You can do it in linear time by building the Knuth-Morris-Pratt search table for t: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
This will tell you all the suffixes of t that are also prefixes of t. If the length of the remainder divides the length of t, then the remainder divides t.
let n is the length of the string s and m is the length of string t, then first we find the gcd(greatest common divisor) of n & m(the largest length that divides both n & m), now we find the all the divisors of gcd in O(square root of gcd) then, we start checking each of them in increasing order whether the starting substring of s or t of length l(divisors of gcd) exist n/l && (m/l) times(using kmp algorithm or robin karp hashing method or rolling hash), if yes, then we break and return length l otherwise we keep checking it until we run out of the divisors and return -1 if nothing is found.

What is the time complexity of this agorithm (that solves leetcode question 650) (question 2)?

Hello I have been working on https://leetcode.com/problems/2-keys-keyboard/ and came upon this dynamic programming question.
You start with an 'A' on a blank page and you get a number n when you are done you should have n times 'A' on the page. The catch is you are allowed only 2 operations copy (and you can only copy the total amount of A's currently on the page) and paste --> find the minimum number of operations to get n 'A' on the page.
I solved this problem but then found a better solution in the discussion section of leetcode --> and I can't figure out it's time complexity.
def minSteps(self, n):
factors = 0
i=2
while i <= n:
while n % i == 0:
factors += i
n /= i
i+=1
return factors
The way this works is i is never gonna be bigger than the biggest prime factor p of n so the outer loop is O(p) and the inner while loop is basically O(logn) since we are dividing n /= i at each iteration.
But the way I look at it we are doing O(logn) divisions in total for the inner loop while the outer loop is O(p) so using aggregate analysis this function is basically O(max(p, logn)) is this correct ?
Any help is welcome.
Your reasoning is correct: O(max(p, logn)) gives the time complexity, assuming that arithmetic operations take constant time. This assumption is not true for arbitrary large n, that would not fit in the machine's fixed-size number storage, and where you would need Big-Integer operations that have non-constant time complexity. But I will ignore that.
It is still odd to express the complexity in terms of p when that is not the input (but derived from it). Your input is only n, so it makes sense to express the complexity in terms of n alone.
Worst Case
Clearly, when n is prime, the algorithm is O(n) -- the inner loop never iterates.
For a prime n, the algorithm will take more time than for n+1, as even the smallest factor of n+1 (i.e. 2), will halve the number of iterations of the outer loop, and yet only add 1 block of constant work in the inner loop.
So O(n) is the worst case.
Average Case
For the average case, we note that the division of n happens just as many times as n has prime factors (counting duplicates). For example, for n = 12, we have 3 divisions, as n = 2·2·3
The average number of prime factors for 1 < n < x approaches loglogn + B, where B is some constant. So we could say the average time complexity for the total execution of the inner loop is O(loglogn).
We need to add to that the execution of the outer loop. This corresponds to the average greatest prime factor. For 1 < n < x this average approaches C.n/logn, and so we have:
O(n/logn + loglogn)
Now n/logn is the more important term here, so this simplifies to:
O(n/logn)

Add numbers to the beginning of lists

I have a list of list, say X, that looks like this
X_train = [[4,3,1,5], [3,1,6,2], [5,0,49,4], ... , [3,57,3,3]]
I wrote this piece of code
for x in range(0,len(X_train)):
X_train[x].insert(0, x+1)
For each list in X this code inserts the index value of the list + 1 to the beginning of the list. That is, running
for x in range(0,len(X_train)):
X_train[x].insert(0, x+1)
print(X)
will produce the following output
[[1,4,1,5],[2,3,1,6,2],[3,5,0,49,4],...,[n,3,57,3,3]]
where n is the number of lists in X.
Question: Is there a faster way to do this? I would like to be able to do this for very large lists, e.g. list with millions of sublists (if that's possible).
This is faster in my testing:
X = [[n, *l] for n, l in enumerate(X, 1)]
To my knowledge, the standard insert method in Python has a time complexity of O(n). Given your current implementation, your algo would have a time complexity of O(m x n) where m is the number of sublists and n is the number of elements in the sublists (I assume here that the number of sublist elements is always the same).
You could use blist instead of the standard lists which has a time complexity of O(log n) for insertions. This means the total time reduces to O(m x log n). It's not that much of an improvement, though.

Generating triangular number using iteration in haskell

I am trying to write a function in Haskell to generate triangular number, I am not allowed to use recursion, I am supposed to use iteration
here is my code ...
triSeries 0 = [0]
triSeries n = take n $iterate (\x->(0+x)) 1
I know that my function after iterate is wrong .
But It has been hours looking for a function, any hint please?
Start by writing out some triangular numbers
T(1) = 1
T(2) = 1 + 2
T(3) = 1 + 2 + 3
An iterative process to generate T(n) is to start from [1..n], take the first element of the list, and add it to a running total. In a language with mutable state, you might write:
def tri(n):
sum = 0
for x in [1..n]:
sum += x
return sum
In Haskell, you can iteratively consume a list of numbers and accumulate state via a fold function (foldl, foldr, or some variant). Hopefully that's enough to get started with.
Maybe wikipedia could be a hint, where something like
triangular :: Int -> Int
triangular x = x * (x + 1) `div` 2
could be got from.
triSeries could be something like
triSeries :: Int -> [Int]
triSeries x = map triangular [1..x]
and works like that
> triSeries 10
[1,3,6,10,15,21,28,36,45,55]
Talking about iterate. Maybe there is some way to use it here, but as John said, foldl would be sufficient. Take a look at this page, what are you looking is in the very beginning.
It is not clear what is meant by "recursion is not allowed, use iteration". All functions that appear to be "iterative" are recursive inside.
iterate in all your uses can only modify the input with a constant, and iterate (+1) 1 is the same as [1..]. Consider using a Data.List function that can combine a number from infinite range [1..] and the previously computed sum to produce a infinite list of such sums:
T_i=i+T_{i-1}
This is definitely cheaper than x*(x+1) div 2
Consider using a Data.List function that can produce an infinite list of finite lists of sums from a infinite list of sums. This is going to be cheaper than computing a list of 10, then a list of 11 repeating the same computation done for the list of 10, etc.

Induction proof of correctness of fibonacci function

Haskell implementation of the familiar Fibonacci function
fibSlow n
| n == 0 = 1 --fib.1
| n == 1 = 1 --fib.2
| otherwise = fibSlow(n-1) + fibSlow(n-2) --fib.3
What is the induction proof of correctness for fibSlow?
To prove correctness of a function on the natural numbers by induction, you would show that it's correct for certain base cases, and then that it's correct for higher values of the parameter given the assumption that it's correct for lower ones. So you'd verify first that fibSlow 0 = 1, and then that fibSlow 1 = 1, and then that for n > 1, fibSlow n is equal to the (n-1)th fibonacci number plus the (n-2)th fibonacci number. Here you get to assume that those numbers are fibSlow (n-1) and fibSlow (n-2), since fibSlow is correct for all inputs less than n by the inductive hypothesis.
This might seem all rather trivial... because it is! The whole point of such an example in Haskell is that you can write code that's obviously correct. When you go to prove it correct, the proof just writes itself and amounts to looking at the code and noting that it clearly says exactly what you're trying to prove. This is one of the nice properties of a declarative language like Haskell.
Apologies I haven't formally seen this kind of material for a while, so you're probably best looking at other sources if this is homework.
I think you want to show the existence of a monotone function which describes the "progress" of the recursion. This case should be pretty simple: the argument itself is monotonically decreasing. For a nonnegative n, the recursive call will be made with a lesser n', and that n' will never be less than zero.
You can also use power induction to argue the function is defined on all n. You have declared it defined on 0 and 1, and it suffices to say that if it's defined on n and n+1, then it's defined on n+2. This is obvious by the definition of the recursive call.
I think you might be able to read up on some formalities in Jech's Set Theory book, in the Ordinals chapter.

Resources