Say we have an array with some numbers in it, and we are given a value d (where d >= 1) that dictates a certain required index. How would we find the largest sum of values bounded by d. Here is an example:
arr = [3000, 4000, 2000, 2500, 3500, 5000]
d = 3
Would return 4000 + 5000 = 9000 (since there is at least a distance of 3 between these two numbers). But in this case:
arr = [3000, 4000, 2000, 2500, 3500, 5000]
d = 2
Would return 4000 + 2500 + 5000 = 11500.
edit: More explanation - We need to find the largest possible sum in an array. The only trick is that we cannot include numbers that are less than d index away. In the second example we could easily just sum all the numbers to get the largest value, but since we are bounded by d = 2, we have to pick a combination where numbers are not < distance 2 away. Another example might include 3000 + 2500 + 5000 = 10500. If you look at my second solution 11500 > 10500, therefore this is not the optimal answer
This problem can be efficiently solved using a dynamic programming approach.
Let A be the input array and d be the gap size. Then for you can construct an array B such that B[i] is the maximum sum for the first i+1 elements of A. You can compute the elements of B by a simple recurrence, and the last element contains the solution:
def solve(A, d):
n = len(A)
B = [0] * n
B[0] = A[0]
for i in range(1, d):
B[i] = max(A[i], B[i-1])
for i in range(d, n):
B[i] = max(A[i] + B[i-d], B[i-1])
return B[-1]
You can do it in a single pass by iteratively computing the best sum for each prefix.
Once you have the best sum for each prefix up to but not including i, for i the best sum will either not include element i (and thus be equal to prefix_sums[i - 1]), or include element i, and thus be equal to i + prefix_sums[i - d] (get element i, and the best sum of elements that are at least d away from i-th element).
a = [3000, 4000, 2000, 2500, 3500, 5000]
d = 2
prefix_sums = [a[0]]
for x in a[1:]:
cur = x if len(prefix_sums) < d else x + prefix_sums[-d]
cur = max(cur, prefix_sums[-1])
prefix_sums.append(cur)
print(max(prefix_sums))
Related
Definition of H Index used in this algorithm
Supposing a relational expression is represented as y = F(x1, x2, . . . , xn), where F returns an integer number greater than 0, and the function is to find a maximum value y satisfying the condition that there exist at least y elements whose values are not less than y. Hence, the H-index of any node i is defined as
H(i) = F(kj1 ,kj2 ,...,k jki)
where kj1, kj2, . . . , kjki represent the set of degrees of neighboring nodes of node i.
Now I want to find the H Index of the nodes of the following graphs using the algorithm given below :
Graph :
Code (Written in Python and NetworkX) :
def hindex(g, n):
nd = {}
h = 0
# print(len(list(g.neighbors(n))))
for v in g.neighbors(n):
#nd[v] = len(list(g.neighbors(v)))
nd[v] = g.degree(v)
snd = sorted(nd.values(), reverse=True)
for i in range(0,len(snd)):
h = i
if snd[i] < i:
break
#print("H index of " + str(n)+ " : " + str(h))
return h
Problem :
This algorithm is returning the wrong values of nodes 1, 5, 8 and 9
Actual Values :
Node 1 - 6 : H Index = 2
Node 7 - 9 : H Index = 1
But for Node 1 and 5 I am getting 1, and for Node 8 and 9 I am getting 0.
Any leads on where I am going wrong will be highly appreciated!
Try this:
def hindex(g, n):
sorted_neighbor_degrees = sorted((g.degree(v) for v in g.neighbors(n)), reverse=True)
h = 0
for i in range(1, len(sorted_neighbor_degrees)+1):
if sorted_neighbor_degrees[i-1] < i:
break
h = i
return h
There's no need for a nested loop; just make a decreasing list, and calculate the h-index like normal.
The reason for 'i - 1' is just that our arrays are 0-indexed, while h-index is based on rankings (i.e. the k largest values) which are 1-indexed.
From the definition of h-index: For a non-increasing function f, h(f) is max i >= 0 such that f(i) >= i. This is, equivalently, the min i >= 1 such that f(i) < i, minus 1. Here, f(i) is equal to sorted_neighbor_degrees[i - 1]. There are of course many other ways (with different time and space requirements) to calculate h.
Given a set of non-negative distinct integers, and a value m, determine if there is a subset of the given set with sum divisible by m.
The solution on geeksforgeeks states that-
If n > m there will always be a subset with sum divisible by m (which is easy to prove with pigeonhole principle). So we need to handle only cases of n <= m.
Can somebody please explain what this case means and what is its relation to pigeonhole principle? Also how is this case different from n <= m?
Making this a bit more verbose of this:
Label the numbers a1, a2, ... an in any order. Now consider the sums:
b1=a1
b2=a1+a2
b3=a1+a2+a3
...
bn=a1+a2+...+an
These are either all unique numbers or one of the as are 0 (which is divisible by m).
Now if any of the bs are divisible by m we are done.
Otherwise:
The remainders of some non-divisible number/m can be in the range of 1...(m-1). So there are m-1 numbers of possible remainders`.
Since numbers b1...bn weren't divisible by m they must have remainders in the range of 1...(m-1). So you must pair n numbers of bs (pigeons) with m-1 remainders (pigeonholes).
We have more pigeons than pigeonholes => there must be at least two pigeons in the same pigeonhole.
That means: there must be at least two bs with the same remainder: call them bi, bj (i<j). Since all of our bs are unique and bi % m == bj % m (the remainders of bi/m and bj/m are the same) => bi - bj = x * m (where x is a positive integer). Therefore bi - bj is divisible by m and bi - bj = ai+1 + ... + aj. Therefore ai+1 + ... + aj is divisible by m which is exactly what we wanted to proof.
Let us create a new set of numbers (i.e. a[ ] array) by doing prefix sum of given values (i.e. value[ ] array).
a[0] = value[0]
a[1] = value[0] + value[1]
a[n] = value[0] + value[1] + .... + value[n]
Now we have n new numbers. If any of them are divisible by m we are done.
If we divide the a[ ] array elements with m, we can get remainders in range of [1, m - 1].
But, we have a total of n values.
So, there exist two numbers 0<=i,j<=n in a such that a[i] mod(m) == a[j] mod(m).
Due to the above statement, we can say that a[i] - a[j] is divisible by m.
Now, let's consider i > j.
We also know that, a[i] = value[i] + value[i - 1] + ... + value[0] and a[j] = value[j] + value[j - 1] + ... + value[0].
So, a[i] - a[j] = value[i] + value[i - 1] + ... + value[i - j + 1] is also divisible by m.
I'm having trouble getting my code to run quickly for Project Euler Problem 23. The problem is pasted below:
A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.
A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.
As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.
Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.
And my code:
import math
import bisect
numbers = list(range(1, 20162))
tot = set()
numberabundance = []
abundant = []
for n in numbers:
m = 2
divisorsum = 1
while m <= math.sqrt(n):
if n % m == 0:
divisorsum += m + (n / m)
m += 1
if math.sqrt(n) % 1 == 0:
divisorsum -= math.sqrt(n)
if divisorsum > n:
numberabundance.append(1)
else:
numberabundance.append(0)
temp = 1
# print(numberabundance)
for each in numberabundance:
if each == 1:
abundant.append(temp)
temp += 1
abundant_set = set(abundant)
print(abundant_set)
for i in range(12, 20162):
for k in abundant:
if i - k in abundant_set:
tot.add(i)
break
elif i - k < i / 2:
break
print(sum(numbers.difference(tot)))
I know the issue lies in the for loop at the bottom but I'm not quire sure how to fix it. I've tried modeling it after some of the other answers I've seen here but none of them seem to work. Any suggestions? Thanks.
Your upper bound is incorrect - the question states all integers greater than 28123 can be written ..., not 20162
After changing the bound, generation of abundant is correct, although you could do this generation in a single pass by directly adding to a set abundant, instead of creating the bitmask array numberabundance.
The final loop is also incorrect - as per the question, you must
Find the sum of all the positive integers
whereas your code
for i in range(12, 20162):
will skip numbers below 12 and also doesn't include the correct upper bound.
I'm a bit puzzled about your choice of
elif i - k < i / 2:
Since the abundants are already sorted, I would just check if the inner loop had passed the midpoint of the outer loop:
if k > i / 2:
Also, since we just need the sum of these numbers, I would just keep a running total, instead of having to do a final sum on a collection.
So here's the result after making the above changes:
import math
import bisect
numbers = list(range(1, 28123))
abundant = set()
for n in numbers:
m = 2
divisorsum = 1
while m <= math.sqrt(n):
if n % m == 0:
divisorsum += m + (n / m)
m += 1
if math.sqrt(n) % 1 == 0:
divisorsum -= math.sqrt(n)
if divisorsum > n:
abundant.add(n)
#print(sorted(abundant))
nonabundantsum = 0
for i in numbers:
issumoftwoabundants = False
for k in abundant:
if k > i / 2:
break
if i - k in abundant:
issumoftwoabundants = True
break
if not issumoftwoabundants:
nonabundantsum += i
print(nonabundantsum)
Example here
. Is there any Direct formula or System to find out the Numbers of Zero's between a Distinct Range ... Let two Integer M & N are given . if I have to find out the total number of zero's between this Range then what should I have to do ?
Let M = 1234567890 & N = 2345678901
And answer is : 987654304
Thanks in advance .
Reexamining the Problem
Here is a simple solution in Ruby, which inspects each integer from the interval [m,n], determines the string of its digits in the standard base 10 positional system, and counts the occuring 0 digits:
def brute_force(m, n)
if m > n
return 0
end
z = 0
m.upto(n) do |k|
z += k.to_s.count('0')
end
z
end
If you run it in an interactive Ruby shell you will get
irb> brute_force(1,100)
=> 11
which is fine. However using the interval bounds from the example in the question
m = 1234567890
n = 2345678901
you will recognize that this will take considerable time. On my machine it does need more than a couple of seconds, I had to cancel it so far.
So the real question is not only to come up with the correct zero counts but to do it faster than the above brute force solution.
Complexity: Running Time
The brute force solution needs to perform n-m+1 times searching the base 10 string for the number k, which is of length floor(log_10(k))+1, so it will not use more than
O(n (log(n)+1))
string digit accesses. The slow example had an n of roughly n = 10^9.
Reducing Complexity
Yiming Rong's answer is a first attempt to reduce the complexity of the problem.
If the function for calculating the number of zeros regarding the interval [m,n] is F(m,n), then it has the property
F(m,n) = F(1,n) - F(1,m-1)
so that it suffices to look for a most likely simpler function G with the property
G(n) = F(1,n).
Divide and Conquer
Coming up with a closed formula for the function G is not that easy. E.g.
the interval [1,1000] contains 192 zeros, but the interval [1001,2000] contains 300 zeros, because a case like k = 99 in the first interval would correspond to k = 1099 in the second interval, which yields another zero digit to count. k=7 would show up as 1007, yielding two more zeros.
What one can try is to express the solution for some problem instance in terms of solutions to simpler problem instances. This strategy is called divide and conquer in computer science. It works if at some complexity level it is possible to solve the problem instance and if one can deduce the solution of a more complex problem from the solutions of the simpler ones. This naturally leads to a recursive formulation.
E.g. we can formulate a solution for a restricted version of G, which is only working for some of the arguments. We call it g and it is defined for 9, 99, 999, etc. and will be equal to G for these arguments.
It can be calculated using this recursive function:
# zeros for 1..n, where n = (10^k)-1: 0, 9, 99, 999, ..
def g(n)
if n <= 9
return 0
end
n2 = (n - 9) / 10
return 10 * g(n2) + n2
end
Note that this function is much faster than the brute force method: To count the zeros in the interval [1, 10^9-1], which is comparable to the m from the question, it just needs 9 calls, its complexity is
O(log(n))
Again note that this g is not defined for arbitrary n, only for n = (10^k)-1.
Derivation of g
It starts with finding the recursive definition of the function h(n),
which counts zeros in the numbers from 1 to n = (10^k) - 1, if the decimal representation has leading zeros.
Example: h(999) counts the zero digits for the number representations:
001..009
010..099
100..999
The result would be h(999) = 297.
Using k = floor(log10(n+1)), k2 = k - 1, n2 = (10^k2) - 1 = (n-9)/10 the function h turns out to be
h(n) = 9 [k2 + h(n2)] + h(n2) + n2 = 9 k2 + 10 h(n2) + n2
with the initial condition h(0) = 0. It allows to formulate g as
g(n) = 9 [k2 + h(n2)] + g(n2)
with the intital condition g(0) = 0.
From these two definitions we can define the difference d between h and g as well, again as a recursive function:
d(n) = h(n) - g(n) = h(n2) - g(n2) + n2 = d(n2) + n2
with the initial condition d(0) = 0. Trying some examples leads to a geometric series, e.g. d(9999) = d(999) + 999 = d(99) + 99 + 999 = d(9) + 9 + 99 + 999 = 0 + 9 + 99 + 999 = (10^0)-1 + (10^1)-1 + (10^2)-1 + (10^3)-1 = (10^4 - 1)/(10-1) - 4. This gives the closed form
d(n) = n/9 - k
This allows us to express g in terms of g only:
g(n) = 9 [k2 + h(n2)] + g(n2) = 9 [k2 + g(n2) + d(n2)] + g(n2) = 9 k2 + 9 d(n2) + 10 g(n2) = 9 k2 + n2 - 9 k2 + 10 g(n2) = 10 g(n2) + n2
Derivation of G
Using the above definitions and naming the k digits of the representation q_k, q_k2, .., q2, q1 we first extend h into H:
H(q_k q_k2..q_1) = q_k [k2 + h(n2)] + r (k2-kr) + H(q_kr..q_1) + n2
with initial condition H(q_1) = 0 for q_1 <= 9.
Note the additional definition r = q_kr..q_1. To understand why it is needed look at the example H(901), where the next level call to H is H(1), which means that the digit string length shrinks from k=3 to kr=1, needing an additional padding with r (k2-kr) zero digits.
Using this, we can extend g to G as well:
G(q_k q_k2..q_1) = (q_k-1) [k2 + h(n2)] + k2 + r (k2-kr) + H(q_kr..q_1) + g(n2)
with initial condition G(q_1) = 0 for q_1 <= 9.
Note: It is likely that one can simplify the above expressions like in case of g above. E.g. trying to express G just in terms of G and not using h and H. I might do this in the future. The above is already enough to implement a fast zero calculation.
Test Result
recursive(1234567890, 2345678901) =
987654304
expected:
987654304
success
See the source and log for details.
Update: I changed the source and log according to the more detailed problem description from that contest (allowing 0 as input, handling invalid inputs, 2nd larger example).
You can use a standard approach to find m = [1, M-1] and n = [1, N], then [M, N] = n - m.
Standard approaches are easily available: Counting zeroes.
I am trying to understand implementation of linear time suffix array creation algorithm by Karkkainen, P. Sanders. Details of algorithm can be found here.
I managed to understand overall concept but failing to match it with provided implementation and hence not able to grasp it clearly.
Here are initial code paths which are confusing me.
As per paper : n0, n1, n2 represent number of triplets starting at i mod 3 = (0,1,2)
As per code : n0 = (n + 2) / 3, n1 = (n + 1) / 3, n2 = n / 3; => How these initialisations has been derived?
As per paper : We need to create T` which is concatenation of triplets at i mod 3 != 0
As per code : n02 = n0 + n2; s12 = [n02] ==> How came n02? It should be n12 i.e n1 + n2.
As per code : for (int i = 0, j = 0; i < n + (n0 - n1); i++) fill s12 with triplets such that i%3 != 0; => Why for loop runs for n + (n0 - n1) times ? It should be simply n1 + n2. Should't be ?
I am not able to proceed because of these :( Please to help.
Consider the following example where the length of the input is n=13:
STA | CKO | WER | FLO | W
As per code : n0 = (n + 2) / 3, n1 = (n + 1) / 3, n2 = n / 3; => How these initialisations has been derived?
Note that the number of triplets i mod3 = 0 is n/3 if n mod3 = 0 and n/3+1 otherwise (if n mod3 = 1 or n mod3 = 2). In the current example n/3 = 4 but since the last triplet 'W' is not complete it is not counted in the integer division. A 'trick' to make this computation directly is to use (n+2)/3. Effectively, if n mod3 = 0 then the result of the integer divisions (n+2)/3 and n/3 will be the same. However, if n mod3 = 1 or 2 then the result of (n+2)/3 will be n/3+1. The same applies to n1 and n2.
As per code : n02 = n0 + n2; s12 = [n02] ==> How came n02? It should be n12 i.e n1 + n2.
As per code : for (int i = 0, j = 0; i < n + (n0 - n1); i++) fill s12 with triplets such that i%3 != 0; => Why for loop runs for n + (n0 - n1) times ? It should be simply n1 + n2. Should't be ?
Both questions have the same answer. In our example we'd have a B12 buffer like this:
B12 = B1 U B2 = {TA KO ER LO}
So you'd first sort the suffixes and end up with a suffix array of B12, which has 8 elements. To proceed to the merging step we first need to compute the suffix array of B0, which is obtained by sorting the tuples (B0(i),rank(i+1))... But this concrete case in which the last triplet has only one element (W) has a problem, because rank(i+1) is not defined for the last element of B0:
B0 = {0,3,6,9,12}
which sorted alphabetically results in
SA0 = {3, 9, 0, ?, ?}
Since the indices 6 and 12 contain a 'W', it is not enough to sort alphabetically, we need to check which goes first in the rank table, so let's check the rank of their suffixes.. oh, wait! rank(13) is not defined!
And that's why we add a dummy 0 to the last triplet of the input when the last triplet only contains one element (if n mod3 = 0). So then the size of B12 is n0+n2, no matter the size of n1, and one needs to add an extra element to B12 if B0 is larger than B1 (in which case n0-n1 = 1).
Hope it was clear.