Is my recurrence relation right for subset sum? - dynamic-programming

Is this recurrence relation correct for the subset sum problem?
Statement: Print Yes or No depending on whether there is a subset of the given array a[ ] which sums up to a given number n.
dp[i][j] = true, if 0 to j elements in array sum up to i and false otherwise.
dp[i][j] = min(dp[i-a[j]][j], dp[i][j-1])
Base case values :
dp[0][0] = true
dp[1...i][0] = false
Just trying to see if I have the recurrence relation right or not.Thanks for guiding.

You are almost correct ( not sure why you used min ). But let dp[i][j] store the answer of whether a subset of arr[0],arr[1],....arr[j] (here arr[] is the array of elements ) can sum upto i.
That is dp[i][j] is 1 if answer is yes and 0 if answer is no. Ignoring the base cases, the recurrence relation is dp[i][j]=(dp[i][j-1] | dp[i-arr[j]][j-1]). To get the exact code and base cases and implementation you can have a look here : http://www.geeksforgeeks.org/dynamic-programming-subset-sum-problem/.

Related

Finding the best condition to 2 datasets of numbers

lets say I have a 2 huge datasets of numbers like this:
wins = [0.3423, 0.6345, .... ]
loss = [0.123, 2.6345, .... ]
how do I find the best if condition to get the most wins out of that array:
if( value > n or othercondition):
pass
what is the best n and operator to get wins more then loss :) I hope I gave the right information.

Generating an array whose sum of squares of elements is equal to a given value

I have a task: given a value N. I should generate a list of length L > 1 such that the sum of the squares of its elements is equal to N.
I wrote a code:
deltas = np.zeros(L)
deltas[0] = (np.random.uniform(-N, N))
i = 1
while i < L and np.sum(np.array(deltas)**2) < N**2:
deltas[i] = (np.random.uniform(-np.sqrt(N**2 - np.sum(np.array(deltas)**2)),\
np.sqrt(N**2 - np.sum(np.array(deltas)**2))))
i += 1
But this approach takes long time, if I generate such list many times. (I think because of loop).
Note, that I don't want my list to consist of just one unique value. The distribution of values does not have to be uniform - I took uniform just for example.
Could you suggest any faster approach? May be there is special function in any lib?
If you didn't mind a few repeating 1s, you could do something like this:
def square_list(integer):
components = []
total = 0
remaining = integer
while total != integer:
component = int(remaining ** 0.5)
remaining -= component ** 2
components.append(component)
total = sum([x ** 2 for x in components])
return components
This code works by finding the taking the largest square, and then decreasing to the next largest square. It continues until the largest square is 1, which could at worse result in 3 1s in a list.
If you are looking for a more random distribution, it might make sense to randomly transform remaining as a separate variable before subtracting from it.
IE:
value = transformation(remaining)
component = int(value ** 0.5)
which should give you more "random" values.

Search and remove algorithm

Say you have an ordered array of values representing x coordinates.
[0,25,50,60,75,100]
You might notice that without the 60, the values would be evenly spaced (25). This would be indicative of a repeating pattern, something that I need to extract using this list (regardless of the length and the values of the list). In this particular example, the algorithm should find and remove the 60.
There are no time or space complexity requirements.
Both the values in the list and the ideal spacing (e.g 25) are unknown. So the algorithm must obtain this by looking at the values. In addition, the number of values, and where the outliers are in the array are not guaranteed. There may be more than one outlier. The algorithm should return a list with the outliers removed. Extra points if the algorithm uses a threshold for the spacing.
Edit: Here is an example image
Here there is one outlier on the x axis. (green-line) There are two on the y axis. The x-coordinates of the array represent the rho of the line on that axis.
arr = [0,25,50,60,75,100]
First construct the distances array
dist = np.array([arr[i+1] - arr[i] for (i, _) in enumerate(arr) if i < len(arr)-1])
print(dist)
>> [25 25 10 15 25]
Now I'm using np.where and np.percentile to cut the array in 3 part: the main , the upper values and the lower values. I arbitrary set them to 5%.
cond_sup = np.where(dist > np.percentile(dist, 95))
print(cond_sup)
>> (array([]),)
cond_inf = np.where(dist < np.percentile(dist, 5))
print(cond_inf)
>> (array([2]),)
You now got indexes where the value is different from the others.
So, dist[2] has a problem, which mean by construction the problem is between arr[2] and arr[2+1]
I don't know if you want to remove 1 or more numbers from this array. So I think the way to solve this problem will be like this:
array A[] = [0,25,50,60,75,100];
sort array (if needed).
create a new array B[] with value i-th: B[i] = A[i+1] - A[i]
find the value of B[] elements that appear most time. It's will be our distance.
find i such that A[i+1]-A[i] != distance
find k (k>i and k min) such that A[i+k]-A[i] == distance
so, we need remove A[i+1] => A[i+k-1]
I hope it is right.

How can I easily show that string index order when calculating Levenshtein distance don't matter for strings of the same length?

When working on my Levenshtein distance implementation I stumbled upon the fact that my indexes were swapped, as shown in this pseudocode (note the s1[j] == s2[i] instead of s1[i] == s2[j]).
L(i, j) = min(L(i - 1, j) + 1,
L(i, j - 1) + 1,
L(i - 1, j - 1) + (s1[j] == s2[i] ? 0 : 1))
But because my implementation calculates the matrix as a sequence of rectangular submatrixes, it doesn't seem to affect the computation at all, and always yields the correct result, no matter if the indexes are swapped or not. (Or for simplicity just think of the strings as having the same length.)
Now my question is, how can I prove (not necessarily in a formal way) that the index order doesn't matter for equal length strings? It seems that because this is the only places that affects the matrix, and because it ends up being symmetrical, swapping the indexes would just transpose the matrix, but I'm not sure if I'm not missing something important.
As you pointed out, this will only work if the two strings are of equal lengths.
But given a more formal definition of levenshtein in the image below, the only things actually referring to the content of the strings are the function r(x, y). The rest is only concerning the length of the strings, which in this case are the same. So the effect of using s1[j] == s2[i] instead of s1[i] == s2[j] is the same as swapping the two input parameters s1 and s2.
Note: MSD = minimum sum of distances

How to find the actual sequence of a Longest Increasing Subsequence?

This is not a homework problem. I am reviewing myself of the Longest Increasing Subsequence problem. I read every where online. I understand how to find the "length", but I don't understand how to back-trace the actual sequence. I am using the patience sorting algorithm to find the length. Can anyone explain how to find the actual sequence? I do not really understand the version in Wikipedia. Can someone explain in a different method or different way?
Thanks.
Lets define as max(j) as the longest increasing subsequence up to A[j]. There are two options: or we use A[j] in this subsequence, or we don't.
If we dont use it, then the value will be max(j-1). If we do use it, then the value will be
max(i)+1, when i is the biggest index such that i < j and A[i] < A[j]. (Here we assume that the max(i) sequence uses i- not neccessary true, but we can solve this issue by saving for each cell 2 values- the max(j) value, and max*(j), when max*(j) is the longest increasing subsequence up to A[j] that uses A[j]. max*(j) will be calculated each time as max*(i)+1).
To sum up, the recursive formula for calculating max(j) will be:
max{max(j-1),max*(i)+1},and max*(j)= max*(i)+1.
In each array cell you can save a pointer, that tells you if you chose to use the A[j] cell or not. In this way you can find all the sequence while moving backwards on the array.
Time Complexity: The complexity of the recursive formula and finding the sequence at the end is O(n). The problem here is finding for each A[j] the corresponding A[i] such that i is the biggest index such that i < j, A[i] < A[j].
Of course you can do it naivly in O(n^2) (from each cell go backwards until you find this i). If you want to do better then I'm pretty sure that you can do it in O(nlogn) in the following way:
*Sort your Array.
1) go for the smallest integer in the array, and notate is position in the array as k.
2)For A[k+1], we have of course A[k] < A[k+1]. If A[k+1]>A[k+2] then k will feet to the k+2 cell as well, and so on until we have A[k+m] < A[k+m+1], and then k+m is feet to k+m+1,
3)delete all the cells that you found thier corresponding cell in the previous stage
4) return to 1.
Hoped that it help. Please notice that I thought about it all alone, therefore there is a very small chance that there is some mistake here- please be convinced that I'm right and ask for more clarifications, if you need.
This Python code solves the Longest Increasing Sequence problem, and also returns one of such sequences. The trick is, at the same time that the dynamic programming table gets filled, another array is also filled, storing the index of the elements that were used to construct the optimal solution.
def an_lis(nums):
table, solution = lis_table(nums)
if not table:
return (0, [])
n, maxLen = max(enumerate(table), key=itemgetter(1))
lis = [nums[n]]
while solution[n] != -1:
lis.append(nums[solution[n]])
n = solution[n]
return lis[::-1]
def lis_table(nums):
n = len(nums)
table, solution = [0] * n, [-1] * n
for i in xrange(n):
maxLen, maxIdx = 0, -1
for j in xrange(i):
if nums[j] < nums[i] and table[j] > maxLen:
maxLen, maxIdx = table[j], j
table[i], solution[i] = 1 + maxLen, maxIdx
return (table, solution)

Resources