What is the approach to solve spoj KPMATRIX? - dynamic-programming

The problem link is here. The problem is basically to count all such sub matrices of a given matrix of size N by M, whose sum of elements is between A and B inclusive. N,M<=250. 10^-9<=A<=B<=10^9.
People have solved it using DP and BIT. I am not clear how.
First, i tried to solve a simpler version, 1-D case of the above problem: Given an array A, of length N, count all subarrays, where sum of elements in the subarray lies between A and B, but still couldn't think of better than O(n^2). Here is what i did :
I thought of making another array for keeping prefix sum of the original array, say prefix[N]. prefix[i] = A1 + A[2] + A[3] + ...A[i]. set prefix[ 1] = A [ 1]. Then for each i from 2 to N, problem is to count all j <= i such that sum Z = A[j] + A[j+1] + ..A[i] lies between A and B. This is equivalent to prefix[i] - prefix[j-1]. But it's still O(n^2), as for each i, j is hitting i places.
can anybody help me step by step to advance me in the given approach to solve the main problem ?.

Related

Maximum Sum of XOR operation on a selected element with array elements with an optimize approach

Problem: Choose an element from the array to maximize the sum after XOR all elements in the array.
Input for problem statement:
N=3
A=[15,11,8]
Output:
11
Approach:
(15^15)+(15^11)+(15^8)=11
My Code for brute force approach:
def compute(N,A):
ans=0
for i in A:
xor_sum=0
for j in A:
xor_sum+=(i^j)
if xor_sum>ans:
ans=xor_sum
return ans
Above approach giving the correct answer but wanted to optimize the approach to solve it in O(n) time complexity. Please help me to get this.
If you have integers with a fixed (constant) number of c bites then it should be possible because O(c) = O(1). For simplicity reasons I assume unsigned integers and n to be odd. If n is even then we sometimes have to check both paths in the tree (see solution below). You can adapt the algorithm to cover even n and negative numbers.
find max in array with length n O(n)
if max == 0 return 0 (just 0s in array)
find the position p of the most significant bit of max O(c) = O(1)
p = -1
while (max != 0)
p++
max /= 2
so 1 << p gives a mask for the highest set bit
build a tree where the leaves are the numbers and every level stands for a position of a bit, if there is an edge to the left from the root then there is a number that has bit p set and if there is an edge to the right there is a number that has bit p not set, for the next level we have an edge to the left if there is a number with bit p - 1 set and an edge to the right if bit p - 1 is not set and so on, this can be done in O(cn) = O(n)
go through the array and count how many times a bit at position i (i from 0 to p) is set => sum array O(cn) = O(n)
assign the root of the tree to node x
now for each i from p to 0 do the following:
if x has only one edge => x becomes its only child node
else if sum[i] > n / 2 => x becomes its right child node
else x becomes its left child node
in this step we choose the best path through the tree that gives us the most ones when xoring O(cn) = O(n)
xor all the elements in the array with the value of x and sum them up to get the result, actually you could have built the result already in the step before by adding sum[i] * (1 << i) to the result if going left and (n - sum[i]) * (1 << i) if going right O(n)
All the sequential steps are O(n) and therefore overall the algorithm is also O(n).

Python infinite recursion with formula

### Run the code below and understand the error messages
### Fix the code to sum integers from 1 up to k
###
def f(k):
return f(k-1) + k
print(f(10))
I am confused on how to fix this code while using recursion, I keep getting the error messages
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
Is there a simple way to fix this without using any while loops or creating more than 1 variable?
A recursion should have a termination condition, i.e. the base case. When your variable attains that value there are no more recursive function calls.
e.g. in your code,
def f(k):
if(k == 1):
return k
return f(k-1) + k
print(f(10))
we define the base case 1, if you want to take the sum of values from n to 1. You can put any other number, positive or negative there, if you want the sum to extend upto that number. e.g. maybe you want to take sum from n to -3, then base case would be k == -3.
Python doesn't have optimized tail recursion. You f function call k time. If k is very big number then Python trow RecursionError. You can see what is limit of recursion via sys.getrecursionlimit and change via sys.setrecursionlimit. But changing limit is not good idea. Instead of changing you can change your code logic or pattern.
Your recursion never terminates. You could try:
def f(k):
return k if k < 2 else f(k-1) + k
print(f(10))
You are working out the sum of all of all numbers from 1 to 10 which in essence returns the 10th triangular number. Eg. the number of black circles in each triangle
Using the formula on OEIS gives you this as your code.
def f(k):
return int(k*(k+1)/2)
print(f(10))
How do we know int() doesn't break this? k and k + 1 are adjacent numbers and one of them must have a factor of two so this formula will always return an integer if given an integer.

Algorithm to solve Local Alignment

Local alignment between X and Y, with at least one column aligning a C
to a W.
Given two sequences X of length n and Y of length m, we
are looking for a highest-scoring local alignment (i.e., an alignment
between a substring X' of X and a substring Y' of Y) that has at least
one column in which a C from X' is aligned to a W from Y' (if such an
alignment exists). As scoring model, we use a substitution matrix s
and linear gap penalties with parameter d.
Write a code in order to solve the problem efficiently. If you use dynamic
programming, it suffices to give the equations for computing the
entries in the dynamic programming matrices, and to specify where
traceback starts and ends.
My Solution:
I've taken 2 sequences namely, "HCEA" and "HWEA" and tried to solve the question.
Here is my code. Have I fulfilled what is asked in the question? If am wrong kindly tell me where I've gone wrong so that I will modify my code.
Also is there any other way to solve the question? If its available can anyone post a pseudo code or algorithm, so that I'll be able to code for it.
public class Q1 {
public static void main(String[] args) {
// Input Protein Sequences
String seq1 = "HCEA";
String seq2 = "HWEA";
// Array to store the score
int[][] T = new int[seq1.length() + 1][seq2.length() + 1];
// initialize seq1
for (int i = 0; i <= seq1.length(); i++) {
T[i][0] = i;
}
// Initialize seq2
for (int i = 0; i <= seq2.length(); i++) {
T[0][i] = i;
}
// Compute the matrix score
for (int i = 1; i <= seq1.length(); i++) {
for (int j = 1; j <= seq2.length(); j++) {
if ((seq1.charAt(i - 1) == seq2.charAt(j - 1))
|| (seq1.charAt(i - 1) == 'C') && (seq2.charAt(j - 1) == 'W')) {
T[i][j] = T[i - 1][j - 1];
} else {
T[i][j] = Math.min(T[i - 1][j], T[i][j - 1]) + 1;
}
}
}
// Strings to store the aligned sequences
StringBuilder alignedSeq1 = new StringBuilder();
StringBuilder alignedSeq2 = new StringBuilder();
// Build for sequences 1 & 2 from the matrix score
for (int i = seq1.length(), j = seq2.length(); i > 0 || j > 0;) {
if (i > 0 && T[i][j] == T[i - 1][j] + 1) {
alignedSeq1.append(seq1.charAt(--i));
alignedSeq2.append("-");
} else if (j > 0 && T[i][j] == T[i][j - 1] + 1) {
alignedSeq2.append(seq2.charAt(--j));
alignedSeq1.append("-");
} else if (i > 0 && j > 0 && T[i][j] == T[i - 1][j - 1]) {
alignedSeq1.append(seq1.charAt(--i));
alignedSeq2.append(seq2.charAt(--j));
}
}
// Display the aligned sequence
System.out.println(alignedSeq1.reverse().toString());
System.out.println(alignedSeq2.reverse().toString());
}
}
#Shole
The following are the two question and answers provided in my solved worksheet.
Aligning a suffix of X to a prefix of Y
Given two sequences X and Y, we are looking for a highest-scoring alignment between any suffix of X and any prefix of Y. As a scoring model, we use a substitution matrix s and linear gap penalties with parameter d.
Give an efficient algorithm to solve this problem optimally in time O(nm), where n is the length of X and m is the length of Y. If you use a dynamic programming approach, it suffices to give the equations that are needed to compute the dynamic programming matrix, to explain what information is stored for the traceback, and to state where the traceback starts and ends.
Solution:
Let X_i be the prefix of X of length i, and let Y_j denote the prefix of Y of length j. We compute a matrix F such that F[i][j] is the best score of an alignment of any suffix of X_i and the string Y_j. We also compute a traceback matrix P. The computation of F and P can be done in O(nm) time using the following equations:
F[0][0]=0
for i = 1..n: F[i][0]=0
for j = 1..m: F[0][j]=-j*d, P[0][j]=L
for i = 1..n, j = 1..m:
F[i][j] = max{ F[i-1][j-1]+s(X[i-1],Y[j-1]), F[i-1][j]-d, F[i][j-1]-d }
P[i][j] = D, T or L according to which of the three expressions above is the maximum
Once we have computed F and P, we find the largest value in the bottom row of the matrix F. Let F[n][j0] be that largest value. We start traceback at F[n][j0] and continue traceback until we hit the first column of the matrix. The alignment constructed in this way is the solution.
Aligning Y to a substring of X, without gaps in Y
Given a string X of length n and a string Y of length m, we want to compute a highest-scoring alignment of Y to any substring of X, with the extra constraint that we are not allowed to insert any gaps into Y. In other words, the output is an alignment of a substring X' of X with the string Y, such that the score of the alignment is the largest possible (among all choices of X') and such that the alignment does not introduce any gaps into Y (but may introduce gaps into X'). As a scoring model, we use again a substitution matrix s and linear gap penalties with parameter d.
Give an efficient dynamic programming algorithm that solves this problem optimally in polynomial time. It suffices to give the equations that are needed to compute the dynamic programming matrix, to explain what information is stored for the traceback, and to state where the traceback starts and ends. What is the running-time of your algorithm?
Solution:
Let X_i be the prefix of X of length i, and let Y_j denote the prefix of Y of length j. We compute a matrix F such that F[i][j] is the best score of an alignment of any suffix of X_i and the string Y_j, such that the alignment does not insert gaps in Y. We also compute a traceback matrix P. The computation of F and P can be done in O(nm) time using the following equations:
F[0][0]=0
for i = 1..n: F[i][0]=0
for j = 1..m: F[0][j]=-j*d, P[0][j]=L
for i = 1..n, j = 1..m:
F[i][j] = max{ F[i-1][j-1]+s(X[i-1],Y[j-1]), F[i][j-1]-d }
P[i][j] = D or L according to which of the two expressions above is the maximum
Once we have computed F and P, we find the largest value in the rightmost column of the matrix F. Let F[i0][m] be that largest value. We start traceback at F[i0][m] and continue traceback until we hit the first column of the matrix. The alignment constructed in this way is the solution.
Hope you get some idea about wot i really need.
I think it's quite easy to find resources or even the answer by google...as the first result of the searching is already a thorough DP solution.
However, I appreciate that you would like to think over the solution by yourself and are requesting some hints.
Before I give out some of the hints, I would like to say something about designing a DP solution
(I assume you know this can be solved by a DP solution)
A dp solution basically consisting of four parts:
1. DP state, you have to self define the physical meaning of one state, eg:
a[i] := the money the i-th person have;
a[i][j] := the number of TV programmes between time i and time j; etc
2. Transition equations
3. Initial state / base case
4. how to query the answer, eg: is the answer a[n]? or is the answer max(a[i])?
Just some 2 cents on a DP solution, let's go back to the question :)
Here's are some hints I am able to think of:
What is the dp state? How many dimensions are enough to define such a state?
Thinking of you are solving problems much alike to common substring problem (on 2 strings),
1-dimension seems too little and 3-dimensions seems too many right?
As mentioned in point 1, this problem is very similar to common substring problem, maybe you should have a look on these problems to get yourself some idea?
LCS, LIS, Edit Distance, etc.
Supplement part: not directly related to the OP
DP is easy to learn, but hard to master. I know a very little about it, really cannot share much. I think "Introduction to algorithm" is a quite standard book to start with, you can find many resources, especially some ppt/ pdf tutorials of some colleges / universities to learn some basic examples of DP.(Learn these examples is useful and I'll explain below)
A problem can be solved by many different DP solutions, some of them are much better (less time / space complexity) due to a well-defined DP state.
So how to design a better DP state or even get the sense that one problem can be solved by DP? I would say it's a matter of experiences and knowledge. There are a set of "well-known" DP problems which I would say many other DP problems can be solved by modifying a bit of them. Here is a post I just got accepted about another DP problem, as stated in that post, that problem is very similar to a "well-known" problem named "matrix chain multiplication". So, you cannot do much about the "experience" part as it has no express way, yet you can work on the "knowledge" part by studying these standard DP problems first maybe?
Lastly, let's go back to your original question to illustrate my point of view:
As I knew LCS problem before, I have a sense that for similar problem, I may be able to solve it by designing similar DP state and transition equation? The state s(i,j):= The optimal cost for A(1..i) and B(1..j), given two strings A & B
What is "optimal" depends on the question, and how to achieve this "optimal" value in each state is done by the transition equation.
With this state defined, it's easy to see the final answer I would like to query is simply s(len(A), len(B)).
Base case? s(0,0) = 0 ! We can't really do much on two empty string right?
So with the knowledge I got, I have a rough thought on the 4 main components of designing a DP solution. I know it's a bit long but I hope it helps, cheers.

Converting N strings to a common target string in maximum of K edits

I've a set of string [S1 S2 S3 ... Sn] and I'm to count all such target strings T such that each one of S1 S2... Sn can be converted into T within a total of K edits. All the strings are of fixed length L and an edit here is hamming distance.
All I've is sort of brute force approach.
so, If my alphabet size is 4, I've sample space of O(4^L) and it takes O(L) time to check each one of them. I can't seem to bring down the complexity from exponential to some poly or pseudo-poly! Is there any way to prune down the sample space to do better?
I tried to visualize it as in a L-dimensional vector space. I've been given N points and have to count all the points whose sum of distance from the given N points is less than or equal to K. i.e. d1 + d2 + d3 +...+ dN <= K
Is there any known geometric algorithm which solves this or similar problem with a better complexity? Kindly point me in the right direction or any hints are appreciated.
Thank you
You can do this efficiently with dynamic programming.
The key idea is that you don't need to enumerate all possible target strings, you just need to know how many ways targets are possible with K edits considering only the string indicies after I.
alphabet = 'abcd'
s = [ 'aabbbb', 'bacaaa', 'dabbbb', 'cabaaa']
# use memoized from http://wiki.python.org/moin/PythonDecoratorLibrary
#memoized
def count(edits_left, index):
if index == -1 and edits_left >= 0:
return 1
if edits_left < 0:
return 0
ret = 0
for char in alphabet:
edits_used = 0
for mutate_str in s:
if mutate_str[index] != char:
edits_used += 1
ret += count(edits_left - edits_used, index - 1)
return ret
Thinking out loud, it seems to me that this problem boils down to a combinatorial problem.
In general for a string S of length L, there are a total of C(L,K) (binomial coefficient) positions that can be substituted and therefore (ALPHABET_SIZE^K)*C(L,K) target strings T from a Hamming Distance of K.
Binomial Coefficient can be computed quite easily using Dynamic Programming and the Pascal Triangle... No need to get crazy into factoriel etc...
Now that one string case is treated, dealing with multiple strings is a little bit more tricky since you might double count targets. Intuitively though if S1 is K far from S2 then both string will generate the same set of target so you don't double count in this case. This last statement might be a long shot that's why I made sure to say "intuitively" :)
Hope it helps,

How to find the actual sequence of a Longest Increasing Subsequence?

This is not a homework problem. I am reviewing myself of the Longest Increasing Subsequence problem. I read every where online. I understand how to find the "length", but I don't understand how to back-trace the actual sequence. I am using the patience sorting algorithm to find the length. Can anyone explain how to find the actual sequence? I do not really understand the version in Wikipedia. Can someone explain in a different method or different way?
Thanks.
Lets define as max(j) as the longest increasing subsequence up to A[j]. There are two options: or we use A[j] in this subsequence, or we don't.
If we dont use it, then the value will be max(j-1). If we do use it, then the value will be
max(i)+1, when i is the biggest index such that i < j and A[i] < A[j]. (Here we assume that the max(i) sequence uses i- not neccessary true, but we can solve this issue by saving for each cell 2 values- the max(j) value, and max*(j), when max*(j) is the longest increasing subsequence up to A[j] that uses A[j]. max*(j) will be calculated each time as max*(i)+1).
To sum up, the recursive formula for calculating max(j) will be:
max{max(j-1),max*(i)+1},and max*(j)= max*(i)+1.
In each array cell you can save a pointer, that tells you if you chose to use the A[j] cell or not. In this way you can find all the sequence while moving backwards on the array.
Time Complexity: The complexity of the recursive formula and finding the sequence at the end is O(n). The problem here is finding for each A[j] the corresponding A[i] such that i is the biggest index such that i < j, A[i] < A[j].
Of course you can do it naivly in O(n^2) (from each cell go backwards until you find this i). If you want to do better then I'm pretty sure that you can do it in O(nlogn) in the following way:
*Sort your Array.
1) go for the smallest integer in the array, and notate is position in the array as k.
2)For A[k+1], we have of course A[k] < A[k+1]. If A[k+1]>A[k+2] then k will feet to the k+2 cell as well, and so on until we have A[k+m] < A[k+m+1], and then k+m is feet to k+m+1,
3)delete all the cells that you found thier corresponding cell in the previous stage
4) return to 1.
Hoped that it help. Please notice that I thought about it all alone, therefore there is a very small chance that there is some mistake here- please be convinced that I'm right and ask for more clarifications, if you need.
This Python code solves the Longest Increasing Sequence problem, and also returns one of such sequences. The trick is, at the same time that the dynamic programming table gets filled, another array is also filled, storing the index of the elements that were used to construct the optimal solution.
def an_lis(nums):
table, solution = lis_table(nums)
if not table:
return (0, [])
n, maxLen = max(enumerate(table), key=itemgetter(1))
lis = [nums[n]]
while solution[n] != -1:
lis.append(nums[solution[n]])
n = solution[n]
return lis[::-1]
def lis_table(nums):
n = len(nums)
table, solution = [0] * n, [-1] * n
for i in xrange(n):
maxLen, maxIdx = 0, -1
for j in xrange(i):
if nums[j] < nums[i] and table[j] > maxLen:
maxLen, maxIdx = table[j], j
table[i], solution[i] = 1 + maxLen, maxIdx
return (table, solution)

Resources