Book Shop Question (same logic but 2 different implementation) - dynamic-programming

You are in a book shop which sells n different books. You know the price and number of pages of each book.
You have decided that the total price of your purchases will be at most x. What is the maximum number of pages you can buy? You can buy each book at most once.
So i figured that it was an example of 0-1 knapsack problem.
In my first approach I created a dp array as dp[i][j] which tells us the maximum pages using i money and first j books.
int n,budget;
cin>>n>>budget;
vector<int> price(n),pages(n);
for (int &v : price)
cin >> v;
for (int &v : pages)
cin >> v;
vector<vector<int> > dp(budget+1 , vector<int>(n+1,0));
for(int i=1 ; i<budget+1 ; i++){
for(int j=1; j<n+1 ; j++){
if(i-price[j-1] >= 0){
dp[i][j] = max(dp[i][j-1] , dp[i-price[j-1]][j-1] + pages[j-1]);
}
else{
dp[i][j] = dp[i][j-1];
}
}
}
cout<<dp[budget][n];
But the problem is that this solution exceeds the time limit.
The solution posted on the site had the same logic but the rows and columns of dp vector were flipped.That is dp[i][j] tells the maximum number of pages using j money and first i books. The solution was way faster.
int n,x;
cin>>n>>x;
vector<int> price(n), pages(n);
for (int &v : price)
cin >> v;
for (int &v : pages)
cin >> v;
vector<vector<int> > dp(n + 1, vector<int>(x + 1, 0));
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= x; j++)
{
if (j - price[i - 1] >= 0)
{
dp[i][j] = max(dp[i-1][j], dp[i - 1][j - price[i - 1]] + pages[i - 1]);
}
else{
dp[i][j] = dp[i-1][j];
}
}
}
cout << dp[n][x] << endl;
I don't understand why there is a difference in time for the above implementations. I am new to competitive programming so please clarify my doubt.
Thanks in advance.

Related

DP Approach ? Output should be the max sum. interview ( still have not idea, any hints) at at well known company

You have a table and in each cell there is either a positive integer or the cell is blocked. You have a player starting from bottom left and want to get to the top right in such a way that you maximize the sum of integers on your way. You are only allowed to move up or right but not through blocked cells. Output should be the max sum.
On my code I am making the assumption that the answer will fit on a long long type.
I am also assuming that is a square matrix for simplicity, but you can adapt the algorithm for any rectangular matrix with almost no effort.
If the input matrix is N x N, the complexity of this approach is O(N ^ 2).
#include <vector>
#include <iostream>
#include <algorithm>
constexpr int maxDimension = 100;
using namespace std;
long long matrix[maxDimension][maxDimension];
long long dp[maxDimension][maxDimension];
int main()
{
// I am assuming that the matrix is filled with positive
// integers, and the blocked cell's are filled with -1.
// reading the values for the matrix
for(int i = 0; i < maxDimension; ++i)
{
for(int j = 0; j < maxDimension; ++j)
{
cin >> matrix[i][j];
}
}
/*
For every pair(i, j),
dp[i][j] is the maximum
sum we can achive going from
(0,0) to (i, j)
*/
// Observation if dp[i][j] is equal to -1, it is because we cannot reach the cell (i, j) because of blocked cells
dp[0][0] = matrix[0][0];
// this calculates the dp for row == 0
for(int col = 1; col < maxDimension; ++col)
{
if(dp[0][col - 1] != -1 && matrix[0][col] != -1)
{
dp[0][col] = dp[0][col-1] + matrix[0][col];
}
else dp[0][col] = -1;
}
// now I will calculate the dp for column == 0
for(int row = 1; row < maxDimension; ++row)
{
if(dp[row - 1][0] != -1 && matrix[row][0] != -1)
{
dp[row][0] = dp[row-1][0] + matrix[row][0];
}
else dp[row][0] = -1;
}
// Now that I have calculated the base cases, I will calculate the dp for the other states
// I will use the following expression
/* dp[i][j] = if (matrix[i][j] == -1) -> -1
else if (dp[i-1][j] != -1 or dp[i][j-1] != -1) -> max(dp[i-1][j], dp[i][j - 1]) + matrix[i][j]
else -> -1
*/
for(int row = 1; row < maxDimension; ++row)
{
for(int col = 1; col < maxDimension; ++col)
{
if(matrix[i][j] != -1 && ( dp[i-1][j] != -1 || dp[i][j-1] != -1) )
{
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) + matrix[i][j];
}
else dp[i][j] = -1;
}
}
if(dp[maxDimension-1][maxDimension-1] == -1) cout << "The top right cell is not reachable from the bottom left cell" << endl;
else cout << "The best sum possible is " << dp[maxDimension - 1][maxDimension - 1] << endl;
return 0;
}

Suffix Array Implementation Bugs

I've coded a Suffix Array implementation and discovered an issue in my implementation. Concretely I've outputted the first few suffix array ranks RA[0..7] of this string(length = 10^5) and had the following output:
80994
84360
87854
91517
95320
99277
83068
But the correct one had to be (everything shifted by 23):
81017
84383
87877
91540
95343
99300
83091
I know two ways how to fix it, but I don't know why it worked.
The first way was adding S[N++] = '$'; to the top of the buildSA() function (then the output was 1 less than the correct one, but it doesn't matter)
I also found another solution by decreasing the MAX_N constant to 1e5 + 10!
This is so much magic for me and I really need to know why this bug happened because I don't want to have this bug again.
#include <cstdio>
#include <cstring>
#include <algorithm>
using std::max;
const int MAX_N = 2e5 + 10;
int SA[MAX_N]; // The ith element is the index of the suffix
int RA[MAX_N]; // The rank of the suffix at i
int tmp[MAX_N]; // A temporary array
int B[MAX_N]; // An array for the buckets
int N;
char S[MAX_N];
void bucketSort(int k){
int i, m = max(256, N);
for(i = 0; i < m; i++)
B[i] = 0;
for(i = 0; i < N; i++)
B[i + k < N ? RA[i + k] : 0] ++;
for(i = 1; i < m; i++)
B[i] += B[i - 1];
for(i = N - 1; i >= 0; i--)
tmp[--B[SA[i] + k < N ? RA[SA[i] + k] : 0]] = SA[i];
for(i = 0; i < N; i++)
SA[i] = tmp[i];
}
void buildSA(){
for(int i = 0; i < N; i++){
SA[i] = i;
RA[i] = S[i];
}
for(int k = 1; k < N; k <<= 1){
bucketSort(k);
bucketSort(0);
int norder = 0;
tmp[SA[0]] = 0;
for(int i = 1; i < N; i++){
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
{} else norder++;
tmp[SA[i]] = norder;
}
for(int i = 0; i < N; i++)
RA[i] = tmp[i];
if(norder == N)
break;
}
}
void printSA(){
for(int i = 0; i < N; i++){
printf("%d: %s\n", SA[i], S + SA[i]);
}
}
int main(){
scanf("%s", S);
N = strlen(S);
buildSA();
for(int i = 0; i < 7; i++){
printf("%d\n",RA[i]);
}
return 0;
}
In the following line:
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
SA[i] + k can be >=N(the same is for SA[i - 1] + k).
It should be (SA[i] + k) % Ninstead.
I think I got it after many wasted hours. Sometimes the littlest mistakes can literally result to wrong answers.
The "bad" code line is:
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
{} else norder++;
I verified this by using a very simple testcase (I couldn't generate randomly...) like:
abab
The resulting suffix array was
0: abab
2: ab
3: b
1: bab
which is clearly wrong.
At step k = 2, if we are comparing two suffixes like ab and abab then, we realize that they have the same rank, since their first k = 2 characters match. ab is suffix #2, by adding k = 2, we are out of range.
I've often coded it like this because I've always appended an auxiliary character (e.g. '$') to the end. If I don't put such a character (like in my case), SA[i] + k could actually be >= N and this code crashes.

Longest Common Prefix property

I was going through suffix array and its use to compute longest common prefix of two suffixes.
The source says:
"The lcp between two suffixes is the minimum of the lcp's of all pairs of adjacent suffixes between them on the array"
i.e. lcp(x,y)=min{ lcp(x,x+1),lcp(x+1,x+2),.....,lcp(y-1,y) }
where x and y are two index of the string from where the two suffix of the string starts.
I am not convinced with the statement as in example of string "abca".
lcp(1,4)=1 (considering 1 based indexing)
but if I apply the above equation then
lcp(1,4)=min{lcp(1,2),lcp(2,3),lcp(3,4)}
and I think lcp(1,2)=0.
so the answer must be 0 according to the equation.
Am i getting it wrong somewhere?
I think the index referred by the source is not the index of the string itself, but index of the sorted suffixes.
a
abca
bca
ca
Hence
lcp(1,2) = lcp(a, abca) = 1
lcp(1,4) = min(lcp(1,2), lcp(2,3), lcp(3,4)) = 0
You can't find LCP of any two suffixes by simply calculating the minimum of the lcp's of all pairs of adjacent suffixes between them on the array.
We can calculate the LCPs of any suffixes (i,j)
with the Help of Following :
LCP(suffix i,suffix j)=LCP[RMQ(i + 1; j)]
Also Note (i<j) as LCP (suff i,suff j) may not necessarly equal LCP (Suff j,suff i).
RMQ is Range Minimum Query .
Page 3 of this paper.
Details:
Step 1:
First Calculate LCP of Adjacents /consecutive Suffix Pairs .
n= Length of string.
suffixArray[] is Suffix array.
void calculateadjacentsuffixes(int n)
{
for (int i=0; i<n; ++i) Rank[suffixArray[i]] = i;
Height[0] = 0;
for (int i=0, h=0; i<n; ++i)
{
if (Rank[i] > 0)
{
int j = suffixArray[Rank[i]-1];
while (i + h < n && j + h < n && str[i+h] == str[j+h])
{
h++;
}
Height[Rank[i]] = h;
if (h > 0) h--;
}
}
}
Note: Height[i]=LCPs of (Suffix i-1 ,suffix i) ie. Height array contains LCP of adjacent suffix.
Step 2:
Calculate LCP of Any two suffixes i,j using RMQ concept.
RMQ pre-compute function:
void preprocesses(int N)
{
int i, j;
//initialize M for the intervals with length 1
for (i = 0; i < N; i++)
M[i][0] = i;
//compute values from smaller to bigger intervals
for (j = 1; 1 << j <= N; j++)
{
for (i = 0; i + (1 << j) - 1 < N; i++)
{
if (Height[M[i][j - 1]] < Height[M[i + (1 << (j - 1))][j - 1]])
{
M[i][j] = M[i][j - 1];
}
else
{
M[i][j] = M[i + (1 << (j - 1))][j - 1];
}
}
}
}
Step 3: Calculate LCP between any two Suffixes i,j
int LCP(int i,int j)
{
/*Make sure we send i<j always */
/* By doing this ,it resolve following
suppose ,we send LCP(5,4) then it converts it to LCP(4,5)
*/
if(i>j)
swap(i,j);
/*conformation over*/
if(i==j)
{
return (Length_of_str-suffixArray[i]);
}
else
{
return Height[RMQ(i+1,j)];
//LCP(suffix i,suffix j)=LCPadj[RMQ(i + 1; j)]
//LCPadj=LCP of adjacent suffix =Height.
}
}
Where RMQ function is:
int RMQ(int i,int j)
{
int k=log((double)(j-i+1))/log((double)2);
int vv= j-(1<<k)+1 ;
if(Height[M[i][k]]<=Height[ M[vv][ k] ])
return M[i][k];
else
return M[ vv ][ k];
}
Refer Topcoder tutorials for RMQ.
You can check the complete implementation in C++ at my blog.

Recurrence equation for dynamic programming

I have a situation that is really similar to the knapsack problem but I just want to confirm that my recurrence equation is the same as the knapsack problem.
We have a maximum of M dollars to invest. We have N different investments which each one have a cost m(i) and a profit g(i). We want to find the recurrence equation for maximize the profit.
here is my answer :
g(i,j) = max{g(i-1,j), g_i + (i-1,j-m_i)} if j-m_i >= 0
g(i-1,j) if j-m_i < 0
I hope my explanation are clear.
Thank you and have a nice day!
Bobby
Your recurrence equation is correct. The problem is same as the traditional knapsack problem. Actually you can make some optimization on space complexity. Here is the C++ code.
int dp[M + 10];
int DP{
memset(dp, 0, sizeof(dp));
for(int i = 0; i < N; ++i)
for(int j = M; j >= m[i]; --j) // pay attention
dp[j] = max(dp[j], dp[j - m[i]] + g[i]);
int ret = 0;
for(int i = 0; i <= M; ++i) ret = max(ret, dp[i]);
return ret;
}

Finding similar/related texts algorithms

I searched a lot in stackoverflow and Google but I didn't find the best answer for this.
Actually, I'm going to develop a news reader system that crawl and collect news from web (with a crawler) and then, I want to find similar or related news in websites (In order to prevent showing duplicated news in website)
I think the best live example for that is Google News, it collect news from web and then categorize and find related news and articles. This is what I want to do.
What's the best algorithm for doing this?
A relatively simple solution is to compute a tf-idf vector (en.wikipedia.org/wiki/Tf*idf) for each document, then use the cosine distance (en.wikipedia.org/wiki/Cosine_similarity) between these vectors as an estimate for semantic distance between articles.
This will probably capture semantic relationships better than Levenstein distance and is much faster to compute.
This is one: http://en.wikipedia.org/wiki/Levenshtein_distance
public static SqlInt32 ComputeLevenstheinDistance(SqlString firstString, SqlString secondString)
{
int n = firstString.Value.Length;
int m = secondString.Value.Length;
int[,] d = new int[n + 1,m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (secondString.Value[j - 1] == firstString.Value[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
This is handy for the task at hand: http://code.google.com/p/boilerpipe/
Also, if you need to reduce the number of words to analyze, try this: http://ots.codeplex.com/
I have found the OTS VERY useful in sentiment analysis, whereby I can reduce the number of sentences into a small list of common phrases and/or words and calculate the overall sentiment based on this. The same should work for similarity.

Resources