Program Correctness, Invariants and Predicate Logic for selection sort - predicate

I'm trying to prove the correctness of the Selection sort, in which I should use only the mathematical predicate logic to prove program correctness, I'm finding it difficult to write the English statements given below as Predicates and proceed through the proof of correctness following Inference rules,
void sort(int [] a, int n) {
for (int i=0; i<n-1; i++) {
int best = i;
for (int j=i; j<n; j++) {
if (a[j] < a[best]) {
best = j;
}
}
swap a[i] and a[best];
}
}
The statements I have to write in Predicates are,
a[0...i-1] is sorted
all entries in a[i..n-1] are larger than or equal to the entries in a[0..i-1].

A statement about a subarray like a[0...i-1] is really a statement about all elements in that subarray, so you will need to use universal quantifiers to translate it into a statement about individual members.
To say that a subarray is sorted, we can say something like: "for any pair of indices j < k in the subarray, the values at those indices are in order."
For all 0 <= j < i-1, for all j < k <= i-1, arr[j] <= arr[k].
The second property is already written as a statement about "all entries" of two subarrays, but let's make it more explicit: "for any pair of indices j in the first subarray and k in the second subarray, the value in the first subarray is less than or equal to the value in the second subarray."
For all 0 <= j <= i-1, for all i <= k <= n-1, arr[j] <= arr[k].

Related

I'm not able to understand logic of coin changing problem in o(sum) space complexity

I'm facing difficulty in understanding O(sum) complexity solution of coin changing problem.
The problem statement is:
You are given a set of coins A. In how many ways can you make sum B assuming you have infinite amount of each coin in the set.
NOTE:
Coins in set A will be unique. Expected space complexity of this problem is O(B).
The solution is:
int count( int S[], int m, int n )
{
int table[n+1];
memset(table, 0, sizeof(table));
table[0] = 1;
for(int i=0; i<m; i++)
for(int j=S[i]; j<=n; j++)
table[j] += table[j-S[i]];
return table[n];
}
can someone explain me this code.?
First, let's identify the parameters and variables used in the function:
Parameters:
S contain the denomination of all m coins. i.e. Each element contain the value of each coin.
m represents the number of coin denominations. Essentially, it's the length of array S.
n represents the sum B to be achieved.
Variables:
table: Element i in array table contains the number of ways sum i can be achieved with the given coins. table[0] = 1 because there is a single way to achieve a sum of 0 (not using any coin).
i loops through each coin.
Logic:
The number of ways to achieve a sum j = sum of the following:
number of ways to achieve a sum of j - S[0]
number of ways to achieve a sum of j - S[1]
...
number of ways to achieve a sum of j - S[m-1] (S[m-1] is the value of the mth coin)
I did not completely decipher nor validate the rest of the code, but I hope this is a step in the right direction.
Added comments to code:
#include <stdio.h>
#include <string.h>
int count( int S[], int m, int n )
{
int table[n+1];
memset(table, 0, sizeof(table));
table[0] = 1;
for(int i=0; i<m; i++) // Loop through all of the coins
for(int j=S[i]; j<=n; j++) // Achieve sum j between the value of S[i] and n.
table[j] += table[j-S[i]]; // Add to the number of ways to achieve sum j the number of ways to achieve sum j - S[i]
return table[n];
}
int main() {
int S[] = {1, 2};
int m = 2;
int n = 3;
int c = count(S, m, n);
printf("%d\n", c);
}
Notes:
The code avoids repeats: 3 = 1+1+1, 1+2 (2 ways instead of 3 if 2+1 was considered.
No dependence on the order of the coins in term of value.

Is it possible to parallelize or unroll this loop?

I am trying to see if I can improve the performance of the following loop in C++, which uses two dimensional vectors (_external and _Table) and has a carried loop dependency on the previous iteration. Additionally, it has a calculated index accessor in the innermost loop that will make the access of _Table non sequential on the right hand side.
int N = 8000;
int M = 400
int P = 100;
for(int i = 1; i <= N; i++){
for(int j = 0; j < M; j++){
for(int k =0; k < P; k++){
int index = _external.at(j).at(k);
_Table.at(j).at(i) += _Table.at(index).at(i-1);
}
}
}
What can I do to improve the performance of a loop like this?
Well it looks to me like the order in which these statements:
int index = _external.at(j).at(k);
_Table.at(j).at(i) += _Table.at(index).at(i-1);
are executed is critical to correctness. (That is, if the iteration order for i, j, k changes, then the results will be different ... and incorrect.)
So I think you are only left with micro-optimizations, like hoisting the expressions _Table.at(j).at(i) and _external.at(j) out of the innermost loop.
Consider this:
for(int k =0; k < P; k++){
int index = _external.at(j).at(k);
_Table.at(j).at(i) += _Table.at(index).at(i-1);
}
This loop is repeatedly adding numbers to _Table.at(j).at(i). Since (by inspection) _Table.at(index).at(i-1) must be reading from a different cell of the table (because of i-1 versus i), you could do this:
int temp = 0;
for(int k =0; k < P; k++){
int index = _external.at(j).at(k);
temp += _Table.at(index).at(i-1);
}
_Table.at(j).at(i) += temp;
This will reduce the number of calls to at, and may also improve cache performance a bit.

total substrings with k ones

Given a binary string s, we need to find the number of its substrings, containing exactly k characters that are '1'.
For example: s = "1010" and k = 1, answer = 6.
Now, I solved it using binary search technique over the cumulative sum array.
I also used another approach to solve it. The approach is as follows:
For each position i, find the total substrings that end at i containing
exactly k characters that are '1'.
To find the total substrings that end at i containing exactly k characters that are 1, it can be represented as the set of indices j such that substring j to i contains exactly k '1's. The answer would be the size of the set. Now, to find all such j for the given position i, we can rephrase the problem as finding all j such that
number of ones from [1] to [j - 1] = the total number of ones from 1 to i - [the total number of ones from j to i = k].
i.e. number of ones from [1] to [j - 1] = C[i] - k
which is equal to
C[j - 1] = C[i] - k,
where C is the cumulative sum array, where
C[i] = sum of characters of string from 1 to i.
Now, the problem is easy because, we can find all the possible values of j's using the equation by counting all the prefixes that sum to C[i] - k.
But I found this solution,
int main() {
cin >> k >> S;
C[0] = 1;
for (int i = 0; S[i]; ++i) {
s += S[i] == '1';
++C[s];
}
for (int i = k; i <= s; ++i) {
if (k == 0) {
a += (C[i] - 1) * C[i] / 2;
} else {
a += C[i] * C[i - k];
}
}
cout << a << endl;
return 0;
}
In the code, S is the given string and K as described above, C is the cumulative sum array and a is the answer.
What is the code exactly doing by using multiplication, I don't know.
Could anybody explain the algorithm?
If you see the way C[i] is calculated, C[i] represents the number of characters between ith 1 and i+1st 1.
If you take an example S = 1001000
C[0] = 1
C[1] = 3 // length of 100
C[2] = 4 // length of 1000
So coming to your doubt, Why multiplication
Say your K=1, then you want to find out the substring which have only one 1, now you know that after first 1 there are two zeros since C[1] = 3. So number of of substrings will be 3, because you have to include this 1.
{1,10,100}
But when you come to the second part: C[2] =4
now if you see 1000 and you know that you can make 4 substrings (which is equal to C[2])
{1,10,100,1000}
and also you should notice that there are C[1]-1 zeroes before this 1.
So by including those zeroes you can make more substring, in this case by including 0 once
0{1,10,100,1000}
=> {01,010,0100,01000}
and 00 once
00{1,10,100,1000}
=> {001,0010,00100,001000}
so essentially you are making C[i] substrings starting with 1 and you can append i number of zeroes before this one and make another C[i] * C[i-k]-1 substrings. i varies from 1 to C[i-k]-1 (-1 because we want to leave that last one).
((C[i-k]-1)* C[i]) +C[i]
=> C[i-k]*C[i]

Asymmetric Levenshtein distance

Given two bit strings, x and y, with x longer than y, I'd like to compute a kind of asymmetric variant of the Levensthein distance between them. Starting with x, I'd like to know the minimum number of deletions and substitutions it takes to turn x into y.
Can I just use the usual Levensthein distance for this, or do I need I need to modify the algorithm somehow? In other words, with the usual set of edits of deletion, substitution, and addition, is it ever beneficial to delete more than the difference in lengths between the two strings and then add some bits back? I suspect the answer is no, but I'm not sure. If I'm wrong, and I do need to modify the definition of Levenshtein distance to disallow deletions, how do I do so?
Finally, I would expect intuitively that I'd get the same distance if I started with y (the shorter string) and only allowed additions and substitutions. Is this right? I've got a sense for what these answers are, I just can't prove them.
If i understand you correctly, I think the answer is yes, the Levenshtein edit distance could be different than an algorithm that only allows deletions and substitutions to the larger string. Because of this, you would need to modify, or create a different algorithm to get your limited version.
Consider the two strings "ABCD" and "ACDEF". The Levenshtein distance is 3 (ABCD->ACD->ACDE->ACDEF). If we start with the longer string, and limit ourselves to deletions and substitutions we must use 4 edits (1 deletion and 3 substitutions. The reason is that strings where deletions are applied to the smaller string to efficiently get to the larger string can't be achieved when starting with the longer string, because it does not have the complimentary insertion operation (since you're disallowing that).
Your last paragraph is true. If the path from shorter to longer uses only insertions and substitutions, then any allowed path can simply be reversed from the longer to the shorter. Substitutions are the same regardless of direction, but the inserts when going from small to large become deletions when reversed.
I haven't tested this thoroughly, but this modification shows the direction I would take, and appears to work with the values I've tested with it. It's written in c#, and follows the psuedo code in the wikipedia entry for Levenshtein distance. There are obvious optimizations that can be made, but I refrained from doing that so it was more obvious what changes I've made from the standard algorithm. An important observation is that (using your constraints) if the strings are the same length, then substitution is the only operation allowed.
static int LevenshteinDistance(string s, string t) {
int i, j;
int m = s.Length;
int n = t.Length;
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)*(n+1) values
var d = new int[m + 1, n + 1];
// set each element to zero
// c# creates array already initialized to zero
// source prefixes can be transformed into empty string by
// dropping all characters
for (i = 0; i <= m; i++) d[i, 0] = i;
// target prefixes can be reached from empty source prefix
// by inserting every character
for (j = 0; j <= n; j++) d[0, j] = j;
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1])
d[i, j] = d[i - 1, j - 1]; // no operation required
else {
int del = d[i - 1, j] + 1; // a deletion
int ins = d[i, j - 1] + 1; // an insertion
int sub = d[i - 1, j - 1] + 1; // a substitution
// the next two lines are the modification I've made
//int insDel = (i < j) ? ins : del;
//d[i, j] = (i == j) ? sub : Math.Min(insDel, sub);
// the following 8 lines are a clearer version of the above 2 lines
if (i == j) {
d[i, j] = sub;
} else {
int insDel;
if (i < j) insDel = ins; else insDel = del;
// assign the smaller of insDel or sub
d[i, j] = Math.Min(insDel, sub);
}
}
}
}
return d[m, n];
}

A Dynamic Programming problem in USACO

In section2.2,a problem called"subset sum"require you to calculate in how many ways can a integer set from 1 to n be partitioned into two sets whose sums are identical.
I know the recurrence is:
f[i][j] : numbers of ways that sum up to j with 1...i
f[i][j]=f[i-1][j]+f[i-1][j-i]
if the initial condition is:
f[1][1]=1;//others are all zero,main loop start from 2
OR:
f[0][0]=1;//others are all zero,main loop start from 1
the answers are all f[n][n*(n+1)/4].Does this means the initial condition doesn't affect the answer?
but if I use a one dimension array,say f[N]:
let f[0]=1,loop from 1(so f[0] is f[0][0] in fact),the answer is f[n]/2
or f[1]=1,loop from 2(f[1] is f[1][1]),the answer is f[n]
I am so confused...
I don't know if you are still stuck on this problem, but here's a solution for anyone else who stumbles onto this problem.
Let ways[i] be the number of ways you can get a sum of i using a subset of the numbers 1...N.
Then it becomes a variant of the 0-1 knapsack algorithm:
base case: ways[0] = 1
for (int i = 1; i <= N; i++) {
for (int j = sum - i; j >= 0; --j) { //sum is n*(n+1)/2
ways[j + i] += ways[j];
}
}
Your answer is located at ways[sum/2]/2.

Resources