Dynamic Programming: 0/1 Knapsack - Retrieving Combinations as array - dynamic-programming

I've been studying Dynamic Programming from both a bottom-up iterative approach and a top-down recursive approach using memoization.
I've been tasked with solving the 0/1 Knapsack Problem and have successfully used the bottom-up approach but am unable to use the top-down approach.
Using information from a webpage(http://www.csl.mtu.edu/cs4321/www/Lectures/Lecture%2017%20-%20Knapsack%20Problem%20and%20Memory%20Function.htm) I have come up with the following pseudocode which successfully computes the Value of the optimal solution. My issue is I cannot think of a way to keep track of the correct combination of the items which constitute this solution.
// values array containing the "profits" of each item
// weights array containing the "weight" of each item
// memo_pad is a list used to memoize recursive results
values[], weights[], memo_pad[]
knapsack_memoized(i, w):
// i is the current item
// w is the remaining weight allowed in the knapsack
if memo_pad[i][w] < 0: // if value not memoized
if w < weights[i]:
memo_pad[i][w] = knapsack_memoized(i-1, w)
else:
memo_pad[i][w] = max{knapsack_memoized(i-1,w), values[i]+knapsack_memoized(i-1, w-weights[i])}
return memo_pad[i][w]
end
I cannot figure out how to find out what combination of the input items will give me the returned optimised value?

you are looking to return the maximum of two cases:
(1) nth item included
(2) item not included
try this...
else:
memo_pad[i][w] = max{values[i] + memo_pad[i-1][w-weights[i]],
memo_pad[i-1][w]}

Related

Algorithm, that finds the k-greatest number in O(n*log(k))

was wondering, if you have given an unsorted list of arrays of any length n >= k,
what is your idea, to find the k-greatest number in O(n*log(k)) time. So the k = 2 -greatest number of an Array containing the numbers 1 to 9 would be 8 for example.
I'm trying to code this in python, if you have an idea how in that time complexity :)
My answer is not python-specific, however you should be able to implement the used concepts in python, or find libraries already implementing them.
The basic idea is to iterate over the list and store the current greatest, second greatest, ... , k-greatest number in a separate data structure. Since you will be iterating over all n entries in your array, the complexity of this is in O(n * insertion_step_complexity)
As seen above, the insertion step needs to not exceed a complexity of O(log(k)) to achieve this you can use a AVL-Tree that has a complexity of O(log(m)) for inserting and deleting items, where m is the number of items stored within the avl-tree.
An algorithm would look like this:
def find_k_greatest_number(k, array):
avl_tree = initialize AVL tree here
avl_items = 0
for number in array:
if (number > avl_tree.smallest_number()):
if (avl_itmes >= k):
avl_tree.delete_smallest_number()
else:
avl_items++
avl_tree.insert(number)
return avl_tree.smallest_number()
Finding the smallest number in a sorted tree is dependent on its height. Since the AVL tree can't exceed the height of log(k) the complexity of finding the smallest number is O(log(k)).

Valid Sudoku: How to decrease runtime

Problem is to check whether the given 2D array represents a valid Sudoku or not. Given below are the conditions required
Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the 9 3x3 sub-boxes of the grid must contain the digits 1-9 without repetition.
Here is the code I prepared for this, please give me tips on how I can make it faster and reduce runtime and whether by using the dictionary my program is slowing down ?
def isValidSudoku(self, boards: List[List[str]]) -> bool:
r = {}
a = {}
for i in range(len(boards)):
c = {}
for j in range(len(boards[i])):
if boards[i][j] != '.':
x,y = r.get(boards[i][j]+f'{j}',0),c.get(boards[i][j],0)
u,v = (i+3)//3,(j+3)//3
z = a.get(boards[i][j]+f'{u}{v}',0)
if (x==0 and y==0 and z==0):
r[boards[i][j]+f'{j}'] = x+1
c[boards[i][j]] = y+1
a[boards[i][j]+f'{u}{v}'] = z+1
else:
return False
return True
Simply optimizing assignment without rethinking your algorithm limits your overall efficiency by a lot. When you make a choice you generally take a long time before discovering a contradiction.
Instead of representing, "Here are the values that I have figured out", try to represent, "Here are the values that I have left to try in each spot." And now your fundamental operation is, "Eliminate this value from this spot." (Remember, getting it down to 1 propagates to eliminating the value from all of its peers, potentially recursively.)
Assignment is now "Eliminate all values but this one from this spot."
And now your fundamental search operation is, "Find the square with the least number of remaining possibilities > 1. Try each possibility in turn."
This may feel heavyweight. But the immediate propagation of constraints results in very quickly discovering constraints on the rest of the solution, which is far faster than having to do exponential amounts of reasoning before finding the logical contradiction in your partial solution so far.
I recommend doing this yourself. But https://norvig.com/sudoku.html has full working code that you can look at at need.

Fibonacci memoization order of execution

The following code is fibonacci sequence using memoization. But I do not understand the order of execution in the algorithm. If we do dynamic_fib(4), it will calculate dynamic_fib(3) + dynamic_fib(2). And left side calls first, then it calculates dynamic_fib(2) + dynamic_fib(1). But while calculating dynamic_fib(3), how does the cached answer of dynamic_fib(2) propagate up to be reused when we are not saving the result to the memory address of the dictionary like &dic[n] in C.
What I think should happen is, the answer for dynamic_fib(2) is gone because it only existed in that stack. So you have to calculate dynamic_fib(2) again when calculating dynamic_fib(4)
Am I missing something?
def dynamic_fib(n):
return fibonacci(n, {})
def fibonacci(n, dic):
if n == 0 or n == 1:
return n
if not dic.get(n, False):
dic[n] = fibonacci(n-1, dic) + fibonacci(n-2, dic)
return dic[n]
The function dynamic_fib (called once) just delegates the work to fibonacci, where the real work is done. In fibonacci you have the dictionary dic which is used to save the values of the function once it is calculated. So, for each of the values (2-n) when you call the function fibonacci for the first time, it calculates the result, but it also stores it in the dictionary, so that when the next time we ask for it, we alread have it, and we don't need to travel the whole tree again. So the complexity is linear , O(n).

Randomized algorithm for string matching

Question:
Given a text t[1...n, 1...n] and p[1...m, 1...m], n = 2m, from alphabet [0, Sigma-1], we say p matches t at [i,j] if t[i+k-1, j+L-1] = p[k,L] for all k,L. Design a randomized algorithm to find all matches in O(n^2) time with high probability.
Image:
Can someone help me understand what this text means? I believe it is saying that 't' has two words in it and the pattern is also two words but the length of both patterns is half of 't'. However, from here I don't understand how the range of [i,j] comes into play. That if statement goes over my head.
This could also be saying that t and p are 2D arrays and you are trying to match a "box" from the pattern in the t 2D array.
Any help would be appreciated, thank you!
The problem asks you to find a 2D pattern i.e defined by the p array in the t array which is also 2D.
The most obvious randomized solution to this problem would be to generate two random indexes i and j and then start searching for the pattern from that (i, j).
To avoid doing redundant searches you can keep track of which pairs of (i, j) you have visited before, this can be done using a simple look up 2D array.
The complexity of above would be O(n^3) in the worst case.
You can also use hashing for comparing the strings to reduce the complexity to O(n^2).
You first need to hash the t array row by row and store the value in an array like hastT, you can use the Rolling hash algorithm for that.
You can then hash the p array using Rolling hash algorithm and store the hashes row by row in the array hashP.
Then when you generate the random pair (i, j), you can get the hash of the corresponding t array using the array hashT in linear time instead of the brute force comparision that takes quadratic time and compare (Note there can be collisions in the hash you can brute force when a hash matches to be completely sure).
To find the corresponding hash using the hashT we can do the following, suppose the current pair (i, j) is (3, 4), and the dimensions of the p array are 2 x 3.
Then we can compare hashT[3][7] - hash[3][3] == hashP[3] to find the result, the above logic comes from the rolling hash algo.
Pseudocode for search in linear time using hashing :
hashT[][], hashS[]
i = rand(), j = rand();
for(int k = i;k < i + lengthOfColumn(p);i++){
if((hashT[i][j + lengthOfRow(p)] - hashT[i][j-1]) != hashP[i]){
//patter does not match.
return false;
}
}

MATLAB: fastest way to do a root-mean-squared error between a vector and array of vectors

I have a question regarding the fastest way to compute the RMSE between a single vector and an array of vectors. Specifically, I have a vector A representing an point and would like to find the index in a list B of points that A is closest to. Right now I am using:
tempmat = bsxfun(#minus,A,B);
tempmat1 = sqrt(sum(tempmat.^2,2);
index = find(tempmat1 == min(tempmat1));
this takes about 0.058 seconds to calculate the index. Is there a faster way in MATLAB of doing this? I performing this calculations literally millions of times.
Many thanks for reading,
Joe
tempmat = bsxfun(#minus,A,B);
tmpmat1 = sum(tempmat.^2,2);
[m,index] = min(tempmat1);
m = sqrt(m); %# optional, only if you need the actual numerical value
This avoids calculating sqrt on the whole array, since the minumum of the squared differences will have the same index. It also uses the second output of min to avoid the second pass of find.
You'll probably find that
tempmat = A - B(ones(1, size(A,1)), :)
is faster than the bsxfun version, unless size(A,1) is exceptionally large.
This assumes that A is your array and B is your vector. The RSS calculation implies that you have row vectors.
Also, I presume you know that you're calculating the RSS not RMS.

Resources