Time Complexity of a Bounded Knapsack problem - dynamic-programming

For the bounded knapsack with repetition problem, I am given a knapsack with capacity, W. I must fill the knapsack such that I maximize the total value of items inside the knapsack, where each item has an associated weight and value, w and v. However, I can only have a maximum of K of each items. I found an algorithm that does this and I believe it runs with a time complexity of O(Wn). But, I've found a lot of sources that says it could be O(WKn). Why would it be O(WKn) if that is the correct time complexity?
For transparency, this is indeed a homework problem. I just need assistance on why the time complexity is O(WKn), unless the sources I read were completely wrong.

Related

Time complexity of A* and Dijsktra's search

In a grid based environment, the number of edges E = O(V).
A* would, in the worst case, explore all the nodes of the grid.
So, number of nodes in open list would be E in the worst case.
Therefore, A's time complexity in worst case is O(E).
Now, as we know that Dijsktra's time complexity is O(E + VlogV).
And A* in worst case would explore same number of nodes as Dijkstra would explore.
Thus, the time complexity of A* should have been equal to Dijkstra = O(E).
But we have VlogV term extra in Dijkstra.
What am I missing here? I request for a detailed explanation.
Thanks in advance.

Flipping a three-sided coin

I have two related question on population statistics. I'm not a statistician, but would appreciate pointers to learn more.
I have a process that results from flipping a three sided coin (results: A, B, C) and I compute the statistic t=(A-C)/(A+B+C). In my problem, I have a set that randomly divides itself into sets X and Y, maybe uniformly, maybe not. I compute t for X and Y. I want to know whether the difference I observe in those two t values is likely due to chance or not.
Now if this were a simple binomial distribution (i.e., I'm just counting who ends up in X or Y), I'd know what to do: I compute n=|X|+|Y|, σ=sqrt(np(1-p)) (and I assume my p=.5), and then I compare to the normal distribution. So, for example, if I observed |X|=45 and |Y|=55, I'd say σ=5 and so I expect to have this variation from the mean μ=50 by chance 68.27% of the time. Alternately, I expect greater deviation from the mean 31.73% of the time.
There's an intermediate problem, which also interests me and which I think may help me understand the main problem, where I measure some property of members of A and B. Let's say 25% in A measure positive and 66% in B measure positive. (A and B aren't the same cardinality -- the selection process isn't uniform.) I would like to know if I expect this difference by chance.
As a first draft, I computed t as though it were measuring coin flips, but I'm pretty sure that's not actually right.
Any pointers on what the correct way to model this is?
First problem
For the three-sided coin problem, have a look at the multinomial distribution. It's the distribution to use for a "binomial" problem with more then 2 outcomes.
Here is the example from Wikipedia (https://en.wikipedia.org/wiki/Multinomial_distribution):
Suppose that in a three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?
Note: Since we’re assuming that the voting population is large, it is reasonable and permissible to think of the probabilities as unchanging once a voter is selected for the sample. Technically speaking this is sampling without replacement, so the correct distribution is the multivariate hypergeometric distribution, but the distributions converge as the population grows large.
Second problem
The second problem seems to be a problem for cross-tabs. Then use the "Chi-squared test for association" to test whether there is a significant association between your variables. And use the "standardized residuals" of your cross-tab to identify which of the assiciations is more likely to occur and which is less likely.

How does the Needleman Wunsch algorithm compare to brute force?

I'm wondering how you can quantify the results of the Needleman-Wunsch algorithm (typically used for aligning nucleotide/protein sequences).
Consider some fixed scoring scheme and two sequences of varying length S1 and S2. Say we calculate every possible alignment of S1 and S2 by brute force, and the highest scoring alignment has a score x. And of course, this has considerably higher complexity than the Needleman-Wunsch approach.
When using the Needleman-Wunsch algorithm to find a sequence alignment, say that it has a score y.
Consider r to be the score generated via Needleman-Wunsch for two random sequences R1 and R2.
How does x compare to y? Is y always greater than r for two sequences of known homology?
In general, I do understand that we use the Needleman-Wunsch algorithm to significantly speed up sequence alignment (vs a brute-force approach), but don't understand the cost in accuracy (if any) that comes with it. I had a go at reading the original paper (Needleman & Wunsch, 1970) but am still left with this question.
Needlman-Wunsch always produces an optimal answer - it's much faster than brute force and doesn't sacrifice accuracy in the process. The key insight it uses is that it's not actually necessary to generate all possible alignments, since most of them contain bad sub-alignments and couldn't possibly be optimal. The Needleman-Wunsch algorithm works by instead slowly building up optimal alignments for fragments of the original strands and then slowly growing those smaller alignments into larger alignments using the guarantee that any optimal alignment must contain an optimal alignment for a slightly smaller case.
I think your question boils down to whether dynamic programming finds the optimal solution ie, garantees that y >= x. For a discussion on this I would refer to people who are likely smarter than me:
https://cs.stackexchange.com/questions/23599/how-is-dynamic-programming-different-from-brute-force
Basically, it says that dynamic programming will likely produce optimal result ie, same as brute force, but only for particular problems that satisfy the Bellman principle of optimality.
According to Wikipedia page for Needleman-Wunsch, the problem does satisfy Bellman principle of optimality:
https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
Specifically:
The Needleman–Wunsch algorithm is still widely used for optimal global
alignment, particularly when the quality of the global alignment is of
the utmost importance. However, the algorithm is expensive with
respect to time and space, proportional to the product of the length
of two sequences and hence is not suitable for long sequences.
There is also mention of optimality elsewhere in the same Wikipedia page.

How is the max of a set of admissible heuristics, a dominating heuristic?

If you have a set of admissible heuristics: h1,h2,h2,...,hn
How is h = max(h1,h2,h2,...,hn) an admissible heuristic that dominates them all?
Isn't a lower h(n) value better?
For A*, f = g + n, and the element with the lowest f will be removed from the list. So shouldn't taking the min give the dominating heuristic?
An admissible heuristic never overestimates the cost of reaching the goal state. That is, its estimate will be lower than the actual cost or exactly the actual cost, but never higher. This is required for greedy approaches like A* search to find the global best solution.
For example, imagine you found a solution with cost 10. The best solution has cost 8. You're not using an admissible heuristic, and the estimate of the heuristic for the solution that really has cost 8 is 12 (it's overestimating). As you already have a solution with cost 10, A* will never evaluation the best solution as it is estimated to be more expensive.
Ideally, your heuristic should be as accurate as possible, i.e. an admissible heuristic shouldn't underestimate the true cost too much. If it does, A* will still find the best solution eventually, but it may take a lot longer to do so because it tries a lot of solutions that look good according to your heuristic, but turn out to be bad.
This is where the answer for your question lies. Your heuristics h1, ..., hn are all admissible, therefore they estimate a cost equal to or less than the true cost. The maximum of this set of estimates is therefore by definition the estimate that is closest to the actual cost (remember that you'll never overestimate). In the ideal case, it will be the exact cost.
If you were to take the minimum value, you would end up with the estimate that is furthest away from the actual cost -- as outlined above, A* would still find the best solution, but in a much less efficient manner.

PartitionProblem variation - fixed size of subsets

I have a problem which is a variation of the partition problem which is NP-complete. This is an optimization problem, not a decision problem.
Problem: Partition a list of numbers into two subsets such that their difference of sums is minimum, and find the two subsets. If n even, then the sizes should be n/2, and if odd, then floor[n/2] and ceil[n/2].
Assuming that the pseudo polynomial time DP algorithm is the best for an exact solution, how can it be modified to solve this? And what would be the best approximate algorithms to solve this?
Since you didn't specified which algorithm to use i'll assume you use the one defined here:
http://www.cs.cornell.edu/~wdtseng/icpc/notes/dp3.pdf
Then using this algorithm you add a variable to track the best result, initialize it to N (sum of all the numbers in the list as you can always take one subset to be the empty set) and every time you update T (e.g: T[i]=true) you do something like bestRes = abs(i-n/2)<bestRes : abs(i-n/2) : bestRes. And you return bestRes. This of course doesn't change the complexity of the algorithm.
I've got no idea about your 2nd question.

Resources