I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
For a problem like the Rod Cutting Problem. In this case while finding the optimal cut, do we consider each cut hence it can be considered as Overlapping Subproblem and work bottom-up. Or do we consider the optimal cut for a given input n and work top-down.
Hence, while they do deal with the optimality in the end, what are the exact differences between the two approaches.
I tried referring to this Overlapping Subproblem, Optimal Substructure and this page as well.
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This thread makes a valid point but I'm hoping if it could be broken down easier.
To answer your main question: overlapping subproblems and optimal substructure are both different concepts/properties, a problem that has both these properties or conditions being met can be solved via Dynamic Programming. To understand the difference between them, you actually need to understand what each of these term means in regards to Dynamic Programming.
I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
This is a poorly worded statement. You need to familiarize yourself with the basics of Dynamic Programming. Hopefully following explanation will help you get started.
Let's start with defining what each of these terms, Optimal Substructure & Overlapping Subproblems, mean.
Optimal Substructure: If optimal solution to a problem, S, of size n can be calculated by JUST looking at optimal solution of a subproblem, s, with size < n and NOT ALL solutions to subproblem, AND it will also result in an optimal solution for problem S, then this problem S is considered to have optimal substructure.
Example (Shortest Path Problem): consider a undirected graph with vertices a,b,c,d,e and edges (a,b), (a,e), (b,c), (c,d), (d,a) & (e,b) then shortest path between a & c is a -- b -- c and this problem can be broken down into finding shortest path between a & b and then shortest path between b & c and this will give us a valid solution. Note that we have two ways of reaching b from a:
a -- b (Shortest path)
a -- e -- b
Longest Path Problem does not have optimal substructure. Longest path between a & d is a -- e -- b -- c -- d, but sum of longest paths between a & c (a -- e -- b -- c) and c & d (c -- b -- e -- a -- d) won't give us a valid (non-repeating vertices) longest path between a & d.
Overlapping Subproblems: If you look at this diagram from the link you shared:
You can see that subproblem fib(1) is 'overlapping' across multiple branches and thus fib(5) has overlapping subproblems (fib(1), fib(2), etc).
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This again is a poorly worded question. Top-down(recursive) and bottom-up(iterative) approaches are different ways of solving a DP problem using memoization. From the Wikipedia article of Memoization:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
For the given fibonacci example, if we store fib(1) in a table after it was encountered the first time, we don't need to recompute it again when we see it next time. We can reuse the stored result and hence saving us lot of computations.
When we implement an iterative solution, "table" is usually an array (or array of arrays) and when we implement a recursive solution, "table" is usually a dynamic data structure, a hashmap (dictionary).
You can further read this link for better understanding of these two approaches.
Related
I am trying to understand Bellman Equation and facing with some confusing moments.
1) In different sources I met different definitions of Bellman Equation.
Sometimes it is defined as value-state function
v(s) = R + y*V(s')
Sometimes it is defined as action-state function
q(s, a) = r + max(q(s', a'))
Are both of these definitions correct? How Bellman equation was introduced in the original paper?
Bellman equation gives a definite form to dynamic programming solutions and using that we can generalise the solutions to optimisations problems which are recursive in nature and follow the optimal substructure property.
Optimal substructure in simpler terms means that the given problem can be broken down into smaller sub problems which require the same solution with smaller data. If an optimal solution to the smaller problem can be computed then it means the given problem (larger one) can also be computed.
Let's denote the problem solution for given state S by value V(S), S is the state or the subproblem. Let's denote the cost that would incur by choosing action a(i) at state S be R. R will be a function f(S, a(i)), where a is the set of all possible actions that can be performed on state S.
V(S) = max{ f(S, a(i)) + y * V(S') } where max is taken by iterating over all possible i. y is a fixed constant that taxes the subproblem to bigger problem transition, for most problems y = 1, so you can ignore it for now.
So basically at any given sub-problem S, V(S) will give us the most optimal solution by choosing all combinations of actions a(i) that can be performed and the next state that will be created with that action. If you think recursively and are habitual to such stuff then it's easy to see why the above equation is correct.
I would suggest to solve dynamic programming problems and look at some standard problems and their solutions to get an idea how those problems are broken down into smaller similar problems and solved recursively. After that, the above equation will make more sense. Also, you will realise that two equations you have written above are almost the same thing, just they are written in a bit different manner.
Here is a list of more commonly known DP problems and their solutions.
It is mentioned in Norvig's Artificial Intelligence that A* Search is optimally efficient. However, I could not figure out why nor find on the web the proof. Does anyone happen to have a proof?
I hope I'm not doing your homework ;). I only sketch the proof here
First thing you need to see is that A* is optimal. That is it returns the shortest path according to your cost function g. I think this prove is trivial under the assumption that the heuristic h is not overestimating the cost of the solution. If this wouldn't hold optimal efficiency would be meaningless, as A* wouldn't be optimal.
Optimal efficiency: among all optimal algorithms starting from the same name node A* is expending the fewest nodes.
Lets assume an algorithm B does not expand a node n which is A* expanded by A*. By definition for this path g(n)+h(n) <= f where f is the cost of the shortest path. Consider a second problem for which all heuristics values are the same as in the original problem. However, there is a new path to a new goal with total cost smaller f.
The assumed algorithm B would expand n hence never reach this new goal. Hence, B wouldn't find this optimal path. Therefore, our original assumption that B is optimal is violated.
I have read these words:
There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems. If a problem can be solved by combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and conquer". This is why mergesort and quicksort are not classified as dynamic programming problems.
I have the 3 questions:
Why mergesort and quicksort is not Dynamic programming?
I think mergesort also can be divided small problems and small problems then do the same thing and so on.
Is Dijkstra Algorithm using dynamic algorithm?
Are there applied examples of using Dynamic programming?
The key words here are "overlapping subproblems" and "optimal substructure". When you execute quicksort or mergesort, you are recursively breaking down your array into smaller pieces that do not overlap. You never operate over the same elements of the original array twice during any given level of the recursion. This means there is no opportunity to re-use previous calculations. On the other hand, many problems DO involve performing the same calculations over overlapping subsets, and have the useful characteristic that an optimal solution to a subproblem can be re-used when computing the optimal solution to a larger problem.
Dijkstra's algorithm is a classic example of dynamic programming, as it re-uses prior computations to discover the shortest path between two nodes A and Z. Say that A's immediate neighbors are B and C. We can find the shortest path from A to Z by summing the distance between A and B with our computed shortest path from B to Z; and do similarly for finding the shortest path from C to Z. Then the shortest path from A to Z will be the shorter of these two paths. The key insight here is that we can re-use the shortest path computations for paths of length 2 when computing the shortest paths of length 3, and so on. Doing so results in a much more efficient algorithm.
Dynamic programming can be used to solve many types of problems -- see http://en.wikipedia.org/wiki/Dynamic_programming#Examples:_Computer_algorithms for some examples.
For dynamic programming to be applicable to a problem, there should be
i. An optimal structure in the subproblems:
This means that when you break down your problem into smaller units, those smaller units also need to be broken down into yet smaller units for an optimal solution. For example, in merge sort, a array of numbers can get sorted if we divide it into two subarrays, get them sorted and combine them. While sorting these two subarrays, repeat the same process you followed in the previous sentence. So an optimal solution (a sorted array) is got when we find an optimal solution to its subproblems (we sort the subarrays and combine them). This requirement is fulfilled for merge sort. Also the subproblems must be independent for them to follow an optimal structure. This is also fulfilled by merge sort as the subproblems' solutions do not get affected by each others' solutions. For example, the solutions to the two parts of an array are not affected by each other's sortedness.
ii. Overlapping subproblems:
This means that while solving for the solution, the subproblems you formulate get repeated, and hence need only be solved once. In the case of merge sort, this requirement will be met only rarely in the normal case. An array of numbers like 2 1 3 4 9 4 2 1 3 1 9 4 may be a good candidate for overlapping subproblems for merge sort. In this case, the solution to the subproblem sort(2 1 3) can be stored in a table to be reused, because it will be called twice during the computation. But as you can see, there is a very slim chance that a random array of numbers will have this kind of a repeated contrivance. So it would only be inefficient if we used a dynamic programming technique like memoization for an algorithm like merge sort.
Yes. Dijkstra's algorithm uses dynamic programming as mentioned by #Alan in the comment. link
Yes. If I may quote Wikipedia here,
"Dynamic programming is widely used in bioinformatics for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein-DNA binding." 1
1 https://en.wikipedia.org/wiki/Dynamic_programming
This is a problem appeared in today's Pacific NW Region Programming Contest during which no one solved it. It is problem B and the complete problem set is here: http://www.acmicpc-pacnw.org/icpc-statements-2011.zip. There is a well-known O(n^2) algorithm for LCS of two strings using Dynamic Programming. But when these strings are extended to rings I have no idea...
P.S. note that it is subsequence rather than substring, so the elements do not need to be adjacent to each other
P.S. It might not be O(n^2) but O(n^2lgn) or something that can give the result in 5 seconds on a common computer.
Searching the web, this appears to be covered by section 4.3 of the paper "Incremental String Comparison", by Landau, Myers, and Schmidt at cost O(ne) < O(n^2), where I think e is the edit distance. This paper also references a previous paper by Maes giving cost O(mn log m) with more general edit costs - "On a cyclic string to string correcting problem". Expecting a contestant to reproduce either of these papers seems pretty demanding to me - but as far as I can see the question does ask for the longest common subsequence on cyclic strings.
You can double the first and second string and then use the ordinary method, and later wrap the positions around.
It is a good idea to "double" the strings and apply the standard dynamic programing algorithm. The problem with it is that to get the optimal cyclic LCS one then has to "start the algorithm from multiple initial conditions". Just one initial condition (e.g. setting all Lij variables to 0 at the boundaries) will not do in general. In practice it turns out that the number of initial states that are needed are O(N) in number (they span a diagonal), so one gets back to an O(N^3) algorithm.
However, the approach does has some virtue as it can be used to design efficient O(N^2) heuristics (not exact but near exact) for CLCS.
I do not know if a true O(N^2) exist, and would be very interested if someone knows one.
The CLCS problem has quite interesting properties of "periodicity": the length of a CLCS of
p-times reapeated strings is p times the CLCS of the strings. This can be proved by adopting a geometric view off the problem.
Also, there are some additional benefits of the problem: it can be shown that if Lc(N) denotes the averaged value of the CLCS length of two random strings of length N, then
|Lc(N)-CN| is O(\sqrt{N}) where C is Chvatal-Sankoff's constant. For the averaged length L(N) of the standard LCS, the only rate result of which I know says that |L(N)-CN| is O(sqrt(Nlog N)). There could be a nice way to compare Lc(N) with L(N) but I don't know it.
Another question: it is clear that the CLCS length is not superadditive contrary to the LCS length. By this I mean it is not true that CLCS(X1X2,Y1Y2) is always greater than CLCS(X1,Y1)+CLCS(X2,Y2) (it is very easy to find counter examples with a computer).
But it seems possible that the averaged length Lc(N) is superadditive (Lc(N1+N2) greater than Lc(N1)+Lc(N2)) - though if there is a proof I don't know it.
One modest interest in this question is that the values Lc(N)/N for the first few values of N would then provide good bounds to the Chvatal-Sankoff constant (much better than L(N)/N).
As a followup to mcdowella's answer, I'd like to point out that the O(n^2 lg n) solution presented in Maes' paper is the intended solution to the contest problem (check http://www.acmicpc-pacnw.org/ProblemSet/2011/solutions.zip). The O(ne) solution in Landau et al's paper does NOT apply to this problem, as that paper is targeted at edit distance, not LCS. In particular, the solution to cyclic edit distance only applies if the edit operations (add, delete, replace) all have unit (1, 1, 1) cost. LCS, on the other hand, is equivalent to edit distances with (add, delete, replace) costs (1, 1, 2). These are not equivalent to each other; for example, consider the input strings "ABC" and "CXY" (for the acyclic case; you can construct cyclic counterexamples similarly). The LCS of the two strings is "C", but the minimum unit-cost edit is to replace each character in turn.
At 110 lines but no complex data structures, Maes' solution falls towards the upper end of what is reasonable to implement in a contest setting. Even if Landau et al's solution could be adapted to handle cyclic LCS, the complexity of the data structure makes it infeasible in a contest setting.
Last but not least, I'd like to point out that an O(n^2) solution DOES exist for CLCS, described here: http://arxiv.org/abs/1208.0396 At 60 lines, no complex data structures, and only 2 arrays, this solution is quite reasonable to implement in a contest setting. Arriving at the solution might be a different matter, though.
What's the best way to (relative) grade a class (of 50 students) on a test (with 7 questions)?
They did not want the traditional percentile-intervals answer, but a more CS-ey one.
It's a pretty open ended question, they asked to assume the following framework:
Input
[m_1,...,m_50], where each m_i is a 7-vector for marks scored in the 7 questions for each of the 50 students.
[c_1,...,c_7], where each c_i is a vector of 'concepts' tested by each question. c_i's need not be disjoint. We can assume to have an importance ordering amongst elements of union(c_i).
Simplistic approach: Assuming that all concepts have the same value I would just sum it all up. One point for each concept everywhere.
Holistic approach: It could be that the question with more concepts is significantly harder than the question with fewer (and worth more than the sum of concepts). Concepts "interact" with each other. To alleviate this I would put a value of (N over C) to each question, where N is the size of the vector of concepts, and C is total number of concepts. And then I would sum it all up.
True holistic approach: If concepts are repeated in different questions then we should "tone down" their influence. However I'm not sure how to accomplish this. Maybe we should divide each (N over C) value with the number of repetitions of each concept involved.
I ignored the importance ordering of concepts, because I don't know how to put a value on that.