So I saw a video about the Knapsnack problem, which can be solved recursively as well as using dynamic programming. The gist I got about dynamic programming is that it's nothing more than a dictionary, list or collectively a record of stuff we have already computed so we don't have to compute it again.
Is that what dynamic programming is all about? Performing record keeping and using when necessary?
In simple words, we are solving a small problem(called subproblem) and then use it to solve bigger problems.
To achieve this we keep a record of what we have computed till now which can inturn be used next time rather than computing all over again.
We think of a dynamic programming approach to a problem if it has
overlapping subproblems
optimal substructure
In very simple words we can say dynamic programming has two faces, they are top-down and bottom-up approaches.
In the top-down approach, we will try to write a recursive solution or a brute-force solution and memoize the results so that we will try to use that result when a similar subproblem arrives, so it is brute-force + memoization.
In the bottom-up approach, we will try to form a solution from base cases or very small subproblems where we already know the solution to. We will build the solution to the larger problem by filling a dynamic programming table that maps every combination possible giving again a brute-force template.
Coming up with a mathematical relation to the problem and also identify the above two properties is the challenging part.
overlapping subproblems
Informally, When a problem needs the same subproblem to be solved more than once then we say it has got overlapping subproblems.
optimal substructure
Informally, when you need to solve a problem for size n, so you divide that problem into subproblems of size n'. so now let's say it has got two stages, one stage is the problem n and another stage is the subproblems n'. Also, let's assume that you know the optimal solutions for size n' so you somehow combine these subproblem solutions together and get a solution for the size n. if the combined solution is same as the actual optimal solution for the problem of size n then you can safely say that the problem has got optimal substructure.
Let's take a simple example of finding nth Fibonacci number, to understand the two properties well.
The usual mathematical recursive relation would be
F(n) = F(n-1) + F(n-2)
Let's try to figure out the two properties for this example.
Informally, it's always easy to take a value for the size n for self-understanding.
Let n be 3 then,
F(3) = F(2) + F(1)
We know the optimal solutions for the F(0) = 0 and F(1) = 1 as base cases.
overlapping subproblems
F(3)
/ \
/ \
F(2) F(1)
/ \ / \
F(1) F(0) F(0) 0
From the above recursive tree you can easily find that we have to recompute F(0) and F(1) more than once. So it has got overlapping subproblems.
Optimal Substructure
We know the Fibonacci sequence as 0 1 1 2 ...
Let's consider a subtree as
F(2)
/ \
F(1) F(0)
Combining the optimal solutions of subproblems with an addition operation would be as
F(2) = F(1) + F(0)
F(2) = 1 + 0
F(2) = 1
Combining subproblem solutions has given us the actual optimal solution solution to the problem n=2 which can be confirmed from the known Fibonacci sequence. So this problem also has an optimal substructure.
From Wikipedia: "Dynamic Programming is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions using a memory-based data structure (array, map,etc)."
Like with recursive algorithms, the key is breaking down the problem in smaller sub-problems, using efficient data-structures to help you in the task.
So, in a nutshell, it is exactly about efficient record keeping (+ sorting algorithms + smart data structures).
Related
I'm trying to invent programming exercise on Suffix Arrays. I learned O(n*log(n)^2) algorithm for constructing it and then started playing with random input strings of varying length in order to find out when naive approach becomes too slow. E.g. I wanted to choose string length so that people will need to implement "advanced" algorithm.
Suddenly I found that naive algorithm (with using logarithmic sort on all suffixes) is not as slow as O(n^2 * log(n)) means. After thinking a bit, I understand that comparison of suffixes of a randomly generated string is not O(n) amortized. Really, we usually only compare few first characters before we come to difference and there we return from comparison function. This of course depends on the size of the alphabet, but anyway it does not depend much on the length of suffixes.
I tried simple implementation in PHP processing 50000-characters string in 2 seconds (despite slowness of scripting language). If it will work at least as O(n^2) we'll expect it to work at least several minutes (with 1e7 operations per second and ~1e9 operations total).
So I understand that even if it is O(n^2 * log(n)) then the constant factor is a very small fraction of 1, really something close to 0. Or we should say about such complexity as worst-case only, right?
But what is the amortized time complexity of the naive approach? I'm bit bewildered about how to assess it.
You seem to be confusing amortized and expected complexity. In this case you are talking about expected complexity. And yes the stated complexity is computed assuming that the suffix comparison takes O(n). This will be the worst case for suffix comparison and for random generated input you will only perform constant number of comparisons in most cases. Thus O(n^2*log(n)) is worst case complexity.
One more note - on a modern computer you can perform a few billion elementary instructions in a second and it is possible that you execute in the order of 50000^2 in 2 seconds. The correct way to benchmark complexity of an algorithm is to measure the time it takes to complete e.g. for input of size N, N*2, N*4,...(as many as you can go) and then to interpolate the function that would describe the computational complexity
Yesterday, I posted a question about general concept of SVM Primal Form Implementation:
Support Vector Machine Primal Form Implementation
and "lejlot" helped me out to understand that what I am solving is a QP problem.
But I still don't understand how my objective function can be expressed as QP problem
(http://en.wikipedia.org/wiki/Support_vector_machine#Primal_form)
Also I don't understand how QP and Quasi-Newton method are related
All I know is Quasi-Newton method will SOLVE my QP problem which supposedly formulated from
my objective function (which I don't see the connection)
Can anyone walk me through this please??
For SVM's, the goal is to find a classifier. This problem can be expressed in terms of a function that you are trying to minimize.
Let's first consider the Newton iteration. Newton iteration is a numerical method to find a solution to a problem of the form f(x) = 0.
Instead of solving it analytically we can solve it numerically by the follwing iteration:
x^k+1 = x^k - DF(x)^-1 * F(x)
Here x^k+1 is the k+1th iterate, DF(x)^-1 is the inverse of the Jacobian of F(x) and x is the kth x in the iteration.
This update runs as long as we make progress in terms of step size (delta x) or if our function value approaches 0 to a good degree. The termination criteria can be chosen accordingly.
Now consider solving the problem f'(x)=0. If we formulate the Newton iteration for that, we get
x^k+1 = x - HF(x)^-1 * DF(x)
Where HF(x)^-1 is the inverse of the Hessian matrix and DF(x) the gradient of the function F. Note that we are talking about n-dimensional Analysis and can not just take the quotient. We have to take the inverse of the matrix.
Now we are facing some problems: In each step, we have to calculate the Hessian matrix for the updated x, which is very inefficient. We also have to solve a system of linear equations, namely y = HF(x)^-1 * DF(x) or HF(x)*y = DF(x).
So instead of computing the Hessian in every iteration, we start off with an initial guess of the Hessian (maybe the identity matrix) and perform rank one updates after each iterate. For the exact formulas have a look here.
So how does this link to SVM's?
When you look at the function you are trying to minimize, you can formulate a primal problem, which you can the reformulate as a Dual Lagrangian problem which is convex and can be solved numerically. It is all well documented in the article so I will not try to express the formulas in a less good quality.
But the idea is the following: If you have a dual problem, you can solve it numerically. There are multiple solvers available. In the link you posted, they recommend coordinate descent, which solves the optimization problem for one coordinate at a time. Or you can use subgradient descent. Another method is to use L-BFGS. It is really well explained in this paper.
Another popular algorithm for solving problems like that is ADMM (alternating direction method of multipliers). In order to use ADMM you would have to reformulate the given problem into an equal problem that would give the same solution, but has the correct format for ADMM. For that I suggest reading Boyds script on ADMM.
In general: First, understand the function you are trying to minimize and then choose the numerical method that is most suited. In this case, subgradient descent and coordinate descent are most suited, as stated in the Wikipedia link.
I am working on multi objective Genetic Algorithms, I have say 4 objectives and no. of generations is 400, and a population size of 100.
So how many function evaluation will be there?
I mean to say is it 4*400*100 or 400*100?
If for each chromosome you evaluate 4 functions, then obviously you have a total of 4*400*100 evaluations.
What you might also want to consider is the running time of each of this evaluations, because if 3 of the functions run in O(n) and the forth runs in O(n^2), the total running time will be bounded by O(number_of_gens*population_size*n^2), and will be only mildly affected by the other three functions in large problem instances.
If you're asking about the number of evaluations as counted by MOO researchers (i.e., you want to know whether your algorithm is better than mine with the same number of evaluations), then the accepted answer is incorrect. In multi-objective optimization, we formally consider the problem not as optimizing k different functions, but as optimizing one vector-valued function.
It's one evaluation per individual, regardless of the dimensionality of the objective space.
As far as I know, the number of function evaluation of genetic algorithm can be calculated through following equation:
Number of function evaluations = Number of main population + [number of new children(from cross over) + number of mututed children(from mutation)] * number of itteration.
I have a problem which is a variation of the partition problem which is NP-complete. This is an optimization problem, not a decision problem.
Problem: Partition a list of numbers into two subsets such that their difference of sums is minimum, and find the two subsets. If n even, then the sizes should be n/2, and if odd, then floor[n/2] and ceil[n/2].
Assuming that the pseudo polynomial time DP algorithm is the best for an exact solution, how can it be modified to solve this? And what would be the best approximate algorithms to solve this?
Since you didn't specified which algorithm to use i'll assume you use the one defined here:
http://www.cs.cornell.edu/~wdtseng/icpc/notes/dp3.pdf
Then using this algorithm you add a variable to track the best result, initialize it to N (sum of all the numbers in the list as you can always take one subset to be the empty set) and every time you update T (e.g: T[i]=true) you do something like bestRes = abs(i-n/2)<bestRes : abs(i-n/2) : bestRes. And you return bestRes. This of course doesn't change the complexity of the algorithm.
I've got no idea about your 2nd question.
in the J programming language,
-: i. 5
the above function computes the halves of all integers in [0,4]. Now let's say I'd like to re-write the -: function, just for the fun of it. My best guess so far was
]&%.2
but that doesn't seem to cut it. How do you do it?
%&2 NB. divide by two
0.5&* NB. multiply by one half
Note that ] % 2: would also work, but to ensure proper grammar you would either want to use that as the definition of a name, or you would want to put the expression in parenthesis.
I saw you were using %. probably because you were dividing a matrix and thought you needed to do a "matrix divide".
The matrix divide and matrix inverse they are talking about there is for matrix algebra, where you have a list of, well, essentially polynomials, and you want to do transformations on the polynomials all at once, so as to solve the equations. One of the things you can do really easily in J is matrix algebra, there are builtins for matrix divide and for inverting a matrix (as you have seen) and in the phrases section, there are short phrases for doing all of the typical matrix transformations. Taking the determinant, for example.
But when you are simply dividing a vector by a scalar to get a vector, or you are dividing a matrix by the corresponding elements of another matrix, well, that is just the % division symbol.
If you want to try and understand this, look at euler problem 101 (http://projecteuler.net/problem=101) and then google curve fitting on the Jsoftware.com site. Creating the matrixes from the observations, and the basic matrixes as shown allow you to solve for ax^2+bx+c = y where you have x and y and you want to determine a, b, and c. Just remember to use extended arithmetic for everything, as the resultant equations are very good but not perfect unless you do, and to solve the equation you need perfect equations.
Just a thought, unless you want to play with Matrix Algebra, you might not care.