I'm wondering if there is a reasonable way of solving Multiple Knapsack using DP. I get the point in 0-1 Knapsack Problem. The recurrence is quite straightforward, add item/ not add item.
dp[item][capacity] = max{
value[item] + dp[item - 1][capacity - weight[item]],
dp[item - 1][capacity]}
However, I cannot see how to get an recurrence equation for the Multiple Knapsack. Should I extend the recurrence equation to "add item bag 1/ not add item bag 1/ add item bag 2/ not add item bag 2" and so on and so forth? It does not seem a good approach as the number of bags becomes larger and larger.
Related
Need advice on a clustering model. There is a list of customers -> customers have a list of products -> Each product contains several words.
I want to cluster clients into several groups by type of activity - that is, by general topics.
How would you introduce such a model for clustering into vectors, for example for K-means?
My hypothesis is so far - turn every word into a fasttext vector, select the top 100 words for example on TF-IDF and add * 100 (the size of the fasttext vector) by 100 words, and this will turn out 10,000 columns. Maybe something more economical in computing?
This is very related to a recommendation systems. I'd recommend reading about content-based vs collaborative-filtering recommender systems. An okay introduction is this blog post.
So, you can cluster based on many properties. Your proposed idea might work. If you have domain knowledge about the product, you could appeal to that before looking to word vectors. For example, let's say all the products are shelves. You could vectorize the products directly, say
vec = [
width,
depth,
height,
width * depth, # footprint/surface area is important on its own
width * depth * height,
color, # numeric representation
popularity, # possibly using a metric like sales
]
This is just an example, but it shows how you can directly vectorize your products without resorting to NLP.
If there is no way you can think of to directly vectorize your products, and you don't/can't use collaborative filtering (cold start problem, perhaps), then you might want to look at vectorizing the entire product description using Universal Sentence Encoder, which will output 512 dimensional vectors, regardless of input size.
I have three arrays of points:
A=[[5,2],[1,0],[5,1]]
B=[[3,3],[5,3],[1,1]]
C=[[4,2],[9,0],[0,0]]
I need the most efficient way to find the three points (one for each array) that are closest to each other (within one pixel in each axis).
What I'm doing right now is taking one point as reference, let's say A[0], and cycling all other B and C points looking for a solution. If A[0] gives me no result I'll move the reference to A[1] and so on. This approach as a huge problem because if I increase the number of points for each array and/or the number of arrays it requires too much time to converge some times, especially if the solution is in the last members of the arrays. So I'm wondering if there is any way to do this without maybe using a reference, or any quicker way than just looping all over the elements.
The rules that I must follow are the following:
the final solution has to be made by only one element from each array like: S=[A[n],B[m],C[j]]
each selected element has to be within 1 pixel in X and Y from ALL the other members of the solution (so Xi-Xj<=1 and Yi-Yj<=1 for each member of the solution).
For example in this simplified case the solution would be: S=[A[1],B[2],C[1]]
To clarify further the problem: what I wrote above it's just a simplify example to explain what I need. In my real case I don't know a priori the length of the lists nor the number of lists I have to work with, could be A,B,C, or A,B,C,D,E... (each of one with different number of points) etc. So I also need to find a way to make it as general as possible.
This requirement:
each selected element has to be within 1 pixel in X and Y from ALL the other members of the solution (so Xi-Xj<=1 and Yi-Yj<=1 for each member of the solution).
massively simplifies the problem, because it means that for any given (xi, yi), there are only nine possible choices of (xj, yj).
So I think the best approach is as follows:
Copy B and C into sets of tuples.
Iterate over A. For each point (xi, yi):
Iterate over the values of x from xi−1 to xi+1 and the values of y from yi−1 to yi+1. For each resulting point (xj, yj):
Check if (xj, yj) is in B. If so:
Iterate over the values of x from max(xi, xj)−1 to min(xi, xj)+1 and the values of y from max(yi, yj)−1 to min(yi, yj)+1. For each resulting point (xk, yk):
Check if (xk, yk) is in C. If so, we're done!
If we get to the end without having a match, that means there isn't one.
This requires roughly O(len(A) + len(B) + len(C)) time and O(len(B) + len(C) extra space.
Edited to add (due to a follow-up question in the comments): if you have N lists instead of just 3, then instead of nesting N loops deep (which gives time exponential in N), you'll want to do something more like this:
Copy B, C, etc., into sets of tuples, as above.
Iterate over A. For each point (xi, yi):
Create a set containing (xi, yi) and its eight neighbors.
For each of the lists B, C, etc.:
For each element in the set of nine points, see if it's in the current list.
Update the set to remove any points that aren't in the current list and don't have any neighbors in the current list.
If the set still has at least one element, then — great, each list contained a point that's within one pixel of that element (with all of those points also being within one pixel of each other). So, we're done!
If we get to the end without having a match, that means there isn't one.
which is much more complicated to implement, but is linear in N instead of exponential in N.
Currently, you are finding the solution with a bruteforce algorithm which has a O(n2) complexity. If your lists contains 1000 items, your algo will need 1000000 iterations to run... (It's even O(n3) as tobias_k pointed out)
Like you can see there: https://en.wikipedia.org/wiki/Closest_pair_of_points_problem, you could improve it by using a divide and conquer algorithm, which would run in a O(n log n) time.
You should search for Delaunay triangulation and/or Voronoi diagram implementations.
NB: if you can use external libs, you should also consider taking a look at the scipy lib: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.Delaunay.html
If I were to implement low pass filters on an array of digital samples by the following code, where original is the original array of data, and new is the array of filtered data, and c is a certain constant:
new[0] = original[0];
for(int i=1; i<original.length; i++){
new[i] = new[i-1] + c * (original[i] - new[i-1]);
}
Or a high pass filter with the third line replaced with:
new[i] = c * (new[i-1] + original[i] - original[i-1]);
What would be the relationship between c and the cutoff frequency of each?
Both of the filters are single-pole Infinite Impulse Response (IIR) filters.
IIR filters have analogues in the continuous time domain (e.g. simple LC and RC circuits). Analysis usually starts with the desired transfer function H(ω) - using the z-tranform to covert to discrete-time. With a little re-arrangement will yield an equation which you can solve for your filter coefficients. [H(ω) is -3dB at the cut-off frequency].
This material is typically taught in the first and second undergraduate years of Electronic engineering degrees, so there will be loads of online, and free, courseware to be had. You'll need the accompanying pure mathematics courses.
Lots of practical filter designs turn out to be analytically insoluble (or at least difficult); a common way to proceed is solving numerically.
MATLAB is the tool of choice for many. NI LabView also has a filter designer. Neither are cheap.
Single-pole filters are easy to solve. this may help. There are also various on-line filter solvers if you want to design more complex - or higher order - filters.
According to wiki it will take (N-1)! to calculate a tour with N cities. I found a better way to do it but I can't do the math to calculate just how much I improved it. I can tell you that on my home pc I been able to solve 20 cities map in less than 1 hour. 20! = 2.43290200e+18. Here is what I did:
When searching a route of N cities (lets give them a names: City(1), City(2), City(3)... City(N)) with the brout algorithm, you will first perform this test: City(1), City(2), City(3), City(4)... City(N) and some time after, this one: City(1), City(3), City(2), City(4)... City(N). I am claiming that the second calculation is unnecessary. If I calculated just once the shortest route for City(4) ... City(N) I can use it for my second calculation and determine which route is better.
Using this trick I can reduce the number of calculation that I am doing for the K city by: (N - k) which is the number of options that I can shouse who will be the first city, multiply (N - K - 1)! which is the number of options that I have to choose the rest of the cities, and minus the first time, that I need to perform the full calculation. So it will be (N - K)!. And you need to sum it for all the K's starting from k = 3 to k = N - 2.
This is as far as I went,(which is not to far)... I hope you be able to help me to calculate this.
Storing and reusing results you've already calculated is the basic idea behind dynamic programming, for the TSP there are dynamic programming algorithms that runs with O[(N^2)*(2^N)] time, which will yield quicker result than your algorithm (you'll be able to solve problems with 25 vertices within minutes...)
See: Dynamic Programming for the TSP
I have a problem which is a variation of the partition problem which is NP-complete. This is an optimization problem, not a decision problem.
Problem: Partition a list of numbers into two subsets such that their difference of sums is minimum, and find the two subsets. If n even, then the sizes should be n/2, and if odd, then floor[n/2] and ceil[n/2].
Assuming that the pseudo polynomial time DP algorithm is the best for an exact solution, how can it be modified to solve this? And what would be the best approximate algorithms to solve this?
Since you didn't specified which algorithm to use i'll assume you use the one defined here:
http://www.cs.cornell.edu/~wdtseng/icpc/notes/dp3.pdf
Then using this algorithm you add a variable to track the best result, initialize it to N (sum of all the numbers in the list as you can always take one subset to be the empty set) and every time you update T (e.g: T[i]=true) you do something like bestRes = abs(i-n/2)<bestRes : abs(i-n/2) : bestRes. And you return bestRes. This of course doesn't change the complexity of the algorithm.
I've got no idea about your 2nd question.