DP sum all node combinations - dynamic-programming

Given a tree and a value for each node, how can we get the total sum for each possible path?
A
|
B
In the above tree there will be 4 such paths: A, B, A-B, B-A.
Each node will have a value assign to it: A: 3, B: 2
The expected output should be: 3+2+(3+2)+(2+3)
A naive solution for this problem is to do a DFS from source to target(for each possible combination) and get the sum by adding the DFS result, but I believe this problem can be solved more efficiently with DP, but don't really have so much experience with DP.

There is a nice solution for this one which doesn't involve (much) dynamic programming:
You can compute how often each node appears in paths. Lets just take an example with n=5 nodes:
A
|
B
/ \
C D
|
E
The computation for the leaves A, C and E is very easy. They only appear in paths that start or end with this node. There are 2n - 1 = 9 paths (-1 because the path A starts and end ends with A and is therefore counted twice in 2*n).
For the inner nodes it gets a bit more tricky. Lets look at the node D first. Of course D appears in all paths, that start or end in D. So we again have 2n - 1 = 9 paths. But now it can also be the case that D appears somewhere in the middle of a path. E.g. in the path A-B-D-E. This can only happen, if the path starts somewhere in the subtree ABC and ends in the subtree E or opposite. Combinatorics tells us, that there are size(ABC)*size(E) + size(E)*size(ABC) = 2*size(ABC)*size(E) = 2*3*1 = 6 many. So D appears in exactly 9 + 6 = 15 paths.
For node B it still gets a bit more tricky. There are again 2n - 1 = 9 paths starting or ending from B (This is true for each node). But again B can appear somewhere in the middle of a path. For this to happen a path must start in one of the subtrees A, C or DE and end in a different one. So there are 2*size(A)*size(C) + 2*size(C)*size(DE) + 2*size(DE)*size(A) = 2*1*1 + 2*1*2 + 2*2*1 = 2 + 4 + 4 = 10 possible paths. With a little bit of math you can see that this is identical to (n-1)^2 - size(A)^2 - size(C)^2 - size(DE)^2. So in total the node appears in 9 + 10 = 19 paths.
And the value you want to compute is 9*value(A) + 19*value(B) + 9*value(C) + 15*value(D) + 9*value(E).
With one depth-first-search you can compute the sizes of all subtrees with dynamic programming and compute the number of appearances for each node using the two formulas.

Related

Minimum Path Sum: is it possible to solve it using DP given negative value(s) in the grid?

Question description taken from leetcode: https://leetcode.com/problems/minimum-path-sum/
Given a m x n grid filled with non-negative numbers,
find a path from top left to bottom right, which minimizes the sum of all numbers along its path.
Note: You can only move either down or right at any point in time.
One Example:
Input: grid = [[1,3,1],
[1,5,1],
[4,2,1]]
Output: 7
Explanation: Because the path 1 → 3 → 1 → 1 → 1 minimizes the sum.
I tried to set some negative values in the grid, and it seems the transformation function still works.
dp[i][j] = min(dp[i-1][j], dp[i][j-1]) + grid[i][j]
Can anyone come up with a case to prove that DP won't work for this question if there is/are negative value(s) in the grid?
If surely it is mentioned that user can only move down or right then your code perfectly works for negative numbers as well
But if it is given that user can move in any direction and at last should reach the destination then
The above code works only for positive elements
consider the below example
grid = [ [1 , 10000,-700,10],
[2 , 4 ,9 ,6 ] ]
in this the min path is 1 -> 2 -> 4 -> 9 -> -700 -> 9 ->6

how to find all numbers that there distance from given point are less or equall to integer n

given a set of points D and some number K I want to find all numbers that are in D such that the distance between K and any found number is less or equal to integer N?
Example:
suppose we have D={5,9,0,6,7} and K=8 and N=1 then the result should be {9,7}
I was thinking to use k-d tree or VP tree but both as I understand (correct me if I am wrong please) find nearest neighbors and do not care about N in my example.
To summarize all the comment:
Solve this problem as brute force will take O(n) time as iterate on each element in D and check if its distance from k is less then n.
You have big data set but a lot of queries it is better to do pre-processing on D (with O(nlogn) and the you can get the answer in O(logn) -> by sorting D as pre-processes (in O(nlogn) as dimple sort of array.
Now, on given query search for k - notice binary search will stop if the number missing but he do stop at the closest value. From that index start spread to both side of D and for each check if still in n range. Notice the spreading in allow as it is include of O(|output|).
In your example: sorted D yield: [0,5,6,7,9]. Try finding k=8 will give false but index 3 or 4 (depended on the implementation). Let say is return index 3. for 3 till last index check if arr[i] - k < n if so print - if bigger stop. For the other side check k - arr[i] < n - if so print and if bigger stop -> this will give you 7,9
Hope that helps!

Minimum no of operations required to create String A By appending subsequence of String B to a empty string C

You have given two strings A and B. You have some empty string C. In one operation You can remove any no of characters (from anywhere) from String B and append it to string C. Minimum no of operations required to convert String C to String A.
e.g if
A is "ABCDE" and
B is "ABDEC" then
In 1st operation you will choose subsequence ABC from B and in 2nd operation DE.
So two operations are required.
if
A is "ABCDE"
B is "EDCBA" then
operations required 5.
Linear complexity expected O(n)
Just use a greedy algorithm.
1 - Let i = 0
2 - Let j = 0
3 - Search for the first A[i] in B after j
4 - If it exists, let j be its index in B, remove it from B, append it to C, increment i, and repeat from 3
5 - If it doesn't exist, repeat from 2
Each time you get to 5 corresponds to one operation.
Assuming all the characters of A (and B) are different, then here is a solution with linear complexity. You need a hashmap or something similar, as well as an array of indices, Y, of equal length to A and B.
1 - Put each character of A in the hashmap as key, with its index as value.
2 - Look up each character of B in the hashmap to get the value i, and put its index into Y at the position i.
3 - Go through Y counting the number of times that Y[i] < Y[i-1]. That's your number of operations.

Best way to match 4 million rows of data against each other and sort results by similarity?

We use libpuzzle ( http://www.pureftpd.org/project/libpuzzle/doc ) to compare 4 million images against each other for similarity.
It works quite well.
But rather then doing a image vs image compare using the libpuzzle functions, there is another method of comparing the images.
Here is some quick background:
Libpuzzle creates a rather small (544 bytes) hash of any given image. This hash can in turn be used to compare against other hashes using libpuzzles functions. There are a few APIs... PHP, C, etc etc... We are using the PHP API.
The other method of comparing the images is by creating vectors from the given hash, here is a paste from the docs:
Cut the vector in fixed-length words. For instance, let's consider the
following vector:
[ a b c d e f g h i j k l m n o p q r s t u v w x y z ]
With a word length (K) of 10, you can get the following words:
[ a b c d e f g h i j ] found at position 0
[ b c d e f g h i j k ] found at position 1
[ c d e f g h i j k l ] found at position 2
etc. until position N-1
Then, index your vector with a compound index of (word + position).
Even with millions of images, K = 10 and N = 100 should be enough to
have very little entries sharing the same index.
So, we have the vector method working. Its actually works a bit better then the image vs image compare since when we do the image vs image compare, we use other data to reduce our sample size. Its a bit irrelevant and application specific what other data we use to reduce the sample size, but with the vector method... we would not have to do so, we could do a real test of each of the 4 million hashes against each other.
The issue we have is as follows:
With 4 million images, 100 vectors per image, this becomes 400 million rows. We have found MySQL tends to choke after about 60000 images (60000 x 100 = 6 million rows).
The query we use is as follows:
SELECT isw.itemid, COUNT(isw.word) as strength
FROM vectors isw
JOIN vectors isw_search ON isw.word = isw_search.word
WHERE isw_search.itemid = {ITEM ID TO COMPARE AGAINST ALL OTHER ENTRIES}
GROUP BY isw.itemid;
As mentioned, even with proper indexes, the above is quite slow when it comes to 400 million rows.
So, can anyone suggest any other technologies / algos to test these for similarity?
We are willing to give anything a shot.
Some things worth mentioning:
Hashes are binary.
Hashes are always the same length, 544 bytes.
The best we have been able to come up with is:
Convert image hash from binary to ascii.
Create vectors.
Create a string as follows: VECTOR1 VECTOR2 VECTOR3 etc etc.
Search using sphinx.
We have not yet tried the above, but this should probably yield a bit better results than doing the mysql query.
Any ideas? As mentioned, we are willing to install any new service (postgresql? hadoop?).
Final note, an outline of exactly how this vector + compare method works can be found in question Libpuzzle Indexing millions of pictures?. We are in essence using the exact method provided by Jason (currently the last answer, awarded 200+ so points).
Don't do this in a database, just use a simple file. Below i have shown a file with some of the words from the two vectores [abcdefghijklmnopqrst] (image 1) and [xxcdefghijklxxxxxxxx] (image 2)
<index> <image>
0abcdefghij 1
1bcdefghijk 1
2cdefghijkl 1
3defghijklm 1
4efghijklmn 1
...
...
0xxcdefghij 2
1xcdefghijk 2
2cdefghijkl 2
3defghijklx 2
4efghijklxx 2
...
Now sort the file:
<index> <image>
0abcdefghij 1
0xxcdefghij 2
1bcdefghijk 1
1xcdefghijk 2
2cdefghijkl 1
2cdefghijkl 2 <= the index is repeated, those we have a match
3defghijklm 1
3defghijklx 2
4efghijklmn 1
4efghijklxx 2
When the file have been sorted it's easy to find the records that have the same index. Write a small program or something that can run through the sorted list and find the duplicates.
i have opted to 'answer my own' question as we have found a solution that works quite well.
in the initial question, i mentioned we were thinking of doing this via sphinx search.
well, we went ahead and did it and the results are MUCH better then doing this via mysql.
so, in essence the process looks like this:
a) generate hash from image.
b) 'vectorize' this hash into 100 parts.
c) binhex (binary to hex) each of these vectors since they are in binary format.
d) store in sphinx search like so:
itemid | 0_vector0 1_vector1 2_vec... etc
e) search using sphinx search.
initially... once we had this sphinxbase full of 4 million records, it would still take about 1 second per search.
we then enabled distributed indexing for this sphinxbase, on 8 cores, and now are about to query about 10+ searches per second. this is good enough for us.
one final step would be to further distribute this sphinxbase over the multiple servers we have, further utilizing the unused cpu cycles we have available.
but for the time being, good enough. we add about 1000-2000 'items' per day, so searching thru 'just the new ones' will happen quite quickly... after we do the initial scan.

Algorithm to form a given pattern using some strings

Given are 6 strings of any length. The words are to be arranged in the pattern shown below. They can be arranged either vertically or horizontally.
--------
| |
| |
| |
---------------
| |
| |
| |
--------
The pattern need not to be symmetric and there need to be two empty areas as shown.
For example:
Given strings
PQF
DCC
ACTF
CKTYCA
PGYVQP
DWTP
The pattern can be
DCC...
W.K...
T.T...
PGYVQP
..C..Q
..ACTF
where dot represent empty areas.
The other example is
RVE
LAPAHFUIK
BIRRE
KZGLPFQR
LLHU
UUZZSQHILWB
Pattern is
LLHU....
A..U....
P..Z....
A..Z....
H..S....
F..Q....
U..H....
I..I....
KZGLPFQR
...W...V
...BIRRE
If multiple patterns are possible then pattern with lexicographically smallest first line, then second line and so on is to be formed. What algorithm can be used to solve this?
Find strings which suits to this constraint:
strlen(a) + strlen(b) - 1 = strlen(c)
strlen(d) + strlen(e) - 1 = strlen(f)
After that try every possible situation if they are valid. For example;
aaa.....
d.f.....
d.f.....
d.f.....
cccccccc
..f....e
..f....e
..bbbbbb
There will be 2*2*2 = 8 different situation.
There are a number of heuristics that you can apply, but before that, let's go over some properties of the puzzle.
+aa+
c f
+ee+eee+
f d
+bbb+
Let us call the length of the string with the same character as appeared in the diagram above. We have:
a + b - 1 = e
c + d - 1 = f
I will refer to the 2 strings for the cross in the middle as middle strings.
We also infer that the length of the string cannot be less than 2. Therefore, we can infer:
e > a, e > b
f > c, f > d
From this, we know that the 2 shortest strings cannot be middle strings, due to the inequality above.
The 3 largest strings cannot be equal also, since after choosing any of 3 string as middle string, we are left with 2 largest strings that are equal, and it is impossible according to the inequality above.
The puzzle is only tricky when the lengths are regular. When the lengths are irregular, you can do direct mapping from length to position.
If we have the 2 largest strings being equal, due to the inequality above, they are the 2 middle strings. The worst case for this one is a "regular" puzzle, where the length a, b, c, d are equal.
If the 2 largest strings are unequal, the largest string's position can be determined immediately (since its length is unique in the puzzle) - as one of the middle string. In worst case, there can be 3 candidates for the other middle string - just brute force and check all of them.
Algorithm:
Try to map unique length string to the position.
Brute force the 2 strings in the middle (taken into consideration what I mentioned above), and brute force to fill in the rest.
Even with stupid brute force, there are only 6! = 720 cases, if the string can only go from left to right, up to down (no reverse). There will be 46080 cases (* 2^6) if the string is allowed to be in any direction.

Resources