Directed graph linear algorithm - dynamic-programming

I would like to know the best way to calculate the length of the shortest path between vertex s and every other vertex of the graph in linear time using dynamic programming.
The graph is weighted DAG.

What you can hope for is an algorithm linear in the number of edges and vertices, i.e. O(|E| + |V|), which also works correctly in presence of negative weights.
This is done by first computing a topological order and then 'exploring' the graph in the order given by this topological order.
Some notation: let's call d'(s,v) the shortest distance from s to v and d(u,v) the length/weight of the arc from u to v (if it exists).
Then, for a node v that is currently being visited, the shortest path from s to v is the minimum of d'(s,u)+d(u,v) for each in-neighbour u of v.
In principle, this is very similar to Dijkstra's algorithm except that we already know in which order to traverse the vertices.
The topological sorting ensures that all in-neighbours of v have already been visited and will not be updated again. So, whenever a node has been visited, the distance it is assigned is the correct shortest path from s to v. Therefore, you end up with a shortest s-v-path for each v.
A full description and implementation can be found here, which links to these lecture notes. I'm not sure where the algorithmic idea for this DAG algorithm was originally published in the literature.
This algorithm works for DAGs, even in the presence of negative weights/distances.
While a typical implementation of this algorithm will most likely not be done using dynamic programming explicitly, it can still be interpreted as such since the problem of finding a shortest path to a node v is computed using the shortest paths to the in-neighbours of v.
For further discussion on if/how this type of algorithm counts as dynamic programming, let me refer you to this question.

It's possible what you're looking for is Bellman-Ford algorithm, which is O(|V||E|) in terms of time complexity (not really linear).
Not sure if some witty dynamic-programming approach could improve on that though.

As hauron said, Bellman-Ford will give you what you're looking for in time O(|V||E|). This works even if your graph contains negative weighted edges, and Bellman-Ford uses dynamic programming at its core.
However, I must add that if your weights are non-negative, you can do Dijkstra from your vertex s in time O(|E| log |E|).

Initialize d[s] = 0.
For every vertex, calculate:
d[v] = min {d[u] + w(u,v) | (u,v) is an edge}
d[v] = ∞ if v has no incoming edges.
(The algorithm always halts since the graph is acyclic.)

Related

Dynamic programming efficient network

Hello I have a dynamic programming related question. How can I compute the shortest path in hops from starting node to ending, with the constrain that the vertices and edges will have an equal or higher predefined value. For example the highest rate of data in a network. Could someone provide some pseudo-code or any thoughts, thank you in advance.
Build new graph from the given network, which does not contain the vertices and edges whose value is less than the predefined value, and from the start node, in the new graph run an algorithm to find the shortest path to the end node, such as BFS, Dijkstra (-Greedy, not Dynamic Programming), Bellman – Ford, etc.

LSH implementation in python 3 with Euclidean distance and seeing all neighbors in LSHForest

I am looking for an efficient implementation of LSH in python 3 that uses Euclidean distance.
There is the "in-python" LSHForest implementation, but it uses cosine distances.
Also, even using this implementation, I didn't find a way to see the content of each of the baskets, e.g., if using LSH for clustering - it only returns a certain number of approximate neighbors within a certain radius. But if I want to see all neighbors, I don't see how it can be done (I do not want to use an arbitrary radius of search and I am really not sure what is the meaning of a very large or infinite radius using this implementation).
Will appreciate any insight. Many thanks.
For software recommendations, please ask here: Software Recommendations.
For how this works, first read my answer and then assume that you ask from the package (I haven't used it) a big k (k should be the number of Neighbors that the software returns), within a big radius r. That should return many neighbors, set k = N, where N is the number of the points in your dataset and you will get all the neighbors.
If you want to see all the neighbors within a certain bucket, then you have to investigate how many points can a bucket contain and set k to that number.

Optimal substructure in Dynamic Programing

I have been trying to understand Dynamic Programming, and what I understood is that there are two parts of DP.
Optimal substructures
Overlapping subproblems
I understand the second one, but I am not able to understand the first one.
Optimal substructure means, that any optimal solution to a problem of size n, is based on an optimal solution to the same problem when considering n' < n elements.
That means, when building your solution for a problem of size n, you split the problem to smaller problems, one of them of size n'. Now, you need only to consider the optimal solution to n', and not all possible solutions to it, based on the optimal substructure property.
An example is the knapsack problem:
D(i,k) = min { D(i-1,k), D(i-1,k-weight(i)) + cost(i) }
The optimal substructure assumption here, is D(i,k) can check only optimal solutions to D(i-1,k), and none optimal solutions are not considered.
An example where this does not hold is the Vertex Cover problem.
If you have a graph G=(V,E), assume you have an optimal solution to a subgraph G'=(V',E[intersection]V'xV') such that V' <= V - the optimal solution for G does not have to be consisted of of the optimal solution for G'/
Another good example is the difference between finding a shortest simple path between every pair of vertices in a graph, and finding a longest simple path between each of these pairs. ("Simple" means that no vertex on a path can be visited twice; if we don't put this constraint in for the "longest" version of the problem, then we can get infinitely long paths whenever the graph contains a cycle.)
The Floyd-Warshall algorithm can compute the answer to the first problem efficiently by exploiting the fact that, if a path from u to v is shortest-possible, then for any vertex x on this path, it must be that the subpath from u to x, and the subpath from x to v, are also shortest-possible. (Suppose to the contrary that there was a vertex x on the "shortest possible" path from u to v such that the subpath from u to x was not shortest-possible: then it's possible to find some other, shorter path from u to x -- and this can also be used to make the overall path from u to v shorter by the same amount, so the original u-to-v path could not have been shortest-possible after all.) That means that when looking for the shortest u-to-v path, the algorithm only needs to consider building it out of shortest-possible (that is, optimal) subpaths between other pairs of vertices -- not out of the much larger number of all such subpaths.
In contrast, consider the problem of determining the longest simple path between any two vertices in a graph. Is it likewise true that, if the longest path from u to v goes through some vertex x, then the subpaths from u to x, and from x to v, are necessarily also longest-possible? Unfortunately not: It may well be that the longest path from u to x uses some vertices in its interior that are also needed by the longest path from x to v, meaning that we can't simply glue these two paths together to get a longest simple path from u to v.
As a general rule, we can always "get around" this problem by choosing to use a sufficiently detailed definition of the subproblem to be solved: In this case, instead of asking for the longest path between two given vertices u and v, we can ask for the longest path between two given vertices u and v which uses only vertices from a given set S. Where previously we could build a function shortest(u, v) that takes two parameters, we must now build a function longest(u, v, S) that takes three; the overall longest path between 2 vertices u and v could then be computed using longest(u, v, V), where V is the entire vertex set of the graph. With this new definition, it's now once again possible to produce optimal solutions by combining only optimal solutions to subproblems, because we can ensure that we only try gluing together paths that result from subproblems whose S sets are disjoint. We can now correctly determine the longest path from u to v that uses only vertices in S, namely longest(u, v, S), by calculating the maximum, over all vertices x in S, and all ways of partitioning S-{x} into two subsets A and B, of longest(u, x, A) + longest(x, v, B).
Unfortunately, there are now an exponential number of subproblems to be solved, because a set of n vertices can be partitioned in 2^(n-1) different ways. (The algorithm just described is not the most efficient possible DP for this problem, but even the most efficient known DP still has this exponential factor in its running time.) The challenge in designing a DP algorithm is always to find a way to define subproblems that results in few enough different subproblems (ideally, only polynomially many) while still maintaining the two properties of overlapping subproblems and optimal substructure.
In Simple Words : "Principle of optimality states while solving the problem of optimization one has to solve sub-problems, solution of sub-problem will be the part of optimization problem" , if problem can be solved by optimal sub problem means it consist optimal substructure.
Example : let say in a graph , source vertex is s and destination is d.
We have to find shortest(s,d)
graph is
a g
b e h d
s c f i
d
length(s,a)=14
length(s,b)=10
length(s,c)=1
length(s,d)=6
length(c,b)=1
Note : No direct edge for (s,e) or (s,f).
While thinking to find an algo for this , if we are writing a priority queue structure which will traverse with least total PATH_Length .
We will assign each vertex PATH_LENGTH from source vertex.
we will keep assigning path_length to adjacent vertex if new path_length < existing path length.
Example : Len(s,b) > Len(s,a)+Len(a,b);
reset len(s,b)=2;
Adjacent nodes from S creating a path to get minimal path_length irrespective of destination node,Because they are making substructure that lead to solution.

Dijkstra on 2D grid?

There are N points on a 2D grid (x,y). I need to find the shortest path, from point A to point B, but I can only travel from one point to another and I can't travel between two points if the distance between them is farther than a distance D. I thought it might be solved by using some kind of modified Dijkstra's algorithm, but I'm not sure how, because I've never implemented it before, just studied it on Wiki.
Well, Dijkstra finds shortest paths in graphs. So just consider the grid points to be nodes in a graph with edges between each node S and all other nodes T such that dist(S, T) <= D. You don't have to actually construct the graph because the edges are easily determined as needed by Dijkstra. Just check all nodes in a square around S with radius D. A S-T edge exists iff (Sx - Tx)^2 (Sy - Ty)^2 <= D^2.
Wiki explanation is sufficient for this.
Dijkstra's algorithm takes 3 inputs. The Graph, Starting node and Ending node.
To construct the graph just do this
For i 1..n in points
For j i+1..n in points
if(dist(points[i],points[j])<=D)
add j to childs of i
add i to childs of j
After constructing the graph, perform dijkstra.
The subtlety of a question like this lies in a critical definition - what is the measure of distance in your grid?
The are many different shortest path problems and solutions, and they are studied throughout mathematics. They are each characterised by the 'topology' of the area being searched. Consider a few distinct topologies with their own solutions:
A one sided piece of paper
Suppose your grid represents coordinates on a piece of paper - the shortest path is easy to find, as it is simply a straight line between those points.
The surface of the moon
If your grid represents locations on the moon in terms of latitude and longitude, the shortest path is an arc along the moon's surface - If you drove "in a straight line" between two points on the moon, you would be travelling in an arc, because of the moon's curvature.
Road Intersections
If you want to find the distance between two intersections in a grid of roads, where the traffic on each road has a different speed, and you can only travel along the roads, then you can find the shortest path using Dijkstra's algorithm.
One way road intersections
A slight variation of the above - we only need to consider roads in one direction. There might not be any paths in this case.
Summary
To give a good solution, we need to understand the topology of your grid. If the distance is pythagerous's theorem than that indicates euclidean geometry (like in the piece of paper example), so the solution is a straight line.
Is it possible you mean that you can travel between any two points if the are closer than D - like flying a plane between airports, for example?
EDIT: I didn't see your comment because you didn't use #. In your case your grid is like the airports a plane can fly between. The shortest path is found using Dijkstra's algorithm - the immediate neighbours of a point are all points closer than D. Find them, represent it all as a graph, and use Dijkstra's algorithm.
I would suggest using the formula to find the distance between 2 points i.e sqrt((x2-x1)^2+(y2-y1)^2). This distance is always the shortest between 2 points.

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.
If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.
If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Resources