Time complexity of A* and Dijsktra's search - search

In a grid based environment, the number of edges E = O(V).
A* would, in the worst case, explore all the nodes of the grid.
So, number of nodes in open list would be E in the worst case.
Therefore, A's time complexity in worst case is O(E).
Now, as we know that Dijsktra's time complexity is O(E + VlogV).
And A* in worst case would explore same number of nodes as Dijkstra would explore.
Thus, the time complexity of A* should have been equal to Dijkstra = O(E).
But we have VlogV term extra in Dijkstra.
What am I missing here? I request for a detailed explanation.
Thanks in advance.

Related

Consistent Heuristic Search - Tiebreaking Policy

So I have the following question in my midterm review and I'm not really sure how to go about it, it's the only one I couldn't figure out. The notation we used in class is that f(n) = g(n) + h(n)
g(n): Cost from start to current
h(n): Estimated cost from current to goal, given heuristic
C*: Optimal cost from start to goal
I know it has something to do with the fact that the heuristic is consistent and this property:
If an instance of A* uses a consistent heuristic, then A* will expand
every node n for which f(n) < C*.
Any help would be greatly appreciated.

DFS exploration of nodes vs A*

I'm trying to implement a search for a personal project in which the exploration of the nodes are relatively expensive. I was hesitant between using DFS(Dijkstra's Forward Search) or A*.
My question is, will there be a case where A* explores more nodes than DFS?
Dijkstra's original algorithm does not use a min-priority queue and runs in time |V|^2 (where |V| is the number of nodes). The implementation based on a min-priority queue implemented by a Fibonacci heap and running in O(|E|+|V|\log |V|) (where |E| is the number of edges) is due to (Fredman & Tarjan 1984). This is asymptotically the fastest known single-source shortest-path algorithm for arbitrary directed graphs with unbounded non-negative weights. However, specialized cases (such as bounded/integer weights, directed acyclic graphs etc) can indeed be improved further.
The time complexity of A* depends on the heuristic. In the worst case of an unbounded search space, the number of nodes expanded is exponential in the depth of the solution (the shortest path) d: O(b^d), where b is the branching factor (the average number of successors per state).This assumes that a goal state exists at all, and is reachable from the start state; if it is not, and the state space is infinite, the algorithm will not terminate.
The A* algorithm is a generalization of Dijkstra's algorithm that cuts down on the size of the subgraph that must be explored, if additional information is available that provides a lower bound on the "distance" to the target. This approach can be viewed from the perspective of linear programming: there is a natural linear program for computing shortest paths, and solutions to its dual linear program are feasible if and only if they form a consistent heuristic (speaking roughly, since the sign conventions differ from place to place in the literature). This feasible dual / consistent heuristic defines a non-negative reduced cost and A* is essentially running Dijkstra's algorithm with these reduced costs. If the dual satisfies the weaker condition of admissibility, then A* is instead more akin to the Bellman–Ford algorithm.
Worst case performance O(|E|)=O(b^{d})
and
Worst case space complexity O(|V|)=O(b^{d})
Dijkstra's algorithm can be viewed as a special case of A* where h(x)=0 for all x.
It should be noted, however, that Dijkstra's algorithm can be implemented more efficiently without including a h(x) value at each node.

How is the max of a set of admissible heuristics, a dominating heuristic?

If you have a set of admissible heuristics: h1,h2,h2,...,hn
How is h = max(h1,h2,h2,...,hn) an admissible heuristic that dominates them all?
Isn't a lower h(n) value better?
For A*, f = g + n, and the element with the lowest f will be removed from the list. So shouldn't taking the min give the dominating heuristic?
An admissible heuristic never overestimates the cost of reaching the goal state. That is, its estimate will be lower than the actual cost or exactly the actual cost, but never higher. This is required for greedy approaches like A* search to find the global best solution.
For example, imagine you found a solution with cost 10. The best solution has cost 8. You're not using an admissible heuristic, and the estimate of the heuristic for the solution that really has cost 8 is 12 (it's overestimating). As you already have a solution with cost 10, A* will never evaluation the best solution as it is estimated to be more expensive.
Ideally, your heuristic should be as accurate as possible, i.e. an admissible heuristic shouldn't underestimate the true cost too much. If it does, A* will still find the best solution eventually, but it may take a lot longer to do so because it tries a lot of solutions that look good according to your heuristic, but turn out to be bad.
This is where the answer for your question lies. Your heuristics h1, ..., hn are all admissible, therefore they estimate a cost equal to or less than the true cost. The maximum of this set of estimates is therefore by definition the estimate that is closest to the actual cost (remember that you'll never overestimate). In the ideal case, it will be the exact cost.
If you were to take the minimum value, you would end up with the estimate that is furthest away from the actual cost -- as outlined above, A* would still find the best solution, but in a much less efficient manner.

When depth of a goal node is known, Which graph search algorithm is best to use BFS or DFS?

In a graph, when we know the depth at which goal node is, Which graph search algorithm is fastest to use: BFS or DFS?
And how would you define "best" ?
If you know that the goal node is at depth n from the root node (the node from which you begin the search), BFS - will ensure that the search won't iterate nodes with depth > n.
That said, DFS might still "choose" such a route that will be faster (iterate less nodes) than BFS.
So to sum up, I don't think that you can define "best" in such a scenario.
As I mentioned in the comments, if the solution is at a known depth d, you can use depth-limited search instead of DFS. For all three methods (BFS, DFS and DLS), the algorithmic complexity is linear in the number of nodes and links in your state space graph, in the worst case (i.e. O(|V|+|E|).
In practice, depending on d, DLS can be faster though, because BFS requires developping the search tree until depth d-1, and possibly a part of depth d (so almost the whole tree). With DLS, this happens only in the worst cases.

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.
If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.
If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Resources