Difference between Breadth First Search, and Iterative deepening - search

I understand BFS, and DFS, but for the life of me cannot figure out the difference between iterative deepening and BFS. Apparently Iterative deepening has the same memory usage as DFS, but I am unable to see how this is possible, as it just keeps expanding like BFS.
If anyone can clarify that would be awesome.
tree to work on if required:
A
/ \
B C
/ / \
D E F

From What I understand iterative deepening does DFS down to depth 1 then does DFS down to depth of 2 ... down to depth n , and so on till it finds no more levels
for example I think that tree would be read
read visited depth
A A 1
ABC ABAC 2
ABDCEF ABDBACECF 3
I believe its pretty much doing a separate DFS with depth limit for each level and throwing away the memory.

From my understanding of the algorithm, IDDFS (iterative-deepening depth-first search) is simply a depth-first search performed multiple times, deepening the level of nodes searched at each iteration. Therefore, the memory requirements are the same as depth-first search because the maximum depth iteration is just the full depth-first search.
Therefore, for the example of the tree you gave, the first iteration would visit node A, the second iteration would visit nodes A, B, and C, and the third iteration would visit all of the nodes of the tree.
The reason it is implemented like this is so that, if the search has a time constraint, then the algorithm will at least have some idea of what a "good scoring" node is if it reaches the time-limit before doing a full traversal of the tree.
This is different than a breadth-first search because at each iteration, the nodes are visited just like they would be in a depth-first search, not like in a breadth-first search. Typically, IDDFS algorithms would probably store the "best scoring" node found from each iteration.

the memory usage is the the maximum number of nodes it saves at any point. not the number of nodes visited.
at any time IDFS needs to store only the nodes in the branch it is expanding.only the A and C if we are expanding C (in your e.g). BFS must save all the nodes of the depth it is searching. to see the effect take a tree with branching factor 8 rather than 2. to search to a depth of 3, BFS needs to store a massive 64 nodes. IDFS needs only 3.

In each iteration of Iterative-Deepening Search, we have a limit and we traverse the graph using the DFS approach, however, for each step of each iteration, we just need to keep track of only nodes inside the path from the root to depth d. That's the saving in memory.
For example, look at the last row of the picture below. If we have used BFS, we had to keep track of all nodes up to depth 2. But because we are using DFS, we don't need to keep all of them in memory since either some nodes are already visited so we don't need them, or not visited yet so we will add it later. We just keep our path to the root (the gray path).
The picture is from Artificial Intelligence book by Peter Norvig and Stuart Russel

The concept of DFS with Iterative Deepening is quite straightforward !
Main Idea:
Modify the original DFS algorithm to stop at a maximum provided depth.
You can call the DFS for a maximum depths. i.e, 1,2,3,.....,n.
So in your case ( for the given tree )
depth = 1
A
depth = 2
A
/ \
B C
depth = 3
A
/ \
B C
/ / \
D E F
Now, let's suppose that there would be 'b' actions per state and the solution do have 'd' actions, then,
Space : O(d)
Time : O(b^d)
Now, the question arises that, then what is the difference between DFS with Iterative Deepening and BFS ?
The answer would be: DFS with Iterative Deepening would be far more better in terms of memory consumption as compared with the BFS. The worst case scenario of BFS for Space complexity is O(b^d).

Related

Time complexity of A* and Dijsktra's search

In a grid based environment, the number of edges E = O(V).
A* would, in the worst case, explore all the nodes of the grid.
So, number of nodes in open list would be E in the worst case.
Therefore, A's time complexity in worst case is O(E).
Now, as we know that Dijsktra's time complexity is O(E + VlogV).
And A* in worst case would explore same number of nodes as Dijkstra would explore.
Thus, the time complexity of A* should have been equal to Dijkstra = O(E).
But we have VlogV term extra in Dijkstra.
What am I missing here? I request for a detailed explanation.
Thanks in advance.

how to find farthest neighbors in Euclidean space?

Given a set P of n points in 2D, for any point x in P, what is the fastest way to find out the farthest neighbor of x? By farthest neighbor, we mean a point in P which has the maximum Euclidean distance to x.
To the best of my knowledge, the current standard kNN search algorithm for various trees (R-Trees, quadtrees, kd-trees) was developed by:
G. R. Hjaltason and H. Samet., "Distance browsing in spatial
databases.", ACM TODS 24(2):265--318. 1999
See here. It traverses the tree based on a priority queue of nearest nodes/entries. One key insight is that the algorithm also works for farthest neighbor search.
The basic algorithm uses a priority queue. The queue can contain tree nodes as well as data entries, all sorted by their distance to your search point.
As initial step it adds the root node to the priority queue. Then repeat the following until k entries have been found:
Take the first element from the queue. If it is an entry, return it. If it is a node, add all elements in the node to the priority queue.
Repeat 1.
The paper describes an implementation for R-Trees, but they claim it can be applied to most tree-like structures. I have implemented the nearest neighbor version myself for R-Trees and PH-Trees (a special type of quadtree), both in Java. I think I know how to do it efficiently for KD-Trees but I believe it is somewhat complicated.

When depth of a goal node is known, Which graph search algorithm is best to use BFS or DFS?

In a graph, when we know the depth at which goal node is, Which graph search algorithm is fastest to use: BFS or DFS?
And how would you define "best" ?
If you know that the goal node is at depth n from the root node (the node from which you begin the search), BFS - will ensure that the search won't iterate nodes with depth > n.
That said, DFS might still "choose" such a route that will be faster (iterate less nodes) than BFS.
So to sum up, I don't think that you can define "best" in such a scenario.
As I mentioned in the comments, if the solution is at a known depth d, you can use depth-limited search instead of DFS. For all three methods (BFS, DFS and DLS), the algorithmic complexity is linear in the number of nodes and links in your state space graph, in the worst case (i.e. O(|V|+|E|).
In practice, depending on d, DLS can be faster though, because BFS requires developping the search tree until depth d-1, and possibly a part of depth d (so almost the whole tree). With DLS, this happens only in the worst cases.

kd-tree BBF algorithm time complexity

I hava 2000 points with 5000 dimensions , and I want to get the nearest neighbour.
Now I have some problems , could anybody give a answer.
People say , it works good with high dimensions. What's the time complexity ?
#param max_nn_chks search is cut off after examining this many tree entries
After I read the algorithm, I wonder if I would get the wrong answer when I set the max_nn_chks too low. If yes, then just tell me how to set this parameter, else give a reason, thanks.
Is the kdtree the best Data Structures for my data to get nearest neighbour?
The time complexity is basically the same as in restricted KD-Tree search plus some little time to maintain the priority queue. The restricted KD-Tree search algorithm needs to traverse the tree in its full depth (log2 of the point count) times the limit (maximum number of leaf nodes/points allowed to be visited).
Yes, you will get a wrong answer if the limit is too low. You can only measure fraction of true NN found versus number of leaf nodes searched. From this, you can determine your optimal value.
Usually a randomized kd-tree forest and hierarchical k-means tree perform best. FLANN provides a method to determine which algorithm to use (k-means vs randomized kd-tree forest) and sets the optimal parameters for you.
The structure of data also has a big impact. If you know there are clusters of points being close together, for example, you can group them in a single node of a tree (represent them by their centroid, for example) and speed up the search.
Another techniques such as visual words, PCA or random projections can be employed on the data. It's a quite active field of research.

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.
If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.
If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Resources