A* Search Advantages of Dynamic Weighting - search

I was reading about the variants of the A* search algorithm and I came across dynamic weighting. As I understand it, a weight is applied to the search equation, which decreases as the search gets closer to the goal node. I was specifically looking at this article : http://theory.stanford.edu/~amitp/GameProgramming/Variations.html
Can anyone tell me what the advantages of this would be? Why would you not care what nodes you expand at the start? Is it to help searches that don't necessarily have a good heuristic?
Thanks

For the TLDNR-crowd:
Dynamic weighting sacrifices solution optimality to speed up the search. The larger the weight, the more greedy the search.
For my fellow scholars:
Motivation
From the Wikipedia A-star article:
A-star's admissibility criterion guarantees an optimal solution path, but it also means that A* must examine all equally meritorious paths to find the optimal path. We can speed up the search at the expense of optimality by relaxing the admissibility criterion to obtain an approximate solution. Oftentimes we want to bound this relaxation, so that we can guarantee that the solution path is no worse than (1 + ε) times the optimal solution path. This new guarantee is referred to as ε-admissible.
Static Weighting
Before we talk about dynamic weighting, let's compare A-star to the simplest ε-admissible relaxation: static-weighted A-star.
In static-weighted A-star, f(n) = g(n) + w·h(n), with w=(1+ε) for some ε>0. To illustrate the effect on optimality and search speed, compare the number of nodes expanded in each of the following illustrations. Empty circles represent nodes in the open set; filled-in circles are in the closed set.
A-star (left) vs. Weighted A-star with ε=4 (right)
As you can see, weighted A-star expanded far fewer nodes and completed about 3x as fast. However, since we used ε=4, weighted A-star could theoretically return a solution that is (1+ε)=(1+4)=5x times as long as the optimal path.
Dynamic Weighting
Dynamic Weighting is a technique that makes the heuristic weight a function of the search state, i.e. f(n) = g(n) + w(n)·h(n), where w(n) = (1 + ε - (ε*d(n))/N), d(n) is the depth of the current search and N is an upper bound on the search depth.
In this way, dynamic-weight A-Star initially behaves very much like a Greedy Best First search, but as the search depth (i.e. the number of hops in the graph) increases, the algorithm takes a more conservative approach, behaving more like the traditional A-star algorithm.
Amit Patel's page says
With dynamic weighting, you assume that at the beginning of your
search, it’s more important to get (anywhere) quickly; at the end of
the search, it’s more important to get to the goal.
He is correct, but I would saythat with dynamic weighting, you assume that at the beginning of your search, it's more important to follow your heuristic; at the end of the search, it becomes equally important to consider the length of the path, too.
Additional Materials and Links:
Asst. Prof. Ira Pohl -- The Avoidance of (Relative)
Catastrophe, Heuristic Competence, Genuine DYnamic Weighting and
Computational Issues in Heuristic Problem Solving
Dynamic Weighting on Amit Patel's Variants of A*
Wikipedia -- Bounded Relaxation for the A* Search Algorithm

Related

What is the difference between uniform-cost search and best-first search methods?

Both methods have a data structure which holds the nodes (with their cost) to expand. Both methods first expand the node with the best cost. So, what is the difference between them?
I was told that uniform-cost search is a blind method and best-first search is not, which confused me even more (both have information about node costs or not?).
The difference is in the heuristic function.
Uniform-cost search is uninformed search: it doesn't use any domain knowledge. It expands the least cost node, and it does so in every direction because no information about the goal is provided. It can be viewed as a function f(n) = g(n) where g(n) is a path cost ("path cost" itself is a function that assigns a numeric cost to a path with respect to performance measure, e.g. distance in kilometers, or number of moves etc.). It simply is a cost to reach node n.
Best-first search is informed search: it uses a heuristic function to estimate how close the current state is to the goal (are we getting close to the goal?). Hence our cost function f(n) = g(n) is combined with the cost to get from n to the goal, the h(n) (heuristic function that estimates that cost) giving us f(n) = g(n) + h(n). An example of a best-first search algorithm is A* algorithm.
Yes, both methods have a list of expanded nodes, but best-first search will try to minimize that number of expanded nodes (path cost + heuristic function).
There is a little misunderstanding in here. Uniform cost search, best first search and A* search algorithms are all different algorithms. Uniform cost is an uninformed search algorithm when Best First and A* search algorithms are informed search algorithms. Informed means that it uses a heuristic function for deciding the expanding node. Difference between best first search and A* is that best first uses f(n) = h(n) for expanding and A* uses f(n) = g(n)+h(n) for choosing the expanding node. h(n) is the heuristic function. g(n) is the actual cost from starting node to node n.
https://www.cs.utexas.edu/~mooney/cs343/slide-handouts/heuristic-search.4.pdf It can be seen here with more details.
Slight correction to the accepted answer
Best-first search does not estimate how close to goal the current state is, it estimates how close to goal each of the next states will be (from the current state) to influence the path selected.
Uniform-cost search expands the least cost node (regardless of heuristic), and best-first search expands the least (cost + heuristic) node.
f(n) is the cost function used to evaluate the potential nodes to
expand
g(n) is the cost of moving to a node n
h(n) is the estimated
cost that it will take to get to the final goal state from if we were
to go to n
The f(n) used in uniform-cost search
f(n) = g(n)
The f(n) used in best-first search (A* is an example of best-first search)
f(n) = h(n)
The f(n) used in A* search.
Note: The h(n) from best-first search above is expanded in A* so that it always includes g(n). It is still basically just a heuristic, but it is a heuristic that includes g(n).
f(n) = g(n) + h(n).
Each of these functions is evaluating the potential expansion nodes, not the current node when traversing the tree looking for an n that is a goal state
The differences are given below:
Uniform-cost search (UCS) expands the node with lowest path cost (i.e. with the lowest g(n)), whereas best-first search (BFS) expand the node with closest to the goal
UCS cannot deal with a heuristic function, whereas BFS can deal with a heuristic function
In UCS, f(n) = g(n), whereas, in BFS, f(n) = g(n) + h(n).
Uniform-cost search picks the unvisited node with the lowest distance, calculates the distance through it to each unvisited neighbor, and updates the neighbor's distance if smaller.
Best-first search is an heuristic-based algorithm that attempts to predict how close the end of a path (i.e. the last node in the path) is to the goal node, so that paths which are judged to be closer to a solution are expanded first.

How does the Needleman Wunsch algorithm compare to brute force?

I'm wondering how you can quantify the results of the Needleman-Wunsch algorithm (typically used for aligning nucleotide/protein sequences).
Consider some fixed scoring scheme and two sequences of varying length S1 and S2. Say we calculate every possible alignment of S1 and S2 by brute force, and the highest scoring alignment has a score x. And of course, this has considerably higher complexity than the Needleman-Wunsch approach.
When using the Needleman-Wunsch algorithm to find a sequence alignment, say that it has a score y.
Consider r to be the score generated via Needleman-Wunsch for two random sequences R1 and R2.
How does x compare to y? Is y always greater than r for two sequences of known homology?
In general, I do understand that we use the Needleman-Wunsch algorithm to significantly speed up sequence alignment (vs a brute-force approach), but don't understand the cost in accuracy (if any) that comes with it. I had a go at reading the original paper (Needleman & Wunsch, 1970) but am still left with this question.
Needlman-Wunsch always produces an optimal answer - it's much faster than brute force and doesn't sacrifice accuracy in the process. The key insight it uses is that it's not actually necessary to generate all possible alignments, since most of them contain bad sub-alignments and couldn't possibly be optimal. The Needleman-Wunsch algorithm works by instead slowly building up optimal alignments for fragments of the original strands and then slowly growing those smaller alignments into larger alignments using the guarantee that any optimal alignment must contain an optimal alignment for a slightly smaller case.
I think your question boils down to whether dynamic programming finds the optimal solution ie, garantees that y >= x. For a discussion on this I would refer to people who are likely smarter than me:
https://cs.stackexchange.com/questions/23599/how-is-dynamic-programming-different-from-brute-force
Basically, it says that dynamic programming will likely produce optimal result ie, same as brute force, but only for particular problems that satisfy the Bellman principle of optimality.
According to Wikipedia page for Needleman-Wunsch, the problem does satisfy Bellman principle of optimality:
https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
Specifically:
The Needleman–Wunsch algorithm is still widely used for optimal global
alignment, particularly when the quality of the global alignment is of
the utmost importance. However, the algorithm is expensive with
respect to time and space, proportional to the product of the length
of two sequences and hence is not suitable for long sequences.
There is also mention of optimality elsewhere in the same Wikipedia page.

DFS exploration of nodes vs A*

I'm trying to implement a search for a personal project in which the exploration of the nodes are relatively expensive. I was hesitant between using DFS(Dijkstra's Forward Search) or A*.
My question is, will there be a case where A* explores more nodes than DFS?
Dijkstra's original algorithm does not use a min-priority queue and runs in time |V|^2 (where |V| is the number of nodes). The implementation based on a min-priority queue implemented by a Fibonacci heap and running in O(|E|+|V|\log |V|) (where |E| is the number of edges) is due to (Fredman & Tarjan 1984). This is asymptotically the fastest known single-source shortest-path algorithm for arbitrary directed graphs with unbounded non-negative weights. However, specialized cases (such as bounded/integer weights, directed acyclic graphs etc) can indeed be improved further.
The time complexity of A* depends on the heuristic. In the worst case of an unbounded search space, the number of nodes expanded is exponential in the depth of the solution (the shortest path) d: O(b^d), where b is the branching factor (the average number of successors per state).This assumes that a goal state exists at all, and is reachable from the start state; if it is not, and the state space is infinite, the algorithm will not terminate.
The A* algorithm is a generalization of Dijkstra's algorithm that cuts down on the size of the subgraph that must be explored, if additional information is available that provides a lower bound on the "distance" to the target. This approach can be viewed from the perspective of linear programming: there is a natural linear program for computing shortest paths, and solutions to its dual linear program are feasible if and only if they form a consistent heuristic (speaking roughly, since the sign conventions differ from place to place in the literature). This feasible dual / consistent heuristic defines a non-negative reduced cost and A* is essentially running Dijkstra's algorithm with these reduced costs. If the dual satisfies the weaker condition of admissibility, then A* is instead more akin to the Bellman–Ford algorithm.
Worst case performance O(|E|)=O(b^{d})
and
Worst case space complexity O(|V|)=O(b^{d})
Dijkstra's algorithm can be viewed as a special case of A* where h(x)=0 for all x.
It should be noted, however, that Dijkstra's algorithm can be implemented more efficiently without including a h(x) value at each node.

Information Retrieval: How to combine different word results when using tf-idf?

Let's say I have a user search query which looks like:
"the happy bunny"
I have already computed tf-idf and have something like this (following are made up example values) for each document in which I am searching (of coures the idf is always the same):
tf idf score
the 0.06 1 0.06 * 1 = 0.06
happy 0.002 20 0.002 * 20 = 0.04
bunny 0.0005 60 0.0005 * 60 = 0.03
I have two questions with what to do next.
Firstly, the still has the highest score, even though it is adjusted for rarity by idf, still it's not exactly important - do you think I should square the idf values to weight in terms of rare words, or would this give bad results? Otherwise I'm worried that the is getting equal importance to happy and bunny, and it should be obvious that bunny is the most important word in the search. As long as rare always equals important then it would be always a good idea to weight in terms of rarity, but if that is not always the case then doing so could really mess up the results.
Secondly and more importantly: what is the best/preferred method for combining the scores for each word together to give each document a single score that represents how well it reflects the entire search query? I was thinking of adding them, but it has become apparent that that is going to give higher priority to a document containing 10,000 happy but only 1 bunny instead of another document with 500 happy and 500 bunny (which would be a better match).
First, make sure that you are computing the correct TF-IDF values. As others have pointed they do not look right. TF is relative to specific documents, and we often do not need to compute them for queries (since raw term frequency is almost always 1 in queries). There are different types of TF functions to pick from (check the Wikipedia page on tf-idf, it has a good coverage). Log Normalisation is common and the most efficient scheme, since it saves an extra disk access to get the respective document's total frequency maxF that is needed for something like Double Normalisation. When you are dealing with large volumes of documents this can be expensive, especially if you can't bring these into memory. A bit of insight on inverted files can go a long way in understanding some of the underlying complexities. Log normalisation is efficient and is a non-linear function, therefore better than raw frequency.
Once you are certain on your weighting scheme, then you may want to consider a stop list to get rid of very common/noisy words. These do not contribute to the rank of documents. It is generally recommended to use a stop list of high frequency, very common words. Do a search and you will find many available, including the one that Lucene uses.
The remaining lies on your ranking strategy and that will depend on your implementation/model. The vector space model (VSM) is simple and readily available with libraries like Lucene, Lemur, etc. VSM computes the Dot product or scalar of the weights of common terms between the query and a document. Term weights are normalised via vector length normalisation (which solves your second question), and the result of applying the model is a value between 0 and 1. This is also justified/interpreted as the Cosine of the angle between two vectors in a planar graph, or the Euclidean distance divided by the Euclidean vector length of two vectors.
One of the earliest comprehensive studies on weighting schemes and ranking with VSM is an article by Salton (pdf) and is a good read if you are interested in Information Retrieval. A bit outdated perhaps (notice how log normalisation is not mentioned in the article).
Your best read I believe is the book Introduction to Information Retrieval by Christopher Manning. It will take you through everything that you need to know, from indexing to ranking schemes, etc. A bit lacking on ranking models (does not cover some of the more complex probabilistic approaches).
You should reconsider your TF and IDF values, they do not look correct. The TF value is usually just how often the word occurs, so if the word "the" appeared 20 times it's tf value would be 20. A word like "the" should have a very low IDF value (possibly around 4 decimal places, 0.000...).
You could use stop word removal if word like the are not necessary, they would be removed rather than just given a low score.
A vector space model could be used for this.
can you compute tf-idf for amalgamated terms? That is, you first generate a sentiment that considers each of its component as equal before treating the sentiment as a single term for which you now compute the tf-idf

kd-tree BBF algorithm time complexity

I hava 2000 points with 5000 dimensions , and I want to get the nearest neighbour.
Now I have some problems , could anybody give a answer.
People say , it works good with high dimensions. What's the time complexity ?
#param max_nn_chks search is cut off after examining this many tree entries
After I read the algorithm, I wonder if I would get the wrong answer when I set the max_nn_chks too low. If yes, then just tell me how to set this parameter, else give a reason, thanks.
Is the kdtree the best Data Structures for my data to get nearest neighbour?
The time complexity is basically the same as in restricted KD-Tree search plus some little time to maintain the priority queue. The restricted KD-Tree search algorithm needs to traverse the tree in its full depth (log2 of the point count) times the limit (maximum number of leaf nodes/points allowed to be visited).
Yes, you will get a wrong answer if the limit is too low. You can only measure fraction of true NN found versus number of leaf nodes searched. From this, you can determine your optimal value.
Usually a randomized kd-tree forest and hierarchical k-means tree perform best. FLANN provides a method to determine which algorithm to use (k-means vs randomized kd-tree forest) and sets the optimal parameters for you.
The structure of data also has a big impact. If you know there are clusters of points being close together, for example, you can group them in a single node of a tree (represent them by their centroid, for example) and speed up the search.
Another techniques such as visual words, PCA or random projections can be employed on the data. It's a quite active field of research.

Resources