Take a simple example: if a dog wants to eat all the food in different locations and there are some obstacles in the search path. Then how can I find a heuristic function in the A star algorithm that can reduce the expanded nodes?
I tried several heuristic functions, but they expanded too many nodes. So I wonder what is the best heuristic you can come up with?
Thanks
Related
(reposted from cs.stackexchange since I got no answers or comments)
I want to solve the problem of finding a shortest path on a directed weighted graph from a certain node to any of a specified set of destination nodes (preferably the closest one, but that's not that important). The standard (I believe) way to do this with the A* algorithm is to use a distance-to-closest-goal heuristic (which is admissable) and exit as soon as any of the goal nodes is reached.
However, in my scenario (which is game AI, if that matters) some (or all) of the goals might be unreachable; furthermore, the set of nodes reachable from such goals is typically quite small (or, at least, I want to optimize in that particular case). For the case of a single goal, bidirectional search sounds promising: the reverse search direction would quickly exhaust all reachable nodes and conclude that no path exists. These slides by Andrew Goldberg et al. describe the bidirectional A* algorithm with proper conditions on the heuristics, as well as stopping conditions.
My question is: is there a way to combine these two approaches, i.e. to perform bidirectional A* to find path to any of a specified set of goal nodes? I'm not sure what heuristic function to choose for the reverse search direction, what are the stopping conditions, etc. Googling for anything on this topic didn't get me anywhere either.
What is the difference between informed and uninformed searches? Can you explain this with some examples?
Blind or Uniformed Search
It is a search without "information" about the goal node.
An example an is breadth-first search (BFS). In BFS, the search proceeds one layer after the other. In other words, nodes in the same layer are first visited before nodes in successive layers. This is performed until a node that is "expanded" is the goal node. In this case, no information about the goal node is used to visit, expand or generate nodes.
We can think of a blind or uniformed search as a brute-force search.
Heuristic or Informed Search
It is a search with "information" about the goal.
An example of such type of algorithm is A*. In this algorithm, nodes are visited and expanded using also information about the goal node. The information about the goal node is given by an heuristic function (which is a function that associates information about the goal node to each of the nodes of the state space). In the case of A*, the heuristic information associated with each node n is an estimate of the distance from n to the goal node.
We can think of a informed search as a approximately "guided" search.
An uninformed search is a brute-force or "blind" search. It uses no knowledge about problem, hence possibly less efficient than an informed search.
Examples of uninformed search algorithms are breadth-first search, depth-first search, depth-limited search, uniform-cost search, depth-first iterative deepening search and bidirectional search.
An informed search (also called "heuristic search") uses prior knowledge about problem ("domain knowledge"), hence possibly more efficient than uninformed search.
Examples of informed search algorithms are best-first search and A*.
Difference between uniformed search and informed search are given below :
Uniformed search technique have access only to the problem definition
whereas Informed search technique have access to the heuristic function and
problem definition.
Uniformed search is less efficient whereas informed search is more efficient.
Uniformed search known as blind search whereas Informed search is known as heuristic search.
Uniformed search use more computation whereas Informed search use less computation.
I am working on an NLTK project, intended in principle to be like a standard thesaurus but (quasi-)continuous. To take one example, there are dozens of entries connected with books, including both religious classics and ledgers.
I tried fiddling with some terms, but I seemed to get just a smaller slice of the pie by doing that. (A "ledger" result contained "daybook" but the substances was a much smaller collection than one would find by reading a book.) The discussion of "synsets" in the documentation seem to imply both that you can find terms close to an existing terms, but the synsets are like islands, or see such to me.
What (if any) means are there to say something like "I want all words with a high match score above XYZ threshold" or "I want to match the n closest related terms." The documentation looks like this is possible, with a really nice way of calculating a proximity score between two words, but don't see how to adjust the threshold or alternately how to request the n closest matches.
What are my best bets here?
If you want to be able to compute distance between arbitrary pairs of words, WordNet is the wrong tool for the job: It is a network of particular terms, so either there is a path between two nodes or there is not. Look around for corpus-based measures instead.
A quick google gave this thread (not on SO) that could serve as a starting point.
In the nltk, I would start by taking a look at nltk.text.ContextIndex, which seems to be behind the nltk demo function nltk.Text.similar(). It won't calculate distances between pairs of words, but at least you'll have a rich network of contexts you can start from.
>>> contexts = nltk.text.ContextIndex(nltk.corpus.brown.words()[:100000])
>>> contexts.similar_words("fact")
['jury', 'announcement', 'Washington', 'addition', '1961', 'impression',
'news', 'belief', 'commissioners', 'Laos', 'return', '1959', '1960', '1956',
'result', 'University', 'opinion', 'work', 'course', 'hope']
I'll leave it to you to remove punctuation, stopwords etc. I haven't looked at the algorithms behind this, but you can always implement your own favorite algorithm if this doesn't do the job for you.
I have a python app with a database of businesses and I want to be able to search for businesses by name (for autocomplete purposes).
For example, consider the names "best buy", "mcdonalds", "sony" and "apple".
I would like "app" to return "apple", as well as "appel" and "ple".
"Mc'donalds" should return "mcdonalds".
"bst b" and "best-buy" should both return "best buy".
Which algorithm am I looking for, and does it have a python implementation?
Thanks!
The Levenshtein distance should do.
Look around - there are implementations in many languages.
Levenshtein distance will do this.
Note: this is a distance, you have to calculate it to every string in your database, which can be a big problem if you have a lot of entries.
If you have this problem then record all the typos the users make (typo=no direct match) and offline build a correction database which contains all the typo->fix mappings. some companies do this even more clever, eg: google watches how users correct their own typos and learns the mappings from this.
Soundex or Metaphone might work.
I think what you are looking for is a huge field of Data Quality and Data Cleansing. I fear if you could find a python implementation regarding this as it has to be something which cleanses considerable amount of data in db which could be of business value.
Levensthein distance goes in the right direction but only half the way. There are several tricks to get it to use the half matches as well.
One would be to use a subsequence dynamic time warping (DTW is actually a generalization of levensthein distance). For this you relax the start and end cases when calcualting the cost matrix. If you only relax one of the conditions you can get autocompletion with spell checking. I am not sure if there is a python implementation available, but if you want to implement it for yourself it should not be more than 10-20 LOC.
The other idea would be to use a Trie for speed up, which can do DTW/Levensthein on multiple results simultaniously (huge speedup if your database is large). There is a paper on Levensthein on Tries at IEEE, so you can find the algorithm there. Again for this you would need to relax the final boundary condition, so you get partial matches. However since you step down in the trie you just need to check when you have fully consumed the input and then return all leaves.
check this one http://docs.python.org/library/difflib.html
it should help you
I would like to better understand how the various common search algorithms relate to each other. Does anyone know of a resource, such as a hierarchy diagram or concise textual description of this?
A small example of what I mean is:
A* Search
-> Uniform-cost is a variant of A* where the heuristic is a constant function
-> Dijkstra's is a variant of uniform-cost search with no goal
-> Breadth-first search is a variant of A* where all step costs are +ve and identical
etc.
Thanks!
There is no hierarchy as such, just a bunch of different algorithms with different traits.
eg. A* can be considered to be based on Dijkstra's, with an added heuristic.
Or it can be considered to be based on a heuristic-based best-first search, with an additional factor of the path cost so far.
Similarly, A* is implemented much the same way as a typical breadth-first search is (ie. with a queue of nodes). Iteratively-deepening A* (IDA*) is based on A* in that it uses the same cost and heuristic measurements, but is actually implemented as a depth-first search method.
There's also a big crossover with optimisation algorithms here. Some people think of genetic algorithms as a bunch of complex hill-climbing attempts, but others consider it a form of beam search.
It's common for search and optimisation algorithms to draw properties from more than one source, to mix and match approaches to make them more relevant either to the search domain or the computing requirements, so rather than a hierarchy of methods you'll find a selection of themes that crop up across various approaches.
Try this http://en.wikipedia.org/wiki/Search_algorithm#Classes_of_search_algorithms