What is the difference between informed and uninformed searches? Can you explain this with some examples?
Blind or Uniformed Search
It is a search without "information" about the goal node.
An example an is breadth-first search (BFS). In BFS, the search proceeds one layer after the other. In other words, nodes in the same layer are first visited before nodes in successive layers. This is performed until a node that is "expanded" is the goal node. In this case, no information about the goal node is used to visit, expand or generate nodes.
We can think of a blind or uniformed search as a brute-force search.
Heuristic or Informed Search
It is a search with "information" about the goal.
An example of such type of algorithm is A*. In this algorithm, nodes are visited and expanded using also information about the goal node. The information about the goal node is given by an heuristic function (which is a function that associates information about the goal node to each of the nodes of the state space). In the case of A*, the heuristic information associated with each node n is an estimate of the distance from n to the goal node.
We can think of a informed search as a approximately "guided" search.
An uninformed search is a brute-force or "blind" search. It uses no knowledge about problem, hence possibly less efficient than an informed search.
Examples of uninformed search algorithms are breadth-first search, depth-first search, depth-limited search, uniform-cost search, depth-first iterative deepening search and bidirectional search.
An informed search (also called "heuristic search") uses prior knowledge about problem ("domain knowledge"), hence possibly more efficient than uninformed search.
Examples of informed search algorithms are best-first search and A*.
Difference between uniformed search and informed search are given below :
Uniformed search technique have access only to the problem definition
whereas Informed search technique have access to the heuristic function and
problem definition.
Uniformed search is less efficient whereas informed search is more efficient.
Uniformed search known as blind search whereas Informed search is known as heuristic search.
Uniformed search use more computation whereas Informed search use less computation.
Related
(reposted from cs.stackexchange since I got no answers or comments)
I want to solve the problem of finding a shortest path on a directed weighted graph from a certain node to any of a specified set of destination nodes (preferably the closest one, but that's not that important). The standard (I believe) way to do this with the A* algorithm is to use a distance-to-closest-goal heuristic (which is admissable) and exit as soon as any of the goal nodes is reached.
However, in my scenario (which is game AI, if that matters) some (or all) of the goals might be unreachable; furthermore, the set of nodes reachable from such goals is typically quite small (or, at least, I want to optimize in that particular case). For the case of a single goal, bidirectional search sounds promising: the reverse search direction would quickly exhaust all reachable nodes and conclude that no path exists. These slides by Andrew Goldberg et al. describe the bidirectional A* algorithm with proper conditions on the heuristics, as well as stopping conditions.
My question is: is there a way to combine these two approaches, i.e. to perform bidirectional A* to find path to any of a specified set of goal nodes? I'm not sure what heuristic function to choose for the reverse search direction, what are the stopping conditions, etc. Googling for anything on this topic didn't get me anywhere either.
I am working on a pet search engine (SE).
What I have right now is boolean keyword SE, as a library that is split in two parts:
index: this is a inverted index ie. it associate terms with the original document where it appears
query: which is supplied by the user and can be arbitrarily complex boolean expression that looks like (mobile OR android OR iphone) AND game
I'd like to improve the search engine, in a way that does automatically extend simple queries to boolean queries so that it includes search terms that do no appear in the original query ie. I'd like to support synonyms.
I need some help to build the synonyms graph.
How can I compute list of words that appears in similar context?
Here is example of list of synonyms I'd like to compute:
psql, pgsql, postgres, postgresql
mobile, iphone, android
and also synonyms that includes ngrams like:
rdbms, relational database management systems, ...
The algorithm doesn't have to be perfect, I can post-process by hand the result, but at least I need to have a clue about what terms are similar to what other terms.
In the standard Information Retrieval (IR) literature, this enrichment of a query with additional terms (that don't appear in the initial/original query) is known as query expansion.
There're a plenty of standard approaches which, generally speaking, are based on the idea of scoring terms based on some factors and then selecting a number of terms (say K, a parameter) that have the highest scores.
To compute the term selection score, it is assumed that the top (M) ranked documents retrieved after initial retrieval are relevant, this being called pseudo-relevance feedback.
The factors on which the term selection function generally depend are:
The term frequency of a term in a top ranked document - higher the better.
The number of documents (out of top M) in which the term occurs in - higher the better.
How many times does an additional term co-occur with a query term - the higher the better.
The co-occurrence factor is the most important and would be give you terms such as 'pgsql' if the original query contains 'psql'.
Note that if documents are too short, this method would not work well and you have to use other methods that are necessarily semantics based such as i) word-vector based expansion or ii) wordnet-based expansion.
I have a python app with a database of businesses and I want to be able to search for businesses by name (for autocomplete purposes).
For example, consider the names "best buy", "mcdonalds", "sony" and "apple".
I would like "app" to return "apple", as well as "appel" and "ple".
"Mc'donalds" should return "mcdonalds".
"bst b" and "best-buy" should both return "best buy".
Which algorithm am I looking for, and does it have a python implementation?
Thanks!
The Levenshtein distance should do.
Look around - there are implementations in many languages.
Levenshtein distance will do this.
Note: this is a distance, you have to calculate it to every string in your database, which can be a big problem if you have a lot of entries.
If you have this problem then record all the typos the users make (typo=no direct match) and offline build a correction database which contains all the typo->fix mappings. some companies do this even more clever, eg: google watches how users correct their own typos and learns the mappings from this.
Soundex or Metaphone might work.
I think what you are looking for is a huge field of Data Quality and Data Cleansing. I fear if you could find a python implementation regarding this as it has to be something which cleanses considerable amount of data in db which could be of business value.
Levensthein distance goes in the right direction but only half the way. There are several tricks to get it to use the half matches as well.
One would be to use a subsequence dynamic time warping (DTW is actually a generalization of levensthein distance). For this you relax the start and end cases when calcualting the cost matrix. If you only relax one of the conditions you can get autocompletion with spell checking. I am not sure if there is a python implementation available, but if you want to implement it for yourself it should not be more than 10-20 LOC.
The other idea would be to use a Trie for speed up, which can do DTW/Levensthein on multiple results simultaniously (huge speedup if your database is large). There is a paper on Levensthein on Tries at IEEE, so you can find the algorithm there. Again for this you would need to relax the final boundary condition, so you get partial matches. However since you step down in the trie you just need to check when you have fully consumed the input and then return all leaves.
check this one http://docs.python.org/library/difflib.html
it should help you
I would like to better understand how the various common search algorithms relate to each other. Does anyone know of a resource, such as a hierarchy diagram or concise textual description of this?
A small example of what I mean is:
A* Search
-> Uniform-cost is a variant of A* where the heuristic is a constant function
-> Dijkstra's is a variant of uniform-cost search with no goal
-> Breadth-first search is a variant of A* where all step costs are +ve and identical
etc.
Thanks!
There is no hierarchy as such, just a bunch of different algorithms with different traits.
eg. A* can be considered to be based on Dijkstra's, with an added heuristic.
Or it can be considered to be based on a heuristic-based best-first search, with an additional factor of the path cost so far.
Similarly, A* is implemented much the same way as a typical breadth-first search is (ie. with a queue of nodes). Iteratively-deepening A* (IDA*) is based on A* in that it uses the same cost and heuristic measurements, but is actually implemented as a depth-first search method.
There's also a big crossover with optimisation algorithms here. Some people think of genetic algorithms as a bunch of complex hill-climbing attempts, but others consider it a form of beam search.
It's common for search and optimisation algorithms to draw properties from more than one source, to mix and match approaches to make them more relevant either to the search domain or the computing requirements, so rather than a hierarchy of methods you'll find a selection of themes that crop up across various approaches.
Try this http://en.wikipedia.org/wiki/Search_algorithm#Classes_of_search_algorithms
I've seen a few sites that list related searches when you perform a search, namely they suggest other search queries you may be interested in.
I'm wondering the best way to model this in a medium-sized site (not enough traffic to rely on visitor stats to infer relationships). My initial thought is to store the top 10 results for each unique query, then when a new search is performed to find all the historical searches that match some amount of the top 10 results but ideally not matching all of them (matching all of them might suggest an equivalent search and hence not that useful as a suggestion).
I imagine that some people have done this functionality before and may be able to provide some ideas of different ways to do this. I'm not necessarily looking for one winning idea since the solution will no doubt vary substantially depending on the size and nature of the site.
have you considered a matrix of with keywords on 1 axis vs. documents on another axis. once you find the set of vetors representing the keywords, find sets of keyword(s) found in your initial result set and then find a way to rank the other keywords by how many documents they reference or how many times they interset the intial result set.
I've tried a number of different approaches to this, with various degrees of success. In the end, I think the best approach is highly dependent on the domain/topics being searched, and how the users form queries.
Your thought about storing previous searches seems reasonable to me. I'd be curious to see how it works in practice (I mean that in the most sincere way -- there are many nuances that can cause these techniques to fail in the "real world", particularly when data is sparse).
Here are some techniques I've used in the past, and seen in the literature:
Thesaurus based approaches: Index into a thesaurus for each term that the user has used, and then use some heuristic to filter the synonyms to show the user as possible search terms.
Stem and search on that: Stem the search terms (eg: with the Porter Stemming Algorithm and then use the stemmed terms instead of the initially provided queries, and given the user the option of searching for exactly the terms they specified (or do the opposite, search the exact terms first, and use stemming to find the terms that stem to the same root. This second approach obviously takes some pre-processing of a known dictionary, or you can collect terms as your indexing term finds them.)
Chaining: Parse the results found by the user's query and extract key terms from the top N results (KEA is one library/algorithm that you can look at for keyword extraction techniques.)