Compare Depth first branch and bound and IDA* search algorithm - search

I want to compare and know precise differences between depth first branch and bound and IDA* algorithms. I browsed the internet but i am unable to find clear explanations. Please help!

IDA* does a f-cost limited depth-first search, pruning paths that are more expensive (using the lower-bound heuristic) than the current cost bound. It gradually increases the bound until a solution is found.
DFBnB searches through a tree keeping track of the best solution found thus far, gradually decreasing the cost of the best solution until it is optimal. DFBnB also uses a lower-bound heuristic to prune any paths that are more expensive than the current best solution.
Some algorithms, like Budgeted Tree Search, do both types of pruning - using both current cost bounds and the best found solution thus far.

Related

Hypothesis search tree

I have a object with many fields. Each field has different range of values. I want to use hypothesis to generate different instances of this object.
Is there a limit to the number of combination of field values Hypothesis can handle? Or what does the search tree hypothesis creates look like? I don't need all the combinations but I want to make sure that I get a fair number of combinations where I test many different values for each field. I want to make sure Hypothesis is not doing a DFS until it hits the max number of examples to generate
TLDR: don't worry, this is a common use-case and even a naive strategy works very well.
The actual search process used by Hypothesis is complicated (as in, "lead author's PhD topic"), but it's definitely not a depth-first search! Briefly, it's a uniform distribution layered on a psudeo-random number generator, with a coverage-guided fuzzer biasing that towards less-explored code paths, with strategy-specific heuristics on top of that.
In general, I trust this process to pick good examples far more than I trust my own judgement, or that of anyone without years of experience in QA or testing research!

How to impure A* algorithm to support multi-searching in a maze

If I have a A* function that supports finding the optimal path from a starting point to a target in a maze, how should I modify the heuristic function to be admissible so that if there are multiple targets the function still return the optimal result.
Assuming that the problem is to visit only one target:
The first solution that comes to mind is to loop over all possible targets, compute the admissible heuristic value for each of them, and then finally return the minimum of those values as the final heuristic value. That way you're sure that the heuristic is still admissible (does not overestimate the distance to any of the targets).
EDIT: Now assuming that the problem is to visit all targets:
In this case, A* may not even necessarily be the best algorithm to use. You can use A* with the heuristic as described above to first find the shortest path to the closest target. Then, upon reaching the first (closest) target, you can run it again in the same way to find the next closest target, etc.
But this may not give an optimal overall solution. It is possible that it is beneficial to first visit the second closest target (for example), because in some cases that may enable a cheaper follow-up path to the second target.
If you want an optimal overall solution, you'll want to have a look at a different overall approach probably. Your problem will be similar to the Travelling Salesman Problem (though not exactly the same, since I think you're allowed to visit the same point multiple times for example). This link may also be able to help you.

When is Alpha-Beta Pruning inefficient

Is there any case where we can say Alpha-Beta pruning is inefficient. In other words, let's say we have a game where you have to reach 27 to win, and you and your opponent may only use 1,2,5 each time to add up. So is Alpha-Beta pruning efficient in here? Isn't it a little bit confusing to evaluate it that way, especially at the begining of our case where there are a lot of possibilities which we don't really care about?
I feel like I can explain this, but I can't! Help.
For this game, it might happen that it is possible to reduce it to some mathematical formula, and the tree search and alpha-beta pruning would be overkill.
But let's say it is not possible. You have a game with two or three outcomes: LOSS(-1), WIN(1) and possibly DRAW(0), and no meaningful evaluation of intermediate positions. Then you would need to search to the end of each variation, and so e.g. iterative deepening would be pointless.
However, alpha-beta pruning could be very efficient: If beta=-1 (meaning the opponent has found a win), you can just return -1 right away, without even searching a PV. If beta=0, the only time you would need to search all child nodes is when all (except possibly the last) moves lose.
The condition for alpha-beta to sufficiently efficient is, of course, that the complete tree is small enough to traverse in reasonable time.
EDIT: I forgot to mention that for your particular example, remembering evaluations would have much, much greater effect than alpha-beta pruning with regard to the number of nodes traversed (from 2688332 to 77).

String Matching Algorithms

I have a python app with a database of businesses and I want to be able to search for businesses by name (for autocomplete purposes).
For example, consider the names "best buy", "mcdonalds", "sony" and "apple".
I would like "app" to return "apple", as well as "appel" and "ple".
"Mc'donalds" should return "mcdonalds".
"bst b" and "best-buy" should both return "best buy".
Which algorithm am I looking for, and does it have a python implementation?
Thanks!
The Levenshtein distance should do.
Look around - there are implementations in many languages.
Levenshtein distance will do this.
Note: this is a distance, you have to calculate it to every string in your database, which can be a big problem if you have a lot of entries.
If you have this problem then record all the typos the users make (typo=no direct match) and offline build a correction database which contains all the typo->fix mappings. some companies do this even more clever, eg: google watches how users correct their own typos and learns the mappings from this.
Soundex or Metaphone might work.
I think what you are looking for is a huge field of Data Quality and Data Cleansing. I fear if you could find a python implementation regarding this as it has to be something which cleanses considerable amount of data in db which could be of business value.
Levensthein distance goes in the right direction but only half the way. There are several tricks to get it to use the half matches as well.
One would be to use a subsequence dynamic time warping (DTW is actually a generalization of levensthein distance). For this you relax the start and end cases when calcualting the cost matrix. If you only relax one of the conditions you can get autocompletion with spell checking. I am not sure if there is a python implementation available, but if you want to implement it for yourself it should not be more than 10-20 LOC.
The other idea would be to use a Trie for speed up, which can do DTW/Levensthein on multiple results simultaniously (huge speedup if your database is large). There is a paper on Levensthein on Tries at IEEE, so you can find the algorithm there. Again for this you would need to relax the final boundary condition, so you get partial matches. However since you step down in the trie you just need to check when you have fully consumed the input and then return all leaves.
check this one http://docs.python.org/library/difflib.html
it should help you

Resource outlining hierarchy of search algorithms?

I would like to better understand how the various common search algorithms relate to each other. Does anyone know of a resource, such as a hierarchy diagram or concise textual description of this?
A small example of what I mean is:
A* Search
-> Uniform-cost is a variant of A* where the heuristic is a constant function
-> Dijkstra's is a variant of uniform-cost search with no goal
-> Breadth-first search is a variant of A* where all step costs are +ve and identical
etc.
Thanks!
There is no hierarchy as such, just a bunch of different algorithms with different traits.
eg. A* can be considered to be based on Dijkstra's, with an added heuristic.
Or it can be considered to be based on a heuristic-based best-first search, with an additional factor of the path cost so far.
Similarly, A* is implemented much the same way as a typical breadth-first search is (ie. with a queue of nodes). Iteratively-deepening A* (IDA*) is based on A* in that it uses the same cost and heuristic measurements, but is actually implemented as a depth-first search method.
There's also a big crossover with optimisation algorithms here. Some people think of genetic algorithms as a bunch of complex hill-climbing attempts, but others consider it a form of beam search.
It's common for search and optimisation algorithms to draw properties from more than one source, to mix and match approaches to make them more relevant either to the search domain or the computing requirements, so rather than a hierarchy of methods you'll find a selection of themes that crop up across various approaches.
Try this http://en.wikipedia.org/wiki/Search_algorithm#Classes_of_search_algorithms

Resources