Is A* an optimal search algorithm (ie. will it find the best solution) even if it is non-monotonic? Why or Why not?
A* is an optimal search algorithm as long as the heuristic is admissible.
But, if the heuristic is inconsistent, you will need to re-expand nodes to ensure optimality. (That is, if you find a shorter path to a node on the closed list, you need to update the g-cost and put it back on the open list.)
This re-expansion can introduce exponential overhead in state spaces with exponentially growing edge costs. There are variants of A* which reduce this to polynomial overhead by essentially interleaving a dijkstra search in between each A* node expansion.
This paper has an overview of recent work, and particularly citations of other work by Martelli and Mero who detail these worst-case graphs and suggest optimizations to improve A*.
A* is an optimal search algorithm if and only if the heuristic is both admissible and monotonic (also referred to as consistent). For a discussion of the distinction between these criteria, see this question. Effectively, the consistency requirement means that the heuristic cannot overestimate the distance between any pair of nodes in the graph being searched. This is the necessary because any overestimate could result in the search ignoring a good path in favor of a path that is actually worse.
Let's walk through an example of why this is true. Consider two nodes, A and B, with heuristic estimates of 5 and 6, respectively. If the heuristic is admissible and consistent, the shortest path that could possibly exist goes through A and is no shorter than 5. But what if the heuristic isn't consistent? In order for a heuristic to be admissible but inconsistent, it must always underestimate the distance to the goal but not result in there being a clear relationship between heuristic estimates at nodes within the graph. For a concrete example, see my answer to this question. In this example, the heuristic function randomly chooses between two other functions. If the heuristic estimates at A and B were not calculated based on the same function, we don't actually know which one of them currently has the potential to lead to a shorter path. In effect, we aren't using the same scale to measure them. So we could choose A when B was actually the better option. This might result in us finding the goal through a sub-optimal path.
Related
As the title says, this is what I'm thinking. A trivial heuristic, say estimate = 0 for every node, essentially makes it count only the current cost. Is there a still a difference between the two algorithms if the heuristic is trivial?
You can, as you say, use a heuristic of 0 for every node and functionally they will be very close.
There is one subtle difference between A* and Uniform Cost Search (UCS): UCS knows that all edges are cost 1, so it can terminate when the goal is generated, while A* can only terminate when the goal is expanded.
The other difference is the complexity of data structures needed. A* might have arbitrary edge costs and so it needs more complicated priority queues to be efficient.
In Haskell or some other functional programming language, how would you implement a heuristic search?
Take as an example search space, the nine-puzzle, that is a 3x3 grid with 8 tiles and 1 hole, and you move tiles into the hole until you have correctly assembled a picture. The heuristic is the "Manhattan heuristic", which evaluates a board position adding up the distance each tile is from its target position, taking as the distance the number of squares horizontally plus the number of squares vertically each tile needs to be moved to get to the correct location.
I have been reading John Hughes paper on pretty printing as I know that pretty printer back-tracks to find better solutions. I am trying to understand how to generalise a heuristic search along these lines.
===
Note that my ultimate aim here is not to write a solver for the 9-puzzle, but to learn some general techniques for writing efficient heuristic searches in FP languages. I am also interested to learn if there is code that can be generalised and re-used across a wider class of such problems, rather than solving any specific problem.
For example, a search space can be characterised by a function that maps a State to a List of States together with some 'operation' that describes how one state is transitioned into another. There could also be a goal function, mapping a State to Bool, indicating when a goal State has been reached. And of course, the heuristic function mapping a State to a Number reflecting how well it is estimated to score. Other descriptions of the search are possible.
I don't think it's necessarily very specific to FP or Haskell (unless you utilize lists as "multiple possibility" monads, as in Learn You A Haskell For Great Good).
One way to do it would be by writing a recursive function taking the following:
the current state (that is the board configuration)
possibly some path metadata, e.g., the number of steps from the initial configuration (which is just the recursion depth), or a memoization-map of all the states already considered
possibly some decision, metadata, e.g., a pesudo-random number generator
Within each recursive call, the function would take the state, and check if it is the required result. If not it would
if it uses a memoization map, check if a choice was already considered
If it uses a recursive-step count, check whether to pursue the choices further
If it decides to recursively call itself on the possible choices emanating from this state (e.g., if there are different tiles which can be pushed into the hole), it could do so in the order based on the heuristic (or possibly pseudo-randomly based on the order based on the heuristic)
The function would return whether it succeeded, and, if they are used, updated versions of the memoization map and/or pseudo-random number generator.
I'm quite sure the * (star) in the A* algorithm means that the algorithm is admissible, i.e. it is guaranteed that it finds the shortest path in the graph if this path exists (when the heuristic employed is optimistic).
Am I right? I was unsuccessfully looking for any info about the topic but I couldn't find any reference. Hopefully, most experienced users in this community know something else about the history of A* than I do.
By the way, I think that other algorithms like IDA*, D*, SMA*, MOA*, NAMOA*, ... that are based on A* follow the same name convention.
The reason is that scientists first came up with an improved version of the Dijkstra algorithm they called A1. Later on, the inventors of A* discovered an improvement of A1 that they called A2. These people then managed to prove that A2 was actually optimal under some assumptions on the heuristic in use. Because A2 was optimal, it was renamed A*. In science, and in optimisation in particular, a " * " symbol is often used to denote optimal solutions. Some also interpret the " * " as meaning "any version number" since it was proven impossible to build an "A3" algorithm that would outperform A2/A*.
By the way, in this context, "optimal" doesn't mean that it reaches the optimal solution, but that it does so while exploring the minimum number of nodes. Of course, A* is also complete, which means it reaches the optimal solution (if we use an admissible heuristic).
It is mentioned in Norvig's Artificial Intelligence that A* Search is optimally efficient. However, I could not figure out why nor find on the web the proof. Does anyone happen to have a proof?
I hope I'm not doing your homework ;). I only sketch the proof here
First thing you need to see is that A* is optimal. That is it returns the shortest path according to your cost function g. I think this prove is trivial under the assumption that the heuristic h is not overestimating the cost of the solution. If this wouldn't hold optimal efficiency would be meaningless, as A* wouldn't be optimal.
Optimal efficiency: among all optimal algorithms starting from the same name node A* is expending the fewest nodes.
Lets assume an algorithm B does not expand a node n which is A* expanded by A*. By definition for this path g(n)+h(n) <= f where f is the cost of the shortest path. Consider a second problem for which all heuristics values are the same as in the original problem. However, there is a new path to a new goal with total cost smaller f.
The assumed algorithm B would expand n hence never reach this new goal. Hence, B wouldn't find this optimal path. Therefore, our original assumption that B is optimal is violated.
Does anyone have a good reference for the connections between A-star search and more general integer programming formulations for a Euclidean shortest path problem?
In particular I'm interested in how one modifies A-star to cope with additional (perhaps path-dependent) constraints, if it makes sense to use a general-purpose LP/IP solver to tackle constrained shortest path problems like this or if something more specialised is required to achieve the same kind of performance obtained by A-star together with a good heuristic.
Not afraid of maths, but most of the references I'm finding for more complex shortest path problems aren't very explicit about how they relate to heuristic-guided algorithms like A* (perhaps because 'A*' is hard to google for...)
You might want to look into constraint optimization, specifically soft-arc consistency, and constraint satisfaction, specifically arc-consistency, or other types of consistency such as i-consistency. Here's some references about constraint optimization:
[1] Thomas Schiex. Soft constraint Processing. http://www.inra.fr/mia/T/schiex/Export/Ecole.pdf
[2] Dechter, Rina. Constraint Processing, 1st ed. Morgan Kaufmann, San Francisco, CA 94104-3205, 2003.
[3] Kask, K., and Dechter, R. Mini-Bucket Heuristics for Improved Search. In Proc. UAI-1999 (San Francisco, CA, 1999), Morgan Kaufmann, pp. 314–323.
[3] might be especially interesting because it deals with combining A* with a heuristic of the type you seem to be interested in.
I'm not sure whether this helps you. Here's how I got the idea that it might:
Constraint optimization is a generalization of SAT towards optimization and variables with more than two values. A set of soft-constraints, i.e. partial cost functions, and a set of discrete variables define your problem. Typically a branch-and-bound algorithm is used to traverse the search tree that this problem implies. Soft-arc consistency refers to a set of heuristics that use local soft-constraints to compute the approximate distance to the goal node in that search tree, from your current position. These heuristics are used within the branch-and-bound search, much like heuristics are used within A* search.
Branch-and-bound relates to A* over trees much the same way that depth-first search relates to breadth-first search. So, apart from the fact that a DFS-like algorithm (branch-and-bound) is used in this case, and that it is a tree instead of a graph, it looks like (soft)-arc consistency or other types of consistency is what you are looking for.
Unfortunately, while you can in principle use A* in place of branch-and-bound, it is not clear yet (as far as I know) how in general you could combine A* with soft-arc consistency. Going from a tree to a graph might further complicate things, but I don't know that.
So, no final answer, just some stuff to look at as a starter, maybe :).