Proof of optimal efficiency of A* Search - search

It is mentioned in Norvig's Artificial Intelligence that A* Search is optimally efficient. However, I could not figure out why nor find on the web the proof. Does anyone happen to have a proof?

I hope I'm not doing your homework ;). I only sketch the proof here
First thing you need to see is that A* is optimal. That is it returns the shortest path according to your cost function g. I think this prove is trivial under the assumption that the heuristic h is not overestimating the cost of the solution. If this wouldn't hold optimal efficiency would be meaningless, as A* wouldn't be optimal.
Optimal efficiency: among all optimal algorithms starting from the same name node A* is expending the fewest nodes.
Lets assume an algorithm B does not expand a node n which is A* expanded by A*. By definition for this path g(n)+h(n) <= f where f is the cost of the shortest path. Consider a second problem for which all heuristics values are the same as in the original problem. However, there is a new path to a new goal with total cost smaller f.
The assumed algorithm B would expand n hence never reach this new goal. Hence, B wouldn't find this optimal path. Therefore, our original assumption that B is optimal is violated.

Related

Trivial Heuristic for A* essentially makes it Equivalent to Uniform Cost Search?

As the title says, this is what I'm thinking. A trivial heuristic, say estimate = 0 for every node, essentially makes it count only the current cost. Is there a still a difference between the two algorithms if the heuristic is trivial?
You can, as you say, use a heuristic of 0 for every node and functionally they will be very close.
There is one subtle difference between A* and Uniform Cost Search (UCS): UCS knows that all edges are cost 1, so it can terminate when the goal is generated, while A* can only terminate when the goal is expanded.
The other difference is the complexity of data structures needed. A* might have arbitrary edge costs and so it needs more complicated priority queues to be efficient.

What is the difference between overlapping subproblems and optimal substructure?

I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
For a problem like the Rod Cutting Problem. In this case while finding the optimal cut, do we consider each cut hence it can be considered as Overlapping Subproblem and work bottom-up. Or do we consider the optimal cut for a given input n and work top-down.
Hence, while they do deal with the optimality in the end, what are the exact differences between the two approaches.
I tried referring to this Overlapping Subproblem, Optimal Substructure and this page as well.
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This thread makes a valid point but I'm hoping if it could be broken down easier.
To answer your main question: overlapping subproblems and optimal substructure are both different concepts/properties, a problem that has both these properties or conditions being met can be solved via Dynamic Programming. To understand the difference between them, you actually need to understand what each of these term means in regards to Dynamic Programming.
I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
This is a poorly worded statement. You need to familiarize yourself with the basics of Dynamic Programming. Hopefully following explanation will help you get started.
Let's start with defining what each of these terms, Optimal Substructure & Overlapping Subproblems, mean.
Optimal Substructure: If optimal solution to a problem, S, of size n can be calculated by JUST looking at optimal solution of a subproblem, s, with size < n and NOT ALL solutions to subproblem, AND it will also result in an optimal solution for problem S, then this problem S is considered to have optimal substructure.
Example (Shortest Path Problem): consider a undirected graph with vertices a,b,c,d,e and edges (a,b), (a,e), (b,c), (c,d), (d,a) & (e,b) then shortest path between a & c is a -- b -- c and this problem can be broken down into finding shortest path between a & b and then shortest path between b & c and this will give us a valid solution. Note that we have two ways of reaching b from a:
a -- b (Shortest path)
a -- e -- b
Longest Path Problem does not have optimal substructure. Longest path between a & d is a -- e -- b -- c -- d, but sum of longest paths between a & c (a -- e -- b -- c) and c & d (c -- b -- e -- a -- d) won't give us a valid (non-repeating vertices) longest path between a & d.
Overlapping Subproblems: If you look at this diagram from the link you shared:
You can see that subproblem fib(1) is 'overlapping' across multiple branches and thus fib(5) has overlapping subproblems (fib(1), fib(2), etc).
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This again is a poorly worded question. Top-down(recursive) and bottom-up(iterative) approaches are different ways of solving a DP problem using memoization. From the Wikipedia article of Memoization:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
For the given fibonacci example, if we store fib(1) in a table after it was encountered the first time, we don't need to recompute it again when we see it next time. We can reuse the stored result and hence saving us lot of computations.
When we implement an iterative solution, "table" is usually an array (or array of arrays) and when we implement a recursive solution, "table" is usually a dynamic data structure, a hashmap (dictionary).
You can further read this link for better understanding of these two approaches.

Does A* need to know the optimal solution cost while utilizing an admissible heuristic?

I've read through a few stackoverflows on this topic, as well as the wikipedia on A* but I'm still a little confused. I think this post almost explained it completely to me: A* heuristic, overestimation/underestimation?
My only confusion left is, how does the A* know the optimal solution? It seems like with an admissible heuristic, you can throw out paths that exceed the known optimal solution, because the heuristic is guaranteed to be less than or equal. But how would A* know the optimal ahead of time?
Would this search work and guarantee an optimal solution if you didn't know the optimal path cost?
A* does not know the optimal solution, the heuristic gives only an educated guess which helps to accelerate the search process. Seeing that you already read some theoretical explanations, let's try a different approach here, with an example:
Starting at the green node, A* explores the node with the smallest cost + heuristic (1.5 + 4 = 5.5, node 'a'). Cost + heuristic can be read as "how much until there plus how much I think is left to the goal". In other words, it's the estimated total cost to the goal. So it makes sense that we select the smallest value. Node 'd' has a higher cost (2 + 4.5 = 6.5), so we just leave it at the queue.
By expanding 'a' neighbors, we add 'b' to the queue and compute its value, which is 1.5 + 2 + 2 = 5.5 (cost until there in bold, the other term is how much I think is left). It is still better than the cost for 'd', so we keep exploring this path. Note that the heuristic in 'b' is 2, which means that we think that this is the addional cost remaining to get to the goal... which is clearly wrong, there is no way to get there from 'b' with cost 2! But it poses no problem to the A* algorithm, because we are underestimating the real cost.
Expanding 'b', we add its neighbor 'c' do the queue with value 1.5 + 2 + 3 + 4 = 10.5. Now, remember that 'd' is still in the queue? And now it has the smallest value (6.5), so we leave 'c' in the queue and try 'd' because it is a more promising path. This decision is possible because we know that it has a cost of 6.5 only to get to 'c', and we think that there is still a cost of 4 to get to the goal. In this case, the heuristic is correct, which is also ok for the A* algorithm.
By expanding 'd' we add 'e' to the queue with value 2 + 3 + 2 = 7. The heuristic here is correct, which we already know that is ok for A*. Then we would explore 'e' and find the goal. But let's suppose we had h(e) = 6, giving 'e' a value of 2 + 3 + 6 = 11. It would mean that 'c' would be the next best guess (10.5) and we would try a hopeless path! It means that overestimating the heuristic is not admissible, since it makes A* take the wrong exploration path.
If you're looking for proofs, here is an informal one from Wikipedia for admissibility:
When A* terminates its search, it has found a path whose actual cost is lower than the estimated cost of any path through any open node. But since those estimates are optimistic, A* can safely ignore those nodes. In other words, A* will never overlook the possibility of a lower-cost path and so is admissible.
And for optimality:
Suppose now that some other search algorithm B terminates its search with a path whose actual cost is not less than the estimated cost of a path through some open node. Based on the heuristic information it has, Algorithm B cannot rule out the possibility that a path through that node has a lower cost. So while B might consider fewer nodes than A*, it cannot be admissible. Accordingly, A* considers the fewest nodes of any admissible search algorithm.
You may also want to check this video: A* optimality proof.
It achieves it by passing through all possible variants/chances using heuristic method. So you will have all needed tiles/vertices/waypoints in you closed list.

What does the star in the A* algorithm mean?

I'm quite sure the * (star) in the A* algorithm means that the algorithm is admissible, i.e. it is guaranteed that it finds the shortest path in the graph if this path exists (when the heuristic employed is optimistic).
Am I right? I was unsuccessfully looking for any info about the topic but I couldn't find any reference. Hopefully, most experienced users in this community know something else about the history of A* than I do.
By the way, I think that other algorithms like IDA*, D*, SMA*, MOA*, NAMOA*, ... that are based on A* follow the same name convention.
The reason is that scientists first came up with an improved version of the Dijkstra algorithm they called A1. Later on, the inventors of A* discovered an improvement of A1 that they called A2. These people then managed to prove that A2 was actually optimal under some assumptions on the heuristic in use. Because A2 was optimal, it was renamed A*. In science, and in optimisation in particular, a " * " symbol is often used to denote optimal solutions. Some also interpret the " * " as meaning "any version number" since it was proven impossible to build an "A3" algorithm that would outperform A2/A*.
By the way, in this context, "optimal" doesn't mean that it reaches the optimal solution, but that it does so while exploring the minimum number of nodes. Of course, A* is also complete, which means it reaches the optimal solution (if we use an admissible heuristic).

Monotonicity and A*. Is it optimal?

Is A* an optimal search algorithm (ie. will it find the best solution) even if it is non-monotonic? Why or Why not?
A* is an optimal search algorithm as long as the heuristic is admissible.
But, if the heuristic is inconsistent, you will need to re-expand nodes to ensure optimality. (That is, if you find a shorter path to a node on the closed list, you need to update the g-cost and put it back on the open list.)
This re-expansion can introduce exponential overhead in state spaces with exponentially growing edge costs. There are variants of A* which reduce this to polynomial overhead by essentially interleaving a dijkstra search in between each A* node expansion.
This paper has an overview of recent work, and particularly citations of other work by Martelli and Mero who detail these worst-case graphs and suggest optimizations to improve A*.
A* is an optimal search algorithm if and only if the heuristic is both admissible and monotonic (also referred to as consistent). For a discussion of the distinction between these criteria, see this question. Effectively, the consistency requirement means that the heuristic cannot overestimate the distance between any pair of nodes in the graph being searched. This is the necessary because any overestimate could result in the search ignoring a good path in favor of a path that is actually worse.
Let's walk through an example of why this is true. Consider two nodes, A and B, with heuristic estimates of 5 and 6, respectively. If the heuristic is admissible and consistent, the shortest path that could possibly exist goes through A and is no shorter than 5. But what if the heuristic isn't consistent? In order for a heuristic to be admissible but inconsistent, it must always underestimate the distance to the goal but not result in there being a clear relationship between heuristic estimates at nodes within the graph. For a concrete example, see my answer to this question. In this example, the heuristic function randomly chooses between two other functions. If the heuristic estimates at A and B were not calculated based on the same function, we don't actually know which one of them currently has the potential to lead to a shorter path. In effect, we aren't using the same scale to measure them. So we could choose A when B was actually the better option. This might result in us finding the goal through a sub-optimal path.

Resources