Heuristic Search Upper Bound - search

I have the following question in my midterm review:
Suppose that you were planning to run A* on a unit-cost domain (ie.
where all actions have a cost of 1), with a heuristic function that only returned non-
negative integer values. Further, assume that you expect all optimal solutions to cost
no more than 100. In no more than 6 sentences, identify an upper bound on the
number of unique f-costs that the search may encounter, and justify your answer.
The way I see it, if the optimal solution does not cost more than 100, then at a maximum, it costs 100. Since all actions have a cost of 1, and we assume the optimal solution costs 100, then it must have started at start_node with a g-cost of 0, but what could the upper bound be on the h-cost? We are not told whether the heuristic function is admissible or not, so how do I know that it's not wildly over-/under-estimating the h-cost? This makes me think the f-cost can vary infinitely. Where am I going wrong?

Related

Flipping a three-sided coin

I have two related question on population statistics. I'm not a statistician, but would appreciate pointers to learn more.
I have a process that results from flipping a three sided coin (results: A, B, C) and I compute the statistic t=(A-C)/(A+B+C). In my problem, I have a set that randomly divides itself into sets X and Y, maybe uniformly, maybe not. I compute t for X and Y. I want to know whether the difference I observe in those two t values is likely due to chance or not.
Now if this were a simple binomial distribution (i.e., I'm just counting who ends up in X or Y), I'd know what to do: I compute n=|X|+|Y|, σ=sqrt(np(1-p)) (and I assume my p=.5), and then I compare to the normal distribution. So, for example, if I observed |X|=45 and |Y|=55, I'd say σ=5 and so I expect to have this variation from the mean μ=50 by chance 68.27% of the time. Alternately, I expect greater deviation from the mean 31.73% of the time.
There's an intermediate problem, which also interests me and which I think may help me understand the main problem, where I measure some property of members of A and B. Let's say 25% in A measure positive and 66% in B measure positive. (A and B aren't the same cardinality -- the selection process isn't uniform.) I would like to know if I expect this difference by chance.
As a first draft, I computed t as though it were measuring coin flips, but I'm pretty sure that's not actually right.
Any pointers on what the correct way to model this is?
First problem
For the three-sided coin problem, have a look at the multinomial distribution. It's the distribution to use for a "binomial" problem with more then 2 outcomes.
Here is the example from Wikipedia (https://en.wikipedia.org/wiki/Multinomial_distribution):
Suppose that in a three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?
Note: Since we’re assuming that the voting population is large, it is reasonable and permissible to think of the probabilities as unchanging once a voter is selected for the sample. Technically speaking this is sampling without replacement, so the correct distribution is the multivariate hypergeometric distribution, but the distributions converge as the population grows large.
Second problem
The second problem seems to be a problem for cross-tabs. Then use the "Chi-squared test for association" to test whether there is a significant association between your variables. And use the "standardized residuals" of your cross-tab to identify which of the assiciations is more likely to occur and which is less likely.

A* Search Advantages of Dynamic Weighting

I was reading about the variants of the A* search algorithm and I came across dynamic weighting. As I understand it, a weight is applied to the search equation, which decreases as the search gets closer to the goal node. I was specifically looking at this article : http://theory.stanford.edu/~amitp/GameProgramming/Variations.html
Can anyone tell me what the advantages of this would be? Why would you not care what nodes you expand at the start? Is it to help searches that don't necessarily have a good heuristic?
Thanks
For the TLDNR-crowd:
Dynamic weighting sacrifices solution optimality to speed up the search. The larger the weight, the more greedy the search.
For my fellow scholars:
Motivation
From the Wikipedia A-star article:
A-star's admissibility criterion guarantees an optimal solution path, but it also means that A* must examine all equally meritorious paths to find the optimal path. We can speed up the search at the expense of optimality by relaxing the admissibility criterion to obtain an approximate solution. Oftentimes we want to bound this relaxation, so that we can guarantee that the solution path is no worse than (1 + ε) times the optimal solution path. This new guarantee is referred to as ε-admissible.
Static Weighting
Before we talk about dynamic weighting, let's compare A-star to the simplest ε-admissible relaxation: static-weighted A-star.
In static-weighted A-star, f(n) = g(n) + w·h(n), with w=(1+ε) for some ε>0. To illustrate the effect on optimality and search speed, compare the number of nodes expanded in each of the following illustrations. Empty circles represent nodes in the open set; filled-in circles are in the closed set.
A-star (left) vs. Weighted A-star with ε=4 (right)
As you can see, weighted A-star expanded far fewer nodes and completed about 3x as fast. However, since we used ε=4, weighted A-star could theoretically return a solution that is (1+ε)=(1+4)=5x times as long as the optimal path.
Dynamic Weighting
Dynamic Weighting is a technique that makes the heuristic weight a function of the search state, i.e. f(n) = g(n) + w(n)·h(n), where w(n) = (1 + ε - (ε*d(n))/N), d(n) is the depth of the current search and N is an upper bound on the search depth.
In this way, dynamic-weight A-Star initially behaves very much like a Greedy Best First search, but as the search depth (i.e. the number of hops in the graph) increases, the algorithm takes a more conservative approach, behaving more like the traditional A-star algorithm.
Amit Patel's page says
With dynamic weighting, you assume that at the beginning of your
search, it’s more important to get (anywhere) quickly; at the end of
the search, it’s more important to get to the goal.
He is correct, but I would saythat with dynamic weighting, you assume that at the beginning of your search, it's more important to follow your heuristic; at the end of the search, it becomes equally important to consider the length of the path, too.
Additional Materials and Links:
Asst. Prof. Ira Pohl -- The Avoidance of (Relative)
Catastrophe, Heuristic Competence, Genuine DYnamic Weighting and
Computational Issues in Heuristic Problem Solving
Dynamic Weighting on Amit Patel's Variants of A*
Wikipedia -- Bounded Relaxation for the A* Search Algorithm

calculating reliability of measurements

I have many measurements of age of the same person. Let's say:
[23 25 32 23 25]
I would like to output a single value and a reliability score of this value. The single value can be the average.
Reliability, I don't know well how to calculate it. The value should be between 0 and 1, where 1 means all ages are equal and a very unreliable measurement should be near 0.
Probably the variance should be used here, but it's not clear to me how to normalize it between 0 and 1 in a meaningful way (1/(x+1) is not much meaningful :)).
Assume some probability distribution (or determine what probability distribution your data fits most accurately). A good choice is a normal distribution, which for discrete data requires a continuity correction. See example here: http://www.milefoot.com/math/stat/pdfc-normaldisc.htm
In your example, your reliability score for the average age of 26 (25.6 rounded to nearest integer), is simply the probability that X falls in the range (25.5, 26.5).
The easiest way for assessing reliability (or internal consistency) is to use Cronbach's alpha. I guess most statistics software has this method built-in.
https://en.wikipedia.org/wiki/Cronbach%27s_alpha

How is the max of a set of admissible heuristics, a dominating heuristic?

If you have a set of admissible heuristics: h1,h2,h2,...,hn
How is h = max(h1,h2,h2,...,hn) an admissible heuristic that dominates them all?
Isn't a lower h(n) value better?
For A*, f = g + n, and the element with the lowest f will be removed from the list. So shouldn't taking the min give the dominating heuristic?
An admissible heuristic never overestimates the cost of reaching the goal state. That is, its estimate will be lower than the actual cost or exactly the actual cost, but never higher. This is required for greedy approaches like A* search to find the global best solution.
For example, imagine you found a solution with cost 10. The best solution has cost 8. You're not using an admissible heuristic, and the estimate of the heuristic for the solution that really has cost 8 is 12 (it's overestimating). As you already have a solution with cost 10, A* will never evaluation the best solution as it is estimated to be more expensive.
Ideally, your heuristic should be as accurate as possible, i.e. an admissible heuristic shouldn't underestimate the true cost too much. If it does, A* will still find the best solution eventually, but it may take a lot longer to do so because it tries a lot of solutions that look good according to your heuristic, but turn out to be bad.
This is where the answer for your question lies. Your heuristics h1, ..., hn are all admissible, therefore they estimate a cost equal to or less than the true cost. The maximum of this set of estimates is therefore by definition the estimate that is closest to the actual cost (remember that you'll never overestimate). In the ideal case, it will be the exact cost.
If you were to take the minimum value, you would end up with the estimate that is furthest away from the actual cost -- as outlined above, A* would still find the best solution, but in a much less efficient manner.

Numerical Integration

Generally speaking when you are numerically evaluating and integral, say in MATLAB do I just pick a large number for the bounds or is there a way to tell MATLAB to "take the limit?"
I am assuming that you just use the large number because different machines would be able to handle numbers of different magnitudes.
I am just wondering if their is a way to improve my code. I am doing lots of expected value calculations via Monte Carlo and often use the trapezoid method to check my self of my degrees of freedom are small enough.
Strictly speaking, it's impossible to evaluate a numerical integral out to infinity. In most cases, if the integral in question is finite, you can simply integrate over a reasonably large range. To converge at a stable value, the integral of the normal error has to be less than 10 sigma -- this value is, for better or worse, as equal as you are going to get to evaluating the same integral all the way out to infinity.
It depends very much on what type of function you want to integrate. If it is "smooth" (no jumps - preferably not in any derivatives either, but that becomes progressively less important) and finite, that you have two main choices (limiting myself to the simplest approach):
1. if it is periodic, here meaning: could you put the left and right ends together and the also there have no jumps in value (and derivatives...): distribute your points evenly over the interval and just sample the functionvalues to get the estimated average, and than multiply by the length of the interval to get your integral.
2. if not periodic: use Legendre-integration.
Monte-carlo is almost invariably a poor method: it progresses very slow towards (machine-)precision: for any additional significant digit you need to apply 100 times more points!
The two methods above, for periodic and non-periodic "nice" (smooth etcetera) functions gives fair results already with a very small number of sample-points and then progresses very rapidly towards more precision: 1 of 2 points more usually adds several digits to your precision! This far outweighs the burden that you have to throw away all parts of the previous result when you want to apply a next effort with more sample points: you REPLACE the previous set of points with a fresh new one, while in Monte-Carlo you can just simply add points to the existing set and so refine the outcome.

Resources