This is mainly a theoretical question and it's quite simple,so I thought I can answer for the questions correctly, but my answers were wrong and I want to know why. I have a graph which looks like this.
Where: K is the start point, V is the end point, X are the points where you can't move, the - | are the edges. The cost between every edge is 1, and the distance heuristic is euclidean distance in plane.
The first iteration begins with the K and I had to answer to the following questions:
In which iteration we will reach the V (end point)? The answers were: 9, 6, 8 or 7; I answered 6 but it was incorrect
At the third iteration which vertex will get in the list of closed edges? Answers: I, G, H, F. My answer H (incorrect)
After how many iterations we will reach the terminal state. My Answer 6, but incorrect again.
So somebody please can explain the correct steps of solving the problem and the answers for these questions? Thank you.
Related
Suppose I have a series of (imperfect) azimuth readouts, giving me vague angles between a number of points. Lines projected from points A, B, C obviously [-don't-always-] never converge in a single point to define the location of point D. Hence, angles as viewed from A, B and C need to be adjusted.
To make it more fun, I might be more certain of the relative positions of specific points (suppose I locate them on a satellite image, or I know for a fact they are oriented perfectly north-south), so I might want to use that certainty in my calculations and NOT adjust certain angles at all.
By what technique should I average the resulting coordinates, to achieve a "mostly accurate" overall shape?
I considered treating the difference between non-adjusted and adjusted angles as "tension" and trying to "relieve" it in subsequent passes, but that approach gives priority to points calculated earlier.
Another approach could be to calculate the total "tension" in the set, then shake all angles by a random amount, see if that resulted in less tension, and repeat for possibly improved results, trying to evolve a possibly better solution.
As I understand it you have a bunch of unknown points (p[] say) and a number of measurements of azimuths, say Az[i,j] of p[j] from p[i]. You want to find the coordinates of the points.
You'll need to fix one point. This is because if the values of p[] is a solution -- i.e. gave the measured azimuths -- so too is q[] where for some fixed x,
q[i] = p[i] + x
I'll suppose you fix p[0].
You'll also need to fix a distance. This is because if p[] is a solution, so too is q[] where now for some fixed s,
q[i] = p[0] + s*(p[i] - p[0])
I'll suppose you fix dist(p[0], p[1]), and that there is and azimuth Az[1,2]. You'd be best to choose p[0] p[1] so that there is a reliable azimuth between them. Then we can compute p[1].
The usual way to approach such problems is least squares. That is we seek p[] to minimise
Sum square( (Az[i,j] - Azimuth( p[i], p[j]))/S[i,j])
where Az[i,j] is your measurement data
Azimuth( r, s) is the function that gives the azimuth of the point s from the point r
S[i,j] is the 'sd' of the measurement A[i,j] -- the higher the sd of a particular observation is, relative to the others, the less it affects the final result.
The above is a non linear least squares problem. There are many solvers available for this, but generally speaking as well as providing the data -- the Az[] and the S[] -- and the observation model -- the Azimuth function -- you need to provide an initial estimate of the state -- the values sought, in your case p[2] ..
It is highly likely that if your initial estimate is wrong the solver will fail.
One way to find this estimate would be to start with a set K of known point indices and seek to expand it. You would start with K being {0,1}. Then look for points that have as many azimuths as possible to points in K, and for such points estimate geometrically their position from the known points and the azimuths, and add them to K. If at the end you have all the points in K, then you can go on to the least squares. If it isn't its possible that a different pair of initial fixed points might do better, or maybe you are stuck.
The latter case is a real possibility. For example suppose you had points p[0],p[1],p[2],p[3] and azimuths A[0,1], A[1,2], A[1,3], A[2,3].
As above we fix the positions of p[0] and p[1]. But we can't compute positions of p[2] and p[3] because we do not know the distances of 2 or 3 from 1. The 1,2,3 triangle could be scaled arbitrarily and still give the same azimuths.
There are N points on a 2D grid (x,y). I need to find the shortest path, from point A to point B, but I can only travel from one point to another and I can't travel between two points if the distance between them is farther than a distance D. I thought it might be solved by using some kind of modified Dijkstra's algorithm, but I'm not sure how, because I've never implemented it before, just studied it on Wiki.
Well, Dijkstra finds shortest paths in graphs. So just consider the grid points to be nodes in a graph with edges between each node S and all other nodes T such that dist(S, T) <= D. You don't have to actually construct the graph because the edges are easily determined as needed by Dijkstra. Just check all nodes in a square around S with radius D. A S-T edge exists iff (Sx - Tx)^2 (Sy - Ty)^2 <= D^2.
Wiki explanation is sufficient for this.
Dijkstra's algorithm takes 3 inputs. The Graph, Starting node and Ending node.
To construct the graph just do this
For i 1..n in points
For j i+1..n in points
if(dist(points[i],points[j])<=D)
add j to childs of i
add i to childs of j
After constructing the graph, perform dijkstra.
The subtlety of a question like this lies in a critical definition - what is the measure of distance in your grid?
The are many different shortest path problems and solutions, and they are studied throughout mathematics. They are each characterised by the 'topology' of the area being searched. Consider a few distinct topologies with their own solutions:
A one sided piece of paper
Suppose your grid represents coordinates on a piece of paper - the shortest path is easy to find, as it is simply a straight line between those points.
The surface of the moon
If your grid represents locations on the moon in terms of latitude and longitude, the shortest path is an arc along the moon's surface - If you drove "in a straight line" between two points on the moon, you would be travelling in an arc, because of the moon's curvature.
Road Intersections
If you want to find the distance between two intersections in a grid of roads, where the traffic on each road has a different speed, and you can only travel along the roads, then you can find the shortest path using Dijkstra's algorithm.
One way road intersections
A slight variation of the above - we only need to consider roads in one direction. There might not be any paths in this case.
Summary
To give a good solution, we need to understand the topology of your grid. If the distance is pythagerous's theorem than that indicates euclidean geometry (like in the piece of paper example), so the solution is a straight line.
Is it possible you mean that you can travel between any two points if the are closer than D - like flying a plane between airports, for example?
EDIT: I didn't see your comment because you didn't use #. In your case your grid is like the airports a plane can fly between. The shortest path is found using Dijkstra's algorithm - the immediate neighbours of a point are all points closer than D. Find them, represent it all as a graph, and use Dijkstra's algorithm.
I would suggest using the formula to find the distance between 2 points i.e sqrt((x2-x1)^2+(y2-y1)^2). This distance is always the shortest between 2 points.
I apologise for the newbishness of this question in advance but I am stuck. I am trying to solve this question,
I can do parts i)-1v) but I am stuck on v. I know to calculate the margin y, you do
y=2/||W||
and I know that W is the normal to the hyperplane, I just don't know how to calculate it. Is this always
W=[1;1] ?
Similarly, the bias, W^T * x + b = 0
how do I find the value x from the data points? Thank you for your help.
Consider building an SVM over the (very little) data set shown in Picture for an example like this, the maximum margin weight vector will be parallel to the shortest line connecting points of the two classes, that is, the line between and , giving a weight vector of . The optimal decision surface is orthogonal to that line and intersects it at the halfway point. Therefore, it passes through . So, the SVM decision boundary is:
Working algebraically, with the standard constraint that , we seek to minimize . This happens when this constraint is satisfied with equality by the two support vectors. Further we know that the solution is for some . So we have that:
Therefore a=2/5 and b=-11/5, and . So the optimal hyperplane is given by
and b= -11/5 .
The margin boundary is
This answer can be confirmed geometrically by examining picture.
This is a programming homework assignment, of which I have no qualms about doing it myself however I'm quite stuck on the geometry of it. I need to be able to determine the exact point of intersection given the center and radius of a circle and two end points of a vertical line segment, and since geometry isn't my forte I was hoping for some help (even pointers in the right direction would be appreciated!)
This probably isn't the best place to ask a question like this but I'm not really sure where else to look for help, my apologies if it's against the rules or something.
edit:
My apologies, what I am really having trouble with is determining what the points of intersection are (and if there is one intersection or two.) I've tried each solution given and they work great for determining if there is an intersection or not but my problem still persists as I mis-worded my question. If anyone can help with that it'd be much appreciated!
Try http://mathworld.wolfram.com/Circle-LineIntersection.html, this covers the geometry aspect of your problem quite well.
If C=(x0,y0) is the center, r the radius, and k the abscissa of the line, you have
y = y0 +/- sqrt(r^2-(k-x0)^2), but no intersection if r < abs(k-x0)
using the centre [x,y] of the circle, find the distance of this particular line from the centre.refer
now if this distance is > radius of the circle => the line won't intersect. otherwise, it will.
I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths.
Here is the scenario:
I have a (possibly) unevenly weighted coin and I want to figure out the probability of it coming up heads. I know that coins from the same box that this one came from have an average probability of p, and I also know the standard deviation of these probabilities (call it s).
(If other summary properties of the probabilities of other coins aside from their mean and stddev would be useful, I can probably get them too.)
I toss the coin n times, and it comes up heads h times.
The naive approach is that the probability is just h/n - but if n is small this is unlikely to be accurate.
Is there a computationally efficient way (ie. doesn't involve very very large or very very small numbers) to take p and s into consideration to come up with a more accurate probability estimate, even when n is small?
I'd appreciate it if any answers could use pseudocode rather than mathematical notation since I find most mathematical notation to be impenetrable ;-)
Other answers:
There are some other answers on SO that are similar, but the answers provided are unsatisfactory. For example this is not computationally efficient because it quickly involves numbers way smaller than can be represented even in double-precision floats. And this one turned out to be incorrect.
Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.
The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.
Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.
As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.
What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)
L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).
Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)
This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.
The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.
so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)
By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.
dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.
If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.
If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).
And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).
If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.
You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use
(h + c * p) / (n + c)
as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.
You don't have nearly enough info in this question.
How many coins are in the box? If it's two, then in some scenarios (for example one coin is always heads, the other always tails) knowing p and s would be useful. If it's more than a few, and especially if only some of the coins are only slightly weighted then it is not useful.
What is a small n? 2? 5? 10? 100? What is the probability of a weighted coin coming up heads/tail? 100/0, 60/40, 50.00001/49.99999? How is the weighting distributed? Is every coin one of 2 possible weightings? Do they follow a bell curve? etc.
It boils down to this: the differences between a weighted/unweighted coin, the distribution of weighted coins, and the number coins in your box will all decide what n has to be for you to solve this with a high confidence.
The name for what you're trying to do is a Bernoulli trial. Knowing the name should be helpful in finding better resources.
Response to comment:
If you have differences in p that small, you are going to have to do a lot of trials and there's no getting around it.
Assuming a uniform distribution of bias, p will still be 0.5 and all standard deviation will tell you is that at least some of the coins have a minor bias.
How many tosses, again, will be determined under these circumstances by the weighting of the coins. Even with 500 tosses, you won't get a strong confidence (about 2/3) detecting a .51/.49 split.
In general, what you are looking for is Maximum Likelihood Estimation. Wolfram Demonstration Project has an illustration of estimating the probability of a coin landing head, given a sample of tosses.
Well I'm no math man, but I think the simple Bayesian approach is intuitive and broadly applicable enough to put a little though into it. Others above have already suggested this, but perhaps if your like me you would prefer more verbosity.
In this lingo, you have a set of mutually-exclusive hypotheses, H, and some data D, and you want to find the (posterior) probabilities that each hypothesis Hi is correct given the data. Presumably you would choose the hypothesis that had the largest posterior probability (the MAP as noted above), if you had to choose one. As Matt notes above, what distinguishes the Bayesian approach from only maximum likelihood (finding the H that maximizes Pr(D|H)) is that you also have some PRIOR info regarding which hypotheses are most likely, and you want to incorporate these priors.
So you have from basic probability Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D). You can estimate these Pr(H|D) numerically by creating a series of discrete probabilities Hi for each hypothesis you wish to test, eg [0.0,0.05, 0.1 ... 0.95, 1.0], and then determining your prior Pr(H) for each Hi -- above it is assumed you have a normal distribution of priors, and if that is acceptable you could use the mean and stdev to get each Pr(Hi) -- or use another distribution if you prefer. With coin tosses the Pr(D|H) is of course determined by the binomial using the observed number of successes with n trials and the particular Hi being tested. The denominator Pr(D) may seem daunting but we assume that we have covered all the bases with our hypotheses, so that Pr(D) is the summation of Pr(D|Hi)Pr(H) over all H.
Very simple if you think about it a bit, and maybe not so if you think about it a bit more.