Suppose I have a photograph, and four pixel coordinates representing the corners of a rectangular sheet of paper. My goal is to determine the rotation, translation, and projection which maps from the 3D scene containing the sheet of paper on a plane to the 2D image.
I understand there are augmented reality libraries for this, like ARToolkit. However, they all require additional information, namely the parameters of the camera used to take the photograph. My question is, how come having the rectangle's four corner points (in addition to knowing the rectangle's real-world dimensions) is insufficient information to extrapolate 3D information?
It makes sense mathematically since there are so many more unknown variables that bring us from 3D coordinates to 2D screen space, but I'm having a hard time grounding that concept in what I see.
Thanks!
Does it help for you to count degrees of freedom?
There are 3 degrees of freedom involved in deciding where in space to put the camera. 3 more degrees of freedom to decide how to turn it. 1 degree of freedom to figure out how much the picture it took had been enlarged, and finally 2 degrees of freedom to fix where on the resulting flat image we're looking.
That makes 9 degrees of freedom in total. However, knowing the location of four points in the final cropped image gives us only 8 continuously varying variables. Therefore there must be a way to slide the camera, zoom level and translation parameters around such that those four points stay in the same place on the screen (while everything else distorts subtly).
If we know even one of these nine parameters, such as the camera's focal length (in pixels!), then there's some hope of getting an unambiguous answer.
Related
I am dealing with a reverse-engineering problem regarding road geometry and estimation of design conditions.
Suppose you have a set of points obtained from the measurement of positions of a road. This road has straight sections as well as curve sections. Straight sections are, of course, represented by lines, and curves are represented by circles of unknown center and radius. There are, as well, transition sections, which may be clothoids / Euler spirals or any other usual track transition curve. A representation of the track may look like this:
We know in advance that the road / track was designed taking this transition + circle + transition principle into account for every curve, yet we only have the measurement points, and the goal is to find the parameters describing every curve on the track, this is, the transition parameters as well as the circle's center and radius.
I have written some code using a nonlinear optimization algorithm, where a user can select start and end points and fit a circle that to the arc section between them, as it shows in next figure:
However, I don't find a suitable way to take the transition into account. After giving it some thought I came to think that this s because, given a set of discrete points -with their measurement error- representing a full curve, it is not entirely clear where to consider it "begins" and where it "ends" and, moreover, it is less clear where to consider the transition, the proper circle and the exit transition "begin" and "end".
Is there any work on this subject which I may have missed? is there a proper way to fit the whole transition + curve + transition structure into the set of points?
As far as I know, there's no method to fit a sequence clothoid1-circle-clothoid2 into a given set of points.
Basic facts are that two points define a straight, and three points define a unique circle.
The clothoid is far more complex, because you need: The parameter A, the final radius Rf, an initial point px,py, the radius Ri at that point, and the tangent T (angle with X-axis) at that point.
These are 5 data you may use to find the solution.
Due to clothoid coords are calculated by expanded Fresnel integrals (see https://math.stackexchange.com/a/3359006/688039 a little explanation), and then apply a translation & rotation, there's no an easy way to fit this spiral into a set of given points.
When I've had to deal with your issue, what I've done is:
Calculate the radius for triplets of consecutive points: p1p2p3, p2p3p4, p3p4p5, etc
Observe the sequence of radius. Similar values mean a circle, increasing/decreasing values mean a clothoid; Big values would mean a straight.
For each basic element (line, circle) find the most probably characteristics (angles, vertices, radius) by hand or by some regression method. Many times the common sense is the best.
For a spiral you may start with aproximated values, taken from the adjacent elements. These values may very well be the initial angle and point, and the initial and final radius. Then you need to iterate, playing with Fresnel and 'space change' until you find a "good" parameter A. Then repeat with small differences in the other values, those you took from adjacents.
Make the changes you consider as good. For example, many values (A, radius) use to be integers, without decimals, just because it was easier for the designer to type.
If you can make a small applet to do these steps then it's enough. Using a typical roads software helps, but doesn't avoid you the iteration process.
If the points are dense compared to the effective radii of curvature, estimate the local curvature by least square fitting of a circle on a small number of points, taking into account that the curvature is most of the time zero.
You will obtain a plot with constant values and ramps that connect them. You can use an estimate of the slope at the inflection points to figure out the transition points.
I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!
My app captures the shape of a room by having the user point a camera at floor corners, and then doing a bunch of math, eventually ending up with a polygon.
The assumption is that the walls are straight (not curved). The majority of the corners are formed by walls at right angles to each other, but in some cases might not be.
Depending on how accurately the user points the camera, the (x,y) coordinates I derive for the corner might be beyond the actual corner, or in front of the actual camera, or, less likely, to the left or right. Obviously, in this case, when I connect the dots, I get weird parallelogram or rhomboid shapes. See example.
I am looking for a program or algorithm to normalize or regularize these shapes, provided we know which corners are supposed to be right angles.
My initial attempt involved finding segments which had angles which were "close" to each other, adjust them all to the same angle, and then recalculate the vertices. However, this algorithm proved to be unstable.
My current thinking is to find angles which are most obtuse (as would be caused by a point mistakenly placed beyond the actual corner), or most acute (as would be caused by a point mistakenly placed in front of the actual corner), and find the corner point which would make it a right angle. The problem, however, is that such as adjustment could have side-effects on other corners, such as making them even further away from right angles. I sense I need some kind of algorithm which takes all the information and optimizes/solves it at once--is this a kind of linear programming problem?--but I am stuck.
There is not a unique solution.
For example, take the perpendicular from the middle point of an edge to the two neighboring edges. This will give you two new corners.
Or take the perpendicular from the end point of an edge to other edges.
Or compute the average of angles in the end points of an edge. Use this average and the middle point of the edge to compute new corners.
Or...
To get the most faithful compliance, capture (or calculate) distances from each corner to the other three. Build triangles with those distances. Then use the average of the coordinates you compute for a corner from 2 or 3 triangles.
Resulting angles will not be exactly 90 degrees, but the polygon will represent the room fairly.
If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).
The issue we are trying to solve the issue of locating a point in two different representations of a plane. The first plane we have is rotated to create perspective; the second is a 2d view of that same plane. We have 4 points on each of the plans that we know to be equivalent. The question is if we have an arbitrary point in plane 1, how do we find the corresponding point in plane 2?
It is best probably to illustrate the use case in order to best clarify the question. We have an image illustrated on the left.
Projective plane
2D layout diagram of space
So the givens that we have are the red squares from both pictures. Note that if possible, I’d like it to be possible that the 2D space isn’t necessarily a square. These are available to us ahead of time and known. I also have green dots laid out on the plane in the first image. I’d like to be able to do a projection of the dot in image 1 onto the space in image 2.
Note also for the image 1 I do not have a defined window or eye position. I just know that the red square from image 1 is a transform of the red square form image 2 and that the image 2 is in 2D space.
This is a special case of finding mappings between quadrilaterals that preserve straight lines. These are generally called homographic or projective transforms. Here, one of the quads is a square, so this is a popular special case. You can google these terms ("quad to quad", etc) to find explanations and code, but here are some for you.
Perspective Transform Estimation
a gaming forum discussion
extracting a quadrilateral image to a rectangle
Projective Mappings for Image Warping by Paul Heckbert.
The math isn't particularly pleasant, but it isn't that hard either. You can also find some code from one of the above links.
Update
And this is one of my favorites: Computing a projective transformation