I'm currently studying for an exam (which is in two days) for computer vision but am getting pretty confused with the epipolar geometry stuff. I'm going through some past exam papers and I'm stuck on these two questions. Any help/explanations would be appreciated.
1) How are two cameras placed one with respect to another if epipoles in both the images coincide with the principle points (traces of optical axes)?
2) How are two cameras placed one with respect to another if epipoles in both the images seat infinitely far along the Y-axis of the world co-ordinate frame and have the same x-coordinate?
For the second one, my first guess was the cameras sit on top of one another and face the same direction. I'm not sure if it's right though.
The epipole is the position of one camera's centre of projection from the point of view of the other. So:
1) The two cameras are directly facing each other. Think of taking a photo, taking a few steps forwards, turning around 180 degrees and taking another picture. The line joining the centre of projections in the two images is normal to both the image planes.
2) Think of taking a photo, crouching down a few inches and taking another one. If the epipole is at infinity it means that the line joining the centre of projections of the two viewpoints is parallel to the image plane. (The other camera can still be rotated relative to the current viewpoint though)
Related
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.
I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.
If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.
It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.
I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)
Suppose I have a photograph, and four pixel coordinates representing the corners of a rectangular sheet of paper. My goal is to determine the rotation, translation, and projection which maps from the 3D scene containing the sheet of paper on a plane to the 2D image.
I understand there are augmented reality libraries for this, like ARToolkit. However, they all require additional information, namely the parameters of the camera used to take the photograph. My question is, how come having the rectangle's four corner points (in addition to knowing the rectangle's real-world dimensions) is insufficient information to extrapolate 3D information?
It makes sense mathematically since there are so many more unknown variables that bring us from 3D coordinates to 2D screen space, but I'm having a hard time grounding that concept in what I see.
Thanks!
Does it help for you to count degrees of freedom?
There are 3 degrees of freedom involved in deciding where in space to put the camera. 3 more degrees of freedom to decide how to turn it. 1 degree of freedom to figure out how much the picture it took had been enlarged, and finally 2 degrees of freedom to fix where on the resulting flat image we're looking.
That makes 9 degrees of freedom in total. However, knowing the location of four points in the final cropped image gives us only 8 continuously varying variables. Therefore there must be a way to slide the camera, zoom level and translation parameters around such that those four points stay in the same place on the screen (while everything else distorts subtly).
If we know even one of these nine parameters, such as the camera's focal length (in pixels!), then there's some hope of getting an unambiguous answer.
If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).
The issue we are trying to solve the issue of locating a point in two different representations of a plane. The first plane we have is rotated to create perspective; the second is a 2d view of that same plane. We have 4 points on each of the plans that we know to be equivalent. The question is if we have an arbitrary point in plane 1, how do we find the corresponding point in plane 2?
It is best probably to illustrate the use case in order to best clarify the question. We have an image illustrated on the left.
Projective plane
2D layout diagram of space
So the givens that we have are the red squares from both pictures. Note that if possible, I’d like it to be possible that the 2D space isn’t necessarily a square. These are available to us ahead of time and known. I also have green dots laid out on the plane in the first image. I’d like to be able to do a projection of the dot in image 1 onto the space in image 2.
Note also for the image 1 I do not have a defined window or eye position. I just know that the red square from image 1 is a transform of the red square form image 2 and that the image 2 is in 2D space.
This is a special case of finding mappings between quadrilaterals that preserve straight lines. These are generally called homographic or projective transforms. Here, one of the quads is a square, so this is a popular special case. You can google these terms ("quad to quad", etc) to find explanations and code, but here are some for you.
Perspective Transform Estimation
a gaming forum discussion
extracting a quadrilateral image to a rectangle
Projective Mappings for Image Warping by Paul Heckbert.
The math isn't particularly pleasant, but it isn't that hard either. You can also find some code from one of the above links.
Update
And this is one of my favorites: Computing a projective transformation