If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).
Related
I am working with GD library, and I'm looking for a way to detect the nearest pixel to the middle center of shapes, as well as total area used by each shape in a monochromic black-and-white image.
I'm having difficulty coming up with an efficient algorithm to do this. If you have done something similar to this in the past, I'd be grateful for any solution that would help.
Check out the binary image library
Essentially, Otsu threshold to separate out foreground from background, then label connected components. That particular image looks very clean but you might need morph ops to clean it up a bit and get rid of small holes and other artifacts.
Then you have area trivially (count pixels in component) or almost as trivially (use the weighted area function that penalises edge pixels). Centre is just mean.
http://malcolmmclean.github.io/binaryimagelibrary/
#MalcolmMcLean is right but there are remaining difficulties (if you are after maximum accuracy).
If you threshold with Otsu, there are a few pairs of "kissing" dots which will form a single blob using connected component analysis.
In addition, Otsu threshoding will discard some of the partially filled edge pixels so that the weighted averages will be inaccurate. A cure would be to increase the threshold (up to 254 is possible), but that worsens the problem of the kissing dots.
A workaround is to keep a low threshold and dilate the blobs individually to obtain suitable masks that cover all edge pixels. Even so, slight inaccuracies will result in the vicinity of the kissings.
Blob splitting by the watershed transform is also possible but more care is required to handle the common pixels. I doubt that a prefect solution is possible.
An alternative is the use of subpixel edge detection and least-squares circle fitting (after blob detection with a very low threshold to separate the dots). By avoiding the edge pixels common to two circles, you can probably achieve excellent results.
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.
Background
Using gluTess to build a triangle list in Direct3D9 from a GDI+ DrawString(..) path:
A pixel shader (v3.0) is then used to fill in the shape. When painting with opaque values, everything looks fine:
The problem
At certain font sizes, if the color has an alpha component (ie Argb #55FFFFFF) we begin to see these nasty tessellation artifacts where triangles may overlap ever so slightly:
At larger font sizes the problem is sometimes not present:
Using Intel's excellent GPA Frame Analyzer Pixel History tool, we can see in areas where the artifacts occur, the pixel has been "touched" 3 times from the single Erg.
I'm trying to figure out how I can stop my pixel shader from touching the same pixel more than once.
Other solutions relating to overdraw prevention seem to be all about zbuffer strategies, however this problem is more to do with painting of a single 2D triangle list within a single pixel shader pass.
I'm at a bit of a loss trying to come up with a solution on this one. I was hoping that HLSL might have some sort of "touch each pixel only once" flag, but I've been unable to find anything like that. The closest I've found was to set the BLENDOP to MAX instead of ADD. But the output is not correct when blending over other colors in the scene.
I also have SRCBLEND = ONE, DSTBLEND = INVSRCALPHA. The only combination of flags which produce correct output (albeit with overdraw artifacts.)
I have played with SEPARATEALPHABLENDENABLE in the GPA frame analyzer, which sounded like almost exactly what I need here -- set blending to MAX but only on the "alpha" channel, however from what I can determine, that setting (and corresponding BLENDOPALPHA) affects nothing at all.
One final thing I thought of was to bake text as opaque onto a texture, and then repaint that texture into the scene with the appropriate alpha value applied, however this doesn't actually work in this project because I also support gradient brushes, where stop values may contain alpha, meaning either the artifacts would still be seen, or the final output just plain wrong if we stripped the alpha away from the stop values prior to baking to a texture. Also the whole endeavor would be hideously expensive.
Any hints or pointers would be appreciated. Thanks for reading.
The problem you're seeing shouldn't happen.
If two of your triangles are overlapping it's because you've placed the vertices in such a way that when the adjacent triangles are drawn, they overlap. What's probably happening is that these two adjacent triangles share two vertices, but each triangle has its own copy of each vertex that's been calculated to be in a very, very slightly different position.
The solution to the problem isn't to try and make the pixel shader touch the pixel only once it's to use an index buffer (if you aren't already) and have the shared vertices between each triangle actually share the same vertex and not use one that's ever-so-slightly not in the same place as the one used by the adjacent triangle.
If you aren't in control of the tessellation algorithm being used you may have to run a pass over the vertex buffer after its been generated to detect and merge vertices that are within some very small tolerance of one another. Even without an index buffer, a naive solution would be this:
For each vertex in the vertex buffer, compare its position to every other vertex in the rest of the vertex buffer.
If two vertices are within some small tolerance of another, replace the second vertex's position with the position of the one you are comparing it against.
This should have the effect of pairing up the positions of two vertices if they are close enough that you deem them to be the same.
You now shouldn't have any problem with overlapping triangles. In everyday rendering two triangles share edges with each other all the time and you won't ever get the effect where they appear to every-so-slightly overlap. The hardware guarantees that a sample point is either on one side of the line or the other, but never both at the same time, no matter how close the point is to the line (even if it's mathematically on the line, it still fails on one side or the other).
From the oldest games to the very modern, it seems like you can still see through walls or most often the ground in some camera positions.
Why is collision difficult to effectively compute in graphics engines?
Is it rounding/loss of precision accumulating leading to a mis-rendered view?
This is not actually collision in the explicit sense. The camera position is probably not actually "inside" the wall or the ground in those situations, but it is simply very close to it.
In computer 3D graphics the camera has a concept of a near plane and a far plane. Only geometry located between these two planes will be visible, while the rest will be clipped. If you are too close to something and align the camera correctly, then chances are that some parts of the geometry will be too close to the camera as defined by the near plane and as a result that geometry will not be rendered.
Now, the distance to this near plane can be set by the developers, and it can be set to be very short - short enough to ensure that situations like these cannot occur. However, the depth buffer or z buffer that is used to determine which objects are closest to the camera during rendering, and thus which objects to render and which not to render, is closely related to the near and far plane distances.
In graphics hardware the depth buffer is represented using a fixed amount of bits for each pixel, for example 32 bits. These 32 bits must be enough to accurately represent the entire span between the near plane and the far plane. It is also not linear, but will use more precision closer to the camera. As a result, choosing a very small near plane distance will greatly reduce the overall precision of the depth buffer. This can cause annoying flickering throughout the entire scene wherever two objects are very close to each others.
You can read more about this issue here as well as section 12.040 here.
It's not about difficulty (of course, it's not easy to compute collision/clipping of non-convex object), but you still have only like ~33ms to compute whole frame, so some compromise have to be made (collision mesh is not the same like mesh you really see). If there is no time for precise solution (to fulfill all conditions - camera distance, object which have to be seen, collision avoidance), you have to fallback to some "easy" solution like see through the wall.
Suppose I have a photograph, and four pixel coordinates representing the corners of a rectangular sheet of paper. My goal is to determine the rotation, translation, and projection which maps from the 3D scene containing the sheet of paper on a plane to the 2D image.
I understand there are augmented reality libraries for this, like ARToolkit. However, they all require additional information, namely the parameters of the camera used to take the photograph. My question is, how come having the rectangle's four corner points (in addition to knowing the rectangle's real-world dimensions) is insufficient information to extrapolate 3D information?
It makes sense mathematically since there are so many more unknown variables that bring us from 3D coordinates to 2D screen space, but I'm having a hard time grounding that concept in what I see.
Thanks!
Does it help for you to count degrees of freedom?
There are 3 degrees of freedom involved in deciding where in space to put the camera. 3 more degrees of freedom to decide how to turn it. 1 degree of freedom to figure out how much the picture it took had been enlarged, and finally 2 degrees of freedom to fix where on the resulting flat image we're looking.
That makes 9 degrees of freedom in total. However, knowing the location of four points in the final cropped image gives us only 8 continuously varying variables. Therefore there must be a way to slide the camera, zoom level and translation parameters around such that those four points stay in the same place on the screen (while everything else distorts subtly).
If we know even one of these nine parameters, such as the camera's focal length (in pixels!), then there's some hope of getting an unambiguous answer.