I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!
Related
I'm building a 2D game where player can only see things that are not blocked by other objects. Consider this example on how it looks now:
I've implemented raytracing algorithm for this and it seems to work just fine (I've reduced the boundaries for demo to make all edges visible).
As you can see, lighter area is built with a bunch of triangles, each of them having common point in the position of player. Each two neighbours have two common points.
However I'm willing to calculate bounds for external the part of the polygon to fill it with black-colored triangles "hiding" what player cannot see.
One way to do it is to "mask" the black rectangle with current polygon, but I'm afraid it's very ineffective.
Any ideas about an effective algorithm to achieve this?
Thanks!
A non-analytical, rough solution.
Cast rays with gradually increasing polar angle
Record when a ray first hits an object (and the point where it hits)
Keep going until it no longer hits the same object (and record where it previously hits)
Using the two recorded points, construct a trapezoid that extends to infinity (or wherever)
Caveats:
Doesn't work too well with concavities - need to include all points in-between as well. May need Delaunay triangulation etc... messy!
May need extra states to account for objects tucked in behind each other.
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.
Suppose I have a photograph, and four pixel coordinates representing the corners of a rectangular sheet of paper. My goal is to determine the rotation, translation, and projection which maps from the 3D scene containing the sheet of paper on a plane to the 2D image.
I understand there are augmented reality libraries for this, like ARToolkit. However, they all require additional information, namely the parameters of the camera used to take the photograph. My question is, how come having the rectangle's four corner points (in addition to knowing the rectangle's real-world dimensions) is insufficient information to extrapolate 3D information?
It makes sense mathematically since there are so many more unknown variables that bring us from 3D coordinates to 2D screen space, but I'm having a hard time grounding that concept in what I see.
Thanks!
Does it help for you to count degrees of freedom?
There are 3 degrees of freedom involved in deciding where in space to put the camera. 3 more degrees of freedom to decide how to turn it. 1 degree of freedom to figure out how much the picture it took had been enlarged, and finally 2 degrees of freedom to fix where on the resulting flat image we're looking.
That makes 9 degrees of freedom in total. However, knowing the location of four points in the final cropped image gives us only 8 continuously varying variables. Therefore there must be a way to slide the camera, zoom level and translation parameters around such that those four points stay in the same place on the screen (while everything else distorts subtly).
If we know even one of these nine parameters, such as the camera's focal length (in pixels!), then there's some hope of getting an unambiguous answer.
What kind of algorithms would generate random "goo balls" like those in World of Goo. I'm using Proccesing, but any generic algorithm would do.
I guess it boils down to how to "randomly" make balls that are kind of round, but not perfectly round, and still looking realistic?
Thanks in advance!
The thing that makes objects realistic in World of Goo is not their shape, but the fact that the behavior of objects is a (more or less) realistic simulation of 2D physics, especially
bending, stretching, compressing (elastic deformation)
breaking due to stress
and all of the above with proper simulation of dynamics, with no perceivable shortcuts
So, try to make the behavior of your objects realistic and that will make them look (feel) realistic.
Not sure if this is what you're looking for since I can't look at that site from work. :)
A circle is just a special case of an ellipse, where the major and minor axes are equal. A squished ball shape is an ellipse where one of the axes is longer than the other. You can generate different lengths for the axes and rotate the ellipse around to get these kinds of irregular shapes.
Maybe Metaballs (wiki) are something to start from.. but I'm not sure.
Otherwise I would suggest a particle approach in which a ball is composed by many particles that stick together, giving an irregularity (mind that this needs a minimal physical engine to handle the spring body that keeps all particles together).
As Unreason said, World of Goo is not so much about shape, but physics simulation.
But an easy way to create ball-like irregular shapes could be to start with n vertices (points) V_1, V_2 ... V_n on a circle and apply some random deformation to it. There are many ways to do that, going from simply moving around some single vertices to complex physical simulations.
Some ideas:
1) Chose a random vertex V_i, chose a random vector T, apply that vector as a translation (movement) to V_i, apply T to all other vertices V_j, too, but scaled down depending on the "distance" from V_i (where distance could be the absolute differenece between j and i, or the actual geometric distance of V_j to V_i). For the scaling factor you could use any function f that is 1 for f(0) and decreasing for increasing distances (basically a radial basis function).
for each V_j
V_j = scalingFactor(distance(V_i, V_j)) * translationVector + V_j
2) You move V_i as in 1, but now you simulate springlike connections between all neigbouring vertices and iteratively move all vertices based on the forces created by stretched springs.
3) For more round shapes you can do 1) or 2) on the control points of a B-spline curve.
Beware of self-intersections when you move vertices too much.
Just some rough ideas, not tested...
In my Direct3D application, the camera can be moved using the mouse or arrow keys. But if I hard code (0,1,0) as the up direction vector in LookAtLH, the frame goes blank at some orientations of the camera.
I just learned the hard way that when looking along the Y-axis, (0,1,0) no longer works as the Up direction (seems obvious?). I am thinking of switching my up direction to something else for each of these special cases. Is there a more graceful way to handle this?
Assuming you can calculate a vector pointing forward (what you are looking at - your position) and a vector pointing right (always on the XZ-plane unless you can roll). Normalize both these vectors, then up is forward x right (where x is cross product).
In general, you can plug in your yaw, pitch and roll into a rotation matrix and rotate the axis vectors to get right, up and forward, but I guess that's what you are using LookAtLH to avoid.
See http://en.wikipedia.org/wiki/Rotation_matrix#The_3-dimensional_rotation_matricies
The graceful way to handle this is to use Unit Quaternions. A quaternion is a vector of 4 values that encodes an orientation in 3D space (not a rotation as some articles assert) and a unit quaternion is one where the vector length sqrt(x^2+y^2+z^2+w^2) is 1.0. There are a set of mathematical operations for working with quaternions that are analogous to using matrices to encode rotations, with the added bonus that quaternions can never represent an degenerate orientation. You can freely convert quaternions to a 3x3 or 4x4 matrix when you need to feed the result to a GPU.
Your problem is that, while you are moving your camera, you will introduce a little twist into the camera's up direction. By forcing the camera to re-center itself on the (0,1,0) vector every iteration, you are in effect rotating the camera and then clamping the camera's orientation to remain on the surface of a sphere, but when your camera hits the pole of this sphere there is no good direction to call "up" and your matrix goes singular and gives you zero-sized polygons (hence the black screen). Quaternions have the ability to interpolate through these poles and come out the other side just fine, leaving you with a valid matrix at all times. all you have to do is control the "twist".
To measure this twist you should read Ken Shoemake's article "Fiber Bundle Twist Reduction" in the book Graphics Gems 4. He shows a good way to measure this accumulated twist and how to remove it when it is offensive.