Reconstruction 3d for a rotation camera - matlab-cvst

I have rotating camera images and I'm trying this example of a MATLAB computer vision toolbox (https://www.mathworks.com/matlabcentral/fileexchange/67383-stereo-triangulation)
I have the calibration and rotation matrix for each image, however I always find 3d points equal to (0,0,0).
It is noted that the translation is null which makes the fourth column null.

You cannot reconstruct a 3D point from a rotating camera.
I suggest you try and draw an example. The idea of triangulation is to compute the intersection of two backprojection rays. These rays pass through the camera center and the point to be reconstructed. In your drawing, you'll find that the intersection becomes more and more accurate the larger the so-called stereo baseline is (that's the translation from one camera center to the other).
Now, for a rotating camera, the camera center remains the same and therefore, the two rays are identical. An intersection is not defined.

Related

What is the reference point for measuring angles in OpenCV?

I'm trying to infer an object's direction of movement using dense optical flow in OpenCV. I'm using calcOpticalFlowFarneback() to get flow coordinates and cartToPolar() to acquire vector angles which would indicate direction.
To interpret the results I need to know the reference point for measuring the angle. I have found this blog post indicating that the range of angles is 360°. That tells me that the angle measurement would go along the lines of the unit circle. I couldn't make out much more than that.
The documentation for cartToPolar() doesn't cover this and my attempts at testing it have failed.
It seems that the angle produced by cartToPolar() is in reference to the unit circle rotated clockwise by 90° centered on the image coordinate starting point in the top left corner. It would look like this.
I came to this conclusion by using the dense optical flow example provided by OpenCV. I replaced the line hsv[...,0] = ang*180/np.pi/2 with hsv[...,0] = ang*180/np.pi to get correct angle conversion from radians. Then I tested a video with people moving from top right to bottom left and vice versa. I sampled the dominant color with GIMP and got RGB values which I converted to HSV values. Hue value corresponds to the angle in degrees.
People moving from top right to bottom left produced an angle of about 300° and people moving the other way round produced an angle of about 120°. This hinted at the way the unit circle is positioned.
Looking at the code, fastAtan32f is used to compute the angles. and that seems to be a atan2 implementation.

skimage project an image's 3D plane to fronto-parallel view

I'm working on implementing Akush Gupta's synthetic data generation dataset (http://www.robots.ox.ac.uk/~vgg/data/scenetext/gupta16.pdf). In his work. he used a convolutional neural network to extract a point cloud from a 2-dimensional scenery image, segmented the point clouds to isolate different planes, used RANSAC to fit a 3d plane to the point cloud segments, and then warped the pixels for the segment, given the 3D plane, to a fronto-parallel view.
I'm stuck in this last part- warping my extracted 3D plane to a fronto-parallel view. I have X, Y, and Z vectors as well as a normal vector. I'm thinking what I need to do is perform some type of perspective transform or rotation that would bring all the pixels on the plane to a complete 0 Z-axis while the X and Y would remain the same. I could be wrong about this, it's been a long time since I've had any formal training in geometry or linear algebra.
It looks like skimage's Perspective Transform requires me to know the dimensions of the final segment coordinates in 2d space. It looks like AffineTransform requires me to know the rotation. All I have at this point is my X,Y,Z and normal vector and the suspicion that I may know my destination plane by just setting the Z axis to all zeros. I'm not sure if my assumption is correct but I need to be able to warp all the pixels in the segment of interest to fronto-parallel, fit a bounding box, place text inside of it, then warp the final segment back to the original perspective in 3d space.
Any help with how to think about this or implement it would be massively useful.

Algorithm to calculate and display a ribbon on a 3D triangle mesh

I am looking for an algorithm for the following problem:
Given:
A 3D triangle mesh. The mesh represents a part of the surface of the earth.
A polyline (a connected series of line segments) whose vertices are always on an edge or on a vertex of a triangle of the mesh. The polyline represents the centerline of a road on the surface of the earth.
I need to calculate and display the road i.e. add half of the road's width on each side of the center line, calculate the resulting vertices in the corresponding triangles of the mesh, fill the area of the road and outline the sides of the road.
What is the simplest and/or most effective strategy to do this? How do I store the data of the road most efficiently?
I see 2 options here:
render thick polyline with road texture
While rendering polyline you need TBN matrix so use
polyline tangent as tangent
surface normal as normal
binormal=tangent x normal
shift actual point p position to
p0=p+d*binormal
p1=p-d*binormal
and render textured line (p0,p1). This approach is not precise match to surface mesh so you need to disable depth or use some sort of blending. Also on sharp turns it could miss some parts of a curve (in that case you can render rectangle or disc instead of line.
create the mesh by shifting polyline to sides by half road size
This produces mesh accurate road fit, but due to your limitations the shape of the road can be very distorted without mesh re-triangulation in some cases. I see it like this:
for each segment of road cast 2 lines shifted by half of road size (green,brown)
find their intersection (aqua dots) with shared edge of mesh with the current road control point (red dot)
obtain the average point (magenta dot) from the intersections and use that as road mesh vertex. In case one of the point is outside shared mesh ignore it. In case both intersections are outside shared edge find closest intersection with different edge.
As you can see this can lead to serious road thickness distortions in some cases (big differences between intersection points, or one of the intersection points is outside surface mesh edge).
If you need accurate road thickness then use the intersection of the casted lines as a road control point instead. To make it possible either use blending or disabling Depth while rendering or add this point to mesh of the surface by re-triangulating the surface mesh. Of coarse such action will also affect the road mesh and you need to iterate few times ...
Another way is use of blended texture for road (like sprites) and compute the texture coordinate for the control points. If the road is too thick then thin it by shifting the texture coordinate ... To make this work you need to select the most far intersection point instead of average ... Compute the real half size of the road and from that compute texture coordinate.
If you get rid of the limitation (for road mesh) that road vertex points are at surface mesh segments or vertexes then you can simply use the intersection of shifted lines alone. That will get rid of the thickness artifacts and simplify things a lot.

Terrain tile scale in case of tilted camera

I am working on 3d terrain visualization tool right now. Surface is logically covered with square tiles. This tiling could be visualized as follows:
Suppose I want to draw a picture on these tiles. The level of detail for a picture is required to be selected according to the current camera scale which is calculated for each tile individually.
In case of vertical camera (no tilt, i.e. camera looks perpendicularly on the ground) all tiles have the same scale which is camera focal length divided on camera height above the ground.
Following picture depicts the situation:
where red triangle is camera which has no tilt, BG is camera height above the ground and EG is focal length, then scale = AC/DF = BG/EG
But if camera has tilt (i.e. pitch angle isn't 0) then scale is changed from tile to tile (even from point to point).
So I wonder if there any kind method to produce reasonable scale for each tile in that case ?
There may be (there almost surely is) a more straightforward solution, but what you could do is regular world to screen coordinate conversion.
You just take the coordinates of bounding points of the tile and calculate to which pixels on the screen these will project (you of course get floating point precision). From this, I believe you can calculate the "scale" you are mentioning.
This is applicable to any point or set of points in the world space.
Here is tutorial on how to do this "by hand".
If you are rendering the tiles with OpenGL or DirectX, you can do this much easier.

How to map points in a 3D-plane into screen plane

I have given an assignment of to project a object in 3D space into a 2D plane using simple graphics in C. The question is that a cube is placed in fixed 3D space and there is camera which is placed in a position whose co-ordinates are x,y,z and the camera is looking at the origin i.e. 0,0,0. Now we have to project the cube vertex into the camera plane.
I am proceeding with the following steps
Step 1: I find the equation of the plane aX+bY+cZ+d=0 which is perpendicular to the line drawn from the camera position to the origin.
Step 2: I find the projection of each vertex of the cube to the plane which is obtained in the above step.
Now I want to map those vertex position which i got by projection in step 2 in the plane aX+bY+cZ+d=0 into my screen plane.
thanks,
I don't think that by letting the z co-ordinate equals zero will lead me to the actual mapping. So any help to figure out this.
You can do that in two simple steps:
Translate the cube's coordinates to the camera's system (using
rotation), such that the camera's own coordinates in that system are x=y=z=0 and the cube's translated z's are > 0.
Project the translated cube's coordinates onto a 2d plain by dividing its x's and y's by their respective z's (you may need to apply a constant scaling factor here for the coordinates to be reasonable for the screen, e.g. not too small and within +/-half the screen's height in pixels). This will create the perspective effect. You can now draw pixels using these divided x's and y's on the screen assuming x=y=0 is the center of it.
This is pretty much how it is done in 3d games. If you use cube vertex coordinates, then you get projections of its sides onto the screen. You may then solid-fill the resultant 2d shapes or texture-map them. But for that you'll have to first figure out which sides are not obscured by others (unless, of course, you use a technique called z-buffering). You don't need that for a simple wire-frame demo, though, just draw straight lines between the projected vertices.

Resources