what is the unit of "focal length = 1.0" in findEssentialMat (openCV)? - python-3.x

Recently I tried to calculate essential matrix by findEssentialMat function in openCV (
https://docs.opencv.org/master/d9/d0c/group__calib3d.html#ga13f7e34de8fa516a686a56af1196247f)
I noticed that the parameter focal length=1.0 and pp=(0,0) by default, but what is the unit of these two values, m/mm/pixels?
Furthermore, shouldn't the principal point at the image center, i.e. (h/2,w/2) in pixel coordinates or (0.5,0.5) in normalized coordinates?

Micka is already right in his comment. The camera model of OpenCV is e.g. illustrated here. As you can see the image coordinate system is defined such that the x-axis points to the right and the y-axis points down. The origin is at the image center (where the optical axis z intersects the image plane) and that is why the default value for the principal point is (0px,0px). This is a standard coordinate system definition in Computer Vision and e.g. Matlab uses the same.
The focal length is set to 1.0 px because if your camera is not calibrated you can still calculate e.g. the Essential Matrix and the math gets easier.
If you want to use your camera matrix for a 3D reconstruction you should definitely calibrate your camera or insert appropriate values for focal length (and principal point). Theoretically, you can convert the pixel system to a metric system if you know the sensor size.

Related

Reconstruction 3d for a rotation camera

I have rotating camera images and I'm trying this example of a MATLAB computer vision toolbox (https://www.mathworks.com/matlabcentral/fileexchange/67383-stereo-triangulation)
I have the calibration and rotation matrix for each image, however I always find 3d points equal to (0,0,0).
It is noted that the translation is null which makes the fourth column null.
You cannot reconstruct a 3D point from a rotating camera.
I suggest you try and draw an example. The idea of triangulation is to compute the intersection of two backprojection rays. These rays pass through the camera center and the point to be reconstructed. In your drawing, you'll find that the intersection becomes more and more accurate the larger the so-called stereo baseline is (that's the translation from one camera center to the other).
Now, for a rotating camera, the camera center remains the same and therefore, the two rays are identical. An intersection is not defined.

How to rotate an Image with nearest neighbor and bilinear interpolations without using any OpenCv library?

I want to know any basic mathematics based algorithm to rotate an image using the nearest neighbor and bilinear interpolation without using OpenCV library and imrotate.
The image won't be cropped after rotation and full image must be displayed.
A rotation corresponds to an affine transformation of the coordinates and is easily described using matrix/vectors. It is no great deal to find the formulas on the Web.
Now the important thing to know, is that rather than taking the pixels of the original image an mapping them to the transformed image, you must work backwards.
Scan every pixel of the transformed image and by applying the inverse transform, find the corresponding coordinates in the original image. You need to do this using real coordinates.
Then
for the nearest-neighbor method, round the coordinates and just copy the source pixel value to the destination;
for the bilinear method, consider the four pixels around the obtained coordinates (you will perform a truncation to integer). Finally, compute the destination pixel as a bilinear combination of the four source pixels, using the fractional part of the coordinates as the weights to perform the interpolation.
Check the figures here: http://wtlab.iis.u-tokyo.ac.jp/wataru/lecture/rsgis/rsnote/cp9/cp9-7.htm

Dynamic Camera Caliberation OpenCV python3

I am planning on making a robotic arm. I have a camera mounted on the arm. I am using Opencv with python3 to do IP.
I want the arm to detect the point on the ground and the servos to move accordingly. I have completed the part of detection and calculating the world coordinates. Also, the inverse kinematics that is required.
The problem here is that I have calibrated the camera for a certain height (20 cm). So, the correct world coordinates are received at the height of 20 cm only. I want the camera to keep correcting the reading at every 2s that it moves towards the ground (downward).
Is there a way that I can do the calibration dynamically, and give dynamic coordinates to my arm? I don't know if this is the right approach. If there is another method to do this, please help.
I am assuming you are using the undistort function to first undistort the image and then using the rotational vector(rcvt) and translational vector(tvct) along with distortCoeffs to get the world coordinates. The correct coordinates are only obtained at that specific height because the rvct and tvct will change according to the square size (of the chess-board) used for calibration.
A smart way to overcome this would be to eliminate the rotational vector and translational vector easily.
Since the camera calibration constants remain the same at any height/rotation, it can be used in this. Also, rather than calibrating it every 2 seconds (which would consume too much CPU), directly use the method below to get the values!
Let's say (img_x, img_y) is the image coordinate which you need to transform to world coordinate (world_x, world_y) and cameraMatrix is your camera matrix. For this method, you need to know the distance_cam, that is, the perpendicular distance of your object from the camera.
Using python, and opencv, use the following code :
import numpy as np
from numpy.linalg import inv
img_x, img_y = 20, 30 # your image coordinates go here
world_coord = np.array([[img_x], [img_y], [1]]) # create a 3x1 matrix
world_coord = inv(cameraMatrix) * world_coord # use formula cameraMatrix^(-1)*coordinates
world_coord = world_coord * distance_cam
world_x = world_coord[0][0]
world_y = world_coord[1][1]
print(world_x, world_y)
At first, we may not realise that the units in the world coordinates don't matter. After multiplying by the inverse of the camera matrix you have defined the ratio x/z which is unitless. So, you can choose the distance_cam in any unit and the end result would be in the units of distance_cam, that is, if distance_cam was in mm, then world_x, world_y would also be in mm.

skimage project an image's 3D plane to fronto-parallel view

I'm working on implementing Akush Gupta's synthetic data generation dataset (http://www.robots.ox.ac.uk/~vgg/data/scenetext/gupta16.pdf). In his work. he used a convolutional neural network to extract a point cloud from a 2-dimensional scenery image, segmented the point clouds to isolate different planes, used RANSAC to fit a 3d plane to the point cloud segments, and then warped the pixels for the segment, given the 3D plane, to a fronto-parallel view.
I'm stuck in this last part- warping my extracted 3D plane to a fronto-parallel view. I have X, Y, and Z vectors as well as a normal vector. I'm thinking what I need to do is perform some type of perspective transform or rotation that would bring all the pixels on the plane to a complete 0 Z-axis while the X and Y would remain the same. I could be wrong about this, it's been a long time since I've had any formal training in geometry or linear algebra.
It looks like skimage's Perspective Transform requires me to know the dimensions of the final segment coordinates in 2d space. It looks like AffineTransform requires me to know the rotation. All I have at this point is my X,Y,Z and normal vector and the suspicion that I may know my destination plane by just setting the Z axis to all zeros. I'm not sure if my assumption is correct but I need to be able to warp all the pixels in the segment of interest to fronto-parallel, fit a bounding box, place text inside of it, then warp the final segment back to the original perspective in 3d space.
Any help with how to think about this or implement it would be massively useful.

How to map points in a 3D-plane into screen plane

I have given an assignment of to project a object in 3D space into a 2D plane using simple graphics in C. The question is that a cube is placed in fixed 3D space and there is camera which is placed in a position whose co-ordinates are x,y,z and the camera is looking at the origin i.e. 0,0,0. Now we have to project the cube vertex into the camera plane.
I am proceeding with the following steps
Step 1: I find the equation of the plane aX+bY+cZ+d=0 which is perpendicular to the line drawn from the camera position to the origin.
Step 2: I find the projection of each vertex of the cube to the plane which is obtained in the above step.
Now I want to map those vertex position which i got by projection in step 2 in the plane aX+bY+cZ+d=0 into my screen plane.
thanks,
I don't think that by letting the z co-ordinate equals zero will lead me to the actual mapping. So any help to figure out this.
You can do that in two simple steps:
Translate the cube's coordinates to the camera's system (using
rotation), such that the camera's own coordinates in that system are x=y=z=0 and the cube's translated z's are > 0.
Project the translated cube's coordinates onto a 2d plain by dividing its x's and y's by their respective z's (you may need to apply a constant scaling factor here for the coordinates to be reasonable for the screen, e.g. not too small and within +/-half the screen's height in pixels). This will create the perspective effect. You can now draw pixels using these divided x's and y's on the screen assuming x=y=0 is the center of it.
This is pretty much how it is done in 3d games. If you use cube vertex coordinates, then you get projections of its sides onto the screen. You may then solid-fill the resultant 2d shapes or texture-map them. But for that you'll have to first figure out which sides are not obscured by others (unless, of course, you use a technique called z-buffering). You don't need that for a simple wire-frame demo, though, just draw straight lines between the projected vertices.

Resources