How to use the bundle adjustment to optimize homography matrix - image-stitching

I did mulitiple cameras image stiching project rencently,I estimated the parameters of the cameras first(use autostitch),and calculated the homography matrixs through those parameters,but something wrong and the result is as follows.
enter image description here
enter image description here
There are more stretchings and more match errors on the left and right sides of the result.Someone told me that I should use bundle adjustment to optimize the homography matrixs,but I don't know what to do,please enlighten me,thanks.

The focal lengths of all images (except the left most) are not multiplied by a scaling factor.
Have a look at line 653 of this code.
cameras[i].K().convertTo(K, CV_32F);
float swa = (float)seam_work_aspect;
K(0,0) *= swa; K(0,2) *= swa;
K(1,1) *= swa; K(1,2) *= swa;
here K(0,0) and K(1,1) are focal lengths which must be scaled by a scaling factor i.e. swa.

Related

De-Skewing image

I am unable to figure out how does this deskew is working
def deskew(img):
m = cv2.moments(img)
if abs(m['mu02']) < 1e-2:
return img.copy()
skew = m['mu11']/m['mu02']
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
img = cv2.warpAffine(img,M,(SZ, SZ),flags=affine_flags)
return img
I know that the moment is a quantitative measure of the shape.
In image processing, the moments give information about the total
area or Intensity, the centroid of the shape and the orientation of the
shape.
Area or total Mass:-
The zeroth moment M(0,0) gives the total Mass or Area.
In image processing, the M(0,0) is the sum of all the pixels and if it is a binary image then sum of pixels gives the area.
Center of mass or Centroid:- When the first moment is divided by
the total mass then it gives the centroid.
Centroid is that point where the shape is perfectly balanced on the
tip of the pin.
M(0,1)/M(0,0) ,M(1,0)/M(0,0)
I think the image from the tutorial you got the code from gives the intuitive idea pretty well:
To deskew the image, they used skewness on x axis (mu02) relative to the variance mu11. They used shear matrix with inverse of image skewness, which is why in skew = m['mu11']/m['mu02'] mu02 and mu11 fraction is flipped. To deskew relative to the center of the top of the image, rather than the (0,0) point, they also used translation, which is where you get M[0, 2] = -0.5*SZ*skew

Cosmic ray removal in spectra

Python developers
I am working on spectroscopy in a university. My experimental 1-D data sometimes shows "cosmic ray", 3-pixel ultra-high intensity, which is not what I want to analyze. So I want to remove this kind of weird peaks.
Does anybody know how to fix this issue in Python 3?
Thanks in advance!!
A simple solution could be to use the algorithm proposed by Whitaker and Hayes, in which they use modified z scores on the derivative of the spectrum. This medium post explains how it works and its implementation in python https://towardsdatascience.com/removing-spikes-from-raman-spectra-8a9fdda0ac22 .
The idea is to calculate the modified z scores of the spectra derivatives and apply a threshold to detect the cosmic spikes. Afterwards, a fixer is applied to remove the spike points and replace it by the mean values of the surrounding pixels.
# definition of a function to calculate the modified z score.
def modified_z_score(intensity):
median_int = np.median(intensity)
mad_int = np.median([np.abs(intensity - median_int)])
modified_z_scores = 0.6745 * (intensity - median_int) / mad_int
return modified_z_scores
# Once the spike detection works, the spectrum can be fixed by calculating the average of the previous and the next point to the spike. y is the intensity values of a spectrum, m is the window which we will use to calculate the mean.
def fixer(y,m):
threshold = 7 # binarization threshold.
spikes = abs(np.array(modified_z_score(np.diff(y)))) > threshold
y_out = y.copy() # So we don't overwrite y
for i in np.arange(len(spikes)):
if spikes[i] != 0: # If we have an spike in position i
w = np.arange(i-m,i+1+m) # we select 2 m + 1 points around our spike
w2 = w[spikes[w] == 0] # From such interval, we choose the ones which are not spikes
y_out[i] = np.mean(y[w2]) # and we average the value
return y_out
The answer depends a on what your data looks like: If you have access to two-dimensional CCD readouts that the one-dimensional spectra were created from, then you can use the lacosmic module to get rid of the cosmic rays there. If you have only one-dimensional spectra, but multiple spectra from the same source, then a quick ad-hoc fix is to make a rough normalisation of the spectra and remove those pixels that are several times brighter than the corresponding pixels in the other spectra. If you have only one one-dimensional spectrum from each source, then a less reliable option is to remove all pixels that are much brighter than their neighbours. (Depending on the shape of your cosmics, you may even want to remove the nearest 5 pixels or something, to catch the wings of the cosmic ray peak as well).

What is the focal length and image plane distance from this raytracing formula

I have a 4x4 camera matrix comprised of right, up, forward and position vectors.
I raytrace the scene with the following code that I found in a tutorial but don't really entirely understand it:
for (int i = 0; i < m_imageSize.width; ++i)
{
for (int j = 0; j < m_imageSize.height; ++j)
{
u = (i + .5f) / (float)(m_imageSize.width - 1) - .5f;
v = (m_imageSize.height - 1 - j + .5f) / (float)(m_imageSize.height - 1) - .5f;
Ray ray(cameraPosition, normalize(u*cameraRight + v*cameraUp + 1 / tanf(m_verticalFovAngleRadian) *cameraForward));
I have a couple of questions:
How can I find the focal length of my raytracing camera?
Where is my image plane?
Why cameraForward needs to be multiplied with this 1/tanf(m_verticalFovAngleRadian)?
Focal length is a property of lens systems. The camera model that this code uses, however, is a pinhole camera, which does not use lenses at all. So, strictly speaking, the camera does not really have a focal length. The corresponding optical properties are instead expressed as the field of view (the angle that the camera can observe; usually the vertical one). You could calculate the focal length of a camera that has an equivalent field of view with the following formula (see Wikipedia):
FOV = 2 * arctan (x / 2f)
FOV diagonal field of view
x diagonal of film; by convention 24x36 mm -> x=43.266 mm
f focal length
There is no unique image plane. Any plane that is perpendicular to the view direction can be seen as the image plane. In fact, the projected images differ only in their scale.
For your last question, let's take a closer look at the code:
u = (i + .5f) / (float)(m_imageSize.width - 1) - .5f;
v = (m_imageSize.height - 1 - j + .5f) / (float)(m_imageSize.height - 1) - .5f;
These formulas calculate u/v coordinates between -0.5 and 0.5 for every pixel, assuming that the entire image fits in the box between -0.5 and 0.5.
u*cameraRight + v*cameraUp
... is just placing the x/y coordinates of the ray on the pixel.
... + 1 / tanf(m_verticalFovAngleRadian) *cameraForward
... is defining the depth component of the ray and ultimately the depth of the image plane you are using. Basically, this is making the ray steeper or shallower. Assume that you have a very small field of view, then 1/tan(fov) is a very large number. So, the image plane is very far away, which produces exactly this small field of view (when keeping the size of the image plane constant since you already set the x/y components). On the other hand, if the field of view is large, the image plane moves closer. Note that this notion of image plane is only conceptual. As I said, all other image planes are equally valid and would produce the same image. Another way (and maybe a more intuitive one) to specify the ray would be
u * tanf(m_verticalFovAngleRadian) * cameraRight
+ v * tanf(m_verticalFovAngleRadian) * cameraUp
+ 1 * cameraForward));
As you see, this is exactly the same ray (just scaled). The idea here is to set the conceptual image plane to a depth of 1 and scale the x/y components to adapt the size of the image plane. tan(fov) (with fov being the half field of view) is exactly the size of the half image plane at a depth of 1. Just draw a triangle to verify that. Note that this code is only able to produce square image planes. If you want to allow rectangular ones, you need to take into account the ratio of the side lengths.

Why does the projection of an image over 3d points show this distortion?

I have a question regarding the projection of an image over a set of 3D points. The image is given to me as a JPG, together with position and attitude information of the camera relative to a cartesian coordinate system (Xc,Yc,Zc and yaw, pitch, roll), as well as the horizontal and vertical field of view (in degrees).
Points are given using solely their 3d position in the same coordinate system (Xp,Yp,Zp).
In my coordinate system, Z is up. To project the image onto the points, I
compute the vector from camera to each point
Vector3 c2p = (Xp,Yp,Zp)-(Xc,Yc,Zc);
rotate c2p according to my camera's attitude (quaternion):
Vector3 c2pCamFrame = getCamQuaternion().conjugate().rotate(c2p);
compute azimuth and elevation from the camera's "center ray" to the point:
float azimuth = atan2(c2pCamFrame.x(),c2pCamFrame.y()));
float elevation = atan2(c2pCamFrame.z(),sqrt(pow(c2pCamFrame.x(),2)+pow(c2pCamFrame.y(),2)));
if azimuth and elevation are within the field of view, I assign the color of the corresponding pixel to the point.
This works almost perfectly, and the "almost" motivates my question. Let me show you:
I cannot figure out why the elevation of the projection is distorted. In the bottom right of the image, you can see that points outside the frustum (exceeding the elevation) actually become colored - and this distortion is null at an azimuth of 0 degrees and peaks at the left and right edges of the image, creating the pillow distortion.
Why does this distortion appear? I'd love to understand this problem both in geometrical as well as mathematical terms. Thank you!
The field of view angles are only valid on the principal axes. But you can do it the other way around. I.e. calculate the x/y bounds from the angles:
maxX = tan(horizontal_fov / 2)
maxY = tan(vertical_fov / 2)
And check
if(abs(c2pCamFrame.x() / c2pCamFrame.z()) <= maxX
&& abs(c2pCamFrame.y() / c2pCamFrame.z()) <= maxY)
Additionally, you might want to check if the points are in front of the camera:
... && c2pCamFrame.z() > 0
This assumes a left-handed coordinate system.

How can I translate an image with subpixel accuracy?

I have a system that requires moving an image on the screen. I am currently using a png and just placing it at the desired screen coordinates.
Because of a combination of the screen resolution and the required frame rate, some frames are identical because the image has not yet moved a full pixel. Unfortunately, the resolution of the screen is not negotiable.
I have a general understanding of how sub-pixel rendering works to smooth out edges but I have been unable to find a resource (if it exists) as to how I can use shading to translate an image by less than a single pixel.
Ideally, this would be usable with any image but if it was only possible with a simple shape like a circle or a ring, that would also be acceptable.
Sub-pixel interpolation is relatively simple. Typically you apply what amounts to an all-pass filter with a constant phase shift, where the phase shift corresponds to the required sub-pixel image shift. Depending on the required image quality you might use e.g. a 5 point Lanczos or other windowed sinc function and then apply this in one or both axes depending on whether you want an X shift or a Y shift or both.
E.g. for a 0.5 pixel shift the coefficients might be [ 0.06645, 0.18965, 0.27713, 0.27713, 0.18965 ]. (Note that the coefficients are normalised, i.e. their sum is equal to 1.0.)
To generate a horizontal shift you would convolve these coefficients with the pixels from x - 2 to x + 2, e.g.
const float kCoeffs[5] = { 0.06645f, 0.18965f, 0.27713f, 0.27713f, 0.18965f };
for (y = 0; y < height; ++y) // for each row
for (x = 2; x < width - 2; ++x) // for each col (apart from 2 pixel border)
{
float p = 0.0f; // convolve pixel with Lanczos coeffs
for (dx = -2; dx <= 2; ++dx)
p += in[y][x + dx] * kCoeffs[dx + 2];
out[y][x] = p; // store interpolated pixel
}
Conceptually, the operation is very simple. First you scale up the image (using any method of interpolation, as you like), then you translate the result, and finally you subsample down to the original image size.
The scale factor depends on the precision of sub-pixel translation you want to do. If you want to translate by 0.5 degrees, you need scale up the original image by a factor of 2 then you translate the resulting image by 1 pixel; if you want to translate by 0.25 degrees, you need to scale up by a factor of 4, and so on.
Note that this implementation is not efficient because when you scale up you end up calculating pixel values that you won't actually use because they're just dropped when you subsample back to the original image size. The implementation in Paul's answer is more efficient.

Resources