How to find camera rotation matrix when axis flips or changes - geometry

Let us assume that we have camera extrinsics [R|t](camera-to-world) and a permutation matrix P that flips or changes the axis, which may have variants regarding its determinant. For example, this
P = np.array([
[1, 0, 0],
[0, 0, 1],
[0, -1, 0]
]) # (sorry that I'm not writing in consistent manner.)
changes points(axis) (x, y, z) to (x, z, -y). This may affect the rotation matrix in new coordinate system, if my guess is correct, so that it results in a new rotation matrix R' that decides the orientation of the transformed camera. Is it just P[R|t] and I get R' = PR? How do I find this?

Fliping odd number of axises is not rotation ! But that does not matter you can use matrix inverse to compute this so if I see it correctly you have:
R' = R*P
Where R' is your new matrix (fliped) and R is original matrix and you want to know the P so:
R' = R*P // Inverse(R)*
Inverse(R)*R' = Inverse(R)*R*P
Inverse(R)*R' = P
In case your R is orthonormal then Inverse of it is the same as its Transponation which is way faster. If its just ortogonal you can still use Transpose instead of Inverse but you would need to correct the basis vector lenghts afterwards.

Related

How to use "cv2.estimateAffine3D" correctly, to align two 3d coordinate systems?

I have to align two 3d coordinate systems and I have 32 coordinates of points in the real world for each coordinate system.
I already know that I have to calculate the translation and the rotation and I (think) that I also understand what the calculation is. Because some smart people already had solve this problem I just want to use openCV for this and I asked the aiChatBot of how to do this calculation without and with openCV.
But I think using "cv2.estimateAffine3D" looks right.
Maybe some can just give me an example how to use it correctly?
At the moment the code with example I have from the aiChatBot looks like this:
import cv2
import numpy as np
# Define the coordinates of the three points in the first coordinate system
p1 = np.array([[1, 2, 3]], dtype=np.float32)
p2 = np.array([[4, 5, 6]], dtype=np.float32)
p3 = np.array([[7, 8, 9]], dtype=np.float32)
# Define the coordinates of the three points in the second coordinate system
q1 = np.array([[1, 2, 3]], dtype=np.float32)
q2 = np.array([[5, 6, 7]], dtype=np.float32)
q3 = np.array([[9, 10, 11]], dtype=np.float32)
# Calculate the affine transformation that aligns the two coordinate systems
T = cv2.estimateAffine3D(p1, p2, p3)
# Reshape the point arrays to 2D or 3D
q1_reshaped = q1.reshape(1, -1, 3)
q2_reshaped = q2.reshape(1, -1, 3)
q3_reshaped = q3.reshape(1, -1, 3)
# Apply the transformation to the second coordinate system
q1_aligned = cv2.transform(q1_reshaped, T)
q2_aligned = cv2.transform(q2_reshaped, T)
q3_aligned = cv2.transform(q3_reshaped, T)
# Print the transformed points
print(q1_aligned)
print(q2_aligned)
print(q3_aligned)
the error I get is :
error: OpenCV(4.6.0) :-1: error: (-5:Bad argument) in function 'transform'
Overload resolution failed:
m is not a numerical tuple
Expected Ptr<cv::UMat> for argument 'm
the answer form the aiChatBot to solve this is:
"It looks like the error you are experiencing is caused by passing an incorrect argument to the cv2.transform function. The cv2.transform function expects a 2D or 3D point array as the first argument and a 3x3 or 3x4 transformation matrix as the second argument, but you are passing a 1D point array and a 3x3 or 3x4 transformation matrix as arguments.
To fix this error, you need to pass a 2D or 3D point array as the first argument to the cv2.transform function. You can do this by reshaping the 1D point arrays q1, q2, and q3 using the np.reshape function, as follows:"
but there we come in a loop because this "solution" is already in the code.
So I do not really find any information about that. Can anybody help me with this?
cheers!
I want to align two coordinate systems.

Understanding an Einsum usage for graph convolution

I am reading the code for the spatial-temporal graph convolution operation here:
https://github.com/yysijie/st-gcn/blob/master/net/utils/tgcn.py and I'm having some difficulty understanding what is happening with an einsum operation. In particular
For x a tensor of shape (N, kernel_size, kc//kernel_size, t, v), where
kernel_size is typically 3, and lets say kc=64*kernel_size, t is the number of frames, say 64, and v the number of vertices, say 25. N is the batch size.
Now for a tensor A of shape (3, 25, 25) where each dimension is a filtering op on the graph vertices, an einsum is computed as:
x = torch.einsum('nkctv,kvw->nctw', (x, A))
I'm not sure how to interpret this expression. What I think it's saying is that for each batch element, for each channel c_i out of 64, sum each of the three matrices obtained by matrix multiplication of the (64, 25) feature map at that channel with the value of A[i]. Do I have this correct? The expression is a bit of a mouthful, and notation wise there seems to be a bit of a strange usage of kc as one variable name, but then decomposition of k as kernel size and c as the number of channels (192//3 = 64) in the expression for the einsum. Any insights appreciated.
Helps when you look closely at the notation:
nkctv for left side
kvw on the right side
nctw being the result
Missing from the result are:
k
v
These elements are summed together into a single value and squeezed, leaving us the resulting shape.
Something along the lines of (expanded shapes (added 1s) are broadcasted and sum per element):
left: (n, k, c, t, v, 1)
right: (1, k, 1, 1, v, w)
Now it goes (l, r for left and right):
torch.mul(l, r)
torch.sum(l, r, dim=(1, 4))
squeeze any singular dimensions
It is pretty hard to get, hence Einstein's summation helps in terms of thinking about resulting shapes “mixed” with each other, at least for me.
Y = torch.einsum('nkctv,kvw->nctw', (x, A)) means:
einsum interpretation on graph
For better understanding, I have replaced the x in left hand side with Y

Get 3D point on directional light ray in Blender Python given Euler angles

I am trying to get a 3Dpoint on Sun light in Blender 3D, so that I can use it to specify directional light target position in Three JS. I have read from this How to convert Euler angles to directional vector? I could not get it. Please let me know how to get it.
I think it is a good question. In blender, since the unit vector starts from the z-axis (the light points down when initialized), I think you could use the last column of the total rotation matrix. The function for calculating the total rotation matrix is given here. Here is a modification of the function that will return a point at unit distance in the direction of the light source:
def getCosinesFromEuler(roll,pitch,yaw):
Rz_yaw = np.array([
[np.cos(yaw), -np.sin(yaw), 0],
[np.sin(yaw), np.cos(yaw), 0],
[ 0, 0, 1]])
Ry_pitch = np.array([
[ np.cos(pitch), 0, np.sin(pitch)],
[ 0, 1, 0],
[-np.sin(pitch), 0, np.cos(pitch)]])
Rx_roll = np.array([
[1, 0, 0],
[0, np.cos(roll), -np.sin(roll)],
[0, np.sin(roll), np.cos(roll)]])
rotMat = Rz_yaw # Ry_pitch # Rx_roll
return rotMat # np.array([0,0,1])
And it can be called like this :
# assuming ob is the light object
roll = ob.rotation_euler.x
pitch = ob.rotation_euler.y
yaw = ob.rotation_euler.z
x,y,z = getCosinesFromEuler(roll,pitch,yaw)
And this point (x,y,z) needs to be subtracted from the position of the light object to get a point at unit distance on the ray.

Rotating 2D grayscale image with transformation matrix

I am new to image processing so i am really confused regarding the coordinate system with images. I have a sample image and i am trying to rotate it 45 clockwise. My transformation matrix is T = [ [cos45 sin45] [-sin45 cos45] ]
Here is the code:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
trans_matrix = np.array([[np.cos(45), np.sin(45)], [-np.sin(45), np.cos(45)]])
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The image comes out distorted and doesn't seems right. I know that in pixel coordinate, the origin is at the top left corner (row, column). is the rotation happening with respect to origin from the top left corner? is there a way to shift origin to center or any other given point?
Thank you all!
Yes, as you suspect, the rotation is happening with respect to the top left corner, which has coordinates (0, 0). (Also: the NumPy trigonometric functions use radians rather than degrees, so you need to convert your angle.) To compute a rotation with respect to the center, you do a little hack: you compute the transformation for moving the image so that it is centered on (0, 0), then you rotate it, then you move the result back. You need to combine these transformations in a sequence because if you do it one after the other, you'll lose everything in negative coordinates.
It's much, much easier to do this using Homogeneous coordinates, which add an extra "dummy" dimension to your image. Here's what your code would look like in homogeneous coordinates:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
c, s = np.cos(np.radians(45)), np.sin(np.radians(45))
rot_matrix = np.array([[c, s, 0], [-s, c, 0], [0, 0, 1]])
x, y = np.array(image.shape) // 2
# move center to (0, 0)
translate1 = np.array([[1, 0, -x], [0, 1, -y], [0, 0, 1]])
# move center back to (x, y)
translate2 = np.array([[1, 0, x], [0, 1, y], [0, 0, 1]])
# compose all three transformations together
trans_matrix = translate2 # rot_matrix # translate1
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j, 1]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out, _ = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The above should work ok, but you will probably get some black spots due to aliasing. What can happen is that no coordinates i, j from the input land exactly on an output pixel, so that pixel never gets updated. Instead, what you need to do is iterate over the pixels of the output image, then use the inverse transform to find which pixel in the input image maps closest to that output pixel. Something like:
inverse_tform = np.linalg.inv(trans_matrix)
for i, j in np.ndindex(img_transformed.shape):
i_orig, j_orig, _ = np.round(inverse_tform # [i, j, 1]).astype(int)
if i_orig in range(image.shape[0]) and j_orig in range(image.shape[1]):
img_transformed[i, j] = image[i_orig, j_orig]
Hope this helps!

Apply an affine transform to a bounding rectangle

I am working on a pedestrian tracking algorithm using Python3 & OpenCV.
We can use SIFT keypoints as an identifier of a pedestrian silhouette on a frame and then perform brute force matching between two sets of SIFT keypoints (i.e. between one frame and the next one) to find the pedestrian in the next frame.
To visualize this on the sequence of frames, we can draw a bounding rectangle delimiting the pedestrian. This is what it looks like :
The main problem is about characterizing the motion of the pedestrian using the keypoints. The idea here is to find an affine transform (that is translation in x & y, rotation & scaling) using the coordinates of the keypoints on 2 successives frames. Ideally, this affine transform somehow corresponds to the motion of the pedestrian. To track this pedestrian, we would then just have to apply the same affine transform on the bounding rectangle coordinates.
That last part doesn’t work well. The rectangle consistently shrinks over several frames to inevitably disappear or drifts away from the pedestrian, as you see below or on the previous image :
To specify, we characterize the bounding rectangle with 2 extreme points :
There are some built-in cv2 functions that can apply an affine transform to an image, like cv2.warpAffine(), but I want to apply it only to the bounding rectangle coordinates (i.e 2 points or 1 point + width & height).
To find the affine transform between the 2 sets of keypoints, I’ve written my own function (I can post the code if it helps), but I’ve observed similar results when using cv2.getAffineTransform() for instance.
Do you know how to properly apply an affine transform to this bounding rectangle ?
EDIT : here’s some explanation & code for better context :
The pedestrian detection is done with the pre-trained SVM classifier available in openCV : hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) & hog.detectMultiScale()
Once a first pedestrian is detected, the SVM returns the coordinates of the associated bounding rectangle (xA, yA, w, h) (we stop using the SVM after the 1st detection as it is quite slow, and we are focusing on one pedestrian for now)
We select the corresponding region of the current frame, with image[yA: yA+h, xA: xA+w] and search for SURF keypoints within with surf.detectAndCompute()
This returns the keypoints & their associated descriptors (an array of 64 characteristics for each keypoint)
We perform brute force matching, based on the L2-norm between the descriptors and the distance in pixels between the keypoints to construct pairs of keypoints between the current frame & the previous one. The code for this function is pretty long, but should be similar to cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
Once we have the matched pairs of keypoints, we can use them to find the affine transform with this function :
previousKpts = previousKpts[:5] # select 4 best matches
currentKpts = currentKpts[:5]
# build A matrix of shape [2 * Nb of keypoints, 4]
A = np.ndarray(((2 * len(previousKpts), 4)))
for idx, keypoint in enumerate(previousKpts):
# Keypoint.pt = (x-coord, y-coord)
A[2 * idx, :] = [keypoint.pt[0], -keypoint.pt[1], 1, 0]
A[2 * idx + 1, :] = [keypoint.pt[1], keypoint.pt[0], 0, 1]
# build b matrix of shape [2 * Nb of keypoints, 1]
b = np.ndarray((2 * len(previousKpts), 1))
for idx, keypoint in enumerate(currentKpts):
b[2 * idx, :] = keypoint.pt[0]
b[2 * idx + 1, :] = keypoint.pt[1]
# convert the numpy.ndarrays to matrix :
A = np.matrix(A)
b = np.matrix(b)
# solution of the form x = [x1, x2, x3, x4]' = ((A' * A)^-1) * A' * b
x = np.linalg.inv(A.T * A) * A.T * b
theta = math.atan2(x[1, 0], x[0, 0]) # outputs rotation angle in [-pi, pi]
alpha = math.sqrt(x[0, 0] ** 2 + x[1, 0] ** 2) # scaling parameter
bx = x[2, 0] # translation along x-axis
by = x[3, 0] # translation along y-axis
return theta, alpha, bx, by
We then just have to apply the same affine transform to the corner points of the bounding rectangle :
# define the 4 bounding points using xA, yA
xB = xA + w
yB = yA + h
rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
# warp the affine transform into a full perspective transform
affine_warp = np.array([[alpha*np.cos(theta), -alpha*np.sin(theta), tx],
[alpha*np.sin(theta), alpha*np.cos(theta), ty],
[0, 0, 1]], dtype=np.float32)
# apply affine transform
rect_pts = cv2.perspectiveTransform(rect_pts, affine_warp)
xA = rect_pts[0, 0, 0]
yA = rect_pts[0, 0, 1]
xB = rect_pts[3, 0, 0]
yB = rect_pts[3, 0, 1]
return xA, yA, xB, yB
Save the updated rectangle coordinates (xA, yA, xB, yB), all current keypoints & descriptors, and iterate over the next frame : select image[yA: yB, xA: xA] using (xA, yA, xB, yB) we previously saved, get SURF keypoints etc.
As Micka suggested, cv2.perspectiveTransform() is an easy way to accomplish this. You'll just need to turn your affine warp into a full perspective transform (homography) by adding a third row at the bottom with the values [0, 0, 1]. For example, let's put a box with w, h = 100, 200 at the point (10, 20) and then use an affine transformation to shift the points so that the box is moved to (0, 0) (i.e. shift 10 pixels to the left and 20 pixels up):
>>> xA, yA, w, h = (10, 20, 100, 200)
>>> xB, yB = xA + w, yA + h
>>> rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
>>> affine_warp = np.array([[1, 0, -10], [0, 1, -20], [0, 0, 1]], dtype=np.float32)
>>> cv2.perspectiveTransform(rect_pts, affine_warp)
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)
So that works perfectly as expected. You could also just simply transform the points yourself with matrix multiplication:
>>> rect_pts.dot(affine_warp[:, :2]) + affine_warp[:, 2]
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)

Resources