De-Skewing image - python-3.x

I am unable to figure out how does this deskew is working
def deskew(img):
m = cv2.moments(img)
if abs(m['mu02']) < 1e-2:
return img.copy()
skew = m['mu11']/m['mu02']
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
img = cv2.warpAffine(img,M,(SZ, SZ),flags=affine_flags)
return img
I know that the moment is a quantitative measure of the shape.
In image processing, the moments give information about the total
area or Intensity, the centroid of the shape and the orientation of the
shape.
Area or total Mass:-
The zeroth moment M(0,0) gives the total Mass or Area.
In image processing, the M(0,0) is the sum of all the pixels and if it is a binary image then sum of pixels gives the area.
Center of mass or Centroid:- When the first moment is divided by
the total mass then it gives the centroid.
Centroid is that point where the shape is perfectly balanced on the
tip of the pin.
M(0,1)/M(0,0) ,M(1,0)/M(0,0)

I think the image from the tutorial you got the code from gives the intuitive idea pretty well:
To deskew the image, they used skewness on x axis (mu02) relative to the variance mu11. They used shear matrix with inverse of image skewness, which is why in skew = m['mu11']/m['mu02'] mu02 and mu11 fraction is flipped. To deskew relative to the center of the top of the image, rather than the (0,0) point, they also used translation, which is where you get M[0, 2] = -0.5*SZ*skew

Related

How to find the direction of triangles in an image using OpenCV

I am trying to find the direction of triangles in an image. below is the image:
These triangles are pointing upward/downward/leftward/rightward. This is not the actual image. I have already used canny edge detection to find edges then contours and then the dilated image is shown below.
My logic to find the direction:
The logic I am thinking to use is that among the three corner coordinates If I can identify the base coordinates of the triangle (having the same abscissa or ordinates values coordinates), I can make a base vector. Then angle between unit vectors and base vectors can be used to identify the direction. But this method can only determine if it is up/down or left/right but cannot differentiate between up and down or right and left. I tried to find the corners using cv2.goodFeaturesToTrack but as I know it's giving only the 3 most effective points in the entire image. So I am wondering if there is other way to find the direction of triangles.
Here is my code in python to differentiate between the triangle/square and circle:
#blue_masking
mask_blue=np.copy(img1)
row,columns=mask_blue.shape
for i in range(0,row):
for j in range(0,columns):
if (mask_blue[i][j]==25):
mask_blue[i][j]=255
else:
mask_blue[i][j]=0
blue_edges = cv2.Canny(mask_blue,10,10)
kernel_blue = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(2,2))
dilated_blue = cv2.dilate(blue_edges, kernel)
blue_contours,hierarchy =
cv2.findContours(dilated_blue,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
for cnt in blue_contours:
area = cv2.contourArea(cnt)
perimeter = cv2.arcLength(cnt,True)
M = cv2.moments(cnt)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
if(12<(perimeter*perimeter)/area<14.8):
shape="circle"
elif(14.8<(perimeter*perimeter)/area<18):
shape="squarer"
elif(18<(perimeter*perimeter)/area and area>200):
shape="triangle"
print(shape)
print(area)
print((perimeter*perimeter)/area,"\n")
cv2.imshow('mask_blue',dilated_blue)
cv2.waitKey(0)
cv2.destroyAllWindows()
Source image can be found here: img1
Please help, how can I found the direction of triangles?
Thank you.
Assuming that you only have four cases: [up, down, left, right], this code should work well for you.
The idea is simple:
Get the bounding rectangle for your contour. Use: box = cv2.boundingRect(contour_pnts)
Crop the image using the bounding rectangle.
Reduce the image vertically and horizontally using the Sum option. Now you have the sum of pixels along each axis. The axis with the largest sum determines whether the triangle base is vertical or horizontal.
To identify whether the triangle is pointing left/right or up/down: you need to check whether the bounding rectangle center is before or after the max col/row:
The code (assumes you start from the cropped image):
ver_reduce = cv2.reduce(img, 0, cv2.REDUCE_SUM, None, cv2.CV_32F)
hor_reduce = cv2.reduce(img, 1, cv2.REDUCE_SUM, None, cv2.CV_32F)
#For smoothing the reduced vector, could be removed
ver_reduce = cv2.GaussianBlur(ver_reduce, (3, 1), 0)
hor_reduce = cv2.GaussianBlur(hor_reduce, (1, 3), 0)
_,ver_max, _, ver_col = cv2.minMaxLoc(ver_reduce)
_,hor_max, _, hor_row = cv2.minMaxLoc(hor_reduce)
ver_col = ver_col[0]
hor_row = hor_row[1]
contour_pnts = cv2.findNonZero(img) #in my code I do not have the original contour points
rect_center, size, angle = cv2.minAreaRect(contour_pnts )
print(rect_center)
if ver_max > hor_max:
if rect_center[0] > ver_col:
print ('right')
else:
print ('left')
else:
if rect_center[1] > hor_row:
print ('down')
else:
print ('up')
Photos:
Well, Mark has mentioned a solution that may not be as efficient but perhaps more accurate. I think this one should be equally efficient but perhaps less accurate. But since you already have a code that finds triangles, try adding the following code after you have found triangle contour:
hull = cv2.convexHull(cnt) # convex hull of contour
hull = cv2.approxPolyDP(hull,0.1*cv2.arcLength(hull,True),True)
# You can double check if the contour is a triangle here
# by something like len(hull) == 3
You should get 3 hull points for a triangle, these should be the 3 vertices of your triangles. Given your triangles always 'face' only in 4 directions; Y coordinate of the hull will have close value to the Y coordinate of the centroid for triangle facing left or right and whether it's pointing left or right will depend on whether hull X is less than or greater than centroid X. Similarly use hull and centroid X and Y for triangle pointing up or down.

Converting a series of depth maps and x, y, z, theta values into a 3D model

I have a quadrotor which flies around and knows its x, y, z positions and angular displacement along the x, y, z axis. It captures a constant stream of images which are converted into depth maps (we can estimate the distance between each pixel and the camera).
How can one program an algorithm which converts this information into a 3D model of the environment? That is, how can we generate a virtual 3D map from this information?
Example: below is a picture that illustrates what the quadrotor captures (top) and what the image is converted into to feed into a 3D mapping algorithm (bottom)
Let's suppose this image was taken from a camera with x, y, z coordinates (10, 5, 1) in some units and angular displacement of 90, 0, 0 degrees about the x, y, z axes. What I want to do is take a bunch of these photo-coordinate tuples and convert them into a single 3D map of the area.
Edit 1 on 7/30: One obvious solution is to use the angle of the quadrotor wrt to x, y, and z axes with the distance map to figure out the Cartesian coordinates of any obstructions with trig. I figure I could probably write an algorithm which uses this approach with a probabilistic method to make a crude 3D map, possibly vectorizing it to make it faster.
However, I would like to know if there is any fundamentally different and hopefully faster approach to solving this?
Simply convert your data to Cartesian and store the result ... As you have known topology (spatial relation between data points) of the input data then this can be done to map directly to mesh/surface instead of to PCL (which would require triangulation or convex hull etc ...).
Your images suggest you have known topology (neighboring pixels are neighboring also in 3D ...) so you can construct mesh 3D surface directly:
align both RGB and Depth 2D maps.
In case this is not already done see:
Align already captured rgb and depth images
convert to Cartesian coordinate system.
First we compute the position of each pixel in camera local space:
so each pixel (x,y) in RGB map we find out the Depth distance to camera focal point and compute the 3D position relative to the camera focal point.For that we can use triangle similarity so:
x = camera_focus.x + (pixel.x-camera_focus.x)*depth(pixel.x,pixel.y)/focal_length
y = camera_focus.y + (pixel.y-camera_focus.y)*depth(pixel.x,pixel.y)/focal_length
z = camera_focus.z + depth(pixel.x,pixel.y)
where pixel is pixel 2D position, depth(x,y) is coresponding depth, and focal_length=znear is the fixed camera parameter (determining FOV). the camera_focus is the camera focal point position. Its usual that camera focal point is in the middle of the camera image and znear distant to the image (projection plane).
As this is taken from moving device you need to convert this into some global coordinate system (using your camera positon and orientation in space). For that are the best:
Understanding 4x4 homogenous transform matrices
construct mesh
as your input data are already spatially sorted we can construct QUAD grid directly. Simply for each pixel take its neighbors and form QUADS. So if 2D position in your data (x,y) is converted into 3D (x,y,z) with approach described in previous bullet we can write iot in form of function that returns 3D position:
(x,y,z) = 3D(x,y)
Then I can form QUADS like this:
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
we can use for loops:
for (x=0;x<xs-1;x++)
for (y=0;y<ys-1;y++)
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
where xs,ys is the resolution of your maps.
In case you do not know camera properties you can set the focal_length to any reasonable constant (resulting in fish eye effects and or scaled output) or infer it from input data like:
Transformation of 3D objects related to vanishing points and horizon line

What is the fastest way to find the center of an irregular convex polygon?

I'm interested in a fast way to calculate the rotation-independent center of a simple, convex, (non-intersecting) 2D polygon.
The example below (on the left) shows the mean center (sum of all points divided by the total), and the desired result on the right.
Some options I've already considered.
bound-box center (depends on rotation, and ignores points based on their relation to the axis).
Straight skeleton - too slow to calculate.
I've found a way which works reasonably well, (weight the points by the edge-lengths) - but this means a square-root call for every edge - which I'd like to avoid.(Will post as an answer, even though I'm not entirely satisfied with it).
Note, I'm aware of this questions similarity with:What is the fastest way to find the "visual" center of an irregularly shaped polygon?
However having to handle convex polygons increases the complexity of the problem significantly.
The points of the polygon can be weighted by their edge length which compensates for un-even point distribution.
This works for convex polygons too but in that case the center point isn't guaranteed to be inside the polygon.
Psudo-code:
def poly_center(poly):
sum_center = (0, 0)
sum_weight = 0.0
for point in poly:
weight = ((point - point.next).length +
(point - point.prev).length)
sum_center += point * weight
sum_weight += weight
return sum_center / sum_weight
Note, we can pre-calculate all edge lengths to halve the number of length calculations, or reuse the previous edge-length for half+1 length calculations. This is just written as an example to show the logic.
Including this answer for completeness since its the best method I've found so far.
There is no much better way than the accumulation of coordinates weighted by the edge length, which indeed takes N square roots.
If you accept an approximation, it is possible to skip some of the vertices by curve simplification, as follows:
decide of a deviation tolerance;
start from vertex 0 and jump to vertex M (say M=N/2);
check if the deviation along the polyline from 0 to M exceeds the tolerance (for this, compute the height of the triangle formed by the vertices 0, M/2, M);
if the deviation is exceeded, repeat recursively with 0, M/4, M/2 and M/2, 3M/4, M;
if the deviation is not exceeded, assume that the shape is straight between 0 and M.
continue until the end of the polygon.
Where the points are dense (like the left edge on your example), you should get some speedup.
I think its easiest to do something with the center of masses of the delaunay triangulation of the polygon points. i.e.
def _centroid_poly(poly):
T = spatial.Delaunay(poly).simplices
n = T.shape[0]
W = np.zeros(n)
C = 0
for m in range(n):
sp = poly[T[m,:],:]
W[m] = spatial.ConvexHull(sp).volume
C += W[m] +np.mean(sp, axis = 0)
return C / np.sum(W)
This works well for me!

Why does the projection of an image over 3d points show this distortion?

I have a question regarding the projection of an image over a set of 3D points. The image is given to me as a JPG, together with position and attitude information of the camera relative to a cartesian coordinate system (Xc,Yc,Zc and yaw, pitch, roll), as well as the horizontal and vertical field of view (in degrees).
Points are given using solely their 3d position in the same coordinate system (Xp,Yp,Zp).
In my coordinate system, Z is up. To project the image onto the points, I
compute the vector from camera to each point
Vector3 c2p = (Xp,Yp,Zp)-(Xc,Yc,Zc);
rotate c2p according to my camera's attitude (quaternion):
Vector3 c2pCamFrame = getCamQuaternion().conjugate().rotate(c2p);
compute azimuth and elevation from the camera's "center ray" to the point:
float azimuth = atan2(c2pCamFrame.x(),c2pCamFrame.y()));
float elevation = atan2(c2pCamFrame.z(),sqrt(pow(c2pCamFrame.x(),2)+pow(c2pCamFrame.y(),2)));
if azimuth and elevation are within the field of view, I assign the color of the corresponding pixel to the point.
This works almost perfectly, and the "almost" motivates my question. Let me show you:
I cannot figure out why the elevation of the projection is distorted. In the bottom right of the image, you can see that points outside the frustum (exceeding the elevation) actually become colored - and this distortion is null at an azimuth of 0 degrees and peaks at the left and right edges of the image, creating the pillow distortion.
Why does this distortion appear? I'd love to understand this problem both in geometrical as well as mathematical terms. Thank you!
The field of view angles are only valid on the principal axes. But you can do it the other way around. I.e. calculate the x/y bounds from the angles:
maxX = tan(horizontal_fov / 2)
maxY = tan(vertical_fov / 2)
And check
if(abs(c2pCamFrame.x() / c2pCamFrame.z()) <= maxX
&& abs(c2pCamFrame.y() / c2pCamFrame.z()) <= maxY)
Additionally, you might want to check if the points are in front of the camera:
... && c2pCamFrame.z() > 0
This assumes a left-handed coordinate system.

How can I translate an image with subpixel accuracy?

I have a system that requires moving an image on the screen. I am currently using a png and just placing it at the desired screen coordinates.
Because of a combination of the screen resolution and the required frame rate, some frames are identical because the image has not yet moved a full pixel. Unfortunately, the resolution of the screen is not negotiable.
I have a general understanding of how sub-pixel rendering works to smooth out edges but I have been unable to find a resource (if it exists) as to how I can use shading to translate an image by less than a single pixel.
Ideally, this would be usable with any image but if it was only possible with a simple shape like a circle or a ring, that would also be acceptable.
Sub-pixel interpolation is relatively simple. Typically you apply what amounts to an all-pass filter with a constant phase shift, where the phase shift corresponds to the required sub-pixel image shift. Depending on the required image quality you might use e.g. a 5 point Lanczos or other windowed sinc function and then apply this in one or both axes depending on whether you want an X shift or a Y shift or both.
E.g. for a 0.5 pixel shift the coefficients might be [ 0.06645, 0.18965, 0.27713, 0.27713, 0.18965 ]. (Note that the coefficients are normalised, i.e. their sum is equal to 1.0.)
To generate a horizontal shift you would convolve these coefficients with the pixels from x - 2 to x + 2, e.g.
const float kCoeffs[5] = { 0.06645f, 0.18965f, 0.27713f, 0.27713f, 0.18965f };
for (y = 0; y < height; ++y) // for each row
for (x = 2; x < width - 2; ++x) // for each col (apart from 2 pixel border)
{
float p = 0.0f; // convolve pixel with Lanczos coeffs
for (dx = -2; dx <= 2; ++dx)
p += in[y][x + dx] * kCoeffs[dx + 2];
out[y][x] = p; // store interpolated pixel
}
Conceptually, the operation is very simple. First you scale up the image (using any method of interpolation, as you like), then you translate the result, and finally you subsample down to the original image size.
The scale factor depends on the precision of sub-pixel translation you want to do. If you want to translate by 0.5 degrees, you need scale up the original image by a factor of 2 then you translate the resulting image by 1 pixel; if you want to translate by 0.25 degrees, you need to scale up by a factor of 4, and so on.
Note that this implementation is not efficient because when you scale up you end up calculating pixel values that you won't actually use because they're just dropped when you subsample back to the original image size. The implementation in Paul's answer is more efficient.

Resources