Related
i'm trying to write a python code to calculate the distance between two 3D points. Those points are listed as follows:
Timestamp, X, Y, Z, Distance
2613, 4.35715, 5.302030, -0.447308
2614, 7.88429, -8.401940, -0.484432
2615, 4.08796, 2.213850, -0.515359
2616, 4.35715, 5.302030, -0.447308
2617, 7.88429, -8.401940, -0.484432
i know the formula but I'm not sure how to list the column to run the formula for 3D point distance!
This is essentially the same question as How can the Euclidean distance be calculated with NumPy?
you can use numpy/scipy.linalg.norm
E.g.
scipy.lingalg.norm(2613-2614)
can you try this code and see if you can get some ideas to start:
# distance between 2 points in 3D
from math import pow, sqrt
from functools import reduce
def calculate_dist(point1, point2):
x, y, z = point1
a, b, c = point2
distance = sqrt(pow(a - x, 2) +
pow(b - y, 2) +
pow(c - z, 2)* 1.0)
return distance
point1 = (2, 3, 4) # tuple
point2 = (1, 5, 7)
print(calculate_dist(point1, point2))
# reduce(calcuate_dist(oint1, point2)) # apply to your data
Does Pytorch gridsample work for this particular case.
I have an image of size [B, 100, 200] that I want to map into a smaller [B, 50, 60] space. I have the pixel 1-1 mappings stored in a [B, 100, 200, 2] tensor where most of it i guess is 0?
Is this actually possible?
The answer is yes, it is possible! But not just with the gridsample.
You can check the documentation here:
grid_sample: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
interpolate: https://pytorch.org/docs/stable/nn.functional.html#interpolate
You will need the following:
import torch.nn.F as F
warped_img = F.grid_sample(input, grid)
small_img = F.interplate(warped_img, size=(50, 60))
input that is the image you have [B, 100, 200];
grid it is the 1-1 pixel mapping [B, 100, 200, 2], the values should be normalized in the range [-1, 1] with (-1, -1) being the leftmost upper corner. Notice that, if most of the values are zeros, most pixels will be mapped to the center of the image. That is because grid takes the end_location of each pixel, it does not take the displacement.
warped_img = F.grid_sample(input, grid)
Now, to downsize the image you have to use F.interpolate
small_img = F.interpolate(warped_img, size=(56, 60))
Notice that you can also downsize first (not sure how that will impact the end-result). That is because the grid is normalized!
import torch.nn.F as F
warped_img = F.grid_sample(F.interplate(input, size=(50, 60)),
F.interplate(grid, size=(50, 60)))
Notice that grid is the end location of each pixel. And it is not the displacement. If you have a flow field (just the displacement of each pixel) you can turn that into a grid with the following:
def warp(img, flow, size):
B, C, H, W = img.size()
# mesh grid
grid_x = torch.arange(W, device=img.device)
grid_y = torch.arange(H, device=img.device)
yy, xx = torch.meshgrid(grid_y, grid_x)
grid = torch.cat((xx.unsqueeze(0), yy.unsqueeze(0)), dim=0)
vgrid = grid + flow.clamp(min=-1000., max=1000.)
# scale grid to [-1,1]
vgrid[:, 0, :, :] = 2.0 * vgrid[:, 0, :, :].clone() / max(W - 1, 1) - 1.0
vgrid[:, 1, :, :] = 2.0 * vgrid[:, 1, :, :].clone() / max(H - 1, 1) - 1.0
vgrid = vgrid.permute(0, 2, 3, 1)
warped_img = F.grid_sample(img, vgrid, align_corners=False)
small_img = F.interpolate(warped_img, size=size)
return small_img
Recently I'm working in cloud motion tracking using images, but in many examples when is used in video implementations shows a quiver plot that moves according the object tracked.
Quiver documentations takes four argumets principally ([X, Y], U, V), when X and Y are the starting points and U and V the directions. In the other hand, optical flow based on this example returnsp1 (the displacements) with a shape (m, n, l) of the image with shape of (200,200). My confusion is in how to order the parameters, because also goodFeaturesToTrack return the same as p1
¿How can I join both components to plot a quiver of the cloud motion?
I found a pretty good solution. I explain all my example here using the Hamburg taxi sequence:
Download the taxi sequence.
$ curl -O ftp://ftp.ira.uka.de/pub/vid-text/image_sequences/taxi/taxi.zip
$ unzip -q taxi.zip
Get all images and pick two random frames
from pathlib import Path
import numpy as np
import cv2 as cv
from PIL import Image
import matplotlib.pyplot as plt
taxis_fnames = list(Path('taxi').iterdir())
taxi1 = Image.open(taxis_fnames[rand_idx])
taxi2 = Image.open(taxis_fnames[rand_idx + 4])
Compute the optical flow
flow = cv.calcOpticalFlowFarneback(np.array(taxi1),
np.array(taxi2),
None, 0.5, 3, 15, 3, 5, 1.2, 0)
Plot the quiver
step = 3
plt.quiver(np.arange(0, flow.shape[1], step), np.arange(flow.shape[0], -1, -step),
flow[::step, ::step, 0], flow[::step, ::step, 1])
The step is to downsample the number of optical flow vectors picked. The x positions goes from 0 to image width, while the y positions are inversed (otherwise the optical flow will be up side down) from image height to 0. In some occasions, you will have to change the step so the height and with are divisible by it.
The resulting image:
Here is a general method for plotting a quiver field easily and accurately.
def plot_quiver(ax, flow, spacing, margin=0, **kwargs):
"""Plots less dense quiver field.
Args:
ax: Matplotlib axis
flow: motion vectors
spacing: space (px) between each arrow in grid
margin: width (px) of enclosing region without arrows
kwargs: quiver kwargs (default: angles="xy", scale_units="xy")
"""
h, w, *_ = flow.shape
nx = int((w - 2 * margin) / spacing)
ny = int((h - 2 * margin) / spacing)
x = np.linspace(margin, w - margin - 1, nx, dtype=np.int64)
y = np.linspace(margin, h - margin - 1, ny, dtype=np.int64)
flow = flow[np.ix_(y, x)]
u = flow[:, :, 0]
v = flow[:, :, 1]
kwargs = {**dict(angles="xy", scale_units="xy"), **kwargs}
ax.quiver(x, y, u, v, **kwargs)
ax.set_ylim(sorted(ax.get_ylim(), reverse=True))
ax.set_aspect("equal")
Example usage:
flow = cv2.calcOpticalFlowFarneback(
frame_1, frame_2, None, 0.5, 3, 15, 3, 5, 1.2, 0
)
fig, ax = plt.subplots()
plot_quiver(ax, flow, spacing=10, scale=1, color="#ff44ff")
I would like to get some tips on how to properly visualize/plot two 2-dimensional arrays of the same shape,
say ground_arr and water_arr. ground_arr represents the elevation of some surface, and water_arr represents the height/depth of water on top of that surface. The total elevation is then ofc ground_arr + water_arr.
For now im using plt.imshow(water_arr, cmap=...) to only see the water and plt.imshow(water_arr+ ground_arr) to see the total elevation but i would like to merge both of them in the same plot, to get some map alike plot.
Any tips?
Supposing you have 2D arrays of height values for the terrain and for the water level. And that the water level is set to zero at the places without water.
Just set the water level to Nan where you want the water image to be transparent.
import numpy as np
import matplotlib.pyplot as plt
# Generate test data, terrain is some sine on the distance to the center
terrain_x, terrain_y = np.meshgrid(np.linspace(-15, 15, 1000), np.linspace(-15, 15, 1000))
r = np.sqrt(terrain_x * terrain_x + terrain_y * terrain_y)
terrain_z = 5 + 5 * np.sin(r)
# test data for water has some height where r is between 3 and 4 pi, zero everywhere else
water_z = np.where(3 * np.pi < r, 3 - terrain_z, 0)
water_z = np.where(4 * np.pi > r, water_z, 0)
extent = [-15, 15, -15, 15]
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3)
ax1.imshow(terrain_z, cmap="YlOrBr", extent=extent)
ax1.set_title('Terrain')
ax2.imshow(water_z, cmap="Blues", extent=extent)
ax2.set_title('Water')
ax3.imshow(terrain_z, cmap="YlOrBr", extent=extent)
water_z = np.where(water_z > 0, water_z, np.nan)
ax3.imshow(water_z, cmap="Blues", extent=extent)
ax3.set_title('Combined')
plt.show()
I am working on a pedestrian tracking algorithm using Python3 & OpenCV.
We can use SIFT keypoints as an identifier of a pedestrian silhouette on a frame and then perform brute force matching between two sets of SIFT keypoints (i.e. between one frame and the next one) to find the pedestrian in the next frame.
To visualize this on the sequence of frames, we can draw a bounding rectangle delimiting the pedestrian. This is what it looks like :
The main problem is about characterizing the motion of the pedestrian using the keypoints. The idea here is to find an affine transform (that is translation in x & y, rotation & scaling) using the coordinates of the keypoints on 2 successives frames. Ideally, this affine transform somehow corresponds to the motion of the pedestrian. To track this pedestrian, we would then just have to apply the same affine transform on the bounding rectangle coordinates.
That last part doesn’t work well. The rectangle consistently shrinks over several frames to inevitably disappear or drifts away from the pedestrian, as you see below or on the previous image :
To specify, we characterize the bounding rectangle with 2 extreme points :
There are some built-in cv2 functions that can apply an affine transform to an image, like cv2.warpAffine(), but I want to apply it only to the bounding rectangle coordinates (i.e 2 points or 1 point + width & height).
To find the affine transform between the 2 sets of keypoints, I’ve written my own function (I can post the code if it helps), but I’ve observed similar results when using cv2.getAffineTransform() for instance.
Do you know how to properly apply an affine transform to this bounding rectangle ?
EDIT : here’s some explanation & code for better context :
The pedestrian detection is done with the pre-trained SVM classifier available in openCV : hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) & hog.detectMultiScale()
Once a first pedestrian is detected, the SVM returns the coordinates of the associated bounding rectangle (xA, yA, w, h) (we stop using the SVM after the 1st detection as it is quite slow, and we are focusing on one pedestrian for now)
We select the corresponding region of the current frame, with image[yA: yA+h, xA: xA+w] and search for SURF keypoints within with surf.detectAndCompute()
This returns the keypoints & their associated descriptors (an array of 64 characteristics for each keypoint)
We perform brute force matching, based on the L2-norm between the descriptors and the distance in pixels between the keypoints to construct pairs of keypoints between the current frame & the previous one. The code for this function is pretty long, but should be similar to cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
Once we have the matched pairs of keypoints, we can use them to find the affine transform with this function :
previousKpts = previousKpts[:5] # select 4 best matches
currentKpts = currentKpts[:5]
# build A matrix of shape [2 * Nb of keypoints, 4]
A = np.ndarray(((2 * len(previousKpts), 4)))
for idx, keypoint in enumerate(previousKpts):
# Keypoint.pt = (x-coord, y-coord)
A[2 * idx, :] = [keypoint.pt[0], -keypoint.pt[1], 1, 0]
A[2 * idx + 1, :] = [keypoint.pt[1], keypoint.pt[0], 0, 1]
# build b matrix of shape [2 * Nb of keypoints, 1]
b = np.ndarray((2 * len(previousKpts), 1))
for idx, keypoint in enumerate(currentKpts):
b[2 * idx, :] = keypoint.pt[0]
b[2 * idx + 1, :] = keypoint.pt[1]
# convert the numpy.ndarrays to matrix :
A = np.matrix(A)
b = np.matrix(b)
# solution of the form x = [x1, x2, x3, x4]' = ((A' * A)^-1) * A' * b
x = np.linalg.inv(A.T * A) * A.T * b
theta = math.atan2(x[1, 0], x[0, 0]) # outputs rotation angle in [-pi, pi]
alpha = math.sqrt(x[0, 0] ** 2 + x[1, 0] ** 2) # scaling parameter
bx = x[2, 0] # translation along x-axis
by = x[3, 0] # translation along y-axis
return theta, alpha, bx, by
We then just have to apply the same affine transform to the corner points of the bounding rectangle :
# define the 4 bounding points using xA, yA
xB = xA + w
yB = yA + h
rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
# warp the affine transform into a full perspective transform
affine_warp = np.array([[alpha*np.cos(theta), -alpha*np.sin(theta), tx],
[alpha*np.sin(theta), alpha*np.cos(theta), ty],
[0, 0, 1]], dtype=np.float32)
# apply affine transform
rect_pts = cv2.perspectiveTransform(rect_pts, affine_warp)
xA = rect_pts[0, 0, 0]
yA = rect_pts[0, 0, 1]
xB = rect_pts[3, 0, 0]
yB = rect_pts[3, 0, 1]
return xA, yA, xB, yB
Save the updated rectangle coordinates (xA, yA, xB, yB), all current keypoints & descriptors, and iterate over the next frame : select image[yA: yB, xA: xA] using (xA, yA, xB, yB) we previously saved, get SURF keypoints etc.
As Micka suggested, cv2.perspectiveTransform() is an easy way to accomplish this. You'll just need to turn your affine warp into a full perspective transform (homography) by adding a third row at the bottom with the values [0, 0, 1]. For example, let's put a box with w, h = 100, 200 at the point (10, 20) and then use an affine transformation to shift the points so that the box is moved to (0, 0) (i.e. shift 10 pixels to the left and 20 pixels up):
>>> xA, yA, w, h = (10, 20, 100, 200)
>>> xB, yB = xA + w, yA + h
>>> rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
>>> affine_warp = np.array([[1, 0, -10], [0, 1, -20], [0, 0, 1]], dtype=np.float32)
>>> cv2.perspectiveTransform(rect_pts, affine_warp)
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)
So that works perfectly as expected. You could also just simply transform the points yourself with matrix multiplication:
>>> rect_pts.dot(affine_warp[:, :2]) + affine_warp[:, 2]
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)