Point cloud object recognition with FPFH + svmlight - svm

I am doing a project for stair recognition which combines point cloud library(PCL) and svmlight.
Now I am able to segment the point cloud, clustering and extract feature using fast point feature histogram(FPFH).
The problem is: How can I transform FPFH results(many FPFHSignature33 which is a feature cloud contents 33 histogram for each point) to a feature vector which can be as the input for svmlight?
I know I need to label "+1" or "-1" for pos or neg sample, but how about the feature vector or value for each data?
I'm totally confused.
Any suggestion or hint is appreciated! Thanks!

You probably want a global feature descriptor like VFH or CVFH. Here are some useful resources:
3D Descriptors for Object and Category Recognition: a Comparative Evaluation.
Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation.

Related

What process does happen in 'Input Transform' in PointNet architecture?

I am reading a papers to understand the method which convert the raw point cloud data into machine learning readable dataset. Here I would like to ask you one question that I have in the research paper PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. I want to understand that in the PointNet architecture (shown in Picture below), in the first step, after taking the raw point cloud data into the algorithm, data goes into 'Input transform' part where some process happens in T-Net (Transformation network) and matrix multiplication. My question is 'What does happen in the 'Input Transform' and 'Feature transform' part? what is the input data and what is the output data? Kindly give an explanation about this as that was my main question.
You can find the research paper by the doi: 10.1109/CVPR.2017.16
I'm trying to work this out as well, consider this an incomplete answer. I think the Input transformer with a 3x3 matrix acts to spacially transform (via some affine transformation) the nx3 inputs (3 dimensional think x,y,z). Intuitively you may think of it this way: say you give it a rotated object (say an upside down chair), it would de-rotate the object to a canonical representation (an upright chair). Its a 3x3 matrix to preserve the dimensionality of the input. That way the input become invarient to changes of pose (perspective). After this the shared mlps (essentially a 1x1 conv) increase the number of features from nx3 to (nx64), the next T-net does the same as in the other example, it moves the higher dimensional feature space into a canonical form. As to exactly how the box works Im reading the code and will let you know.

3D point cloud matching

I have a 3D point cloud and I would like to match different point clouds with each other for recognition purposes. Does OpenCV or Tensorflow do it for me? if yes, how?
Example:
src1 = pointCloud of object 1
src2 = pointCloud of object 2
compare(src1, src2)
Output: Both point clouds are of the same object or different objects.
I want to achieve something like this. Please help with some ideas or resources.
OpenCV Surface Matching can be used to detect and find pose of a given point cloud within another point cloud.
In Open3d there is a 3d reconstruction module, but it is used to register (find poses) of RGBD Images and reconstruct 3d object from them. But there is a sub step in which different point cloud fragments are registered (finding pose of point clouds) to combine them into a single point cloud for reconstruction. But not sure if it is useful for your task.
There are many 3d Point cloud object detection methods which use neural networks, as well, but you have to generate the data needed to train, if your objects are not available in a standard dataset.

DeepSORT's Feature extractor cannot be used for Person ReIdentification

I am using this repo for DeepSORT - https://github.com/nwojke/deep_sort
I am trying to build a Multi Camera Person Tracking system. I want to save and utilize the features extracted by one camera in footage from other cameras.
The Feature extractor which is trained on Mars dataset, doesn't seem to help in differentiating between two different people.
I wrote the below snippet to check the Cosine Distance between images from same person and different person.
extr = Extractor("./deep_sort_pytorch/deep_sort/deep/checkpoint/ckpt.t7")
list = glob.glob("mars/*.jpg")
features_list = []
for i in list:
im = cv2.imread(i)
im_crops = [im]
features = extr(im_crops)
features_list.append(features)
for f in features_list:
print(_cosine_distance(f, features_list[0]),"<<")
def _cosine_distance(a, b, data_is_normalized=False):
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
return (1. - cos(torch.from_numpy(a), torch.from_numpy(b)))
As expected, the cosine distance between images of same person is very low.
But unexpectedly the cosine distance between crops of two different people is also similarly low.
I thought the Feature extractor will help me in differentiating.
Shall I increase the latent space dimensions from 512 to a bigger size?
Or maybe I am mistaking the role of Feature extractor.
A slightly larger feature space may help. But your main issue is the architecture of the feature extractor. In order to match people and distinguish them from impostors, features corresponding small local regions (e.g. shoes, glasses) and global whole body regions are equally important. This is not captured by the simple feature extractor provided by https://github.com/nwojke/deep_sort. For more information on this check: https://arxiv.org/pdf/1905.00953.pdf. I recommend you to try any of the OSNet models provided here: https://kaiyangzhou.github.io/deep-person-reid/MODEL_ZOO
I can also recommend you to check out my repository: https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch. It seems to provide all you need: multi-camera multi-object tracking and OSNet models

Point Cloud triangulation using marching-cubes in Python 3

I'm working on a 3D reconstruction system and want to generate a triangular mesh from the registered point cloud data using Python 3. My objects are not convex, so the marching cubes algorithm seems to be the solution.
I prefer to use an existing implementation of such method, so I tried scikit-image and Open3d but both the APIs do not accept raw point clouds as input (note that I'm not expert of those libraries). My attempts to convert my data failed and I'm running out of ideas since the documentation does not clarify the input format of the functions.
These are my desired snippets where pcd_to_volume is what I need.
scikit-image
import numpy as np
from skimage.measure import marching_cubes_lewiner
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
verts, faces, normals, values = marching_cubes_lewiner(volume, 0)
open3d
import numpy as np
import open3d
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
mesh = volume.extract_triangle_mesh()
I'm not able to find a way to properly write the pcd_to_volume function. I do not prefer a library over the other, so both the solutions are fine to me.
Do you have any suggestions to properly convert my data? A point cloud is a Nx3 matrix where dtype=float.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data? I would prefer libraries like scikit and open3d, but I will also take into account github projects.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data?
Hoppe's paper Surface reconstruction from unorganized points might contain the information you needed and it's open sourced.
And latest Open3D seems to be containing surface reconstruction algorithms like alphaShape, ballPivoting and PoissonReconstruction.
From what I know, marching cubes is usually used for extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field (that's what you mean by volume). The algorithm does not work on raw point cloud data.
Hoppe's algorithm works by first generating a signed distance function field (a SDF volume), and then passing it to marching cubes. This can be seen as an implementation to you pcd_to_volume and it's not the only way!
If the raw point cloud is all you have, then the situation is a little bit constrained. As you might see, the Poisson reconstruction and Screened Poisson reconstruction algorithm both implement pcd_to_volume in their own way (they are highly related). However, they needs additional point normal information, and the normals have to be consistently oriented. (For consistent orientation you can read this question).
While some Delaunay based algorithm (they do not use marching cubes) like alphaShape and this may not need point normals as input, for surfaces with complex topology, it's hard to get a satisfactory result due to orientation problem. And the graph cuts method can use visibility information to solve that.
Having said that, if your data comes from depth images, you will usually have visibility information. And you can use TSDF to build a good surface mesh. Open3D have already implemented that.

Canny algorithm is enough for creating a feature descriptor image and giving for SVM?

i retrieve contours from images by using canny algorithm. it's enough to have a descriptor image and put in SVM and find similarities? Or i need necessarily other features like elongation, perimeter, area ?
I talk about this, because inspired by this example: http://scikit-learn.org/dev/auto_examples/plot_digits_classification.html i give my image in greyscale first, in canny algorithm style second and in both cases my confusion matrix was plenty of 0 like precision, recall, f1-score, support measure
My advice is:
unless you have a low number of images in your database and/or the recognition is going to be really specific (not a random thing for example) I would highly recommend you to apply one or more features extractors such SIFT, Fourier Descriptors, Haralick's Features, Hough Transform to extract more details which could be summarised in a short vector.
Then you could apply SVM after all this in order to get more accuracy.

Resources