Im trying to get good hand tracking without any sdk, only with math and openGL.
First with kinect v1 I get the 3d depth points and hence the convex hull (CH) of this set of points. I found that the CH its geometrically different at every frame, then i tried the point set registration technique for a point matching and preserve the form of the CH but im still get undesired results (i.e vibrations of about +- 3,4 degrees) Can you recommend a computer vision algorithm or approach for this problem?.
Im trying to get good hand tracking without any sdk, only with math and openGL.
First with kinect v1 I get the 3d depth points and hence the convex hull (CH) of this set of points. I found that the CH its geometrically different at every frame, then i tried the point set registration technique for a point matching and preserve the for of the CH but im still get the undesired results (i.e vibrations of about +- 3,4 degrees) Can you recommend a computer vision algorithm or approach for this problem.
You can use KinFu library from PCL package, it is implementing "kinect fusion" algorithm
Related
Is there a library in python or c++ that is capable of estimating normals of point clouds in a consistent way?
In a consistent way I mean that the orientation of the normals is globally preserved over the surface.
For example, when I use python open3d package:
downpcd.estimate_normals(search_param=o3d.geometry.KDTreeSearchParamHybrid(
radius=4, max_nn=300))
I get an inconsistent results, where some of the normals point inside while the rest point outside.
many thanks
UPDATE: GOOD NEWS!
The tangent plane algorithm is now implemented in Open3D!
The source code and the documentation.
You can just call pcd.orient_normals_consistent_tangent_plane(k=15).
And k is the knn graph parameter.
Original answer:
Like Mark said, if your point cloud comes from multiple depth images, then you can call open3d.geometry.orient_normals_towards_camera_location(pcd, camera_loc) before concatenating them together (assuming you're using python version of Open3D).
However, if you don't have that information, you can use the tangent plane algorithm:
Build knn-graph for your point cloud.
The graph nodes are the points. Two points are connected if one is the other's k-nearest-neighbor.
Assign weights to the edges in the graph.
The weight associated with edge (i, j) is computed as 1 - |ni ⋅ nj|
Generate the minimal spanning tree of the resulting graph.
Rooting the tree at an initial node,
traverse the tree in depth-first order, assigning each node an
orientation that is consistent with that of its parent.
Actually the above algorithm comes from Section 3.3 of Hoppe's 1992
SIGGRAPH paper Surface Reconstruction from Unorganized Points. The algorithm is also open sourced.
AFAIK the algorithm does not guarantee a perfect orientation, but it should be good enough.
If you know the viewpoint from where each point was captured, it can be used to orient the normals.
I assume that this not the case - so given your situation, which seems rather watertight and uniformly sampled, mesh reconstruction is promising.
PCL library offers many alternatives in the surface module. For the sake of normal estimation, I would start with either:
ConcaveHull
Greedy projection triangulation
Although simple, they should be enough to produce a single coherent mesh.
Once you have a mesh, each triangle defines a normal (the cross product). It is important to note that a mesh isn't just a collection of independent faces. The faces are connected and this connectivity enforces a coherent orientation across the mesh.
pcl::PolygonMesh is an "half edge data structure". This means that every triangle face is defined by an ordered set of vertices, which defines the orientation:
order of vertices => order of cross product => well defined unambiguous normals
You can either use the normals from the mesh (nearest neighbor), or calculate a low resolution mesh and just use it to orient the cloud.
I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!
I am an undergraduate student working with detecting defects on a surface of an object, in a given digital image using image processing technique. I am planning on using OpenCV library to get image processing functions. Currently I am trying to decide on which defect detection algorithm to use, in order to detect defects. This is one of my very first projects related to this field, so it will be appreciated if I can get some help related to this issue. The reference image with a defect (missing teeth in the gear), which I am currently working with is uploaded as a link below ("defective gear image").
defective gear image
Get the convex hull of a gear (which is a polygon) and shrink is slightly so that it crosses the teeth. Make sure that the centroid of the gear is the fixed point.
Then sample the pixels along the hull, preferably using equidistant points (divide the perimeter by a multiple of the number of teeth). The unwrapped profile will be a dashed line, with missing dashes corresponding to missing teeth, and the problem is reduced to 1D.
You can also try a polar unwarping, making the outline straight, but you will need an accurate location of the center.
I am currently trying to construct the area covered by a device over an operating period.
The first step in this process appears to be constructing a polygon of the covered area.
Since the pattern is not a standard shape, convex hulls overstate the covered area by jumping to the largest coverage area possible.
I have found a paper that appears to cover the concept of non-convex hull generation, but no discussions on how to implement this within a high level language.
http://www.geosensor.net/papers/duckham08.PR.pdf
Has anyone seen a straight forward algorithm for constructing a non-convex hull or concave hull or perhaps any python code to achieve the same result?
I have tried convex hulls mainly qhull, with a limited edge size with limited success.
Also I have noticed some licensed libraries that will not be able to be distributed, so unfortunately thats off the table.
Any better ideas or cookbooks?
You might try looking into Alpha Shapes. The CGAL library can compute them.
Edit: I see that the paper you linked references alpha shapes, and also has an algorithm listing. Is that not high level enough for you? Since you listed python as a tag, I'm sure there are Delaunay triangulation libraries in Python, which I think is the hardest part of implementing the algorithm; you just need to make sure you can modify the resulting triangulation output. The boundary query functions can probably be implemented with associative arrays.
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.