I am working on a project to reconstruct a 3D object from rotation invariant shape descriptors. So far I have worked in the process of converting a 3D object to a harmonic map and calculating the spherical harmonic coefficients and inturn the rotation invariant shape descriptors which is the norm of all spherical harmonic coefficients. I have basically implemented the works of the following research paper. I would like to reconstruct the 3D object back from the shape descriptors. I am unable to find any methods for the reconstruction and would like to know whether there are any methods for the reconstruction from the shape descriptors.
Related
3D Density maps of course can be plotted as heatmap, but when data itself is homogeneous (near 0) except for a small part (2D cross section for example):
This should give a letter 'E' shape as 2D "model". The original data is not saved as point-cloud however.
A naive approach would be to use the pixels that are more than a certain value, and then smooth the border. However this does not take into account of the border pixels being small.
Another would be to use some point-cloud based algorithms that come with modeling softwares, but then the point-cloud's probability function would still be discontinuous on pixel border, and not taking into account that only one side have signal.
Is there any tested solution to this (the example is 2D, the actual case is many 2D slices that compose a low-res 3D density map)? I was thinking of making border pixels have area proportional to signal data, and border should be defined from gradient? Any suggestions?
I was thinking of model visualization results similar to this (seems to be based on established point-cloud algorithm):
I'm trying to develop a fully-convolutional neural net to estimate the 2D locations of keypoints in images that contain renders of known 3D models. I've read plenty of literature on this subject (human pose estimation, model based estimation, graph networks for occluded objects with known structure) but no method I've seen thus far allows for estimating an arbitrary number of keypoints of different classes in an image. Every method I've seen is trained to output k heatmaps for k keypoint classes, with one keypoint per heatmap. In my case, I'd like to regress k heatmaps for k keypoint classes, with an arbitrary number of (non-overlapping) points per heatmap.
In this toy example, the network would output heatmaps around each visible location of an upper vertex for each shape. The cubes have 4 vertices on top, the extruded pentagons have 2, and the pyramids just have 1. Sometimes points are offscreen or occluded, and I don't wish to output heatmaps for occluded points.
The architecture is a 6-6 layer Unet (as in this paper https://arxiv.org/pdf/1804.09534.pdf). The ground truth heatmaps are normal distributions centered around each keypoint. When training the network with a batch size of 5 and l2 loss, the network learns to never make an estimate whatsoever, just outputting blank images. Datatypes are converted properly and normalized from 0 to 1 for input and 0 to 255 for output. I'm not sure how to solve this, are there any red flags with my general approach? I'll post code if there's no clear problem in general...
I'm reading about Interactive Graphics, in particular I started the section about the viewing and I did not understand well this sentence:
Initially, we start with the model-view matrix set to an identity matrix, so the camera frame and the object frame are identical.
I know what is a model view matrix and I know that in this case the camera view is oriented in the z negative axis. But I did not understand exactly what is the difference between the object frame and the camera frame.
you got 2 matrices: View and Model where View represents where from are you looking and in which directions (camera) and Model represents where is and how oriented your object you are currently rendering is.
However To speed-up rendering we are using just one cumulative matrix so:
ModelView = Inverse(View) * Model
so for example when you write something like this in OpenGL:
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
Then both View and Model matrices are identical and equal to unit matrix. After this point you add your incremental rotations and translations either to View (inverse order and direction) or to Model (normal order and direction).
For more info see:
Understanding 4x4 homogenous transform matrices
Especially the last 3 links in there...
I'm working on implementing Akush Gupta's synthetic data generation dataset (http://www.robots.ox.ac.uk/~vgg/data/scenetext/gupta16.pdf). In his work. he used a convolutional neural network to extract a point cloud from a 2-dimensional scenery image, segmented the point clouds to isolate different planes, used RANSAC to fit a 3d plane to the point cloud segments, and then warped the pixels for the segment, given the 3D plane, to a fronto-parallel view.
I'm stuck in this last part- warping my extracted 3D plane to a fronto-parallel view. I have X, Y, and Z vectors as well as a normal vector. I'm thinking what I need to do is perform some type of perspective transform or rotation that would bring all the pixels on the plane to a complete 0 Z-axis while the X and Y would remain the same. I could be wrong about this, it's been a long time since I've had any formal training in geometry or linear algebra.
It looks like skimage's Perspective Transform requires me to know the dimensions of the final segment coordinates in 2d space. It looks like AffineTransform requires me to know the rotation. All I have at this point is my X,Y,Z and normal vector and the suspicion that I may know my destination plane by just setting the Z axis to all zeros. I'm not sure if my assumption is correct but I need to be able to warp all the pixels in the segment of interest to fronto-parallel, fit a bounding box, place text inside of it, then warp the final segment back to the original perspective in 3d space.
Any help with how to think about this or implement it would be massively useful.
I understand that to create a shape (let's say a 3D sphere for an example) that I have to first find the vertex locations of the shape and second, use the parametric equation in order to create the x, y, z points of the triangle meshes. I am currently looking at a sample code to create shapes and it appears that after using the parametric equation in order to find the vectors of the triangle meshes, unit normals to the sphere at the vertices are found.
I understand why regular vectors in the first step are used to create the 3D shape and that a normal vector is perpendicular to the shape object, but I don't understand why the unit normal vectors at the vertices are used to create the shapes? What's the purpose of finding the normal of the vectors?
I am not sure I totally understand your question, but one very important use for normals in computer graphics is calculating reflections. For instance, if you're writing a simple raytracer, Lambertian reflectance is quite easy to compute if you know the normal vector where your camera ray intersects a surface. Normals are similarly required for (off the top of my head) the majority of calculations involved in more complex rendering techniques.