I'm trying to use face.evoLVe library, which is high-performance Face Recognition Library in PyTorch. Going through the codes, I encountered a list of coordinates named REFERENCE_FACIAL_POINTS which is :
REFERENCE_FACIAL_POINTS = [ # default reference facial points for crop_size = (112, 112); should adjust REFERENCE_FACIAL_POINTS accordingly for other crop_size
[30.29459953, 51.69630051],
[65.53179932, 51.50139999],
[48.02519989, 71.73660278],
[33.54930115, 92.3655014],
[62.72990036, 92.20410156]
]
Further down the code, these numbers are converted into numpy arrays and used heavily everywhere in align_trans.py
I have some questions:
What exactly are these numbers? Reading the comment, I'm certain they are locations for eyes, lips, etc but what exactly do they represent and how are they calculated?
It seems they are tightly coupled with the size of the input image [used in training (at the very least)]. Knowing this, how can one calculate new reference points for newer image sizes?
Are these points for frontal pose only? or will they work on profiles, etc as well? If they don't, how can we add reference points for profiles or other random facial poses?
Related
I have been asked to create three dimensional vector embeddings for a series of words. Although I understand what an embedding is and that word2vec will be able to create the vector embeddings, I cannot find a resource that shows me how to create a three dimensional vector (all the resources show many more dimensions than this).
The format I have to create the file in is:
house 34444 0.3232 0.123213 1.231231
dog 14444 0.76762 0.76767 1.45454
which is in the format <token>\t<word_count>\t<vector_embedding_separated_by_spaces>
Can anyone point me towards a resource that will show me how to create the desired file format given some training text?
Once you've decided on a programming language, and word2vec library, its documentation will likely highlight a configurable parameter that lets you specify the dimensionality of the vectors it trains. So, you just need to change that parameter from its typical values , like 100 or 300, to 3.
(Note, though, that 3-dimensional word-vectors are unlikely to show the interesting & useful property of higher-dimensional vectors.)
Once you've used such a library to create the vectors-in-memory, writing them out in your specified format becomes just a file-IO problem, unrelated to word2vec itself. In typical languages, you'd open a new file for writing, loop over your data printing each line properly, then close the file.
(To get a more detailed answer from StackOverflow, you'd want to pick a specific language/library, show what you've already tried with actual code, and show how the results/errors achieved fall short of your goal.)
I'm working on a 3D reconstruction system and want to generate a triangular mesh from the registered point cloud data using Python 3. My objects are not convex, so the marching cubes algorithm seems to be the solution.
I prefer to use an existing implementation of such method, so I tried scikit-image and Open3d but both the APIs do not accept raw point clouds as input (note that I'm not expert of those libraries). My attempts to convert my data failed and I'm running out of ideas since the documentation does not clarify the input format of the functions.
These are my desired snippets where pcd_to_volume is what I need.
scikit-image
import numpy as np
from skimage.measure import marching_cubes_lewiner
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
verts, faces, normals, values = marching_cubes_lewiner(volume, 0)
open3d
import numpy as np
import open3d
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
mesh = volume.extract_triangle_mesh()
I'm not able to find a way to properly write the pcd_to_volume function. I do not prefer a library over the other, so both the solutions are fine to me.
Do you have any suggestions to properly convert my data? A point cloud is a Nx3 matrix where dtype=float.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data? I would prefer libraries like scikit and open3d, but I will also take into account github projects.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data?
Hoppe's paper Surface reconstruction from unorganized points might contain the information you needed and it's open sourced.
And latest Open3D seems to be containing surface reconstruction algorithms like alphaShape, ballPivoting and PoissonReconstruction.
From what I know, marching cubes is usually used for extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field (that's what you mean by volume). The algorithm does not work on raw point cloud data.
Hoppe's algorithm works by first generating a signed distance function field (a SDF volume), and then passing it to marching cubes. This can be seen as an implementation to you pcd_to_volume and it's not the only way!
If the raw point cloud is all you have, then the situation is a little bit constrained. As you might see, the Poisson reconstruction and Screened Poisson reconstruction algorithm both implement pcd_to_volume in their own way (they are highly related). However, they needs additional point normal information, and the normals have to be consistently oriented. (For consistent orientation you can read this question).
While some Delaunay based algorithm (they do not use marching cubes) like alphaShape and this may not need point normals as input, for surfaces with complex topology, it's hard to get a satisfactory result due to orientation problem. And the graph cuts method can use visibility information to solve that.
Having said that, if your data comes from depth images, you will usually have visibility information. And you can use TSDF to build a good surface mesh. Open3D have already implemented that.
I am working on a project where I am trying to extract key features of a bicycle from an overall image. I am currently investigating the use of Haar Cascades to train my computer to find certain regions of interest from said bicycles, e.g. the pedal-sprocket, seat, handle-bars. Then I will extract local features from these sub regions accordingly. The purpose is to create an overall descriptor of a particular bicycle so I can try to match it throughout a sample set of images of other bicycles.
My questions are as follows: Can I train a Haar classifier to look for a sub-component of an overall object? For example, say I want to look for the handlebars on a bicycle. How should I design the training? Should I detect the bicycle first, and then detect the handlebars within the overall bicycle region (Similar to detecting the eyes within a face in terms of facial recognition)? Since I know beforehand that all my images will contain a picture of a bicycle, I'm not sure if there is any point in detecting the bicycle to begin with and then looking for sub components.
In terms of training a Haar cascade and creating an XML that I can use (in OpenCV 3.1 and Python 3.6), could I just set up the positive and negative images with pictures of bicycles and no bicycles respectively? With the difference being that I isolate the particular area of interest by cropping the image appropriately each time (e.g. where the handlebars are)?
Also open to any recommendations about how others might solve the general problem of extracting key features for object matching. This is just one approach I am currently investigating. Thanks!
I understand (and please correct me if my understanding is wrong) that the primary purpose of a CNN is to reduce the number of parameters from what you would need if you were to use a fully connected NN. And CNN achieves this by extracting "features" of images.
CNN can do this because in a natural image, there are small features such as lines and elementary curves that may occur in an "invariant" fashion, and constitute the image much like elementary building blocks.
My question is: when we create layers of feature maps, say, 5 of them, and we get these by using the sliding window of a size, say, 5x5 on an image that has pixels of, say, 100x100, Initially, these feature maps are initialized as random number weight matrices, and must progressively adjust the weights with gradient descent right? But then, if we are getting these feature maps by using the exactly same sized windows, sliding in exactly the same ways (sharing the same starting point and the same stride value), on the exactly same image, how can these maps learn different features of the image? Won't they all come out the same, say, a line or a curve?
Is it due to the different initial values of the weight matrices? (I.e. some weight matrices are more receptive to learning a certain particular feature than others?)
Thanks!! I wrote my 4 questions/opinions and indexed them, for the ease of addressing them separately!
I would like to do some odd geometric/odd shape recognition. But I'm not sure how to do it.
Here's what I have so far:
Convert RGB image to Monochrome.
Otsu Threshold
Hough Transform.
I'm not sure what to do next.
For geometric information, you could do a raster to vector conversion to convert your image into coordinated vectors (lines and points) and finite element analysis to look for known shapes. Not easy but libraries should be available for both.
Edit: Note that there are sometimes easier practical solutions, but they depend on the image and types of errors. For example, removing perspective, identifying a 3d object from a 2d image, significance of colour, etc... You often see registration markers added to the real world object to overcome
this and allow much easier identification. Looking up articles on feature extraction techniques might help.