Convert wavefront obj mesh to custom mesh representation - graphics

Code that I'm trying to understand represents meshes using this structure:
struct vertex
{
float v[3]; // vertex coords
float t[3]; // texture coords
float n[3]; // normals
};
typedef unsigned face[3];
struct mesh
{
vector<vertex> vn;
vector<face> fn;
};
Then, I wrote a quick parser for wavefront obj mesh format. This is the sample mesh that I'm loading. It only has v, vn, vt, f elements. The most important part of this sample mesh is that it has all faces with matching indices for v/vt/vn and I can load it easily into my struct mesh specified above:
f 2/2/2 2/2/2 1/1/1
f 1/1/1 2/2/2 3/3/3
...
Now, I'm trying to figure out how to load an obj mesh that does not have these v/vt/vn matching. This second sample mesh supposedly should represent identical shape to the one specified above. As you can see faces do not always have matching v/vt/vn triplets, like here:
f 3/3/3 1/1/1 2/2/2
f 2/2/2 1/1/1 6/12/6
...
f 3/15/3 10/13/10 9/14/9
...
It would be ok if the triplets were unique (e.g. always 3/15/3), but there are also different triplets for v=3: 3/3/3.
If I ignore for now /vt/vn part I can still load the shape but I loose normals/texture coords and I do not get correct visual representation of the shape (looses textures and light reflection from the shape).
What should I do to load that mesh properly into my internal representation? Should I just create two vertices with identical coords where one vertex would have vn=3,vt=3 and the other one would have vn=15,vt=3?..
(disclaimer: my experience in 3d graphics and meshes is about a day and a half)

Related

Converting a series of depth maps and x, y, z, theta values into a 3D model

I have a quadrotor which flies around and knows its x, y, z positions and angular displacement along the x, y, z axis. It captures a constant stream of images which are converted into depth maps (we can estimate the distance between each pixel and the camera).
How can one program an algorithm which converts this information into a 3D model of the environment? That is, how can we generate a virtual 3D map from this information?
Example: below is a picture that illustrates what the quadrotor captures (top) and what the image is converted into to feed into a 3D mapping algorithm (bottom)
Let's suppose this image was taken from a camera with x, y, z coordinates (10, 5, 1) in some units and angular displacement of 90, 0, 0 degrees about the x, y, z axes. What I want to do is take a bunch of these photo-coordinate tuples and convert them into a single 3D map of the area.
Edit 1 on 7/30: One obvious solution is to use the angle of the quadrotor wrt to x, y, and z axes with the distance map to figure out the Cartesian coordinates of any obstructions with trig. I figure I could probably write an algorithm which uses this approach with a probabilistic method to make a crude 3D map, possibly vectorizing it to make it faster.
However, I would like to know if there is any fundamentally different and hopefully faster approach to solving this?
Simply convert your data to Cartesian and store the result ... As you have known topology (spatial relation between data points) of the input data then this can be done to map directly to mesh/surface instead of to PCL (which would require triangulation or convex hull etc ...).
Your images suggest you have known topology (neighboring pixels are neighboring also in 3D ...) so you can construct mesh 3D surface directly:
align both RGB and Depth 2D maps.
In case this is not already done see:
Align already captured rgb and depth images
convert to Cartesian coordinate system.
First we compute the position of each pixel in camera local space:
so each pixel (x,y) in RGB map we find out the Depth distance to camera focal point and compute the 3D position relative to the camera focal point.For that we can use triangle similarity so:
x = camera_focus.x + (pixel.x-camera_focus.x)*depth(pixel.x,pixel.y)/focal_length
y = camera_focus.y + (pixel.y-camera_focus.y)*depth(pixel.x,pixel.y)/focal_length
z = camera_focus.z + depth(pixel.x,pixel.y)
where pixel is pixel 2D position, depth(x,y) is coresponding depth, and focal_length=znear is the fixed camera parameter (determining FOV). the camera_focus is the camera focal point position. Its usual that camera focal point is in the middle of the camera image and znear distant to the image (projection plane).
As this is taken from moving device you need to convert this into some global coordinate system (using your camera positon and orientation in space). For that are the best:
Understanding 4x4 homogenous transform matrices
construct mesh
as your input data are already spatially sorted we can construct QUAD grid directly. Simply for each pixel take its neighbors and form QUADS. So if 2D position in your data (x,y) is converted into 3D (x,y,z) with approach described in previous bullet we can write iot in form of function that returns 3D position:
(x,y,z) = 3D(x,y)
Then I can form QUADS like this:
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
we can use for loops:
for (x=0;x<xs-1;x++)
for (y=0;y<ys-1;y++)
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
where xs,ys is the resolution of your maps.
In case you do not know camera properties you can set the focal_length to any reasonable constant (resulting in fish eye effects and or scaled output) or infer it from input data like:
Transformation of 3D objects related to vanishing points and horizon line

How to calculate correct plane-frustum intersection?

Question:
I need to calculate intersection shape (purple) of plane defined by Ax + By + Cz + D = 0 and frustum defined by 4 rays emitting from corners of rectangle (red arrows). The result shoud be quadrilateral (4 points) and important requirement is that result shape must be in plane's local space. Plane is created with transformation matrix T (planes' normal is vec3(0, 0, 1) in T's space).
Explanation:
This is perspective form of my rectangle projection to another space (transformation / matrix / node). I am able to calculate intersection shape of any rectangle without perspective rays (all rays are parallel) by plane-line intersection algorithm (pseudocode):
Definitions:
// Plane defined by normal (A, B, C) and D
struct Plane { vec3 n; float d; };
// Line defined by 2 points
struct Line { vec3 a, b; };
Intersection:
vec3 PlaneLineIntersection(Plane plane, Line line) {
vec3 ba = normalize(line.b, line.a);
float dotA = dot(plane.n, l.a);
float dotBA = dot(plane.n, ba);
float t = (plane.d - dotA) / dotBA;
return line.a + ba * t;
}
Perspective form comes with some problems, because some of rays could be parallel with plane (intersection point is in infinite) or final shape is self-intersecting. Its works in some cases, but it's not enough for arbitary transformation. How to get correct intersection part of plane wtih perspective?
Simply, I need to get visible part of arbitary plane by arbitary perspective "camera".
Thank you for suggestions.
Intersection between a plane (one Ax+By+Cx+D equation) and a line (two planes equations) is a matter of solving the 3x3 matrix for x,y,z.
Doing all calculations on T-space (origin is at the top of the pyramid) is easier as some A,B,C are 0.
What I don't know if you are aware of is that perspective is a kind of projection that distorts the z ("depth", far from the origin). So if the plane that contains the rectangle is not perpendicular to the axis of the fustrum (z-axis) then it's not a rectangle when projected into the plane, but a trapezoid.
Anyhow, using the projection perspective matrix you can get projected coordinates for the four rectangle corners.
To tell if a point is in one side of a plane or in the other just put the point coordinates in the plane equation and get the sign, as shown here
Your question seems inherently mathematic so excuse my mathematical solution on StackOverflow. If your four arrows emit from a single point and the formed side planes share a common angle, then you are looking for a solution to the frustum projection problem. Your requirements simplify the problem quite a bit because you define the plane with a normal, not two bounded vectors, thus if you agree to the definitions...
then I can provide you with the mathematical solution here (Internet Explorer .mht file, possibly requiring modern Windows OS). If you are thinking about an actual implementation then I can only direct you to a very similar frustum projection implementation that I have implemented/uploaded here (Lua): https://github.com/quiret/mta_lua_3d_math
The roadmap for the implementation could be as follows: creation of condition container classes for all sub-problems (0 < k1*a1 + k2, etc) plus the and/or chains, writing algorithms for the comparisions across and-chains as well as normal-form creation, optimization of object construction/memory allocation. Since each check for frustum intersection requires just a fixed amount of algebraic objects you can implement an efficient cache.

Understanding What a TextureBlitter is in this Haskell Graphics Program

In a private window manager/compositor Haskell repository I have come across the following datatype which I am trying to understand:
data TextureBlitter = TextureBlitter {
_textureBlitterProgram :: Program, -- OpenGL Type
_textureBlitterVertexCoordEntry :: AttribLocation, -- OpenGL Type
_textureBlitterTextureCoordEntry :: AttribLocation, -- OpenGL Type
_textureBlitterMatrixLocation :: UniformLocation -- OpenGL Type
} deriving Eq
The types Program, AttribLocation, and UniformLocation are from this OpenGL library.
The Problem: I cannot find good information online about what the concept of a "texture blitter" is. So I'm hoping that people with more expertise might immediately have a good guess as to what this type is (probably) used for.
I'm assuming that the field _textureBlitterProgram :: Program is an OpenGL shader program. But what about the other entries? And what is a TextureBlitter as a whole supposed to represent?
EDIT: I have discovered in my repo shaders with the same name:
//textureblitter.vert
#version 300 es
precision highp float;
uniform highp mat4 matrix;
in highp vec3 vertexCoordEntry;
in highp vec2 textureCoordEntry;
out highp vec2 textureCoord;
void main() {
textureCoord = textureCoordEntry;
gl_Position = matrix * vec4(vertexCoordEntry, 1.);
}
and
//textureblitter.frag
#version 300 es
precision highp float;
uniform sampler2D uTexSampler;
in highp vec2 textureCoord;
out highp vec4 fragmentColor;
void main() {
fragmentColor = texture2D(uTexSampler, textureCoord);
}
I don't use haskell nor its OpenGL package. But the names and shaders you expose are pretty descriptive. I'll try to explain what a texture is in OpenGL parlance.
Let's say you have a picture of size width x height. Let's suppose it's saved in a two-dimension, [w,h] sized, matrix.
Instead of accesing a pixel in that matrix by its a,b coordinates let's use normalized coordinates (i.e. in [0-1] range): u= a/w and v= b/h. These formulas need u and v of type float so no rounding to integer is done.
Using u,v coordinates allows us to access any pixel in a "generic" matrix.
Now you want to show that picture on the screen. It's rectangle can be scaled, rotated or even deformed by a perspective projection. Somehow you know the final four coordinates of that rectangle.
If you use also normalized coordinates (again in [0-1] range) then a mapping between picture coordinates and rectangle coordinates makes the picture to adjust to the [likely deformed] rectangle.
This is how OpenGL works. You pass the vertices of the rectangle and compute their normalized final coordinates by the use of some matrix. You also pass the picture matrix (called texture) and map it to those final coordinates.
The programm where all of this computing and mapping is done is a shader, which usually is composed by two sub-shaders: a Vertex Shader that works vertex by vertex (the VS runs exactly once per vertex); and a Fragment Shader that works with fragments (interpolated points between vertices).
TextureBlitter or "an object that blits a picture onto the screen"
You set the program (shader) to use. You can have several shaders
with different effects (e.g. modifying the colors of the picture). Just select one.
Set the vertices. The AttribLocation represents the point of
connection between your vertices and the shader that uses it
(attribute in shaders parlance).
Same for "picture" coordinates.
Set the matrix that transform the vertices. Because it's the same for
all vertices, another type of connection is used: UniformLocation (an
uniform in shaders parlance).
I suppose you can find a good tutorial with examples for how to set and use this "texture blitter".

What does face list represent?

I know in mesh representation it is common to use three lists:
Vertex list, all vertices, this is easy to understand
Normal list, normals for each surface I guess?
And the face list, I have no idea what it does and I don't know how to calculate it.
For example, this is a mesh describing a triangular prism I found online.
double vertices[][] = {{0,1,-1},
{-0.5,0,-1},
{0.5,0,-1},
{0,1,-3},
{-0.5,0,-3},
{0.5,0,-3},
};
int faces[][] = {{0,1,2}, //front
{3,5,4}, //back
{1,4,5,2},//base
{0,3,4,1}, //left side
{0,2,5,3} //right side
};
double normals[][] = { {0,0,1}, //front face
{0,0,-1}, //back face
{0,-1,0}, //base
{-2.0/Math.sqrt(5),1.0/Math.sqrt(5),0}, //left
{2.0/Math.sqrt(5),1.0/Math.sqrt(5),0} //right
};
Why are there 4 elements in the base, left and right faces but only 3 at the front and back? How do I calculate them manually?
Usually, faces stores indices of each triangle in the vertices array. So the first face is a triangle consisting of vertices[0], vertices[1], vertices[2]. The second one consists of vertices[3], vertices[4], vertices[5] and so on.
For triangular meshes, a face is a triangle defined by 3 vertices. Normally, a mesh is composed by a list of n vertices and m faces. For example:
Vertices:
Point_0 = {0,0,0}
Point_1 = {2,0,0}
Point_3 = {0,2,0}
Point_4 = {0,3,0}
...
Point_n = {30,0,0}
Faces:
Face_0 = {Point_1, Point_4, Point_5}
Face_1 = {Point_2, Point_4, Point_7}
...
Face_m = {Point_1, Point_2, Point_n}
For the sake of brevity, you can define Face_0 as a set of indices: {1,4,5}.
In addition, the normal vector is computed as a cross product between the vertices of a face. By convention, the direction of the normal vector is directed outside the mesh. For example:
normal_face_0 = CrossProcuct ( (Point_4 - Point_1) , (Point_5 - Point_4) )
In your case, it is quite weird to see four indices in a face definition. Normally, there should be only 3 items in the array. Are you sure this is not a mistake?

3D Graphics Algorithms (Hardware)

I am trying to design an asic graphics processor. I have done extensive research on the topic but I am still kind of fuzzy on how to translate and rotate points. I am using orthographic projection to rasterize the transformed points.
I have been using the following lecture regarding the matrix multiplication (homogenous coordinates)
http://www.cs.kent.edu/~zhao/gpu/lectures/Transformation.pdf
Could someone please explain this a little more in depth to me. I am still somewhat shakey on the algorithm. I am passing a camera (x,y,z) and a camera vector (x,y,z) representing the camera angle, along with a point (x,y,z). What should go where within the matrices to transform the point to the new appropriate location?
Here's the complete transformation algorithm in pseudocode:
void project(Vec3d objPos, Matrix4d modelViewMatrix,
Matrix4d projMatrix, Rect viewport, Vec3d& winCoords)
{
Vec4d in(objPos.x, objPos.y, objPos.z, 1.0);
in = projMatrix * modelViewMatrix * in;
in /= in.w; // perspective division
// "in" is now in normalized device coordinates, which are in the range [-1, 1].
// Map coordinates to range [0, 1]
in.x = in.x / 2 + 0.5;
in.y = in.y / 2 + 0.5;
in.z = in.z / 2 + 0.5;
// Map to viewport
winCoords.x = in.x * viewport.w + viewport.x;
winCoords.y = in.y * viewport.h + viewport.y;
winCoords.z = in.z;
}
Then rasterize using winCoords.x and winCoords.y.
For an explanation of the stages of this algorithm, see question 9.011 from the OpenGL FAQ.
For the first few years they were for sale, mass-market graphics processors for PC didn't translate or rotate points at all. Are you required to implement this feature? If not, you may wish to let software do it. Depending on your circumstances, software may be the more sensible route.
If you are required to implement the feature, I'll tell you how they did it in the early days.
The hardware has sixteen floating point registers that represent a 4x4 matrix. The application developer loads these registers with the ModelViewProjection matrix just before rendering a mesh of triangles. The ModelViewProjection matrix is:
Model * View * Projection
Where "Model" is a matrix that brings vertices from "model" coordinates into "world" coordinates, "View" is a matrix that brings vertices from "world" coordinates into "camera" coordinates, and "Projection" is a matrix that brings vertices from "camera" coordinates to "screen" coordinates. Together they bring vertices from "model" coordinates - coordinates relative to the 3D model they belong to - into "screen" coordinates, where you intend to rasterize them as triangles.
Those are three different matrices, but they're multiplied together and the 4x4 result is written to hardware registers.
When a buffer of vertices is to be rendered as triangles, the hardware reads in vertices as [x,y,z] vectors from memory, and treats them as if they were [x,y,z,w] where w is always 1. It then multiplies each vector by the 4x4 ModelViewProjection matrix to get [x',y',z',w']. If there is perspective (you said there wasn't) then we divide by w' to get perspective [x'/w',y'/w',z'/w',w'/w'].
Then triangles are rasterized with the newly computed vertices. This enables a model's vertices to be in read-only memory if desired, though the model and camera may be in motion.

Resources