I am trying to understand what it is the up vector in the ray tracing, but I'm not sure if I get it correctly. Is the up vector used to get the image plane? I will get the "right vector" making the cross product between the up vector and the forward vector (= target - origin). Does it have any other motivations? A good example could help me to understand it.
Let's revise how you construct these vectors:
First of all your forward-vector is very straightforward: vec3 forward = vec3(camPos - target) i.e. the opposite direction your camera is facing, where the target is a point in 3d space and camPos the current position of your camera.
Now you can't define a cartesian coordinate system with only one vector so we need two more to describe a coordinate system for the camera / the view of the ray-tracer. So let's find a vector that is perpendicular to the forward vector. Such vector can be found with: vec3 v = cross(anyVec, forward). In fact, you could use a random vector (except forward- and null-vector) to get the desired second direction but this is not convenient. We want that when looking along the z-axis (0, 0, 1) "right" is described as (1, 0, 0). This is true for vec3 right = cross(yAxis, forward) and vec3 yAxis = (0, 1, 0). If you would change your forward-vector so you aren't looking along the z-axis your right vector would change too, similar to how your "right" changes when you are changing your orientation.
So now only one vector is left to describe the orientation of our camera. This vector has to be perpendicular to the right- and forward-vector. Such vector can be found with: vec3 up = cross(forward, right). This vector describes what "up" is for the camera.
Please note that the forward-, right- and up-vectors need to be normalized.
If you would make a handstand your up-vector would be (0, -1, 0) and therefore would describe that you are seeing everything upsidedown while the other two vectors would be completely the same. If you look at the floor in a 90° angle your up-vector would be (0, 0, 1), your forward-vector (0, 1, 0) and your right-vector would stay at (1, 0, 0). So the function or "motivation" for the up-vector is that we need it to describe the orientation of your camera. You need it to "nod" the camera, i.e. adjust its pitch.
Is the up vector used to get the image plane?
The up-vector is used to describe the orientation of your camera or view into your scene. I assume you are using a view-matrix produced with the "look at method". When applying the view-matrix you are not rotating your camera. You are rotating all the objects/vertices. The camera is always facing the same direction (normally along the negative z-axis). For more information you can visit: https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-generating-camera-rays/generating-camera-rays
The matrix that transforms world space coordinates to view space is called the view matrix. A look at matrix is a view matrix that is constructed from an eye point, a look at direction or target point and an up vector.
More details here
Related
Does the technique that vulkan uses (and I assume other graphics libraries too) to interpolate vertex attributes in a perspective-correct manner require that the vertex shader must normalize the homogenous camera-space vertex position (ie: divide through by the w-coordinate such that the w-coordinate is 1.0) prior to multiplication by a typical projection matrix of the form...
g/s 0 0 0
0 g 0 n
0 0 f/(f-n) -nf/(f-n)
0 0 1 0
...in order for perspective-correctness to work properly?
Or, will perspective-correctness continue to work on any homogeneous vertex position in camera-space (with a w-coordinate other than 1.0)?
(I didn't completely follow the perspective-correctness math, so it is unclear which to me which is the case.)
Update:
In order to clarify terminology:
vec4 modelCoordinates = vec4(x_in, y_in, z_in, 1);
mat4 modelToWorld = ...;
vec4 worldCoordinates = modelToWorld * modelCoordinates;
mat4 worldToCamera = ...;
vec4 cameraCoordinates = worldToCamera * worldCoordinates;
mat4 cameraToProjection = ...;
vec4 clipCoordinates = cameraToProjection * cameraCoordinates;
output(clipCoordinates);
cameraToProjection is a matrix like the one shown in the question
The question is does cameraCoordinates.w have to be 1.0?
And consequently the last row of both the modelToWorld and worldToCamera matricies have to be 0 0 0 1?
You have this exactly backwards. Doing the perspective divide in the shader is what prevents perspective-correct interpolation. The rasterizer needs the perspective information provided by the W component to do its job. With a W of 1, the interpolation is done in window space, without any regard to perspective.
Provide a clip-space coordinate to the output of your vertex processing stage, and let the system do what it exists to do.
the vertex shader must normalize the homogenous camera-space vertex position (ie: divide through by the w-coordinate such that the w-coordinate is 1.0) prior to multiplication by a typical projection matrix of the form...
If your camera-space vertex position does not have a W of 1.0, then one of two things has happened:
You are deliberately operating in a post-projection world space or some similar construct. This is a perfectly valid thing to do, and the math for a camera space can be perfectly reasonable.
Your code is broken somewhere. That is, you intend for your world and camera space to be a normal, Euclidean, non-homogeneous space, but somehow the math didn't work out. Obviously, this is not a perfectly valid thing to do.
In both cases, dividing by W is the wrong thing to do. If your world space that you're placing a camera into is post-projection (such as in this example), dividing by W will break your perspective-correct interpolation, as outlined above. If your code is broken, dividing by W will merely mask the actual problem; better to fix your code than to hide the bug, as it may crop up elsewhere.
To see whether or not the camera coordinates need to be in normal form, let's represent the camera coordinates as multiples of w, so they are (wx,wy,wz,w).
Multiplying through by the given projection matrix, we get the clip coordinates (wxg/s, wyg, fwz/(f-n)-nfw/(f-n)), wz)
Calculating the x-y framebuffer coordinates as per the fixed Vulkan formula we get (P_x * xg/sz +O_x, P_y * Hgy/z + O_y). Notice this does not depend on w, so the position in the framebuffer of a polygons verticies doesn't require the camera coordinates be in normal form.
Likewise calculation of the barycentric coordinates of fragments within a polygon only depends on x,y in framebuffer coordinates, and so is also independant of w.
However perspective-correct perspective interpolation of fragment attributes does depend on W_clip of the verticies as this is used in the formula given in the Vulkan spec. As shown above W_clip is wz which does depend on w and scales with it, so we can conclude that camera coordinates must be in normal form (their w must be 1.0)
I have a quadrotor which flies around and knows its x, y, z positions and angular displacement along the x, y, z axis. It captures a constant stream of images which are converted into depth maps (we can estimate the distance between each pixel and the camera).
How can one program an algorithm which converts this information into a 3D model of the environment? That is, how can we generate a virtual 3D map from this information?
Example: below is a picture that illustrates what the quadrotor captures (top) and what the image is converted into to feed into a 3D mapping algorithm (bottom)
Let's suppose this image was taken from a camera with x, y, z coordinates (10, 5, 1) in some units and angular displacement of 90, 0, 0 degrees about the x, y, z axes. What I want to do is take a bunch of these photo-coordinate tuples and convert them into a single 3D map of the area.
Edit 1 on 7/30: One obvious solution is to use the angle of the quadrotor wrt to x, y, and z axes with the distance map to figure out the Cartesian coordinates of any obstructions with trig. I figure I could probably write an algorithm which uses this approach with a probabilistic method to make a crude 3D map, possibly vectorizing it to make it faster.
However, I would like to know if there is any fundamentally different and hopefully faster approach to solving this?
Simply convert your data to Cartesian and store the result ... As you have known topology (spatial relation between data points) of the input data then this can be done to map directly to mesh/surface instead of to PCL (which would require triangulation or convex hull etc ...).
Your images suggest you have known topology (neighboring pixels are neighboring also in 3D ...) so you can construct mesh 3D surface directly:
align both RGB and Depth 2D maps.
In case this is not already done see:
Align already captured rgb and depth images
convert to Cartesian coordinate system.
First we compute the position of each pixel in camera local space:
so each pixel (x,y) in RGB map we find out the Depth distance to camera focal point and compute the 3D position relative to the camera focal point.For that we can use triangle similarity so:
x = camera_focus.x + (pixel.x-camera_focus.x)*depth(pixel.x,pixel.y)/focal_length
y = camera_focus.y + (pixel.y-camera_focus.y)*depth(pixel.x,pixel.y)/focal_length
z = camera_focus.z + depth(pixel.x,pixel.y)
where pixel is pixel 2D position, depth(x,y) is coresponding depth, and focal_length=znear is the fixed camera parameter (determining FOV). the camera_focus is the camera focal point position. Its usual that camera focal point is in the middle of the camera image and znear distant to the image (projection plane).
As this is taken from moving device you need to convert this into some global coordinate system (using your camera positon and orientation in space). For that are the best:
Understanding 4x4 homogenous transform matrices
construct mesh
as your input data are already spatially sorted we can construct QUAD grid directly. Simply for each pixel take its neighbors and form QUADS. So if 2D position in your data (x,y) is converted into 3D (x,y,z) with approach described in previous bullet we can write iot in form of function that returns 3D position:
(x,y,z) = 3D(x,y)
Then I can form QUADS like this:
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
we can use for loops:
for (x=0;x<xs-1;x++)
for (y=0;y<ys-1;y++)
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
where xs,ys is the resolution of your maps.
In case you do not know camera properties you can set the focal_length to any reasonable constant (resulting in fish eye effects and or scaled output) or infer it from input data like:
Transformation of 3D objects related to vanishing points and horizon line
Problem:
Vulkan right handed coordinate system became left handed coordinate system after applying projection matrix. How can I make it consistent with Vulkan coordinate system?
Details:
I know that Vulkan is right handed coordinate system where
X+ points toward right
Y+ points toward down
Z+ points toward inside the screen
I've this line in the vertex shader: https://github.com/AndreaCatania/HelloVulkan/blob/master/shaders/shader.vert#L23
gl_Position = scene.cameraProjection * scene.cameraView * meshUBO.model * vec4(vertexPosition, 1.0);
At this point: https://github.com/AndreaCatania/HelloVulkan/blob/master/main.cpp#L62-L68 I'm defining the position of camera at center of scene and the position of box at (4, 4, -10) World space
The result is this:
As you can see in the picture above I'm getting Z- that point inside the screen but it should be positive.
Is it expected and I need to add something more or I did something wrong?
Useful part of code:
Projection calculation: https://github.com/AndreaCatania/HelloVulkan/blob/master/VisualServer.cpp#L88-L98
void Camera::reloadProjection(){
projection = glm::perspectiveRH_ZO(FOV, aspect, near, far);
isProjectionDirty = false;
}
Camera UBO fill: https://github.com/AndreaCatania/HelloVulkan/blob/master/VisualServer.cpp#L403-L414
SceneUniformBufferObject sceneUBO = {};
sceneUBO.cameraView = camera.transform;
sceneUBO.cameraProjection = camera.getProjection();
I do not use or know Vulcan but perspective projection matrix (at lest in OpenGL) is looking in the Z- direction which inverts one axis of your coordinate system. That inverts the winding rule of the coordinate system.
If you want to preserve original winding than just invert Z axis vector in the matrix for more info see:
Understanding 4x4 homogenous transform matrices
So just scale the Z axis by -1 either by some analogy to glScale(1.0,1.0,-1.0); or by direct matrix cells access.
All the OpenGL left coordinate system vs Vulkan right coordinate system happens during the fragment shader in NDC space, it means your view matrix doesn't care.
If you are using glm, everything you do in world space or view space is done via a right handed coordinate system.
GLM, a very popular math library that every beginner uses, uses right-handed coordinate system by default.
Your view matrix must be set accordingly, the only way to get a right handed system with x from left to right and y from bottom to top is if to set your z looking direction looking down at the negative values. If you don't provide a right handed system to your glm::lookat call, glm will convert it with one of your axis getting flipped via a series of glm::cross see glm source code
the proper way:
glm::vec3 eye = glm::vec3(0, 0, 10);
glm::vec3 up = glm::vec3(0, 1, 0);
glm::vec3 center = glm::vec3(0, 0, 0);
// looking in the negative z direction
glm::mat4 viewMat = glm::lookAt(eye, up, center);
Personnaly I store all information for coordinate system conversion in the projection matrix because by default glm doest it for you for the z coordinate
from songho: http://www.songho.ca/opengl/gl_projectionmatrix.html
Note that the eye coordinates are defined in the right-handed coordinate system, but NDC uses the left-handed coordinate system. That is, the camera at the origin is looking along -Z axis in eye space, but it is looking along +Z axis in NDC. Since glFrustum() accepts only positive values of near and far distances, we need to NEGATE them during the construction of GL_PROJECTION matrix.
Because we are looking at the negative z direction glm by default negate the sign.
It turns out that the y coordinate is flipped between vulkan and openGL so everything will get turned upside down. One way to resolve the problem is to negate the y values aswell:
glm::mat4 projection = glm::perspective(glm::radians(verticalFov), screenDimension.x / screenDimension.y, near, far);
// Vulkan NDC space points downward by default everything will get flipped
projection[1][1] \*= -1.0f;
If you follow the above step you must end up with something very similar to old openGL applications and with the up vector of your camera with the same sign than most 3D models.
Question:
I need to calculate intersection shape (purple) of plane defined by Ax + By + Cz + D = 0 and frustum defined by 4 rays emitting from corners of rectangle (red arrows). The result shoud be quadrilateral (4 points) and important requirement is that result shape must be in plane's local space. Plane is created with transformation matrix T (planes' normal is vec3(0, 0, 1) in T's space).
Explanation:
This is perspective form of my rectangle projection to another space (transformation / matrix / node). I am able to calculate intersection shape of any rectangle without perspective rays (all rays are parallel) by plane-line intersection algorithm (pseudocode):
Definitions:
// Plane defined by normal (A, B, C) and D
struct Plane { vec3 n; float d; };
// Line defined by 2 points
struct Line { vec3 a, b; };
Intersection:
vec3 PlaneLineIntersection(Plane plane, Line line) {
vec3 ba = normalize(line.b, line.a);
float dotA = dot(plane.n, l.a);
float dotBA = dot(plane.n, ba);
float t = (plane.d - dotA) / dotBA;
return line.a + ba * t;
}
Perspective form comes with some problems, because some of rays could be parallel with plane (intersection point is in infinite) or final shape is self-intersecting. Its works in some cases, but it's not enough for arbitary transformation. How to get correct intersection part of plane wtih perspective?
Simply, I need to get visible part of arbitary plane by arbitary perspective "camera".
Thank you for suggestions.
Intersection between a plane (one Ax+By+Cx+D equation) and a line (two planes equations) is a matter of solving the 3x3 matrix for x,y,z.
Doing all calculations on T-space (origin is at the top of the pyramid) is easier as some A,B,C are 0.
What I don't know if you are aware of is that perspective is a kind of projection that distorts the z ("depth", far from the origin). So if the plane that contains the rectangle is not perpendicular to the axis of the fustrum (z-axis) then it's not a rectangle when projected into the plane, but a trapezoid.
Anyhow, using the projection perspective matrix you can get projected coordinates for the four rectangle corners.
To tell if a point is in one side of a plane or in the other just put the point coordinates in the plane equation and get the sign, as shown here
Your question seems inherently mathematic so excuse my mathematical solution on StackOverflow. If your four arrows emit from a single point and the formed side planes share a common angle, then you are looking for a solution to the frustum projection problem. Your requirements simplify the problem quite a bit because you define the plane with a normal, not two bounded vectors, thus if you agree to the definitions...
then I can provide you with the mathematical solution here (Internet Explorer .mht file, possibly requiring modern Windows OS). If you are thinking about an actual implementation then I can only direct you to a very similar frustum projection implementation that I have implemented/uploaded here (Lua): https://github.com/quiret/mta_lua_3d_math
The roadmap for the implementation could be as follows: creation of condition container classes for all sub-problems (0 < k1*a1 + k2, etc) plus the and/or chains, writing algorithms for the comparisions across and-chains as well as normal-form creation, optimization of object construction/memory allocation. Since each check for frustum intersection requires just a fixed amount of algebraic objects you can implement an efficient cache.
Say we are rendering an image for an IKEA catalog that includes a mug of a smooth, mirror-like surface.
The mug will be illuminated by an environment map of a room interior with a window, a
directional light, and a ambient component.
The environment map is represented in spherical coordinates using φ and θ
(e.g. point (1, 0, 0) is (φ = 90◦, θ = 90◦); point (-1, 0, 0) is (φ = 90◦, θ = −90◦)).
The camera is positioned at (0, 0, 20), viewing in direction (0, 0, -1) with up direction (0, 1, 0). The mug is centered at the coordinates origin, with height 10 and radius 5. The mug’s axis is aligned
with the y axis. And the whole mug can be captured in the image.
For a nice product photo we’d like to see the window reflected in the side of the mug. Where
can the window be placed in the environment map where it will be reflected in the side of
the cylindrical mug? Compute the (φ, θ) coordinates of the corners of the region and of the highest and lowest phi and theta that will be reflected in the mug.
How do I approach this problem? Is there a specific equation I should be utilizing? Thanks in advance.
You can solve that by casting rays from the viewer to the mug and reflect them to the map. Say one ray per corner of the desired reflected quadrilateral on the mug.
Reflection is simply computed by the reflection law: the normal to the surface is the bissectrix of the incident and reflected rays.
First compute the incident ray from the viewer to one of the chosen corners. Then compute the normal at that point (it's perpendicular to the axis of rotation of the mug, in the direction of the radius to the point). From the incident vector and the normal, you will find the direction of the reflected vector.
Turning this vector to spherical coordinates will give you a corner of the quadrilateral in the environment map.