I'm currently attempting to create a first-person space flight camera.
First, allow me to define what I mean by that.
Notice that I am currently using Row-Major matrices in my math library (meaning, the basis vectors in my 4x4 matrices are laid out in rows, and the affine translation part is in the fourth row). Hopefully this helps clarify the order in which I multiply my matrices.
What I have so Far
So far, I have successfully implemented a simple first-person camera view. The code for this is as follows:
fn fps_camera(&mut self) -> beagle_math::Mat4 {
let pitch_matrix = beagle_math::Mat4::rotate_x(self.pitch_in_radians);
let yaw_matrix = beagle_math::Mat4::rotate_y(self.yaw_in_radians);
let view_matrix = yaw_matrix.get_transposed().mul(&pitch_matrix.get_transposed());
let translate_matrix = beagle_math::Mat4::translate(&self.position.mul(-1.0));
translate_matrix.mul(&view_matrix)
}
This works as expected. I am able to walk around and look around with the mouse.
What I am Attempting to do
However, an obvious limitation of this implementation is that since pitch and yaw is always defined relative to a global "up" direction, the moment I pitch more than 90 degrees, getting the world to essentially being upside-down, my yaw movement is inverted.
What I would like to attempt to implement is what could be seen more as a first-person "space flight" camera. That is, no matter what your current orientation is, pitching up and down with the mouse will always translate into up and down in the game, relative to your current orientation. And yawing left and right with your mouse will always translate into a left and right direction, relative to your current orientation.
Unfortunately, this problem has got me stuck for days now. Bear with me, as I am new to the field of linear algebra and matrix transformations. So I must be misunderstanding or overlooking something fundamental. What I've implemented so far might thus look... stupid and naive :) Probably because it is.
What I've Tried so far
The way that I always end up coming back to thinking about this problem is to basically redefine the world's orientation every frame. That is, in a frame, you translate, pitch, and yaw the world coordinate space using your view matrix. You then somehow redefine this orientation as being the new default or zero-rotation. By doing this, you can then, in your next frame apply new pitch and yaw rotations based on this new default orientation, which (by my thinking, anyways), would mean that mouse movement will always translate directly to up, down, left, and right, no matter how you are oriented, because you are basically always redefining the world coordinate space in terms relative to what your previous orientation was, as opposed to the simple first-person camera, which always starts from the same initial coordinate space.
The latest code I have which attempts to implement my idea is as follows:
fn space_camera(&mut self) -> beagle_math::Mat4 {
let previous_pitch_matrix = beagle_math::Mat4::rotate_x(self.previous_pitch);
let previous_yaw_matrix = beagle_math::Mat4::rotate_y(self.previous_yaw);
let previous_view_matrix = previous_yaw_matrix.get_transposed().mul(&previous_pitch_matrix.get_transposed());
let pitch_matrix = beagle_math::Mat4::rotate_x(self.pitch_in_radians);
let yaw_matrix = beagle_math::Mat4::rotate_y(self.yaw_in_radians);
let view_matrix = yaw_matrix.get_transposed().mul(&pitch_matrix.get_transposed());
let translate_matrix = beagle_math::Mat4::translate(&self.position.mul(-1.0));
// SAVES
self.previous_pitch += self.pitch_in_radians;
self.previous_yaw += self.yaw_in_radians;
// RESETS
self.pitch_in_radians = 0.0;
self.yaw_in_radians = 0.0;
translate_matrix.mul(&(previous_view_matrix.mul(&view_matrix)))
}
This, however, does nothing to solve the issue. It actually gives the exact same result and problem as the fps camera.
My thinking behind this code is basically: Always keep track of an accumulated pitch and yaw (in the code that is the previous_pitch and previous_yaw) based on deltas each frame. The deltas are pitch_in_radians and pitch_in_yaw, as they are always reset each frame.
I then start off by constructing a view matrix that would represent how the world was orientated previously, that is the previous_view_matrix. I then construct a new view matrix based on the deltas of this frame, that is the view_matrix.
I then attempt to do a view matrix that does this:
Translate the world in the opposite direction of what represents the camera's current position. Nothing is different here from the FPS camera.
Orient that world according to what my orientation has been so far (using the previous_view_matrix. What I would want this to represent is the default starting point for the deltas of my current frame's movement.
Apply the deltas of the current frame using the current view matrix, represented by view_matrix
My hope was that in step 3, the previous orientation would be seen as a starting point for a new rotation. That if the world was upside-down in the previous orientation, the view_matrix would apply a yaw in terms of the camera's "up", which would then avoid the problem of inverted controls.
I must surely be either attacking the problem from the wrong angle, or misunderstanding essential parts of matrix multiplication with rotations.
Can anyone help pin-point where I'm going wrong?
[EDIT] - Rolling even when you only pitch and yaw the camera
For anyone just stumbling upon this, I fixed it by a combination of the marked answer and Locke's answer (ultimately, in the example given in my question, I also messed up the matrix multiplication order).
Additionally, when you get your camera right, you may stumble upon the odd side-effect that holding the camera stationary, and just pitching and yawing it about (such as moving your mouse around in a circle), will result in your world slowly rolling as well.
This is not a mistake, this is how rotations work in 3D. Kevin added a comment in his answer that explains it, and additionally, I also found this GameDev Stack Exchange answer explaining it in further detail.
The problem is that two numbers, pitch and yaw, provide insufficient degrees of freedom to represent consistent free rotation behavior in space without any “horizon”. Two numbers can represent a look-direction vector but they cannot represent the third component of camera orientation, called roll (rotation about the “depth” axis of the screen). As a consequence, no matter how you implement the controls, you will find that in some orientations the camera rolls strangely, because the effect of trying to do the math with this information is that every frame the roll is picked/reconstructed based on the pitch and yaw.
The minimal solution to this is to add a roll component to your camera state. However, this approach (“Euler angles”) is both tricky to compute with and has numerical stability issues (“gimbal lock”).
Instead, you should represent your camera/player orientation as a quaternion, a mathematical structure that is good for representing arbitrary rotations. Quaternions are used somewhat like rotation matrices, but have fewer components; you'll multiply quaternions by quaternions to apply player input, and convert quaternions to matrices to render with.
It is very common for general purpose game engines to use quaternions for describing objects' rotations. I haven't personally written quaternion camera code (yet!) but I'm sure the internet contains many examples and longer explanations you can work from.
It looks like a lot of the difficulty you are having is due to trying to normalize the transformation to apply the new translation. It seems like this is probably a large part of what is tripping you up. I would suggest changing how you store your position and rotation. Instead, try letting your view matrix define your position.
/// Apply rotation based on the change in mouse position
pub fn on_mouse_move(&mut self, dx: f32, dy: f32) {
// I think this is correct, but it might need tweaking
let rotation_matrix = Mat4::rotate_xy(-y, x);
self.apply_movement(&rotation_matrix, &Vec3::zero())
}
/// Append axis-aligned movement relative to the camera and rotation
pub fn apply_movement(&mut self, rotation: &Mat4<f32>, translation: &Vec3<f32>) {
// Create transformation matrix for translation
let translation = Mat4::translate(translation);
// Append translation and rotation to existing view matrix
self.view_matrix = self.view_matrix * translation * rotation;
}
/// You can get the position from the last column [x, y, z, w] of your view matrix.
pub fn translation(&self) -> Vec3<f32> {
self.view_matrix.column(3).into()
}
I made a couple assumptions about the library:
Mat4 implements Mul<Self> so you do not need to call x.mul(y) explicitly and can instead use x * y. Same goes for Sub.
There exists a Mat4::rotate_xy function. If there isn't one, it would be equivalent to Mat4::rotate_xyz(delta_pitch, delta_yaw, 0.0) or Mat4::rotate_x(delta_pitch) * Mat4::rotate_y(delta_yaw).
I'm somewhat eyeballing the equations so hopefully this is correct. The main idea is to take the delta from the previous inputs and create matrices from that which can then be added on top of the previous view_matrix. If you attempt to take the difference after creating transformation matrices it will only be more work for you (and your processor).
As a side note I see you are using self.position.mul(-1.0). This tells me that your projection matrix is probably backwards. You likely want to adjust your projection matrix by scaling it by a factor of -1 in the z axis.
Related
I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.
This is an excerpt from Fundamentals of Computer Graphics by Peter Shirley. On Page 114 (in the 3rd edition it reads:
We'd like to be able to change the viewpoint in 3D and look in any
direction. There are a multitude of conventions for specifying viewer
position and orientation. We will use the following one:
the eye position e
the gaze direction g
the view-up vector t
The eye position is a location that the eye "sees from". If you think
of graphics as a photographic process, it is the center of the lens.
The gaze direction is any vector in the direction that the viewer is
looking. The view-up vector is any vector in the plane that both
bisects the viewer's head into right and left halves and points "to
the sky" for a person standing on the ground. These vectors provide us
with enough information to set up a coordinate system with origin e
and uvw basis.....
The bold sentence is the one confusing me the most. Unfortunately the book provides only very basic and crude diagrams and doesn't provide any examples.
Does this sentence mean that all view-up vectors are simply (0, 1, 0)?
I tried it on some examples but it didn't quite match up with the given solutions (though it came close sometimes).
Short answer: the view-up vector is not derived from other components: instead, it is a user input, chosen so as to ensure the camera is right-side up. Or, to put it another way, the view-up vector is how you tell your camera system what direction "up" is, for purposes of orienting the camera.
The reason you need a view-up vector is that the position and gaze direction of the camera is not enough to completely determine its pose: you can still spin the camera around the position/gaze axis. The view-up vector is needed to finish locking down the camera; and because people usually prefer to look at things right-side up, the view-up vector is conventionally a fixed direction, determined by how the scene is oriented in your coordinate space.
In theory, the view-up vector could be in any direction, but in practice "up" is usually a coordinate direction. Which coordinate is "up" is a matter of convention: in your case, it appears the Y-axis is "up", but some systems prefer the Z-axis.
That said, I will reiterate: you can choose pretty much any direction you want. If you want your first-person POV to "lean" (e.g., to look around a corner, or to indicate intoxication), you can tweak your view-up vector to accomplish this. Also, consider camera control in a game like Super Mario Galaxy...
I took a graphics class last year, and I'm pretty rusty. I referenced some old notes, so here goes.
I think that bolded line is just trying to explain what the view-up vector (VUP) would mean in one case for sake of introduction, not what it necessarily is in all cases. The wording is a bit odd; here's a rewording: "Consider a person standing on the ground. VUP in that case would be the vector that bisects the viewer's head and points to the sky."
To determine a standard upward vector, do the following:
Normalize g.
g_norm x (0,1,0) gives you view-right, or the vector to the right of your camera view
view-right x g gives you VUP.
You can then apply a rotation if you wish to do so.
Does this sentence mean that all view-up vectors are simply (0, 1, 0)?
No (0,1,0) is the world up vector. We're looking for the camera's up vector.
Others have written in depth explanations. I will provide the code below, which is largely self documenting. Example is in DirectX C++.
DirectX::XMVECTOR Camera::getDirection() const noexcept
{
// camDirection = camPosition - camTarget
const dx::XMVECTOR forwardVector{0.0f, 0.0f, 1.0f, 0.0f};
const auto lookVector = dx::XMVector3Transform( forwardVector,
dx::XMMatrixRotationRollPitchYaw( m_pitch, m_yaw, 0.0f ) );
const auto camPosition = dx::XMLoadFloat3( &m_position );
const auto camTarget = dx::XMVectorAdd( camPosition,
lookVector );
return dx::XMVector3Normalize( dx::XMVectorSubtract( camPosition,
camTarget ) );
}
DirectX::XMVECTOR Camera::getRight() const noexcept
{
const dx::XMVECTOR upVector{0.0f, 1.0f, 0.0f, 0.0f};
return dx::XMVector3Cross( upVector,
getDirection() );
}
DirectX::XMVECTOR Camera::getUp() const noexcept
{
return dx::XMVector3Cross( getDirection(),
getRight() );
}
Can you change the perspective in POV-Ray, so that convergence between parallel lines does not look so steep?
E.g. change this angle (the convergence of the checkered floor into the distance) here
To an angle like this
I want it to seem like you're looking at something nearby, so with a smaller angle of convergence in parallel lines.
To illustrate it more: instead of a view like this
Use a view like this
Move the camera backwards and zoom in (by making the angle smaller):
camera {
perspective
location <0,0,-15> // move this backwards
sky y
up y
angle 30 // make this smaller
right (image_width/image_height)*x
look_at <0,0,0>
}
You can go to the extreme by using an orthographic "camera":
camera {
orthographic
location <0,0,-15> // Move backwards, no matter how far
sky y
up y * h // where h = hight you want to cover
right x * w // where w = width you want to cover
look_at <0,0,0>
}
The other extreme is the fish-eye lens.
You need to reduce the field of view of your camera's view frustum. The larger the field of view, the more stuff you're trying to squeeze into the output of your camera's render and so they parallel lines will converge faster. So in your first example with a cube, the camera will be more focused on the cube and the areas just immediately around it, than the whole environment.
The other option is to make your far plane much closer to your near plane, so you don't see many things that are far off. So in you first image example, you'll only see the first four or five grids instead.
In my Direct3D application, the camera can be moved using the mouse or arrow keys. But if I hard code (0,1,0) as the up direction vector in LookAtLH, the frame goes blank at some orientations of the camera.
I just learned the hard way that when looking along the Y-axis, (0,1,0) no longer works as the Up direction (seems obvious?). I am thinking of switching my up direction to something else for each of these special cases. Is there a more graceful way to handle this?
Assuming you can calculate a vector pointing forward (what you are looking at - your position) and a vector pointing right (always on the XZ-plane unless you can roll). Normalize both these vectors, then up is forward x right (where x is cross product).
In general, you can plug in your yaw, pitch and roll into a rotation matrix and rotate the axis vectors to get right, up and forward, but I guess that's what you are using LookAtLH to avoid.
See http://en.wikipedia.org/wiki/Rotation_matrix#The_3-dimensional_rotation_matricies
The graceful way to handle this is to use Unit Quaternions. A quaternion is a vector of 4 values that encodes an orientation in 3D space (not a rotation as some articles assert) and a unit quaternion is one where the vector length sqrt(x^2+y^2+z^2+w^2) is 1.0. There are a set of mathematical operations for working with quaternions that are analogous to using matrices to encode rotations, with the added bonus that quaternions can never represent an degenerate orientation. You can freely convert quaternions to a 3x3 or 4x4 matrix when you need to feed the result to a GPU.
Your problem is that, while you are moving your camera, you will introduce a little twist into the camera's up direction. By forcing the camera to re-center itself on the (0,1,0) vector every iteration, you are in effect rotating the camera and then clamping the camera's orientation to remain on the surface of a sphere, but when your camera hits the pole of this sphere there is no good direction to call "up" and your matrix goes singular and gives you zero-sized polygons (hence the black screen). Quaternions have the ability to interpolate through these poles and come out the other side just fine, leaving you with a valid matrix at all times. all you have to do is control the "twist".
To measure this twist you should read Ken Shoemake's article "Fiber Bundle Twist Reduction" in the book Graphics Gems 4. He shows a good way to measure this accumulated twist and how to remove it when it is offensive.