Simulating 3D 'cards' with just orthographic rendering - graphics

I am rendering textured quads from an orthographic perspective and would like to simulate 'depth' by modifying UVs and the vertex positions of the quads four points (top left, top right, bottom left, bottom right).
I've found if I make the top left and bottom right corners y position be the same I don't get a linear 'skew' but rather a warped one where the texture covering the top triangle (which makes up the quad) seems to get squashed while the bottom triangles texture looks normal.
I can change UVs, any of the four points on the quad (but only in 2D space, it's orthographic projection anyway so 3D space won't matter much). So basically I'm trying to simulate perspective on a two dimensional quad in orthographic projection, any ideas? Is it even mathematically possible/feasible?
ideally what I'd like is a situation where I can set an x/y rotation as well as a virtual z 'position' (which simulates z depth) through a function and see it internally calclate the positions/uvs to create the 3D effect. It seems like this should all be mathematical where a set of 2D transforms can be applied to each corner of the quad to simulate depth, I just don't know how to make it happen. I'd guess it requires trigonometry or something, I'm trying to crunch the math but not making much progress.
here's what I mean:
Top left is just the card, center is the card with a y rotation of X degrees and right most is a card with an x and y rotation of different degrees.

To compute the 2D coordinates of the corners, just choose the coordinates in 3D and apply the 3D perspective equations :
Original card corner (x,y,z)
Apply a rotation ( by matrix multiplication ) you get ( x',y',z')
Apply a perspective projection ( choose some camera origin, direction and field of view )
For the most simple case it's :
x'' = x' / z
y'' = y' / z
The bigger problem now is the texturing used to get the texture coordinates from pixel coordinates :
The correct way for you is to use an homographic transformation of the form :
U(x,y) = ( ax + cy + e ) / (gx + hy + 1)
V(x,y) = ( bx + dy + f ) / (gx + hy + 1)
Which is fact is the result of the perpective equations applied to a plane.
a,b,c,d,e,f,g,h are computed so that ( with U,V in [0..1] ) :
U(top'',left'') = (0,0)
U(top'',right'') = (0,1)
U(bottom'',left'') = (1,0)
U(bottom'',right'') = (1,1)
But your 2D rendering framework probably uses instead a bilinear interpolation :
U( x , y ) = a + b * x + c * y + d * ( x * y )
V( x , y ) = e + f * x + g * y + h * ( x * y )
In that case you get a bad looking result.
And it is even worse if the renderer splits the quad in two triangles !
So I see only two options :
use a 3D renderer
compute the texturing yourself if you only need a few images and not a realtime animation.

Related

Tessellating hexagons over a rectangle

I have an infinite grid of hexagons, defined by a cubic (x y z) coordinate system like so:
I also have a viewport -- a rectangular canvas where I will draw the hexagons.
My issue is this. Because the grid of hexagons is infinite in all directions, I can't feasibly draw all of them at once. Therefore, I need to draw all the hexagons that are in the viewport, and ONLY those hexagons.
This image summarizes what I want to do:
In this image, purple-colored hexagons are those I want to render, while white-colored hexagons are those I don't want to render. The black rectangle is hte viewport -- all the hexagons that intersect with it are to be drawn. How would I find which hexagons to render (IE their xyz coordinates)?
Some other info:
I have a function that can recall a hexagon tile and draw it centered at position(x,y) in the viewport, given its cubic xyz coordinates. Therefore, all I should need is the xyz coords of each rectangle to draw, and I can draw them. This might simplify the problem.
I have formulas to convert from cubic hexagon coordinates to x/y coordinates, and back. Given the above diagram, r/g/b being the axes for the cubic coords with the image above, x and y being the cartesian coordinates, and s being the length of a hexagon's edge...
y = 3/2 * s * b
b = 2/3 * y / s
x = sqrt(3) * s * ( b/2 + r)
x = - sqrt(3) * s * ( b/2 + g )
r = (sqrt(3)/3 * x - y/3 ) / s
g = -(sqrt(3)/3 * x + y/3 ) / s
r + b + g = 0
Let's X0, Y0 are coordinates of top left corner, RectWidth is rectangle width, HexWidth = s * Sqrt(3/2) is hexagon width.
Find center of the closest hexagon r0, g0, b0, HX0, HY0. (Rect corner lies in this hexagon, because hexagons are Voronoy diagram cells). Remember horizontal and vertical shift DX = X0 - HX0, DY = Y0 - HY0
Draw horizontal row of Ceil(RectWidth/HexWidth) hexagons, incrementing r coordinate, decrementing f, and keeping b the same, ROWINC=(1,-1,0).
Note that if DY > HexWidth/2, you need extra top row with initial coordinates shifted up (r0, g0-1, b0+1)
Shift starting point by L=(0, 1, -1) if the DX < 0, or by R=(1, 0, -1) otherwise. Draw another horizontal row with the same ROWINC
Shift row starting point by alternative way (L after R, R after L). Draw horizontal rows until bottom edge is reached.
Check whether extra row is needed in the bottom.
You can think of the rectangular box in terms of constraints on an axis.
In the diagram, the horizontal lines correspond to b and your constraints will be of the form somenumber ≤ b and b ≤ somenumber. For example the rectangle might be in the range 3 ≤ b ≤ 7.
The vertical lines are a little trickier, but they are a “diagonal” that corresponds to r-g. Your constraints will be of the form somenumber ≤ r-g and r-g ≤ somenumber. For example it might be the range -4 ≤ r-g ≤ 5.
Now you have two axes with constraints on them, and you can form a loop. The easiest thing will be to have the outer loop use b:
for (b = 3; b ≤ 7; b++) {
…
}
The inner loop is a little trickier, because that's the diagonal constraint. Since we know r+g+b=0, and we know the value of b from the outer loop, we can rewrite the two-variable constraint on r-g. Express r+g+b=0 as g=0-r-b. Now substitute into r-g and get r-(0-r-b). Simplify r-(0-r-b) to 2*r-b. Instead of -4 ≤ r-g we can say -4 ≤ 2*r-b or -4+b ≤ 2*r or (-4+b)/2 ≤ r. Similarly, we can rearrange r-g ≤ 5 to 2*r-b ≤ 5 to r ≤ (5+b)/2. This gives us our inner loop:
for (b = 3; b ≤ 7; b++) {
for (r = (-4+b)/2; r ≤ (5+b)/2; r++) {
g = 0-b-r;
…
}
}
The last bit is to generalize, replacing the constants 3,7,-4,5 with the actual bounds for your rectangle.

Perspective Projection: Proving that 1/z is Linear?

In 3D rendering (or geometry for that matter), in the rasterization algorithm, when you project the vertices of a triangle onto the screen and then find if a pixel overlaps the 2D triangle, you often need to find the depth or the z-coordinate of the triangle that the pixel overlaps. Generally, the method consists of computing the barycentric coordinates of the pixel in the 2D "projected" image of the triangle, and then use these coordinates to interpolate the triangle original vertices z-coordinates (before the vertices got projected).
Now it's written in all text books that you can't interpolate the vertices coordinates of the vertices directly but that you need to do this instead:
(sorry can't get Latex to work?)
1/z = w0 * 1/v0.z + w1 * 1/v1.z + w2 * 1/v2.z
Where w0, w1, and w2 are the barycentric coordinates of the "pixel" on the triangle.
Now, what I am looking after, are two things:
what would be the formal proof to show that interpolating z doesn't work?
what would be the formal proof to show that 1/z does the right thing?
To show this is not home work ;-) and that I have made some work on my own, I have found the following explanation for question 2.
Basically a triangle can be defined by a plane equation. Thus you can write:
Ax + By + Cz = D.
Then you isolate z to get z = (D - Ax - By)/C
Then you divide this formula by z as you would with a perspective divide and if you develop, regroup, etc. you get:
1/z = C/D + A/Dx/z + B/Dy/z.
Then we name C'=C/D B'=B/D and A'=A/D you get:
1/z = A'x/z + B'y/z + C'
It says that x/z and y/z are just the coordinates of the points on the triangles once projected on the screen and that the equation on the right is an "affine" function therefore 1/z is a linear function???
That doesn't seem like a demonstration to me? Or maybe it's the right idea, but can't really say how you can tell by just looking at the equation that this is an affine function. If you multiply all the terms you just get:
A'x + B'y + C'z = 1.
Which is just basically our original equations (just need to replace A' B' and C' with the proper term).
Not sure what you are trying to ask here, but if you look at:
1/z = A'x/z + B'y/z + C'
and rewrite it as:
1/z = A'u + B'v + C'
where (u,v) are screen coordinates of the triangle after perspective projection, you can see that the depth (z) of a point on the triangle is not linearly related to (u,v) but 1/depth is and that is what the textbooks are trying to teach you.

Equation for the ray with parallel projection

What will be the equation for the ray and ray origin when we are using parallel projection and how to derive that?
In traditional raytracing, you use a ray that starts at your eye point. For each pixel you calculate where it is on a virtual screen in front of the camera and shoot a ray through that pixel.
Let pO be the eye point, d be the direction of the camera, r to be a vector pointing to the right and u to be a vector pointing up. Let w be the number of pixels in the screen horizontally and h be the number of pixels vertically.
The parametric equation for a ray going through any pixel x, y is then:
ray = pO + t * normalize (d + (x - 0.5w)/0.5w * r + (y - 0.5h)/0.5h * u)
where t is the parameter.
For a parallel projection, move the virtual screen to the origin and calculate the x, y to be the origin of the ray then use the same direction d for each ray:
ray = (pO + (x - 0.5w)/0.5w * r + (y - 0.5h)/0.5h * u) + t*d
For a perspective projection, you have an eye origin, direction, right and up vectors. You then run a vector from the eye origin to each pixel in a virtual screen by scaling the right and up vectors.
In a parallel projection, you do the same calculation for the point on the screen, but your origin becomes that point and you use the same direction for each ray.

Clip matrix for 3D Perspective Projection

I am trying to create a simple 3D graphics engine and have found and used the equations I found here: http://en.wikipedia.org/wiki/3D_projection#cite_note-0. (I have calculations for Dx, Dy, Dz and Bx, By)
I works, but when I rotate the camera enough lines start flying all over the place and eventually you see the polygons that went off screen start to come back on the opposite side of the screen (you can go here: http://mobile.sheridanc.on.ca/~claassen/3d.html and use the W, A, S and D keys to rotate the camera to see what I'm talking about)
I read this discussion: How to convert a 3D point into 2D perspective projection? where he talked about using a clip matrix but Im still a little confused as to how exactly to use one. Also I'm not sure if I am using 'homogeneous coordinates' as described in the discussion.
After multiplying by the perspective projection matrix (aka clip matrix) you end up with a homogenious 4-vector [x,y,z,w]. This is called npc (normalized projection coordinates), and also called clip coordinates. To get the 2D coordinates on the screen you typically use something like
xscreen = (x/w) * screen_width
yscreen = (y/w) * screen_width
For points in front of the camera this gives you what you want. But points behind the camera will have w<0 and you will get values that map to valid screen coordinates even though the point is behind the camera. To avoid this you need to clip. Any vertex that has a w<0 needs to be clipped.
A quick thing to try is to just not draw any line if either vertex has w<0. This should fix the strange polygons that show up in your scene. But it will also remove some lines that should be visible.
TO completely fix the problem you need to clip all the lines which have one vertex in front of the camera and one vertex behind the camera. Clipping means cutting the line in half and throwing away the half that is behind the camera. The line is "clipped" by a plane that goes through the camera and is parallel to the display screen. The problem is to find the point on the line that corresponds to this plane (i.e. where the line intersects the plane). This will occur at the point on the line where w==0. You can find this point, but then when you try to find the screen coordinates
xscreen = (x/w) * screen_width
yscreen = (y/w) * screen_width
you end up dividing by 0 (w==0). This is the reason for the "near clipping plane". The near clipping plane is also parallel to the display screen but is in front of the camera (between the camera and the scene). The distance between the camera and the near clipping plane is the "near" parameter of the projection matrix:
[ near/width ][ 0 ][ 0 ][ 0 ]
[ 0 ][ near/height ][ 0 ][ 0 ]
[ 0 ][ 0 ][(far+near)/(far-near) ][ 1 ]
[ 0 ][ 0 ][-(2*near*far)/(far-near)][ 0 ]
To clip to the near plane you have to find the point on the line that intersects the near clipping plane. This is the point where w == near. So if you have a line with vertices v1,v2 where
v1 = [x1, y1, z1, w1]
v2 = [x2, y2, z2, w2]
you need to check if each vertex is in front of or behind the near clip plane. V1 is in front if w1 >= near and behind if w1 < near. If v1 and v2 are both in front then draw the line. If v1 and v2 are both behind then don't draw the line. If v1 is in front and v2 is behind then you need to find vc where the line intersects the near clip plane:
n = (w1 - near) / (w1 - w2)
xc = (n * x1) + ((1-n) * x2)
yc = (n * y1) + ((1-n) * y2)
zc = (n * z1) + ((1-n) * z2)
wc = near
vc = [xc, yc, zc, wc]
Now draw the line between v1 and vc.
This might be a misunderstanding of the terminology. The clip matrix is more appropriately known as a projection matrix. In OpenGL at least, the projection matrix transforms 4D homogeneous coordinates in view coordinate space (VCS) to clipping coordinate space (CCS). Projection from the CCS to normalized device coodinate space (NDCS) requires the perspective division, i.e., dividing each component by the W component. Clipping is correctly done before this step. So, a 'clipping matrix' doesn't remove the need to clip the geometry prior to projection. I hope I've understood you, and this doesn't sound condescending.
That said, I think you've obviously got the projection matrix right - it works. I suspect that the vertices passing behind the eye have negative W, which means they should be clipped; but I also suspect they have negative Z, so the division is yielding a positive Z value. If you really want to clip the geometry, rather than discard whole triangles, do a search for 'homogeneous clipping'. If you're not really working in 4D homogeneous space, you might start by looking at 'Sutherland-Hodgman' 3D clipping.

Projective transformation

Given two image buffers (assume it's an array of ints of size width * height, with each element a color value), how can I map an area defined by a quadrilateral from one image buffer into the other (always square) image buffer? I'm led to understand this is called "projective transformation".
I'm also looking for a general (not language- or library-specific) way of doing this, such that it could be reasonably applied in any language without relying on "magic function X that does all the work for me".
An example: I've written a short program in Java using the Processing library (processing.org) that captures video from a camera. During an initial "calibrating" step, the captured video is output directly into a window. The user then clicks on four points to define an area of the video that will be transformed, then mapped into the square window during subsequent operation of the program. If the user were to click on the four points defining the corners of a door visible at an angle in the camera's output, then this transformation would cause the subsequent video to map the transformed image of the door to the entire area of the window, albeit somewhat distorted.
Using linear algebra is much easier than all that geometry! Plus you won't need to use sine, cosine, etc, so you can store each number as a rational fraction and get the exact numerical result if you need it.
What you want is a mapping from your old (x,y) co-ordinates to your new (x',y') co-ordinates. You can do it with matrices. You need to find the 2-by-4 projection matrix P such that P times the old coordinates equals the new co-ordinates. We'll assume that you're mapping lines to lines (not, for instance, straight lines to parabolas). Because you have a projection (parallel lines don't stay parallel) and translation (sliding), you need a factor of (xy) and (1), too. Drawn as matrices:
[x ]
[a b c d]*[y ] = [x']
[e f g h] [x*y] [y']
[1 ]
You need to know a through h so solve these equations:
a*x_0 + b*y_0 + c*x_0*y_0 + d = i_0
a*x_1 + b*y_1 + c*x_1*y_1 + d = i_1
a*x_2 + b*y_2 + c*x_2*y_2 + d = i_2
a*x_3 + b*y_3 + c*x_3*y_3 + d = i_3
e*x_0 + f*y_0 + g*x_0*y_0 + h = j_0
e*x_1 + f*y_1 + g*x_1*y_1 + h = j_1
e*x_2 + f*y_2 + g*x_2*y_2 + h = j_2
e*x_3 + f*y_3 + g*x_3*y_3 + h = j_3
Again, you can use linear algebra:
[x_0 y_0 x_0*y_0 1] [a e] [i_0 j_0]
[x_1 y_1 x_1*y_1 1] * [b f] = [i_1 j_1]
[x_2 y_2 x_2*y_2 1] [c g] [i_2 j_2]
[x_3 y_3 x_3*y_3 1] [d h] [i_3 j_3]
Plug in your corners for x_n,y_n,i_n,j_n. (Corners work best because they are far apart to decrease the error if you're picking the points from, say, user-clicks.) Take the inverse of the 4x4 matrix and multiply it by the right side of the equation. The transpose of that matrix is P. You should be able to find functions to compute a matrix inverse and multiply online.
Where you'll probably have bugs:
When computing, remember to check for division by zero. That's a sign that your matrix is not invertible. That might happen if you try to map one (x,y) co-ordinate to two different points.
If you write your own matrix math, remember that matrices are usually specified row,column (vertical,horizontal) and screen graphics are x,y (horizontal,vertical). You're bound to get something wrong the first time.
EDIT
The assumption below of the invariance of angle ratios is incorrect. Projective transformations instead preserve cross-ratios and incidence. A solution then is:
Find the point C' at the intersection of the lines defined by the segments AD and CP.
Find the point B' at the intersection of the lines defined by the segments AD and BP.
Determine the cross-ratio of B'DAC', i.e. r = (BA' * DC') / (DA * B'C').
Construct the projected line F'HEG'. The cross-ratio of these points is equal to r, i.e. r = (F'E * HG') / (HE * F'G').
F'F and G'G will intersect at the projected point Q so equating the cross-ratios and knowing the length of the side of the square you can determine the position of Q with some arithmetic gymnastics.
Hmmmm....I'll take a stab at this one. This solution relies on the assumption that ratios of angles are preserved in the transformation. See the image for guidance (sorry for the poor image quality...it's REALLY late). The algorithm only provides the mapping of a point in the quadrilateral to a point in the square. You would still need to implement dealing with multiple quad points being mapped to the same square point.
Let ABCD be a quadrilateral where A is the top-left vertex, B is the top-right vertex, C is the bottom-right vertex and D is the bottom-left vertex. The pair (xA, yA) represent the x and y coordinates of the vertex A. We are mapping points in this quadrilateral to the square EFGH whose side has length equal to m.
Compute the lengths AD, CD, AC, BD and BC:
AD = sqrt((xA-xD)^2 + (yA-yD)^2)
CD = sqrt((xC-xD)^2 + (yC-yD)^2)
AC = sqrt((xA-xC)^2 + (yA-yC)^2)
BD = sqrt((xB-xD)^2 + (yB-yD)^2)
BC = sqrt((xB-xC)^2 + (yB-yC)^2)
Let thetaD be the angle at the vertex D and thetaC be the angle at the vertex C. Compute these angles using the cosine law:
thetaD = arccos((AD^2 + CD^2 - AC^2) / (2*AD*CD))
thetaC = arccos((BC^2 + CD^2 - BD^2) / (2*BC*CD))
We map each point P in the quadrilateral to a point Q in the square. For each point P in the quadrilateral, do the following:
Find the distance DP:
DP = sqrt((xP-xD)^2 + (yP-yD)^2)
Find the distance CP:
CP = sqrt((xP-xC)^2 + (yP-yC)^2)
Find the angle thetaP1 between CD and DP:
thetaP1 = arccos((DP^2 + CD^2 - CP^2) / (2*DP*CD))
Find the angle thetaP2 between CD and CP:
thetaP2 = arccos((CP^2 + CD^2 - DP^2) / (2*CP*CD))
The ratio of thetaP1 to thetaD should be the ratio of thetaQ1 to 90. Therefore, calculate thetaQ1:
thetaQ1 = thetaP1 * 90 / thetaD
Similarly, calculate thetaQ2:
thetaQ2 = thetaP2 * 90 / thetaC
Find the distance HQ:
HQ = m * sin(thetaQ2) / sin(180-thetaQ1-thetaQ2)
Finally, the x and y position of Q relative to the bottom-left corner of EFGH is:
x = HQ * cos(thetaQ1)
y = HQ * sin(thetaQ1)
You would have to keep track of how many colour values get mapped to each point in the square so that you can calculate an average colour for each of those points.
I think what you're after is a planar homography, have a look at these lecture notes:
http://www.cs.utoronto.ca/~strider/vis-notes/tutHomography04.pdf
If you scroll down to the end you'll see an example of just what you're describing. I expect there's a function in the Intel OpenCV library which will do just this.
There is a C++ project on CodeProject that includes source for projective transformations of bitmaps. The maths are on Wikipedia here. Note that so far as i know, a projective transformation will not map any arbitrary quadrilateral onto another, but will do so for triangles, you may also want to look up skewing transforms.
If this transformation has to look good (as opposed to the way a bitmap looks if you resize it in Paint), you can't just create a formula that maps destination pixels to source pixels. Values in the destination buffer have to be based on a complex averaging of nearby source pixels or else the results will be highly pixelated.
So unless you want to get into some complex coding, use someone else's magic function, as smacl and Ian have suggested.
Here's how would do it in principle:
map the origin of A to the origin of B via a traslation vector t.
take unit vectors of A (1,0) and (0,1) and calculate how they would be mapped onto the unit vectors of B.
this gives you a transformation matrix M so that every vector a in A maps to M a + t
invert the matrix and negate the traslation vector so for every vector b in B you have the inverse mapping b -> M-1 (b - t)
once you have this transformation, for each point in the target area in B, find the corresponding in A and copy.
The advantage of this mapping is that you only calculate the points you need, i.e. you loop on the target points, not the source points. It was a widely used technique in the "demo coding" scene a few years back.

Resources