How do I get base-space transform from CGContext? - transform

The offset for CGContext.setShadow has to be specified in base-space:
offset - Specifies a translation in base-space.
(from https://developer.apple.com/documentation/coregraphics/cgcontext/1455205-setshadow)
What is this "base-space"?
Semi related docs have this explanation:
The drawing (user) coordinate system. This coordinate system is used when you issue drawing commands.
The view coordinate system (base space). This coordinate system is a fixed coordinate system relative to the view.
The (physical) device coordinate system. This coordinate system represents pixels on the physical screen.
(from: https://developer.apple.com/library/archive/documentation/2DDrawing/Conceptual/DrawingPrintingiOS/GraphicsDrawingOverview/GraphicsDrawingOverview.html)
This makes sense. However, how do I get the transform of this base-space? There is CGContext.userSpaceToDeviceSpaceTransform but it seems to be transform from user->physical. How do I get from user->base or base->physical?

I believe that the base-space is equivalent to when the user-space has an identity transform matrix. In Apple's documentation, figure 1-1, the show the base-space have the origin in the upper-left, and an identity matrix of one (the indicated pixel of (3, 5) is 3 to the right, 5 down, as expected).
Thus, the shadow offset is in untransformed units. This is probably convenient for you, as you probably want the shadow offset to be the same, regardless of what the scale factor is. (If you scale a piece of vector clip art in a PowerPoint presentation, you want the shadow to be offset the same no matter how big you expand the clip art.)

Related

How do I find the world coordinates of a pixel on the image plane?

A bit of background
I am writing a simple ray tracer in C++. I have most of the core complete but don't understand how to retrieve the world coordinate of a pixel on the image plane. I need this location so that I can cast the ray into the world.
Currently I have a Camera with a position(aka my perspective reference point), a direction (vector) which is not normalized. The directions length signifies the center of the image plane and which way the camera is facing.
There are other values associated with the camera but they should not be relevant.
My image coordinates will range from -1 to 1 and the perspective(focal length), will change based on the distance of the direction associated with the camera.
What I need help with
I need to go from pixel coordinates (say [0, 256] in an image 256 pixels on each side) to my world coordinates.
I will also want to program this so that no matter where the camera is placed and where it is directed, that I can find the pixel in the world coordinates. (Currently the camera will almost always be centered at the origin and will look down the negative z axis. I would like to program this with the future changes in mind.) It is also important to know if this code should be pushed down into my threaded code as well. Otherwise it will be calculated by the main thread and then the ray will be used in the threaded code.
(source: in.tum.de)
I did not make this image and it is only there to give an idea of what I need.
Please leave comments if you need any additional info. Otherwise I would like a simple theory/code example of what to do.
Basically you have to do the inverse process of V * MVP which transforms the point to unit cube dimensions. Look at the following urls for programming help
http://nehe.gamedev.net/article/using_gluunproject/16013/ https://sites.google.com/site/vamsikrishnav/gluunproject

How to map points in a 3D-plane into screen plane

I have given an assignment of to project a object in 3D space into a 2D plane using simple graphics in C. The question is that a cube is placed in fixed 3D space and there is camera which is placed in a position whose co-ordinates are x,y,z and the camera is looking at the origin i.e. 0,0,0. Now we have to project the cube vertex into the camera plane.
I am proceeding with the following steps
Step 1: I find the equation of the plane aX+bY+cZ+d=0 which is perpendicular to the line drawn from the camera position to the origin.
Step 2: I find the projection of each vertex of the cube to the plane which is obtained in the above step.
Now I want to map those vertex position which i got by projection in step 2 in the plane aX+bY+cZ+d=0 into my screen plane.
thanks,
I don't think that by letting the z co-ordinate equals zero will lead me to the actual mapping. So any help to figure out this.
You can do that in two simple steps:
Translate the cube's coordinates to the camera's system (using
rotation), such that the camera's own coordinates in that system are x=y=z=0 and the cube's translated z's are > 0.
Project the translated cube's coordinates onto a 2d plain by dividing its x's and y's by their respective z's (you may need to apply a constant scaling factor here for the coordinates to be reasonable for the screen, e.g. not too small and within +/-half the screen's height in pixels). This will create the perspective effect. You can now draw pixels using these divided x's and y's on the screen assuming x=y=0 is the center of it.
This is pretty much how it is done in 3d games. If you use cube vertex coordinates, then you get projections of its sides onto the screen. You may then solid-fill the resultant 2d shapes or texture-map them. But for that you'll have to first figure out which sides are not obscured by others (unless, of course, you use a technique called z-buffering). You don't need that for a simple wire-frame demo, though, just draw straight lines between the projected vertices.

Why is a quadrilateral insufficient to determine projection/rotation/etc.?

Suppose I have a photograph, and four pixel coordinates representing the corners of a rectangular sheet of paper. My goal is to determine the rotation, translation, and projection which maps from the 3D scene containing the sheet of paper on a plane to the 2D image.
I understand there are augmented reality libraries for this, like ARToolkit. However, they all require additional information, namely the parameters of the camera used to take the photograph. My question is, how come having the rectangle's four corner points (in addition to knowing the rectangle's real-world dimensions) is insufficient information to extrapolate 3D information?
It makes sense mathematically since there are so many more unknown variables that bring us from 3D coordinates to 2D screen space, but I'm having a hard time grounding that concept in what I see.
Thanks!
Does it help for you to count degrees of freedom?
There are 3 degrees of freedom involved in deciding where in space to put the camera. 3 more degrees of freedom to decide how to turn it. 1 degree of freedom to figure out how much the picture it took had been enlarged, and finally 2 degrees of freedom to fix where on the resulting flat image we're looking.
That makes 9 degrees of freedom in total. However, knowing the location of four points in the final cropped image gives us only 8 continuously varying variables. Therefore there must be a way to slide the camera, zoom level and translation parameters around such that those four points stay in the same place on the screen (while everything else distorts subtly).
If we know even one of these nine parameters, such as the camera's focal length (in pixels!), then there's some hope of getting an unambiguous answer.

Do I need to rectify if camera planes are aligned?

If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).

What does 'Polygon' mean in terms of 3D Graphics?

An old Direct3D book says
"...you can achieve an acceptable frame
rate with hardware acceleration while
displaying between 2000 and 4000
polygons per frame..."
What is one polygon in Direct3D? Do they mean one primitive (indexed or otherwise) or one triangle?
That book means triangles. Otherwise, what if I wanted 1000-sided polygons? Could I still achieve 2000-4000 such shapes per frame?
In practice, the only thing you'll want it to be is a triangle because if a polygon is not a triangle it's generally tessellated to be one anyway. (Eg, a quad consists of two triangles, et cetera). A basic triangulation (tessellation) algorithm for that is really simple; you just loop though the vertices and turn every three vertices into a triangle.
Here, a "polygon" refers to a triangle. All . However, as you point out, there are many more variables than just the number of triangles which determine performance.
Key issues that matter are:
The format of storage (indexed or not; list, fan, or strip)
The location of storage (host-memory vertex arrays, host-memory vertex buffers, or GPU-memory vertex buffers)
The mode of rendering (is the draw primitive command issued fully from the host, or via instancing)
Triangle size
Together, those variables can create much greater than a 2x variation in performance.
Similarly, the hardware on which the application is running may vary 10x or more in performance in the real world: a GPU (or integrated graphics processor) that was low-end in 2005 will perform 10-100x slower in any meaningful metric than a current top-of-the-line GPU.
All told, any recommendation that you use 2-4000 triangles is so ridiculously outdated that it should be entirely ignored today. Even low-end hardware today can easily push 100,000 triangles in a frame under reasonable conditions. Further, most visually interesting applications today are dominated by pixel shading performance, not triangle count.
General rules of thumb for achieving good triangle throughput today:
Use [indexed] triangle (or quad) lists
Store data in GPU-memory vertex buffers
Draw large batches with each draw primitives call (thousands of primitives)
Use triangles mostly >= 16 pixels on screen
Don't use the Geometry Shader (especially for geometry amplification)
Do all of those things, and any machine today should be able to render tens or hundreds of thousands of triangles with ease.
According to this page, a polygon is n-sided in Direct3d.
In C#:
public static Mesh Polygon(
Device device,
float length,
int sides
)
As others already said, polygons here means triangles.
Main advantage of triangles is that, since 3 points define a plane, triangles are coplanar by definition. This means that every point within the triangle is exactly defined as a linear combination of polygon points. More vertices aren't necessarily coplanar, and they don't define a unique curved plane.
An advantage more in mechanical modeling than in graphics is that triangles are also undeformable.

Resources