How to calculate a pixels world space position on an image plane formed by a virtual camera? - position

First, this Calculating camera ray direction to 3d world pixel helped me a bit in understanding what the virtual camera setup is like. I don't understand how the vectors work in this setup, and I thought normalized device coordinates had to be used which led me to this page http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-6-rays-cameras-and-images/building-primary-rays-and-rendering-an-image/. What I am trying to do is build a ray tracer, and as the question states, find out the pixels position in order to shoot out a ray. What I really, really really would like, is an actually example showing a virtual camera setup, screen resolution and how to calculate a pixels position, then transform to world space coordinates. Experts!, Thank you for your help! :D

Multiply a matrix by the coordinates. What matrix? There are lots of choices. For example XNA uses a projection matrix, view matrix and world matrix. Applying all of them transforms pixel coordinates into world coordinates or vice versa. Breaking it down this way helps to understand the different transformations going on so you can more easily construct the matrices.

Isn't this webpage providing you already with 4 pages of explanation on how these rays are built? It seems like you haven't made the effort to read the content of the link you are referring to. I would suggest you read it first, try to understand it, maybe look at the source code they provide and come back with a real question regarding what you potentially don't understand.
It's all there, and I am not going to re-write what these people seem to have put a lot of energy already to explain! (nor should anybody else really ...).

Related

Are there existing tools that raytrace triangle meshes?

Disclaimer: I'm not 100% on whether this is a well-formed question, so please feel free to comment and suggest improvements. I'll be actively looking out for ways to improve this question.
I have a triangle mesh, let's say the Stanford Bunny. Now, I want to raycast a ray from a source point in 3D along a 3D direction vector, and identify just the first intersection of that ray with the triangle mesh.
I already have a naive implementation cooked up. However, I'm looking for a more advanced implementation. In particular, I'll be casting many millions of rays in many directions, so I'm looking for a multi-threaded or GPU-accelerated implementation.
I have to believe that there must be some pretty complete projects online, as raycasting triangle meshes is a fundamental part of 3D computer graphics. However, I can't find anything beyond personal projects, which leads me to believe that I am using the wrong search terms, or something pretty simple along those lines.
I am looking for suggestions on existing tools that can raytrace polygonal meshes.
If all you need to do is find the distance to the mesh for millions of rays. Then it might be a good idea to look up CUDA raytracing tutorial online. This will show you how to cast many millions of rays. In most tutorials, raytracing is used to render to the screen with the camera matrix. However, this is not necessary. Simply adjust the rays starting parameters to what you need them to be such as 3D vector and position. Then output the data back to the CPU. Be weary of the bandwidth between the GPU and CPU sending millions of intersection points between the CPU and GPU can make the program run exceptionally slow.

How do I find the world coordinates of a pixel on the image plane?

A bit of background
I am writing a simple ray tracer in C++. I have most of the core complete but don't understand how to retrieve the world coordinate of a pixel on the image plane. I need this location so that I can cast the ray into the world.
Currently I have a Camera with a position(aka my perspective reference point), a direction (vector) which is not normalized. The directions length signifies the center of the image plane and which way the camera is facing.
There are other values associated with the camera but they should not be relevant.
My image coordinates will range from -1 to 1 and the perspective(focal length), will change based on the distance of the direction associated with the camera.
What I need help with
I need to go from pixel coordinates (say [0, 256] in an image 256 pixels on each side) to my world coordinates.
I will also want to program this so that no matter where the camera is placed and where it is directed, that I can find the pixel in the world coordinates. (Currently the camera will almost always be centered at the origin and will look down the negative z axis. I would like to program this with the future changes in mind.) It is also important to know if this code should be pushed down into my threaded code as well. Otherwise it will be calculated by the main thread and then the ray will be used in the threaded code.
(source: in.tum.de)
I did not make this image and it is only there to give an idea of what I need.
Please leave comments if you need any additional info. Otherwise I would like a simple theory/code example of what to do.
Basically you have to do the inverse process of V * MVP which transforms the point to unit cube dimensions. Look at the following urls for programming help
http://nehe.gamedev.net/article/using_gluunproject/16013/ https://sites.google.com/site/vamsikrishnav/gluunproject

Emulating a perspective rectangle on 2D

So, I'm currently developing a puzzle game of sorts, and I came upon something I'm not sure how to approach.
As you can see from the screenshot below, the text on the sides next to the main square is distorted along the diagonal of the quadrilateral. This is because this is not a screenshot of a 3D environment, but rather a 2D environment where the squares have been stretched in such a way that it looks like it's 3D.
I have tried using 3D perspective and changing depths, and while it solves the issue of the distorted sides, I was wondering if it's possible to fix this issue without doing 3D perspectives. Mainly because the current mesh transformation scheme took a while to get to, and converting that to something that works on 3D space is extra effort that might be avoidable.
I have a feeling this is unavoidable, but I'm curious if anyone knows a solution. I'm currently using OpenGL ES 1.
Probably not the answer you wanted, but I'd go with the 3d transformation because it will save you not only this distortion, but will simplify many other things down the road and give you opportunities to do nice effects.
What you are lacking in this scene is "perspective-correct interpolation", which is slightly non-linear, and is done automatically when you provide coordinates with depth information.
It may be possible to emulate it another way (though your options are limited since you do not have shaders available) but they will all likely be less efficient than using the dedicated functionality of your GPU. I recommend that you switch to using 3D coordinates.
Actually, I just found the answer. Turns out there's a Q coordinate which you can use to play around with trapezoidal texture distortion:
texture mapping a trapezoid with a square texture in OpenGL
http://www.xyzw.us/~cass/qcoord/
http://hacksoflife.blogspot.com.au/2008/08/perspective-correct-texturing-in-opengl.html
Looks like it won't be as correct as doing it 3D, but I suppose it will be easier for my use right now.

Distance between the camera and a recognized "object"

I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.
If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.
It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.
I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)

3D laser scanner capturing normals?

The Lab university I work at is in the process of purchasing a laser scanner for scanning 3D objects. All along from the start we've been trying to find a scanner that is able to capture real RAW normals from the actual scanned surface. It seems that most scanners only capture points and then the software interpolates to find the normal of the approximate surface.
Does anybody know if there is actually such a thing as capturing raw normals? Is there a scanner that can do this and not interpolate the normals from the point data?
Highly unlikely. Laser scanning is done using ranges. What you want would be combining two entirely different techniques. Normals could be evaluated with higher precision using well controlled lighting etc, but requiring a very different kind of setup. Also consider the sampling problem: What good is a normal with higher resolution than your position data?
If you already know the bidirectional reflectance distribution function of the material that composes your 3D object, it is possible that you could use a gonioreflectometer to compare the measured BRDF at a point. You could then individually optimize a computed normal at that point by comparing a hypothetical BRDF against the actual measured value.
Admittedly, this would be a reasonably computationally-intensive task. However, if you are only going through this process fairly rarely, it might be feasible.
For further information, I would recommend that you speak with either Greg Ward (Larson) of Radiance fame or Peter Shirley at NVIDIA.
Here is an example article of using structured light to reconstruct normals from gradients.
Shape from 2D Edge Gradients
I didn't find the exact article I was looking for, but this seems to be on the same principle.
You can reconstruct normals from the angle and width of the stripe after being deformed on the object.
You could with a structured light + camera setup.
The normal would come from the angle betwen the projected line and the position on the image. As the other posters point out - you can't do it from a point laser scanner.
Capturing raw normals is almost always done using photometric stereo. This almost always requires placing some assumptions on the underlying reflectance, but even with somewhat inaccurate normals you can often do well when combining them with another source of data:
Really nice code for combining point clouds (from a laser scan for example) with surface normals: http://www.cs.princeton.edu/gfx/pubs/Nehab_2005_ECP/

Resources