Distance between the camera and a recognized "object" - geometry

I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.

If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.

It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.

I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)

Related

Triangulate camera position and orientation in regards to known objects

I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!

How do I find the world coordinates of a pixel on the image plane?

A bit of background
I am writing a simple ray tracer in C++. I have most of the core complete but don't understand how to retrieve the world coordinate of a pixel on the image plane. I need this location so that I can cast the ray into the world.
Currently I have a Camera with a position(aka my perspective reference point), a direction (vector) which is not normalized. The directions length signifies the center of the image plane and which way the camera is facing.
There are other values associated with the camera but they should not be relevant.
My image coordinates will range from -1 to 1 and the perspective(focal length), will change based on the distance of the direction associated with the camera.
What I need help with
I need to go from pixel coordinates (say [0, 256] in an image 256 pixels on each side) to my world coordinates.
I will also want to program this so that no matter where the camera is placed and where it is directed, that I can find the pixel in the world coordinates. (Currently the camera will almost always be centered at the origin and will look down the negative z axis. I would like to program this with the future changes in mind.) It is also important to know if this code should be pushed down into my threaded code as well. Otherwise it will be calculated by the main thread and then the ray will be used in the threaded code.
(source: in.tum.de)
I did not make this image and it is only there to give an idea of what I need.
Please leave comments if you need any additional info. Otherwise I would like a simple theory/code example of what to do.
Basically you have to do the inverse process of V * MVP which transforms the point to unit cube dimensions. Look at the following urls for programming help
http://nehe.gamedev.net/article/using_gluunproject/16013/ https://sites.google.com/site/vamsikrishnav/gluunproject

How to calculate a pixels world space position on an image plane formed by a virtual camera?

First, this Calculating camera ray direction to 3d world pixel helped me a bit in understanding what the virtual camera setup is like. I don't understand how the vectors work in this setup, and I thought normalized device coordinates had to be used which led me to this page http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-6-rays-cameras-and-images/building-primary-rays-and-rendering-an-image/. What I am trying to do is build a ray tracer, and as the question states, find out the pixels position in order to shoot out a ray. What I really, really really would like, is an actually example showing a virtual camera setup, screen resolution and how to calculate a pixels position, then transform to world space coordinates. Experts!, Thank you for your help! :D
Multiply a matrix by the coordinates. What matrix? There are lots of choices. For example XNA uses a projection matrix, view matrix and world matrix. Applying all of them transforms pixel coordinates into world coordinates or vice versa. Breaking it down this way helps to understand the different transformations going on so you can more easily construct the matrices.
Isn't this webpage providing you already with 4 pages of explanation on how these rays are built? It seems like you haven't made the effort to read the content of the link you are referring to. I would suggest you read it first, try to understand it, maybe look at the source code they provide and come back with a real question regarding what you potentially don't understand.
It's all there, and I am not going to re-write what these people seem to have put a lot of energy already to explain! (nor should anybody else really ...).

Obstacle avoidance using 2 fixed cameras on a robot

I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.

3D laser scanner capturing normals?

The Lab university I work at is in the process of purchasing a laser scanner for scanning 3D objects. All along from the start we've been trying to find a scanner that is able to capture real RAW normals from the actual scanned surface. It seems that most scanners only capture points and then the software interpolates to find the normal of the approximate surface.
Does anybody know if there is actually such a thing as capturing raw normals? Is there a scanner that can do this and not interpolate the normals from the point data?
Highly unlikely. Laser scanning is done using ranges. What you want would be combining two entirely different techniques. Normals could be evaluated with higher precision using well controlled lighting etc, but requiring a very different kind of setup. Also consider the sampling problem: What good is a normal with higher resolution than your position data?
If you already know the bidirectional reflectance distribution function of the material that composes your 3D object, it is possible that you could use a gonioreflectometer to compare the measured BRDF at a point. You could then individually optimize a computed normal at that point by comparing a hypothetical BRDF against the actual measured value.
Admittedly, this would be a reasonably computationally-intensive task. However, if you are only going through this process fairly rarely, it might be feasible.
For further information, I would recommend that you speak with either Greg Ward (Larson) of Radiance fame or Peter Shirley at NVIDIA.
Here is an example article of using structured light to reconstruct normals from gradients.
Shape from 2D Edge Gradients
I didn't find the exact article I was looking for, but this seems to be on the same principle.
You can reconstruct normals from the angle and width of the stripe after being deformed on the object.
You could with a structured light + camera setup.
The normal would come from the angle betwen the projected line and the position on the image. As the other posters point out - you can't do it from a point laser scanner.
Capturing raw normals is almost always done using photometric stereo. This almost always requires placing some assumptions on the underlying reflectance, but even with somewhat inaccurate normals you can often do well when combining them with another source of data:
Really nice code for combining point clouds (from a laser scan for example) with surface normals: http://www.cs.princeton.edu/gfx/pubs/Nehab_2005_ECP/

Resources