Can I measure distance between camera and an object using single image? - python-3.x

I want to measure the distance between the camera and an object using OpenCV without giving a known distance. Is it possible? If yes, it can be done with only one image?

Monocular depth estimation is an open issue in computer vision research. Since is not a trivial operation, you will certainly need a deep learning approach. Then, once you have your network, maybe you can run it under DNN module of openCV, if that network is supported!
About the accuracy, it depends what accuracy you need, this approach will be surely less accurate of a RGBD camera, which has an accuracy of millimeters (depending of the distance, for example Intel Realsense D415 claims 2% error up to 4 meters).
Here you can find a sort of survey exactly in this field, take a look!

Related

Which face detection method suitable for detecting faces of people at a long distance?

I have checked five different methods for face detection.
1. Haar cascade
2. Dlib HOG
3. Python face_recognition module
4. DLib_CNN
5. OpenCV CNN
All these methods have some advantages and disadvantages and i found out that openCV_CNN works better out of these five algorithm. But for my application i need to detect faces from people on far distance and for this purpose even OpenCV_CNN is not working well (it detects faces of people closer to camera and not the people on far distance). Is there any other algorithm which detects faces of people on far distance?
One of the ways is to do instance segmentation in order to get all the classes in the environment including distant objects.
Once you get all the classes, you can draw a bounding box around the required far off face class, upsample it and send it to your face detection NN. suppose your image is of 54x54x3, it will be upsampled to 224x224x3 and sent to your trained NN.
Face Detection State-of-the-art practical considerations
Face Detection is often the first stage of a Computer Vision pipeline. Thus, it is important for the algorithm to perform in real time. So, it is important to know the comparison between various face detection algorithms and their pros and cons to use the right algorithm for your application. There are many algorithms that have been developed over the years as shown below.
Our recent favorite is YuNet because of its balance between speed and accuracy. Apart from that, RetinaFace is also very accurate but it is a larger model and is a little slow. We have compared the top 9 algorithms for Face Detection on some of the features that we should keep in mind while choosing a Face Detection algorithm:
Speed
Accuracy
Size of face
Robustness to occlusion
Robustness to Lighting variation
Robustness to Orientation or Pose
You can check out the Face Detection ultimate guide that gives a brief overview of the popular face detection algorithms.

Camera calibration algorithm evaluation

Recently, I am developing algorithm to improvement the camera calibration algorithm in my research group. I would like to ask are there any method to evaluation the camera calibration algorithm so that I can compare the result among difference algorithm?
The most easiest way I can think of is taking the mean square average of different between the calibrated one and the original one pixel-wise. Are there any other suggestions?
Are you talking about geometric camera calibration (focal length, optical center, etc.), or color calibration?
For geometric camera calibration, the main criterion is reprojection errors. Presumably, you are using some sort of calibration pattern, like a checkerboard, where you can detect a set of points. To evaluate calibration accuracy, you look at the distances between the detected points and the reprojected points.
This is what a calibration algorithm typically tries to minimize. Depending on the calibration software you use, you may also be able to look at the uncertainty of the estimated camera parameters.
See this example in MATLAB.
Alternatively, you can use your calibrated camera to measure an object of a known size, and see how precise your measurement is.

Obstacle avoidance using 2 fixed cameras on a robot

I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.

Image Stabilization optical flow

I'm working on a image stabilization by using optical flow.
The algorithm that I've used is like this; first of all I have found good features to track in OpenCv "cvGoodFeaturesToTrack" and then I've estimated the optical flow by using this function for OpenCv as well "cvCalcOpticalFlowPyrLK".
Now I want to stabilize the video sequence, which I think I need to take the average of the optical flow vectors.
I'm working on a real time application so I can't use either SIFT or SURF.
The problem that I don't know how take the average.
Can anyone show me what to do?
Regards
You don't need to average anything. Optical flow will return the position of the "good features to track" in the second image. Transform the second image so that these features coincide with the features on the first image (use GetPerspectiveTransform).
I'll probably write an article on this soon on my website http://aishack.in/

3D laser scanner capturing normals?

The Lab university I work at is in the process of purchasing a laser scanner for scanning 3D objects. All along from the start we've been trying to find a scanner that is able to capture real RAW normals from the actual scanned surface. It seems that most scanners only capture points and then the software interpolates to find the normal of the approximate surface.
Does anybody know if there is actually such a thing as capturing raw normals? Is there a scanner that can do this and not interpolate the normals from the point data?
Highly unlikely. Laser scanning is done using ranges. What you want would be combining two entirely different techniques. Normals could be evaluated with higher precision using well controlled lighting etc, but requiring a very different kind of setup. Also consider the sampling problem: What good is a normal with higher resolution than your position data?
If you already know the bidirectional reflectance distribution function of the material that composes your 3D object, it is possible that you could use a gonioreflectometer to compare the measured BRDF at a point. You could then individually optimize a computed normal at that point by comparing a hypothetical BRDF against the actual measured value.
Admittedly, this would be a reasonably computationally-intensive task. However, if you are only going through this process fairly rarely, it might be feasible.
For further information, I would recommend that you speak with either Greg Ward (Larson) of Radiance fame or Peter Shirley at NVIDIA.
Here is an example article of using structured light to reconstruct normals from gradients.
Shape from 2D Edge Gradients
I didn't find the exact article I was looking for, but this seems to be on the same principle.
You can reconstruct normals from the angle and width of the stripe after being deformed on the object.
You could with a structured light + camera setup.
The normal would come from the angle betwen the projected line and the position on the image. As the other posters point out - you can't do it from a point laser scanner.
Capturing raw normals is almost always done using photometric stereo. This almost always requires placing some assumptions on the underlying reflectance, but even with somewhat inaccurate normals you can often do well when combining them with another source of data:
Really nice code for combining point clouds (from a laser scan for example) with surface normals: http://www.cs.princeton.edu/gfx/pubs/Nehab_2005_ECP/

Resources