I'm working on a image stabilization by using optical flow.
The algorithm that I've used is like this; first of all I have found good features to track in OpenCv "cvGoodFeaturesToTrack" and then I've estimated the optical flow by using this function for OpenCv as well "cvCalcOpticalFlowPyrLK".
Now I want to stabilize the video sequence, which I think I need to take the average of the optical flow vectors.
I'm working on a real time application so I can't use either SIFT or SURF.
The problem that I don't know how take the average.
Can anyone show me what to do?
Regards
You don't need to average anything. Optical flow will return the position of the "good features to track" in the second image. Transform the second image so that these features coincide with the features on the first image (use GetPerspectiveTransform).
I'll probably write an article on this soon on my website http://aishack.in/
Related
I want to measure the distance between the camera and an object using OpenCV without giving a known distance. Is it possible? If yes, it can be done with only one image?
Monocular depth estimation is an open issue in computer vision research. Since is not a trivial operation, you will certainly need a deep learning approach. Then, once you have your network, maybe you can run it under DNN module of openCV, if that network is supported!
About the accuracy, it depends what accuracy you need, this approach will be surely less accurate of a RGBD camera, which has an accuracy of millimeters (depending of the distance, for example Intel Realsense D415 claims 2% error up to 4 meters).
Here you can find a sort of survey exactly in this field, take a look!
I'm working on face recognition project with python & OpenCV I detect faces but I have that problem
I don't know how to get t make the system differentiating between real and fake faces with 2D image
if someone has any ideas, please help me.
thank you.
There is a really good article (code included) by Adrian from pyimagesearch tackling the same exact problem with liveness detector.
Below is the extract from that article
There are a number of approaches to liveness detection, including:
Texture analysis, including computing Local Binary Patterns (LBPs) over face regions and using an SVM to classify the faces as real or spoofed.
Frequency analysis, such as examining the Fourier domain of the face.
Variable focusing analysis, such as examining the variation of pixel values between two consecutive frames.
Heuristic-based algorithms, including eye movement, lip movement, and blink detection. These set of algorithms attempt to track eye movement and blinks to ensure the user is not holding up a photo of another person (since a photo will not blink or move its lips).
Optical Flow algorithms, namely examining the differences and properties of optical flow generated from 3D objects and 2D planes.
-3D face shape, similar to what is used on Appleās iPhone face recognition system, enabling the face recognition system to distinguish between real faces and printouts/photos/images of another person.
Combinations of the above, enabling a face recognition system engineer to pick and choose the liveness detections models appropriate for their particular application.
You can solve this problem using multiple methods, I'm listing some of them here, you can find a few more by referring to some research papers.
Motion Approach: You can make user blink or move which convinces a way that they are real (Most likely to work on video dataset or sequential images)
Feature Approach: Extract useful features from an Image and use them to make binary classification decisions to say real or not.
Frequency Analysis: Examining the Fourier domain of the face.
Optical Flow algorithms: Namely examining the differences and properties of optical flow generated from 3D objects and 2D planes.
Texture Analysis: You can also do Local Binary Patterns using OpenCV to classify the images fake or not, refer this link for details on this approach.
I'm after advice from the image processing / computer vision experts here. Trying to develop a robust, scaled algorithm to extract dimensions of a person's body. For example, his upper-body width.
problems:
images without faces
person sitting
multiple faces
person is holding something , thus covering part of his body
ways of doing this:
* haar - unsupervised , a lot of training date of different body parts and hope for the best.
* HOG - 1. face detection -> afterwards using HOG and assumptions along the way with different filters
Note: all images will be scaled to the same size.
Obviously computation time for the second approach MIGHT be more demanding (doubtful though)
but for the 1st method, training is almost impossible and would take much more time..
P.S.
I know there's a paper about using pedestrian data.. but that would work for full body + standing, not for sitting.
I'm open to hearing all your ideas..ask away if you have anything to add.
Implementation would be done, hopefully via node.js
Thank you
DPM is widely used in computer vision for object detection and it tends to work in the case of occlusion and also when only part of an object is present in the image. The grammar model for humans is very good and has state of the art results on standard datasets. It takes around a second to perform detection on a single image, its matlab code, so its expected to be slow.
http://www.cs.berkeley.edu/~rbg/latent/
I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.
If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.
It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.
I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)
I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.