Problems with template matching and pyrDown - node.js

I am trying to make a normal template matching search more effizient by first doing the search on downscaled representations of the image. Basically I do a double pyrDown -> quarter resolution.
For most images and templates this works beautifully, but for some others I get really bad matching results. It seems to be especially bad for thin fonts or small contrast.
Look at this example:
And this template:
At 100% resolution I get a matching probability of 99,9%
At 50% resolution I get 90%
At 25% resolution I get 87%
I don't really know why its so bad for some images/templates. I tried to recreate and test in photoshop by hiding/showing the 25% downscaled template on top of the 25% downscaled image, and as you can see, it's not 100% congruent:
https://giphy.com/gifs/coWDjcvHysKgn95IFa
I need a way to get more probability for those matchings at low resolution because it needs to be fast.
Any ideas on how to improve my algorithm?
Here are the original files:
https://www.dropbox.com/s/llbdj9bx5eprxbk/images.zip?dl=0

This is not unusual and those scores seem perfectly fine. However here are some ideas that might help you improve the situation:
You mentioned that it seems to be especially bad for thin fonts. This could be happening because some of the pixels in the lines are being smoothed out or distorted with the Gaussian filter that is applied on pyrDown. It could also be an indication that you have reduced the resolution too much. Unfortunately I think the pyrDown function in OpenCV reduces the resolution by a factor of 2 so it does not give you the ability to fine tune it by other scale factors. Another thing you could try is the instruction resize() with interpolation set to INTER_LINEAR or INTER_CUBIC. The resize() function will allow you to resize the image using any scale factor so you might have more control of performance vs accuracy.
Use multiple templates of the same objects. If you come to a scene and can only achieve an 87% score, create a template out of that scene. Then add it to a database of templates that are to be utilized. Obviously as the amount of templates increases so does the time it takes to complete the search.
The best way to deal with this scenario is to perform an exhaustive match on the highest level of the pyramid then track it down to the lowest level using a reduced search space on lower levels. By exhaustive I mean you will search all rows and all columns across the entire top pyramid level image. You will keep track of the locations (row, col) of the highest matches on the highest level (you are already probably doing that). Then you will multiply those locations by a factor of 2 and perform a restricted search on the next lowest level (ex. 5 x 5 shift centered on the rough location). You keep doing this until you are at the bottom level. This will give you the best overall accuracy and performance. This is also the way most industrial computer vision packages do it.

Related

Image resizing: what is a "filter"?

I'm trying to understand how image resizing works - please, can someone explain to me what is a "filter" good for?
does a filter calculates how much a source pixel contributes to a destination pixel?
there are filters like "box" and "gaussian", but is there a filter called "bicubic"? Do I mix two concepts here, one being "convolution filter" and ...?
is it possible to use the same filter for both upscaling and downscaling? (it would be really great to see an example code of this)
is it desirable to first stretch the image in one dimension and then in the other one?
In image resizing, the filter avoids a phenomenon called aliasing. If you try to resize without a filter, aliasing typically manifests as obnoxious pixellated effects, which are especially visible when animated...
To answer your points:
The filter does calculate how much each source pixel contributes to each destination. For resizing, you want a linear filter, which is pretty simple: the filter can be viewed as a small grayscale image; effectively, you center the filter over a location corresponding to each output pixel, multiply each nearby pixel by the filter value at that location, and add them up to get the output pixel value.
All such filters are "convolution filters", because convolution is the mathematical name for the operation described above. A "box" filter literally looks like a box -- every pixel within the box is weighted equally, while "gaussian" filters are more roundish blobs, feathering towards zero at the edge.
The most important thing for upscaling and downscaling is to choose the right size for your filter. Briefly, you want to scale your filter based on whichever of the input and output has the lowest resolution. The second most important thing is to avoid bad filters: the "box" filter is what you usually get when you try to resize without filtering; a "bilinear" filter as provided by computer graphics hardware yields mediocre upscaling, but is supplied at the wrong size for downscaling.
For performance reasons, it is desirable to scale images in one dimension and then the other one. This means your filter runs much faster: in time proportional to the filter width, instead of proportional to the filter area. All the filters discussed here are "separable", which means you can apply them in this way.
If you choose a high-quality filter, the exact form is less critical than you might think. There are two classes of good filters: all-positive ones like "gaussian" which tend to the blurry side, and negative-lobed ones like "lanczos" which are sharp, but may yield slight ringing effects. Note that "bicubic" filters is a category, which includes "B-spline" which is all-positive, and "Mitchell" and "Catmull-Rom" which have negative lobes.

Number of polygons in a 3D object and the rendering workload?

Is there any relation (preferably an equation) between the number of polygons in a 3D object and the rendering workload? I want to see how much the rendering workload would be increased if for instance the number of polygons doubles.
There is no clear connection between the arbitrary number of polygons and the mythical "workload".
See the following samples:
You render a cube with 6 faces composed of 12 triangles. You get, say, 1000fps (without vsync). When you tesselate the cube into 120 triangles, most likely the fps counter remains 1000.
You render a single fullscreen-sized quad with a heavy fragment shader with a lot of calculation. You get 0.5fps (or more, but I hope you get the point).
Another extreme. You are rendering a thousand of similar cubes, each with different texture. The rendering state change will take most of the time, not the actual rendering.
So, polygons may have different screen area and they may be rendered not within a single primitive. If you're talking about one big vertex array with a large number of polygons, then for some certain scenarios the performance change must be something like linear. "Something" because the videocard and the drivers are clipping the invisible polys and perfrom the early-out tests for each pixel being rendered.
Could you define 'workload'? ā€“ Erno yesterday
Well, I mean working
calculations. I want to see how much overhead (for GPU, CPU,
memory,...) would be increased. Actually I want to conclude the energy
usage of the device ā€“ user1196937 2 hours ago
If that is the actual question, a comparison of energy usage:
You will have to pick specific configurations and test those. Energy usage is very different from GPU to GPU and machine to machine.
Some GPU manufactures give very detailed information on the performance of their processors but when you want to compare those you will need an actual machine.

Recognizing line segments from a sequence of points

Given an input of 2D points, I would like to segment them in lines. So if you draw a zig-zag style line, each of the segments should be recognized as a line. Usually, I would use OpenCV's
cvHoughLines or a similar approach (PCA with an outlier remover), but in this case the program is not allowed to make "false-positive" errors. If the user draws a line and it's not recognized - it's ok, but if the user draws a curcle and it comes out as a square - it's not ok. So I have an upper bound on the error - but if it's a long line and some of the points have a greater distance from the approximated line, it's ok again. Summed up:
-line detection
-no false positives
-bounded, dynamically adjusting error
Oh, and the points are drawn in sequence, just like hand drawing.
At least it does not have to be fast. It's for a sketching tool. Anyone has an idea?
This has the same difficulty as voice and gesture recognition. In other words, you can never be 100% sure that you've found all the corners/junctions, and among those you've found you can never be 100% sure they are correct. The reason you can't be absolutely sure is because of ambiguity. The user might have made a single stroke, intending to create two lines that meet at a right angle. But if they did it quickly, the 'corner' might have been quite round, so it wouldn't be detected.
So you will never be able to avoid false positives. The best you can do is mitigate them by exploring several possible segmentations, and using contextual information to decide which is the most likely.
There are lots of papers on sketch segmentation every year. This seems like a very basic thing to solve, but it is still an open topic. The one I use is out of Texas A&M, called MergeCF. It is nicely summarized in this paper: http://srlweb.cs.tamu.edu/srlng_media/content/objects/object-1246390659-1e1d2af6b25a2ba175670f9cb2e989fe/mergeCF-sbim09-fin.pdf.
Basically, you find the areas that have high curvature (higher than some fraction of the mean curvature) and slow speed (so you need timestamps). Combining curvature and speed improves the initial fit quite a lot. That will give you clusters of points, which you reduce to a single point in some way (e.g. the one closest to the middle of the cluster, or the one with the highest curvature, etc.). This is an 'over fit' of the stroke, however. The next stage of the algorithm is to iteratively pick the smallest segment, and see what would happen if it is merged with one of its neighboring segments. If merging doesn't increase the overall error too much, you remove the point separating the two segments. Rinse, repeat, until you're done.
It has been a while since I've looked at the new segmenters, but I don't think there have been any breakthroughs.
In my implementation I use curvature median rather than mean in my initial threshold, which seems to give me better results. My heavily modified implementation is here, which is definitely not a self-contained thing, but it might give you some insight. http://code.google.com/p/pen-ui/source/browse/trunk/thesis-code/src/org/six11/sf/CornerFinder.java

Obstacle avoidance using 2 fixed cameras on a robot

I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.

What are the efficient and accurate algorithms to exclude outliers from a set of data?

I have set of 200 data rows(implies a small set of data). I want to carry out some statistical analysis, but before that I want to exclude outliers.
What are the potential algos for the purpose? Accuracy is a matter of concern.
I am very new to Stats, so need help in very basic algos.
Overall, the thing that makes a question like this hard is that there is no rigorous definition of an outlier. I would actually recommend against using a certain number of standard deviations as the cutoff for the following reasons:
A few outliers can have a huge impact on your estimate of standard deviation, as standard deviation is not a robust statistic.
The interpretation of standard deviation depends hugely on the distribution of your data. If your data is normally distributed then 3 standard deviations is a lot, but if it's, for example, log-normally distributed, then 3 standard deviations is not a lot.
There are a few good ways to proceed:
Keep all the data, and just use robust statistics (median instead of mean, Wilcoxon test instead of T-test, etc.). Probably good if your dataset is large.
Trim or Winsorize your data. Trimming means removing the top and bottom x%. Winsorizing means setting the top and bottom x% to the xth and 1-xth percentile value respectively.
If you have a small dataset, you could just plot your data and examine it manually for implausible values.
If your data looks reasonably close to normally distributed (no heavy tails and roughly symmetric), then use the median absolute deviation instead of the standard deviation as your test statistic and filter to 3 or 4 median absolute deviations away from the median.
Start by plotting the leverage of the outliers and then go for some good ol' interocular trauma (aka look at the scatterplot).
Lots of statistical packages have outlier/residual diagnostics, but I prefer Cook's D. You can calculate it by hand if you'd like using this formula from mtsu.edu (original link is dead, this is sourced from archive.org).
You may have heard the expression 'six sigma'.
This refers to plus and minus 3 sigma (ie, standard deviations) around the mean.
Anything outside the 'six sigma' range could be treated as an outlier.
On reflection, I think 'six sigma' is too wide.
This article describes how it amounts to "3.4 defective parts per million opportunities."
It seems like a pretty stringent requirement for certification purposes. Only you can decide if it suits you.
Depending on your data and its meaning, you might want to look into RANSAC (random sample consensus). This is widely used in computer vision, and generally gives excellent results when trying to fit data with lots of outliers to a model.
And it's very simple to conceptualize and explain. On the other hand, it's non deterministic, which may cause problems depending on the application.
Compute the standard deviation on the set, and exclude everything outside of the first, second or third standard deviation.
Here is how I would go about it in SQL Server
The query below will get the average weight from a fictional Scale table holding a single weigh-in for each person while not permitting those who are overly fat or thin to throw off the more realistic average:
select w.Gender, Avg(w.Weight) as AvgWeight
from ScaleData w
join ( select d.Gender, Avg(d.Weight) as AvgWeight,
2*STDDEVP(d.Weight) StdDeviation
from ScaleData d
group by d.Gender
) d
on w.Gender = d.Gender
and w.Weight between d.AvgWeight-d.StdDeviation
and d.AvgWeight+d.StdDeviation
group by w.Gender
There may be a better way to go about this, but it works and works well. If you have come across another more efficient solution, Iā€™d love to hear about it.
NOTE: the above removes the top and bottom 5% of outliers out of the picture for purpose of the Average. You can adjust the number of outliers removed by adjusting the 2* in the 2*STDDEVP as per: http://en.wikipedia.org/wiki/Standard_deviation
If you want to just analyse it, say you want to compute the correlation with another variable, its ok to exclude outliers. But if you want to model / predict, it is not always best to exclude them straightaway.
Try to treat it with methods such as capping or if you suspect the outliers contain information/pattern, then replace it with missing, and model/predict it. I have written some examples of how you can go about this here using R.

Resources