I am working with GD library, and I'm looking for a way to detect the nearest pixel to the middle center of shapes, as well as total area used by each shape in a monochromic black-and-white image.
I'm having difficulty coming up with an efficient algorithm to do this. If you have done something similar to this in the past, I'd be grateful for any solution that would help.
Check out the binary image library
Essentially, Otsu threshold to separate out foreground from background, then label connected components. That particular image looks very clean but you might need morph ops to clean it up a bit and get rid of small holes and other artifacts.
Then you have area trivially (count pixels in component) or almost as trivially (use the weighted area function that penalises edge pixels). Centre is just mean.
http://malcolmmclean.github.io/binaryimagelibrary/
#MalcolmMcLean is right but there are remaining difficulties (if you are after maximum accuracy).
If you threshold with Otsu, there are a few pairs of "kissing" dots which will form a single blob using connected component analysis.
In addition, Otsu threshoding will discard some of the partially filled edge pixels so that the weighted averages will be inaccurate. A cure would be to increase the threshold (up to 254 is possible), but that worsens the problem of the kissing dots.
A workaround is to keep a low threshold and dilate the blobs individually to obtain suitable masks that cover all edge pixels. Even so, slight inaccuracies will result in the vicinity of the kissings.
Blob splitting by the watershed transform is also possible but more care is required to handle the common pixels. I doubt that a prefect solution is possible.
An alternative is the use of subpixel edge detection and least-squares circle fitting (after blob detection with a very low threshold to separate the dots). By avoiding the edge pixels common to two circles, you can probably achieve excellent results.
Related
Problem
I'm trying to use opencv2 to detect PlayStation Move Motion Controllers in still images. In an attempt to increase the contrast between the orbs and the backgrounds, I decided to modify the input image to automatically scale the brightness level between the image's mean level and 96 above for each channel, then when converting to grayscale, taking the maximum value instead of the default transform, since some orbs are saturated but not "bright".
However, my best attempts at adjusting the parameters seems to not work well, detecting circles that aren't there over the obvious ones.
What can I do to improve the accuracy of the detection? What other improvements or algorithms do you think I could use?
Samples
In order of best to worst:
2 Wands, 1 Wand detected (showing all 2 detected circles)
2 Wands, 1 Wand detected with many nonexistent circles (showing top 4 circles)
1 Wand (against a dark background), 6 total circles, the lowest-ranked of which is the correct one (showing all 6 circles)
1 Wand (against a dark background), 44 total circles detected, none of which are that Wand (showing all 44 circles)
I am using this function call:
cv2.HoughCircles(img_gray,cv2.HOUGH_GRADIENT,
dp=1, minDist=24, param1=90, param2=25,
minRadius=2, maxRadius=48)
All images are resized and cropped to 640x480 (the resolution of the PS3 Eye). No blur is performed.
I think hough circles is the wrong approach for you, as you are not really looking for circles. You are looking for circular areas with strong intensity. Use e.g. blob detection instead, I linked a guide:
https://www.learnopencv.com/blob-detection-using-opencv-python-c/
In the blob detection, you need to set the parameters to get a proper high-intensity circular area.
as the other user said, hough circles arent the best approach here because hough circles look for perfect circles only. whereas your target is "circular" but not a circle (due to motion blur, light bleed/reflection, noise etc)
I suggest converting the image to HSV then filtering by hue/color and intensities to get a binary threshold instead of using grayscale directly (that will help remove background & noise and limit the search area)
then using findContours() (faster than blob detection), check for contours of high circularity and expected size/area range and maybe even solidity.
area = cv2.contourArea(contour)
perimeter = cv2.arcLength(contour,True)
circularity = 4*np.pi*area / (perimeter**2)
solidity = area/cv2.contourArea(cv2.convexHull(contour))
your biggest problem will be the orb contour merging with the background due to low contrast. so maybe some adaptive threshold could help
I have this fun idea of a project i'd like to do, but i'm not really sure about the math part of it. Here is the idea:
Make a plastic card that would simulate a 9 finger multitouch gesture when it is held against a capacitive screen
Based on the "9 finger" placement, determine some sort of a unique string and use it as an encryption/decryption key for an app
This way i could just open an app, touch the screen with the card and it would get authorized.
But here's the problem:
It shouldn't matter where you place the card on a screen, because the card would be pretty small to fit various screen sizes
The rectangle in which we can randomly position the 9 "fingers" would optimally be 4.5cm x 3cm
The "finger" itself is only recognized as a touch if it is about a 6mm circle (not sure if this can be made smaller)
I figured we could find the left-top "finger" and get every other "finger's" X and Y difference from it. Then concatenate the resulting numbers into a string and use it as a decryption/encryption key. So basically:
key = concat(X2 - X1, Y2 - Y1, X3 - X1, Y3 - Y1, ...)
But i think such an approach would have very few possible combinations (given a relatively small card size and a relatively big "finger") and one could easily write a program to generate all possible combinations and break the key in no time. Am i right about this? If so, how could i improve this?
Thanks for your thoughts
UPDATE 1: actually tried it out on iOS. The result is not promising, since the "fingers" get detected differently each time. The distance between them varies significantly (by as much as 40 pixels!). So i guess this is not as easy as i expected, since the OS seems to detect the touch differently each time for the same two circles.
Your question is lacking some relevant information: how far apart need the circles be so that the system can still distinguish them? What resolution can you realistically expect for the circle centers? And by “6mm circle”, do you mean 6mm diameter or radius (or even circumference)?
Lacking details, I'll make some pretty rough approximations. I'll start by requiring that two of the circles will be placed in opposite corners of the card. That way, you can find them by looking for a pair with maximal distance, and from that compute the orientation and size of the card and correct for that. This leaves 7 fingers to be placed randomly. I'll assume 1mm resolution, and restrict myself to a 45×30mm area. Which means 39×24=936 positions per circle, for a total of 9367≈6,3×1020≈269 combinations. OK, this does not exclude overlapping circles. But since the card is still rather sparsely covered, that shouldn't amount to too much. I'd say 64 bit of entropy (i.e. 264 possible combinations) should be reasonable even if you enforce non-overlapping circles. If you can really detect the circle centers with the required resolution, that is. This should be sufficient security for most applications. Far better than 8-letter passwords, but worse than the symmetric keys usually used for e.g. AES.
Since all of this depends very much on the resolution, it might be worthwhile to investigate that aspect first. Usually you'll get pixel coordinates for your finger positions, but it would be expecting too much to assume that you'd always get the pixel coordinate closest to the center of your circle. So you might start by writing a small application which draws a 6mm circle and records coordinates it receives. Then place a 6mm artificial circle in that drawn one a large number of times. Look how far the recorded positions differ from the center of circle. Take the maximum of those differences, perhaps after removing outliers. I'd add a pixel or two to that, to account for rounding errors due to the rotation of the card. Then turn that pixel count back into a metric length. This is the resolution you can expect. You might have to do this for several devices. If you do perform these experiments, let me know what you find and I'll update my answer accordingly.
If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).
So say I have an image that I want to "pixelate". I want this sharp image represented by a grid of, say, 100 x 100 squares. So if the original photo is 500 px X 500 px, each square is 5 px X 5 px. So each square would have a color corresponding to the 5 px X 5 px group of pixels it swaps in for...
How do I figure out what this one color, which is best representative of the stuff it covers, is? Do I just take the R G and B numbers for each of the 25 pixels and average them? Or is there some obscure other way I should know about? What is conventionally used in "pixelation" functions, say like in photoshop?
If you want to know about the 'theory' of pixelation, read up on resampling (and downsampling in particular). Pixelation algorithms are simply downsampling an image (using some downsampling method) and then upsampling it using nearest-neighbour interpolation. Note that in code these two steps may be fused into one.
For downsampling in general, to downsample by a factor of n the image is first filtered by an appropriate low-pass filter, and then one sample out of every n is taken. An "ideal" filter to use is the sinc filter, but because of issues with implementing it, the Lanczos filter is often used as a close alternative.
However, for almost all purposes when doing pixelization, using a simple box blur should work fine, and is very simple to implement. This is just an average of nearby pixels.
If you don't need to change the output size of the image, then this means you divide the image into blocks (the big resulting pixels) which are k×k pixels, and then replace all the pixels in each block with the average value of the pixels in that block.
when the source and target grids are so evenly divisible and aligned, most algorigthms give similar results. if the grids are fixed, go for simple averages.
in other cases, especially when resizing by a small percentage, the quality difference is quite evident. the simplest enhancement over simple average is weighting each pixel value considering how much of it's contained in the target pixel's area.
for more algorithms, check multivariate interpolation
Does anyone know of a graphics system which handles composition of multiple anti-aliased lines well?
I'm showing a dependency diagram and have a bunch of curves emanating from a point. These are drawn anti-aliased in the usual way, of blending partially covered pixels. So if two lines would occupy the same half of a pixel, the antialiasing blends it to 75% filled rather than 50% filled. With enough lines drawn on top of each other, the pixel blend clamps and you end up with aliased lines.
I know anti-grain geometry has algorithms for calculating blends which cater for lines which abut, and that oversampling might work, but are there any other approaches?
Handling this form of line composition well is going to be slow (you have to consider all the lines that impinge upon each pixel using a deferred rendering approach). I doubt that there are many (if any) libraries out there that will do it for you.
The quickest and easiest method (and possibly the only realistic and cost effective solution for your case), which will work with virtually any drawing library would be to supersample it - draw to an offscreen bitmap at much higher resolution (e.g. 4 times wider and higher, with lines of 4 pixels width. Disable antialiasing when drawing this as it'll only slow it down) and then scale the result down with bilinear filtering. The main down-side is that it uses a lot of memory for the offscreen bitmap.
If you need an existing system that gets antialiased lines "visually correct", you might try using one of several existing RenderMan-compliant 3D renderers. The REYES algorithm, which many of these renderers use, works by breaking up primitives into micropolygons, then sampling them at several random point locations within each pixel. So even if you have a million lines collectively obscuring 50% of a pixel, the resulting image value will show roughly 50% coverage. (This is, for example, how the millions of antialiased hairs are drawn on characters in many animated movies.)
Of course, using a full-blown 3D renderer to draw 2D lines is like driving nails with a sledgehammer. You'd need a fairly pathological scenario for the 3D renderer to be any more efficient than simply supersampling with a traditional 2D renderer.
It sounds like you want a premade drawing library, which I do not know of.
However, to answer your question of knowing any approach that would work, you can consider a pixel to be a square. You can then approximate any shape that you draw as a polygon that intersects the pixel box. By clipping these polygons against the box of the pixel and against each other, you can get a very good estimate of the areas associated with each color that intersects the pixel for accurate antialiasing. This is, of course, very slow to calculate and is not suitable for interactive drawing.