dithering vs ordered dithering - graphics

I understand how dithering works etc, but what is the differance between dithering vs ordered dithering?
Also can anyone point me to some good resources?

Taken from here:
Random dither
Random dithering could be termed the
"bubblesort" of digital halftoning
algorithms. It was the first attempt
(documented as far back as 1951) to
correct the contouring produced by
fixed thresholding, and it has
traditionally been referenced for
comparison in most studies of digital
halftoning. In fact, the name
"ordered dither" (which will be
discussed later) was chosen to
contrast random dither.
Ordered dither
While patterning was an important step
toward the digital reproduction of the
classic halftone, its main shortcoming
was the spatial enlargement (and
corresponding reduction in resolution)
of the image. Ordered dither
represents a major improvement in
digital halftoning where this spatial
distortion was eliminated and the
image could then be rendered in its
original size
.

The main difference between dither vs ordered dither is the way of the quantisation error spreading.
Dither - the quantisation error is spread (in Floyd-Steinberg) from the current pixel, to the right, bottom and right-bottom pixels. Thus, every pixel quantisation affects the neighbour pixels. In results, the dithering has more smooth vision (like drawing with strokes)
Ordered dither - the quantisation error is used from a pattern (matrix) with specified size. While processing pixels, a corresponding value for threashold is obtained from the pattern and applied to the pixel. The type of thresholds distribution in the pattern, determines the visual effect that will be produced.
Usually, the thresholds are evenly distributed and the result image is as smooth as possible.
For example, if the high-value thresholds are concentrated around the center of the pattern, the effect is "halftoning"
In conclusion it is worth to mention that Ordered dither is far simple and much faster. It was used back in the '90s in Windows 95/98 when the monitors' resolution were 256 colors or 16bit colors.
You can get source code and demo project from here

Related

opencv2: Circle detection not detecting the obvious ones

Problem
I'm trying to use opencv2 to detect PlayStation Move Motion Controllers in still images. In an attempt to increase the contrast between the orbs and the backgrounds, I decided to modify the input image to automatically scale the brightness level between the image's mean level and 96 above for each channel, then when converting to grayscale, taking the maximum value instead of the default transform, since some orbs are saturated but not "bright".
However, my best attempts at adjusting the parameters seems to not work well, detecting circles that aren't there over the obvious ones.
What can I do to improve the accuracy of the detection? What other improvements or algorithms do you think I could use?
Samples
In order of best to worst:
2 Wands, 1 Wand detected (showing all 2 detected circles)
2 Wands, 1 Wand detected with many nonexistent circles (showing top 4 circles)
1 Wand (against a dark background), 6 total circles, the lowest-ranked of which is the correct one (showing all 6 circles)
1 Wand (against a dark background), 44 total circles detected, none of which are that Wand (showing all 44 circles)
I am using this function call:
cv2.HoughCircles(img_gray,cv2.HOUGH_GRADIENT,
dp=1, minDist=24, param1=90, param2=25,
minRadius=2, maxRadius=48)
All images are resized and cropped to 640x480 (the resolution of the PS3 Eye). No blur is performed.
I think hough circles is the wrong approach for you, as you are not really looking for circles. You are looking for circular areas with strong intensity. Use e.g. blob detection instead, I linked a guide:
https://www.learnopencv.com/blob-detection-using-opencv-python-c/
In the blob detection, you need to set the parameters to get a proper high-intensity circular area.
as the other user said, hough circles arent the best approach here because hough circles look for perfect circles only. whereas your target is "circular" but not a circle (due to motion blur, light bleed/reflection, noise etc)
I suggest converting the image to HSV then filtering by hue/color and intensities to get a binary threshold instead of using grayscale directly (that will help remove background & noise and limit the search area)
then using findContours() (faster than blob detection), check for contours of high circularity and expected size/area range and maybe even solidity.
area = cv2.contourArea(contour)
perimeter = cv2.arcLength(contour,True)
circularity = 4*np.pi*area / (perimeter**2)
solidity = area/cv2.contourArea(cv2.convexHull(contour))
your biggest problem will be the orb contour merging with the background due to low contrast. so maybe some adaptive threshold could help

How to compare images and determine which has more content?

Goal: I want to grab the best frame from an animated GIF and use it as a static preview image. I believe the best frame is one that shows the most content - not necessarily the first or last frame.
Take this GIF for example:
--
This is the first frame:
--
Here is the 28th frame:
It's clear that frame 28th represents the entire GIF well.
How could I programmatically determine if one frame has more pixel/content over another? Any thoughts, ideas, packages/modules, or articles that you can point me to would be greatly appreciated.
One straightforward way this could be accomplished would be to estimate the entropy of each image and choose the frame with maximal entropy.
In information theory, entropy can be thought of as the "randomness" of the image. An image of a single color is very predictable, the flatter the distribution, the more random. This is highly related to the compression method described by Arthur-R as entropy is the lower bound on how much data can be losslessly compressed.
Estimating Entropy
One way to estimate the entropy is to approximate the probability mass function for pixel intensities using a histogram. To generate the plot below I first convert the image to grayscale, then compute the histogram using a bin spacing of 1 (for pixel values from 0 to 255). Then, normalize the histogram so that the bins sum to 1. This normalized histogram is an approximation of the pixel probability mass function.
Using this probability mass function we can easily estimate the entropy of the grayscale image which is described by the following equation
H = E[-log(p(x))]
Where H is entropy, E is the expected value, and p(x) is the probability that any given pixel takes the value x.
Programmatically H can be estimated by simply computing -p(x)*log(p(x)) for each value p(x) in the histogram and then adding them together.
Plot of entropy vs. frame number for your example.
with frame 21 (the 22nd frame) having the highest entropy.
Observations
The entropy computed here is not equal to the true entropy of the
image because it makes the assumption that each pixel is independently sampled from the same distribution. To get the true entropy we would need to know
the joint distribution of the image which we won't be able to know without
understanding the underlying random process that generated the images
(which would include human interaction). However, I don't think the true entropy would be very useful and this measure should
give a reasonable estimate of how much content is in the image.
This method will fail if some not-so-interesting frame
contains much more noise (randomly colored pixels) than the most
interesting frame because noise results in a high entropy. For example, the
following image is pure uniform noise and therefore has maximum entropy (H = 8 bits), i.e. no compression is possible.
Ruby Implementation
I don't know ruby but it looks like one of the answers to this question refers to a package for computing entropy of an image.
From m. simon borg's comment
FWIW, using Ruby's File.size() returns 1904 bytes for the 28th frame
image and 946 bytes for the first frame image – m. simon borg
File.size() should be roughly proportional to entropy.
As an aside, if you check the size of the 200x200 noise image on disk you will see that the file is 40,345 bytes even after compression, but the uncompressed data is only 40,000 bytes. Information theory tells us that no compression scheme can ever losslessly compress such images on average.
There are a couple ways I might go about this. My first thought (this may not be the most practical solution, but it seems theoretically interesting!) would be to try losslessly compressing each frame, and in theory, the frame with the least repeatable content (and thus the most unique content) would have the largest size, so you could then compare the size in bytes/bits of each compressed frame. The accuracy of this solution would probably be highly dependent on the photo passed in.
A more realistic/ practical solution might be to grab the predominant color in the GIF (so in the example, the background color), and then iterate through each pixel and increment a counter each time the color of the current pixel doesn't match the color of the background.
I'm thinking about some more optimized/ sample based solutions, and will edit my response to include them a little later, if performance is a concern for you.
I think that you can choose an API such as Restful Web Service for do that because without it that's so hard.
For example,these are some famous API's:
https://cloud.google.com/vision/
https://www.clarifai.com/
https://vize.ai
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://imagga.com

Detecting center and area of shapes in an image

I am working with GD library, and I'm looking for a way to detect the nearest pixel to the middle center of shapes, as well as total area used by each shape in a monochromic black-and-white image.
I'm having difficulty coming up with an efficient algorithm to do this. If you have done something similar to this in the past, I'd be grateful for any solution that would help.
Check out the binary image library
Essentially, Otsu threshold to separate out foreground from background, then label connected components. That particular image looks very clean but you might need morph ops to clean it up a bit and get rid of small holes and other artifacts.
Then you have area trivially (count pixels in component) or almost as trivially (use the weighted area function that penalises edge pixels). Centre is just mean.
http://malcolmmclean.github.io/binaryimagelibrary/
#MalcolmMcLean is right but there are remaining difficulties (if you are after maximum accuracy).
If you threshold with Otsu, there are a few pairs of "kissing" dots which will form a single blob using connected component analysis.
In addition, Otsu threshoding will discard some of the partially filled edge pixels so that the weighted averages will be inaccurate. A cure would be to increase the threshold (up to 254 is possible), but that worsens the problem of the kissing dots.
A workaround is to keep a low threshold and dilate the blobs individually to obtain suitable masks that cover all edge pixels. Even so, slight inaccuracies will result in the vicinity of the kissings.
Blob splitting by the watershed transform is also possible but more care is required to handle the common pixels. I doubt that a prefect solution is possible.
An alternative is the use of subpixel edge detection and least-squares circle fitting (after blob detection with a very low threshold to separate the dots). By avoiding the edge pixels common to two circles, you can probably achieve excellent results.

( p5.js ) FFT report lower frequencies "too loud" and higher frequencies "mute"?

I have been experimenting with simple FFT using p5 sound and then plotting the bands of the spectrum visually.
One thing i noticed is that the lower frequencies appears very high in almost all tracks while the high frequencies seems to be mute.
So for instance when doing FFT only with 16 bands most of the sound happens only on the first 4 bands and it seems that the other frequencies ( the higher ones ) are reported to be "muted" or just too quiet.
You can see this on this example for instance: http://p5js.org/reference/#/p5.FFT where even with relatively high frequencies the right side of the spectrum stays totally down, the lower frequencies are reported to be the highest even tough what you here is more of a middle / higher pitch kind of sound.
It seems that some sort of transformation have to be applied to the FFT result in order to have a visual representation that matches better that we hearing?
Am i missing something? I mean, i'm surely missing some basic information about how FFT works and how the frequencies are reported, but i mean, is that a common problem that has a common solution?
The human auditory system is fundamentally logarithmic base-2 in nature - each subsequent octave has twice the bandwidth of the next. As a consequence of this, the vast majority of the frequency content of human perceivable sound is below 1kHz, and signal power is spread more thinly between FFT bins at higher frequencies - which is precisely what your graph shows.
Spectrograms - which is what I suspect you're expecting to see here - are plotted with log(F) on the x-axis and signal power in dB on the Y axis. Your code draw a graph with both axes linear.
In addition, because you are not specifically applying a window function to the samples used to calculate the FFT , what you get by default is the rectangular window - very far from a good choice in this application.

Do I need to rectify if camera planes are aligned?

If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).

Resources