Image resizing: what is a "filter"? - graphics

I'm trying to understand how image resizing works - please, can someone explain to me what is a "filter" good for?
does a filter calculates how much a source pixel contributes to a destination pixel?
there are filters like "box" and "gaussian", but is there a filter called "bicubic"? Do I mix two concepts here, one being "convolution filter" and ...?
is it possible to use the same filter for both upscaling and downscaling? (it would be really great to see an example code of this)
is it desirable to first stretch the image in one dimension and then in the other one?

In image resizing, the filter avoids a phenomenon called aliasing. If you try to resize without a filter, aliasing typically manifests as obnoxious pixellated effects, which are especially visible when animated...
To answer your points:
The filter does calculate how much each source pixel contributes to each destination. For resizing, you want a linear filter, which is pretty simple: the filter can be viewed as a small grayscale image; effectively, you center the filter over a location corresponding to each output pixel, multiply each nearby pixel by the filter value at that location, and add them up to get the output pixel value.
All such filters are "convolution filters", because convolution is the mathematical name for the operation described above. A "box" filter literally looks like a box -- every pixel within the box is weighted equally, while "gaussian" filters are more roundish blobs, feathering towards zero at the edge.
The most important thing for upscaling and downscaling is to choose the right size for your filter. Briefly, you want to scale your filter based on whichever of the input and output has the lowest resolution. The second most important thing is to avoid bad filters: the "box" filter is what you usually get when you try to resize without filtering; a "bilinear" filter as provided by computer graphics hardware yields mediocre upscaling, but is supplied at the wrong size for downscaling.
For performance reasons, it is desirable to scale images in one dimension and then the other one. This means your filter runs much faster: in time proportional to the filter width, instead of proportional to the filter area. All the filters discussed here are "separable", which means you can apply them in this way.
If you choose a high-quality filter, the exact form is less critical than you might think. There are two classes of good filters: all-positive ones like "gaussian" which tend to the blurry side, and negative-lobed ones like "lanczos" which are sharp, but may yield slight ringing effects. Note that "bicubic" filters is a category, which includes "B-spline" which is all-positive, and "Mitchell" and "Catmull-Rom" which have negative lobes.

Related

How to identify joints in the profile of a shape?

I'm working on a system to automatically take 2D profiles of components and assemble them into 3D shapes.
Imagine given these pieces:
You want to make this shape:
I'm highlighting one of the components to show how they fit together.
I'm open to any suggestions on how to go about doing this but the current approach I'm attempting first finds joints that may fit together just by looking at the 2D profile.
How could I go about identifying the "tabs" from the polyline profile?
The same technique should also work on assemblies like such:
see How to compare two shapes?
so you basically trying to find the "same" sequences in polylines encoded in the polar increment format (turn angle, line length) and then just check if relative position of matched sequences are the same in both shapes ...
Beware that the locks might have some gap between the joined shapes to ensure assembly is possible... in same case the gap might be even negative (overlap) depends on material and function so You need to compare the sequences with some margin ...
Also I would divide each shape into its sides to speed up the process as the lock is most likely not crossing sides ...
You may define the "code" for a tab. For example:
3,C,5,C,3 would mean: Three units length, then turn 90º counter-clockwise, then 5 units length, then turn 90º counter-clockwise, then 3 units length.
Of course more identifiers than C can be used, for different angles and so.
A tab in another piece that fits in the tab of the first piece has the same (or very similar) 3,C,5,C,3 code
So, finding same code in both pieces may be a fit. Check if adjacents codes in both pieces also fit, and you're done.
Notice that pieces can be rotated. This case doesn't change the code, but may change the order of adjacents codes.

Problems with template matching and pyrDown

I am trying to make a normal template matching search more effizient by first doing the search on downscaled representations of the image. Basically I do a double pyrDown -> quarter resolution.
For most images and templates this works beautifully, but for some others I get really bad matching results. It seems to be especially bad for thin fonts or small contrast.
Look at this example:
And this template:
At 100% resolution I get a matching probability of 99,9%
At 50% resolution I get 90%
At 25% resolution I get 87%
I don't really know why its so bad for some images/templates. I tried to recreate and test in photoshop by hiding/showing the 25% downscaled template on top of the 25% downscaled image, and as you can see, it's not 100% congruent:
https://giphy.com/gifs/coWDjcvHysKgn95IFa
I need a way to get more probability for those matchings at low resolution because it needs to be fast.
Any ideas on how to improve my algorithm?
Here are the original files:
https://www.dropbox.com/s/llbdj9bx5eprxbk/images.zip?dl=0
This is not unusual and those scores seem perfectly fine. However here are some ideas that might help you improve the situation:
You mentioned that it seems to be especially bad for thin fonts. This could be happening because some of the pixels in the lines are being smoothed out or distorted with the Gaussian filter that is applied on pyrDown. It could also be an indication that you have reduced the resolution too much. Unfortunately I think the pyrDown function in OpenCV reduces the resolution by a factor of 2 so it does not give you the ability to fine tune it by other scale factors. Another thing you could try is the instruction resize() with interpolation set to INTER_LINEAR or INTER_CUBIC. The resize() function will allow you to resize the image using any scale factor so you might have more control of performance vs accuracy.
Use multiple templates of the same objects. If you come to a scene and can only achieve an 87% score, create a template out of that scene. Then add it to a database of templates that are to be utilized. Obviously as the amount of templates increases so does the time it takes to complete the search.
The best way to deal with this scenario is to perform an exhaustive match on the highest level of the pyramid then track it down to the lowest level using a reduced search space on lower levels. By exhaustive I mean you will search all rows and all columns across the entire top pyramid level image. You will keep track of the locations (row, col) of the highest matches on the highest level (you are already probably doing that). Then you will multiply those locations by a factor of 2 and perform a restricted search on the next lowest level (ex. 5 x 5 shift centered on the rough location). You keep doing this until you are at the bottom level. This will give you the best overall accuracy and performance. This is also the way most industrial computer vision packages do it.

How to clean border noise from a license plate using filters

I am filtering an image so that it removes noise from them.
This image corresponds to a patent plate, and to detect the letters I need them to be without noise.
Original image:
Output:
Any way to make that 5 able to remove white part from above? or decrease it
I have a couple of images like this with that problem, which occurs when I transform the image horizontally. Any help is welcome.
With just low-level operations (filters), you can't reduce the black area on the top because is it of the same nature as the characters themselves. Any action you take against this zone will also damage the characters. No filter will work satisfactorily.
Hence you must use some extra contextual information such as "against the top edge", and possibly "forming a straight edge". Even so, finding the exact border with the 5 is challenging.

SVG filter that highlights a path upwards (image sample provided)

I'm trying to highlight the countries of a svg map in a specific manner.
This is the result I want to achieve:
Before
After
Using the drop shadow technique provided here: https://stackoverflow.com/a/6094674, I was able to obtain a small relief effect, but I think this might not be the correct direction.
How should I approach this?
This gives you a shadow effect, which is fine. But if you want a true relief effect that works for all shapes (even fiddly ones with thin horizontal lines), then you'll need to composite multiple copies of the SourceGraphic, incrementally offset in y once for each pixel. Alternatively, you can use lighting primitives and some fancy compositing.

Downsampling and applying a lowpass filter to digital audio

I've got a 44Khz audio stream from a CD, represented as an array of 16 bit PCM samples. I'd like to cut it down to an 11KHz stream. How do I do that? From my days of engineering class many years ago, I know that the stream won't be able to describe anything over 5500Hz accurately anymore, so I assume I want to cut everything above that out too. Any ideas? Thanks.
Update: There is some code on this page that converts from 48KHz to 8KHz using a simple algorithm and a coefficient array that looks like { 1, 4, 12, 12, 4, 1 }. I think that is what I need, but I need it for a factor of 4x rather than 6x. Any idea how those constants are calculated? Also, I end up converting the 16 byte samples to floats anyway, so I can do the downsampling with floats rather than shorts, if that helps the quality at all.
Read on FIR and IIR filters. These are the filters that use a coefficent array.
If you do a google search on "FIR or IIR filter designer" you will find lots of software and online-applets that does the hard job (getting the coefficients) for you.
EDIT:
This page here ( http://www-users.cs.york.ac.uk/~fisher/mkfilter/ ) lets you enter the parameters of your filter and will spit out ready to use C-Code...
You're right in that you need apply lowpass filtering on your signal. Any signal over 5500 Hz will be present in your downsampled signal but 'aliased' as another frequency so you'll have to remove those before downsampling.
It's a good idea to do the filtering with floats. There are fixed point filter algorithms too but those generally have quality tradeoffs to work. If you've got floats then use them!
Using DFT's for filtering is generally overkill and it makes things more complicated because dft's are not a contiuous process but work on buffers.
Digital filters generally come in two tastes. FIR and IIR. The're generally the same idea but IIF filters use feedback loops to achieve a steeper response with far less coefficients. This might be a good idea for downsampling because you need a very steep filter slope there.
Downsampling is sort of a special case. Because you're going to throw away 3 out of 4 samples there's no need to calculate them. There is a special class of filters for this called polyphase filters.
Try googling for polyphase IIR or polyphase FIR for more information.
Notice (in additions to the other comments) that the simple-easy-intuitive approach "downsample by a factor of 4 by replacing each group of 4 consecutive samples by the average value", is not optimal but is nevertheless not wrong, nor practically nor conceptually. Because the averaging amounts precisely to a low pass filter (a rectangular window, which corresponds to a sinc in frequency). What would be conceptually wrong is to just downsample by taking one of each 4 samples: that would definitely introduce aliasing.
By the way: practically any software that does some resampling (audio, image or whatever; example for the audio case: sox) takes this into account, and frequently lets you choose the underlying low-pass filter.
You need to apply a lowpass filter before you downsample the signal to avoid "aliasing". The cutoff frequency of the lowpass filter should be less than the nyquist frequency, which is half the sample frequency.
The "best" solution possible is indeed a DFT, discarding the top 3/4 of the frequencies, and performing an inverse DFT, with the domain restricted to the bottom 1/4th. Discarding the top 3/4ths is a low-pass filter in this case. Padding to a power of 2 number of samples will probably give you a speed benefit. Be aware of how your FFT package stores samples though. If it's a complex FFT (which is much easier to analyze, and generally has nicer properties), the frequencies will either go from -22 to 22, or 0 to 44. In the first case, you want the middle 1/4th. In the latter, the outermost 1/4th.
You can do an adequate job by averaging sample values together. The naïve way of grabbing samples four by four and doing an equal weighted average works, but isn't too great. Instead you'll want to use a "kernel" function that averages them together in a non-intuitive way.
Mathwise, discarding everything outside the low-frequency band is multiplication by a box function in frequency space. The (inverse) Fourier transform turns pointwise multiplication into a convolution of the (inverse) Fourier transforms of the functions, and vice-versa. So, if we want to work in the time domain, we need to perform a convolution with the (inverse) Fourier transform of box function. This turns out to be proportional to the "sinc" function (sin at)/at, where a is the width of the box in the frequency space. So at every 4th location (since you're downsampling by a factor of 4) you can add up the points near it, multiplied by sin (a dt) / a dt, where dt is the distance in time to that location. How nearby? Well, that depends on how good you want it to sound. It's common to ignore everything outside the first zero, for instance, or just take the number of points to be the ratio by which you're downsampling.
Finally there's the piss-poor (but fast) way of just discarding the majority of the samples, keeping just the zeroth, the fourth, and so on.
Honestly, if it fits in memory, I'd recommend just going the DFT route. If it doesn't use one of the software filter packages that others have recommended to construct the filter for you.
The process you're after called "Decimation".
There are 2 steps:
Applying Low Pass Filter on the data (In your case LPF with Cut Off at Pi / 4).
Downsampling (In you case taking 1 out of 4 samples).
There are many methods to design and apply the Low Pass Filter.
You may start here:
http://en.wikipedia.org/wiki/Filter_design
You could make use of libsamplerate to do the heavy lifting. Libsamplerate is a C API, and takes care of calculating the filter coefficients. You to select from different quality filters so that you can trade off quality for speed.
If you would prefer not to write any code, you could just use Audacity to do the sample rate conversion. It offers a powerful GUI, and makes use of libsamplerate for it's sample rate conversion.
I would try applying DFT, chopping 3/4 of the result and applying inverse DFT. I can't tell if it will sound good without actually trying tough.
I recently came across BruteFIR which may already do some of what you're interested in?
You have to apply low-pass filter (removing frequencies above 5500 Hz) and then apply decimation (leave every Nth sample, every 4th in your case).
For decimation, FIR, not IIR filters are usually employed, because they don't depend on previous outputs and therefore you don't have to calculate anything for discarded samples. IIRs, generally, depends on both inputs and outputs, so, unless a specific type of IIR is used, you'd have to calculate every output sample before discarding 3/4 of them.
Just googled an intro-level article on the subject: https://www.dspguru.com/dsp/faqs/multirate/decimation

Resources