I'm using https://github.com/liuliu/ccv for text detection in business cards. The problem is that the recall is low and a good amount of text in the business cards is missed - sometimes partially and at times completely. Can anyone suggest values for the parameters in the file swtdetect.c that could give high recall? I can compromise on the precision.
Related
I am trying to make a normal template matching search more effizient by first doing the search on downscaled representations of the image. Basically I do a double pyrDown -> quarter resolution.
For most images and templates this works beautifully, but for some others I get really bad matching results. It seems to be especially bad for thin fonts or small contrast.
Look at this example:
And this template:
At 100% resolution I get a matching probability of 99,9%
At 50% resolution I get 90%
At 25% resolution I get 87%
I don't really know why its so bad for some images/templates. I tried to recreate and test in photoshop by hiding/showing the 25% downscaled template on top of the 25% downscaled image, and as you can see, it's not 100% congruent:
https://giphy.com/gifs/coWDjcvHysKgn95IFa
I need a way to get more probability for those matchings at low resolution because it needs to be fast.
Any ideas on how to improve my algorithm?
Here are the original files:
https://www.dropbox.com/s/llbdj9bx5eprxbk/images.zip?dl=0
This is not unusual and those scores seem perfectly fine. However here are some ideas that might help you improve the situation:
You mentioned that it seems to be especially bad for thin fonts. This could be happening because some of the pixels in the lines are being smoothed out or distorted with the Gaussian filter that is applied on pyrDown. It could also be an indication that you have reduced the resolution too much. Unfortunately I think the pyrDown function in OpenCV reduces the resolution by a factor of 2 so it does not give you the ability to fine tune it by other scale factors. Another thing you could try is the instruction resize() with interpolation set to INTER_LINEAR or INTER_CUBIC. The resize() function will allow you to resize the image using any scale factor so you might have more control of performance vs accuracy.
Use multiple templates of the same objects. If you come to a scene and can only achieve an 87% score, create a template out of that scene. Then add it to a database of templates that are to be utilized. Obviously as the amount of templates increases so does the time it takes to complete the search.
The best way to deal with this scenario is to perform an exhaustive match on the highest level of the pyramid then track it down to the lowest level using a reduced search space on lower levels. By exhaustive I mean you will search all rows and all columns across the entire top pyramid level image. You will keep track of the locations (row, col) of the highest matches on the highest level (you are already probably doing that). Then you will multiply those locations by a factor of 2 and perform a restricted search on the next lowest level (ex. 5 x 5 shift centered on the rough location). You keep doing this until you are at the bottom level. This will give you the best overall accuracy and performance. This is also the way most industrial computer vision packages do it.
I have become a part of this infinite question of how to estimate position from accelerometer data achieved by an Inertial measurement unit. I am wondering how to compensate for integration ''drift'' during linear movement using Kalman filtering.
At this moment I got my acceleration in a fixed coordinate system and all movements are in know directions with no change in angular position.
So at this point we got acceleration in 3D (x-y-z) in known directions, an acceleration in x will yield for zero acceleration in y and z and so on. Assuming perfect conditions, which are not the case, of course some noise with be added to the other directions when moving in one direction but lets ''leave'' this out at this point. In addition, It is important to note that the system only has to estimate a limited period, approximately about 1 second using a sampling freq of 512 Hz.
It also important to note that I have compensated for the offset (gravity and misalignment of the accelerometer in the IMU) and bias of the acceleromter data when static. Meaning when the sensor is non-moving all my readings are constant zero before going into the Kalman filter.
To more characterize my problem I have this graph to illustrate my problem with drift. This is estimations on 5 seconds to more show what I'm struggling with.
Position-estimation-drift-problem
Here we are looking into a movement in one direction, the movement are 20cm movement in y direction which in my case are forward relative to my starting position.
Is there a way to reduce/eliminate this drift when integrating my signal. For instance assume something about drifting when my sensor is non-moving. Or to compute using some correction in my Kalman algorithm to subtract or add to my estimated velocity and position. The system does not have to run in real time so any tuning bias compensation can be adjusted for looking back into the data. But I would be preferable if it was possible to take new measurements with slightly different movements and not tune more then needed.
Finally where/how can I compensate for this, in the Kalman algorithm or before/after, or should I be in for a disappointment already?
If I left out some important information please ask so i can elaborate more, an at last any thoughts/ideas are welcome!
Remember I do only need to estimate for second’s worth of time so my hope is that this makes it more achievable, but i might be wrong?
I can only guess/suggest few tricks, but you will probably get some significant error if you only based on accelerometer.
seems that detecting motionless is not resetting the speed, just acceleration (according to your graph) so this should be an easy fix
if we are talking an a car/other type of surface motion with contact / friction, your motionless can be set by characterizing the noise of in motion/self sensor noise
kalman parameters may be off
run multiple kernels and average results (may also try particle filter)
if its not for online application you can also try fitting offsets/drift and reduce them by assuming there is not motion in constant speed or other approaches that can replace the kalman filter which is designed for real time best estimation.
error seems a-symmetric in time, just run it in both directions (:
what are you measuring at 512 Hz??? maybe you can better model it
I can go on and on but if you supply data and code, it would be much easier.
Good luck,
Lev
I've been hunting all over the web for material about vocoder or autotune, but haven't got any satisfactory answers. Could someone in a simple way please explain how do you autotune a given sound file using a carrier sound file?
(I'm familiar with ffts, windowing, overlap etc., I just don't get the what do we do when we have the ffts of the carrier and the original sound file which has to be modulated)
EDIT: After looking around a bit more, I finally got to know exactly what I was looking for -- a channel vocoder. The way it works is, it takes two inputs, one a voice signal and the other a musical signal rich in frequency. The musical signal is modulated by the envelope of the voice signal, and the output signal sounds like the voice singing in the musical tone.
Thanks for your help!
Using a phase vocoder to adjust pitch is basically pitch estimation plus interpolation in the frequency domain.
A phase vocoder reconstruction method might resample the frequency spectrum at, potentially, a new FFT bin spacing to shift all the frequencies up or down by some ratio. The phase vocoder algorithm additionally uses information shared between adjacent FFT frames to make sure this interpolation result can create continuous waveforms across frame boundaries. e.g. it adjusts the phases of the interpolation results to make sure that successive sinewave reconstructions are continuous rather than having breaks or discontinuities or phase cancellations between frames.
How much to shift the spectrum up or down is determined by pitch estimation, and calculating the ratio between the estimated pitch of the source and that of the target pitch. Again, phase vocoders use information about any phase differences between FFT frames to help better estimate pitch. This is possible by using more a bit more global information than is available from a single local FFT frame.
Of course, this frequency and phase changing can smear out transient detail and cause various other distortions, so actual phase vocoder products may additionally do all kinds of custom (often proprietary) special case tricks to try and fix some of these problems.
The first step is pitch detection. There are a number of pitch detection algorithms, introduced briefly in wikipedia: http://en.wikipedia.org/wiki/Pitch_detection_algorithm
Pitch detection can be implemented in either frequency domain or time domain. Various techniques in both domains exist with various properties (latency, quality, etc.) In the F domain, it is important to realize that a naive approach is very limiting because of the time/frequency trade-off. You can get around this limitation, but it takes work.
Once you've identified the pitch, you compare it with a desired pitch and determine how much you need to actually pitch shift.
Last step is pitch shifting, which, like pitch detection, can be done in the T or F domain. The "phase vocoder" method other folks mentioned is the F domain method. T domain methods include (in increasing order of quality) OLA, SOLA and PSOLA, some of which you can read about here: http://www.scribd.com/doc/67053489/60/Synchronous-Overlap-and-Add-SOLA
Basically you do an FFT, then in the frequency domain you move the signals to the nearest perfect semitone pitch.
Forgive me if I may come as ignorant but I would like to ask some questions regarding using Filter Algorithms for Note Onset Detection.
Is 'Detection Function' the same as using Filters on the audio signal? Or generally, what is the difference between Detection Function, Filtering (pre-processing the signal), and Peak-Picking?
I've constantly heard about the Low-Pass (or High-Pass) filter, but I am confused. I read that it works on cancelling out certain frequencies that are below (or above) a certain threshold. However, I am using the Time-Domain for calculating Note Onsets (that is, using the change in signal amplitude/energy). So I am not sure on how I can apply low-pass filtering to the time-domain. Any other good filters for note-onset detection?
What is the difference between, Spectral and Phase energy? (I have an idea that spectral refers to the spectogram or frequencies, but I do not know what Phase is)
I am having difficulties with working with dynamic thresholding. Any suggestions for a good algorithm? For example, I have the following signal:
As shown in the image above, there are note onsets that I have missed. A brief description of my algorithm, I calculate and take note of the energy/amplitude changes that occur in the audio signal. Then I get the maximum 'energy change' and based on the sensitivity, I take a percentage of it and set it as the threshold. So this is where the problem of dealing with varying degrees of amplitude/energy comes in. If I set the sensitivity too low, I come up with 'ghost' onsets and if I set the sensitivity too high, I miss out on some onsets. Any suggestions to improve the algorithm (or suggest a new algorithm) that I am using?
I am sure that it is difficult to have 100% accuracy but I need to have a better algorithm for note onset detection compared with what I have now. I would appreciate all the help I can get. Thank you very much!
One way is to detect sudden increases in the amplitude envelope. One way of calculating the amplitude envelope is to rectify the input signal (i.e. take the absolute value) and then low pass filter it. Check out the code examples in http://www.musicdsp.org for time domain filter examples and envelope followers.
I'm looking for ways to determine the quality of a photography (jpg). The first thing that came into my mind was to compare the file-size to the amount of pixel stored within. Are there any other ways, for example to check the amount of noise in a jpg? Does anyone have a good reading link on this topic or any experience? By the way, the project I'm working on is written in C# (.net 3.5) and I use the Aurigma Graphics Mill for image processing.
Thanks in advance!
I'm not entirely clear what you mean by "quality", if you mean the quality setting in the JPG compression algorithm then you may be able to extract it from the EXIF tags of the image (relies on the capture device putting them in and no-one else overwriting them) for your library see here:
http://www.aurigma.com/Support/DocViewer/30/JPEGFileFormat.htm.aspx
If you mean any other sort of "quality" then you need to come up with a better definition of quality. For example, over-exposure may be a problem in which case hunting for saturated pixels would help determine that specific sort of quality. Or more generally you could look at statistics (mean, standard deviation) of the image histogram in the 3 colour channels. The image may be out of focus, in which case you could look for a cutoff in the spatial frequencies of the image Fourier transform. If you're worried about speckle noise then you could try applying a median filter to the image and comparing back to the original image (more speckle noise would give a larger change) - I'm guessing a bit here.
If by "quality" you mean aesthetic properties of composition etc then - good luck!
The 'quality' of an image is not measurable, because it doesn't correspond to any particular value.
If u take it as number of pixels in the image of specific size its not accurate. You might talk about a photograph taken in bad light conditions as being of 'bad quality', even though it has exactly the same number of pixels as another image taken in good light conditions. This term is often used to talk about the overall effect of an image, rather than its technical specifications.
I wanted to do something similar, but wanted the "Soylent Green" option and used people to rank images by performing comparisons. See the question responses here.
I think you're asking about how to determine the quality of the compression process itself. This can be done by converting the JPEG to a BMP and comparing that BMP to the original bitmap from with the JPEG was created. You can iterate through the bitmaps pixel-by-pixel and calculate a pixel-to-pixel "distance" by summing the differences between the R, G and B values of each pair of pixels (i.e. the pixel in the original and the pixel in the JPEG) and dividing by the total number of pixels. This will give you a measure of the average difference between the original and the JPEG.
Reading the number of pixels in the image can tell you the "megapixel" size(#pixels/1000000), which can be a crude form of programatic quality check, but that wont tell you if the photo is properly focused, assuming it is supposed to be focused (think fast-motion objects, like trains), nor weather or not there is something in the pic worth looking at, that will require a human, or pigeon if you prefer.