Analyze "whistle" sound for pitch/note

Analyze "whistle" sound for pitch/note - audio

I am trying to build a system that will be able to process a record of someone whistling and output notes.
Can anyone recommend an open-source platform which I can use as the base for the note/pitch recognition and analysis of wave files ?
Thanks in advance

As many others have already said, FFT is the way to go here. I've written a little example in Java using FFT code from http://www.cs.princeton.edu/introcs/97data/. In order to run it, you will need the Complex class from that page also (see the source for the exact URL).
The code reads in a file, goes window-wise over it and does an FFT on each window. For each FFT it looks for the maximum coefficient and outputs the corresponding frequency. This does work very well for clean signals like a sine wave, but for an actual whistle sound you probably have to add more. I've tested with a few files with whistling I created myself (using the integrated mic of my laptop computer), the code does get the idea of what's going on, but in order to get actual notes more needs to be done.
1) You might need some more intelligent window technique. What my code uses now is a simple rectangular window. Since the FFT assumes that the input singal can be periodically continued, additional frequencies are detected when the first and the last sample in the window don't match. This is known as spectral leakage ( http://en.wikipedia.org/wiki/Spectral_leakage ), usually one uses a window that down-weights samples at the beginning and the end of the window ( http://en.wikipedia.org/wiki/Window_function ). Although the leakage shouldn't cause the wrong frequency to be detected as the maximum, using a window will increase the detection quality.
2) To match the frequencies to actual notes, you could use an array containing the frequencies (like 440 Hz for a') and then look for the frequency that's closest to the one that has been identified. However, if the whistling is off standard tuning, this won't work any more. Given that the whistling is still correct but only tuned differently (like a guitar or other musical instrument can be tuned differently and still sound "good", as long as the tuning is done consistently for all strings), you could still find notes by looking at the ratios of the identified frequencies. You can read http://en.wikipedia.org/wiki/Pitch_%28music%29 as a starting point on that. This is also interesting: http://en.wikipedia.org/wiki/Piano_key_frequencies
3) Moreover it might be interesting to detect the points in time when each individual tone starts and stops. This could be added as a pre-processing step. You could do an FFT for each individual note then. However, if the whistler doesn't stop but just bends between notes, this would not be that easy.
Definitely have a look at the libraries the others suggested. I don't know any of them, but maybe they contain already functionality for doing what I've described above.
And now to the code. Please let me know what worked for you, I find this topic pretty interesting.
Edit: I updated the code to include overlapping and a simple mapper from frequencies to notes. It works only for "tuned" whistlers though, as mentioned above.
package de.ahans.playground;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
public class FftMaxFrequency {
// taken from http://www.cs.princeton.edu/introcs/97data/FFT.java.html
// (first hit in Google for "java fft"
// needs Complex class from http://www.cs.princeton.edu/introcs/97data/Complex.java
public static Complex[] fft(Complex[] x) {
int N = x.length;
// base case
if (N == 1) return new Complex[] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2"); }
// fft of even terms
Complex[] even = new Complex[N/2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
Complex[] q = fft(even);
// fft of odd terms
Complex[] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
Complex[] r = fft(odd);
// combine
Complex[] y = new Complex[N];
for (int k = 0; k < N/2; k++) {
double kth = -2 * k * Math.PI / N;
Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
y[k] = q[k].plus(wk.times(r[k]));
y[k + N/2] = q[k].minus(wk.times(r[k]));
}
return y;
}
static class AudioReader {
private AudioFormat audioFormat;
public AudioReader() {}
public double[] readAudioData(File file) throws UnsupportedAudioFileException, IOException {
AudioInputStream in = AudioSystem.getAudioInputStream(file);
audioFormat = in.getFormat();
int depth = audioFormat.getSampleSizeInBits();
long length = in.getFrameLength();
if (audioFormat.isBigEndian()) {
throw new UnsupportedAudioFileException("big endian not supported");
}
if (audioFormat.getChannels() != 1) {
throw new UnsupportedAudioFileException("only 1 channel supported");
}
byte[] tmp = new byte[(int) length];
byte[] samples = null;
int bytesPerSample = depth/8;
int bytesRead;
while (-1 != (bytesRead = in.read(tmp))) {
if (samples == null) {
samples = Arrays.copyOf(tmp, bytesRead);
} else {
int oldLen = samples.length;
samples = Arrays.copyOf(samples, oldLen + bytesRead);
for (int i = 0; i < bytesRead; i++) samples[oldLen+i] = tmp[i];
}
}
double[] data = new double[samples.length/bytesPerSample];
for (int i = 0; i < samples.length-bytesPerSample; i += bytesPerSample) {
int sample = 0;
for (int j = 0; j < bytesPerSample; j++) sample += samples[i+j] << j*8;
data[i/bytesPerSample] = (double) sample / Math.pow(2, depth);
}
return data;
}
public AudioFormat getAudioFormat() {
return audioFormat;
}
}
public class FrequencyNoteMapper {
private final String[] NOTE_NAMES = new String[] {
"A", "Bb", "B", "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#"
};
private final double[] FREQUENCIES;
private final double a = 440;
private final int TOTAL_OCTAVES = 6;
private final int START_OCTAVE = -1; // relative to A
public FrequencyNoteMapper() {
FREQUENCIES = new double[TOTAL_OCTAVES*12];
int j = 0;
for (int octave = START_OCTAVE; octave < START_OCTAVE+TOTAL_OCTAVES; octave++) {
for (int note = 0; note < 12; note++) {
int i = octave*12+note;
FREQUENCIES[j++] = a * Math.pow(2, (double)i / 12.0);
}
}
}
public String findMatch(double frequency) {
if (frequency == 0)
return "none";
double minDistance = Double.MAX_VALUE;
int bestIdx = -1;
for (int i = 0; i < FREQUENCIES.length; i++) {
if (Math.abs(FREQUENCIES[i] - frequency) < minDistance) {
minDistance = Math.abs(FREQUENCIES[i] - frequency);
bestIdx = i;
}
}
int octave = bestIdx / 12;
int note = bestIdx % 12;
return NOTE_NAMES[note] + octave;
}
}
public void run (File file) throws UnsupportedAudioFileException, IOException {
FrequencyNoteMapper mapper = new FrequencyNoteMapper();
// size of window for FFT
int N = 4096;
int overlap = 1024;
AudioReader reader = new AudioReader();
double[] data = reader.readAudioData(file);
// sample rate is needed to calculate actual frequencies
float rate = reader.getAudioFormat().getSampleRate();
// go over the samples window-wise
for (int offset = 0; offset < data.length-N; offset += (N-overlap)) {
// for each window calculate the FFT
Complex[] x = new Complex[N];
for (int i = 0; i < N; i++) x[i] = new Complex(data[offset+i], 0);
Complex[] result = fft(x);
// find index of maximum coefficient
double max = -1;
int maxIdx = 0;
for (int i = result.length/2; i >= 0; i--) {
if (result[i].abs() > max) {
max = result[i].abs();
maxIdx = i;
}
}
// calculate the frequency of that coefficient
double peakFrequency = (double)maxIdx*rate/(double)N;
// and get the time of the start and end position of the current window
double windowBegin = offset/rate;
double windowEnd = (offset+(N-overlap))/rate;
System.out.printf("%f s to %f s:\t%f Hz -- %s\n", windowBegin, windowEnd, peakFrequency, mapper.findMatch(peakFrequency));
}
}
public static void main(String[] args) throws UnsupportedAudioFileException, IOException {
new FftMaxFrequency().run(new File("/home/axr/tmp/entchen.wav"));
}
}

i think this open-source platform suits you
http://code.google.com/p/musicg-sound-api/

Well, you could always use fftw to perform the Fast Fourier Transform. It's a very well respected framework. Once you've got an FFT of your signal you can analyze the resultant array for peaks. A simple histogram style analysis should give you the frequencies with the greatest volume. Then you just have to compare those frequencies to the frequencies that correspond with different pitches.

in addition to the other great options:
csound pitch detection: http://www.csounds.com/manual/html/pvspitch.html
fmod: http://www.fmod.org/ (has a free version)
aubio: http://aubio.org/doc/pitchdetection_8h.html

You might want to consider Python(x,y). It's a scientific programming framework for python in the spirit of Matlab, and it has easy functions for working in the FFT domain.

If you use Java, have a look at TarsosDSP library. It has a pretty good ready-to-go pitch detector.
Here is an example for android, but I think it doesn't require too much modifications to use it elsewhere.

I'm a fan of the FFT but for the monophonic and fairly pure sinusoidal tones of whistling, a zero-cross detector would do a far better job at determining the actual frequency at a much lower processing cost. Zero-cross detection is used in electronic frequency counters that measure the clock rate of whatever is being tested.
If you going to analyze anything other than pure sine wave tones, then FFT is definitely the way to go.
A very simple implementation of zero cross detection in Java on GitHub

Related

Profit Maximization based on dynamix programming

I have been trying to solve this problem :
" You have to travel to different villages to make some profit.
In each village, you gain some profit. But the catch is, from a particular village i, you can only move to a village j if and only if and the profit gain from village j is a multiple of the profit gain from village i.
You have to tell the maximum profit you can gain while traveling."
Here is the link to the full problem:
https://www.hackerearth.com/practice/algorithms/dynamic-programming/introduction-to-dynamic-programming-1/practice-problems/algorithm/avatar-and-his-quest-d939b13f/description/
I have been trying to solve this problem for quite a few hours. I know this is a variant of the longest increasing subsequence but the first thought that came to my mind was to solve it through recursion and then memoize it. Here is a part of the code to my approach. Please help me identify the mistake.
static int[] dp;
static int index;
static int solve(int[] p) {
int n = p.length;
int max = 0;
for(int i = 0;i<n; i++)
{
dp = new int[i+1];
Arrays.fill(dp,-1);
index = i;
max = Math.max(max,profit(p,i));
}
return max;
}
static int profit(int[] p, int n)
{
if(dp[n] == -1)
{
if(n == 0)
{
if(p[index] % p[n] == 0)
dp[n] = p[n];
else
dp[n] = 0;
}
else
{
int v1 = profit(p,n-1);
int v2 = 0;
if(p[index] % p[n] == 0)
v2 = p[n] + profit(p,n-1);
dp[n] = Math.max(v1,v2);
}
}
return dp[n];
}

I have used extra array to get the solution, my code is written in Java.
public static int getmaxprofit(int[] p, int n){
// p is the array that contains all the village profits
// n is the number of villages
// used one extra array msis, that would be just a copy of p initially
int i,j,max=0;
int msis[] = new int[n];
for(i=0;i<n;i++){
msis[i]=p[i];
}
// while iteraring through p, I will check in backward and find all the villages that can be added based on criteria such previous element must be smaller and current element is multiple of previous.
for(i=1;i<n;i++){
for(j=0;j<i;j++){
if(p[i]>p[j] && p[i]%p[j]==0 && msis[i] < msis[j]+p[i]){
msis[i] = msis[j]+p[i];
}
}
}
for(i=0;i<n;i++){
if(max < msis[i]){
max = msis[i];
}
}
return max;
}

Loss of data during the Inverse-FFT of an Image

I am using the following code to convert a Bitmap to Complex and vice versa.
Even though those were directly copied from Accord.NET framework, while testing these static methods, I have discovered that, repeated use of these static methods cause 'data-loss'. As a result, the end output/result becomes distorted.
public partial class ImageDataConverter
{
#region private static Complex[,] FromBitmapData(BitmapData bmpData)
private static Complex[,] ToComplex(BitmapData bmpData)
{
Complex[,] comp = null;
if (bmpData.PixelFormat == PixelFormat.Format8bppIndexed)
{
int width = bmpData.Width;
int height = bmpData.Height;
int offset = bmpData.Stride - (width * 1);//1 === 1 byte per pixel.
if ((!Tools.IsPowerOf2(width)) || (!Tools.IsPowerOf2(height)))
{
throw new Exception("Imager width and height should be n of 2.");
}
comp = new Complex[width, height];
unsafe
{
byte* src = (byte*)bmpData.Scan0.ToPointer();
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++, src++)
{
comp[y, x] = new Complex((float)*src / 255,
comp[y, x].Imaginary);
}
src += offset;
}
}
}
else
{
throw new Exception("EightBppIndexedImageRequired");
}
return comp;
}
#endregion
public static Complex[,] ToComplex(Bitmap bmp)
{
Complex[,] comp = null;
if (bmp.PixelFormat == PixelFormat.Format8bppIndexed)
{
BitmapData bmpData = bmp.LockBits( new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadOnly,
PixelFormat.Format8bppIndexed);
try
{
comp = ToComplex(bmpData);
}
finally
{
bmp.UnlockBits(bmpData);
}
}
else
{
throw new Exception("EightBppIndexedImageRequired");
}
return comp;
}
public static Bitmap ToBitmap(Complex[,] image, bool fourierTransformed)
{
int width = image.GetLength(0);
int height = image.GetLength(1);
Bitmap bmp = Imager.CreateGrayscaleImage(width, height);
BitmapData bmpData = bmp.LockBits(
new Rectangle(0, 0, width, height),
ImageLockMode.ReadWrite,
PixelFormat.Format8bppIndexed);
int offset = bmpData.Stride - width;
double scale = (fourierTransformed) ? Math.Sqrt(width * height) : 1;
unsafe
{
byte* address = (byte*)bmpData.Scan0.ToPointer();
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++, address++)
{
double min = System.Math.Min(255, image[y, x].Magnitude * scale * 255);
*address = (byte)System.Math.Max(0, min);
}
address += offset;
}
}
bmp.UnlockBits(bmpData);
return bmp;
}
}
(The DotNetFiddle link of the complete source code)
(ImageDataConverter)
Output:
As you can see, FFT is working correctly, but, I-FFT isn't.
That is because bitmap to complex and vice versa isn't working as expected.
What could be done to correct the ToComplex() and ToBitmap() functions so that they don't loss data?

I do not code in C# so handle this answer with extreme prejudice!
Just from a quick look I spotted few problems:
ToComplex()
Is converting BMP into 2D complex matrix. When you are converting you are leaving imaginary part unchanged, but at the start of the same function you have:
Complex[,] complex2D = null;
complex2D = new Complex[width, height];
So the imaginary parts are either undefined or zero depends on your complex class constructor. This means you are missing half of the data needed for reconstruction !!! You should restore the original complex matrix from 2 images one for real and second for imaginary part of the result.
ToBitmap()
You are saving magnitude which is I think sqrt( Re*Re + Im*Im ) so it is power spectrum not the original complex values and so you can not reconstruct back... You should store Re,Im in 2 separate images.
8bit per pixel
That is not much and can cause significant round off errors after FFT/IFFT so reconstruction can be really distorted.
[Edit1] Remedy
There are more options to repair this for example:
use floating complex matrix for computations and bitmap only for visualization.
This is the safest way because you avoid additional conversion round offs. This approach has the best precision. But you need to rewrite your DIP/CV algorithms to support complex domain matrices instead of bitmaps which require not small amount of work.
rewrite your conversions to support real and imaginary part images
Your conversion is really bad as it does not store/restore Real and Imaginary parts as it should and also it does not account for negative values (at least I do not see it instead they are cut down to zero which is WRONG). I would rewrite the conversion to this:
// conversion scales
float Re_ofset=256.0,Re_scale=512.0/255.0;
float Im_ofset=256.0,Im_scale=512.0/255.0;
private static Complex[,] ToComplex(BitmapData bmpRe,BitmapData bmpIm)
{
//...
byte* srcRe = (byte*)bmpRe.Scan0.ToPointer();
byte* srcIm = (byte*)bmpIm.Scan0.ToPointer();
complex c = new Complex(0.0,0.0);
// for each line
for (int y = 0; y < height; y++)
{
// for each pixel
for (int x = 0; x < width; x++, src++)
{
complex2D[y, x] = c;
c.Real = (float)*(srcRe*Re_scale)-Re_ofset;
c.Imaginary = (float)*(srcIm*Im_scale)-Im_ofset;
}
src += offset;
}
//...
}
public static Bitmap ToBitmapRe(Complex[,] complex2D)
{
//...
float Re = (complex2D[y, x].Real+Re_ofset)/Re_scale;
Re = min(Re,255.0);
Re = max(Re, 0.0);
*address = (byte)Re;
//...
}
public static Bitmap ToBitmapIm(Complex[,] complex2D)
{
//...
float Im = (complex2D[y, x].Imaginary+Im_ofset)/Im_scale;
Re = min(Im,255.0);
Re = max(Im, 0.0);
*address = (byte)Im;
//...
}
Where:
Re_ofset = min(complex2D[,].Real);
Im_ofset = min(complex2D[,].Imaginary);
Re_scale = (max(complex2D[,].Real )-min(complex2D[,].Real ))/255.0;
Im_scale = (max(complex2D[,].Imaginary)-min(complex2D[,].Imaginary))/255.0;
or cover bigger interval then the complex matrix values.
You can also encode both Real and Imaginary parts to single image for example first half of image could be Real and next the Imaginary part. In that case you do not need to change the function headers nor names at all .. but you would need to handle the images as 2 joined squares each with different meaning ...
You can also use RGB images where R = Real, B = Imaginary or any other encoding that suites you.
[Edit2] some examples to make my points more clear
example of approach #1
The image is in form of floating point 2D complex matrix and the images are created only for visualization. There is little rounding error this way. The values are not normalized so the range is <0.0,255.0> per pixel/cell at first but after transforms and scaling it could change greatly.
As you can see I added scaling so all pixels are multiplied by 315 to actually see anything because the FFT output values are small except of few cells. But only for visualization the complex matrix is unchanged.
example of approach #2
Well as I mentioned before you do not handle negative values, normalize values to range <0,1> and back by scaling and rounding off and using only 8 bits per pixel to store the sub results. I tried to simulate that with my code and here is what I got (using complex domain instead of wrongly used power spectrum like you did). Here C++ source only as an template example as you do not have the functions and classes behind it:
transform t;
cplx_2D c;
rgb2i(bmp0);
c.ld(bmp0,bmp0);
null_im(c);
c.mul(1.0/255.0);
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out0_Re.bmp");
bmp1->SaveToFile("_out0_Im.bmp");
t. DFFT(c,c);
c.wrap();
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out1_Re.bmp");
bmp1->SaveToFile("_out1_Im.bmp");
c.wrap();
t.iDFFT(c,c);
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out2_Re.bmp");
bmp1->SaveToFile("_out2_Im.bmp");
And here the sub results:
As you can see after the DFFT and wrap the image is really dark and most of the values are rounded off. So the result after unwrap and IDFFT is really pure.
Here some explanations to code:
c.st(bmpre,bmpim) is the same as your ToBitmap
c.ld(bmpre,bmpim) is the same as your ToComplex
c.mul(scale) multiplies complex matrix c by scale
rgb2i converts RGB to grayscale intensity <0,255>
i2iii converts grayscale intensity ro grayscale RGB image

I'm not really good in this puzzles but double check this dividing.
comp[y, x] = new Complex((float)*src / 255, comp[y, x].Imaginary);
You can loose precision as it is described here
Complex class definition in Remarks section.
May be this happens in your case.
Hope this helps.

Non-blocking read of /dev/input/mice in java

I am trying to poll a mouse and update its position in Java using "/dev/input/mice". However right now I cannot figure out a way to determine if there is new data. How do I read an input file in a non-blocking way?
public class Mouse {
String MOUSE_STREAM = "/dev/input/mice";
BufferedInputStream bis;
private Mouse(){
bis = new BufferedInputStream(new FileInputStream(new File (MOUSE_STREAM)));
}
public Vect getDelta() {
int sumX = 0;
int sumY = 0;
// This is what I want, but available always returns 0;
// while (bis.available() > 3) {
// Otherwise, If I don't check then .read() will block until the next event
int buttons = bis.read();
int x = bis.read();
int y = bis.read();
sumX += x;
sumY += y;
}
return new Vect(sumX, sumY);
}
}
PS I need to be using a very low level interface because I'm operating in a Real-Time version of linux without much support for higher level libraries.

Finding the local maxima/peaks and minima/valleys of histograms

Ok, so I have a histogram (represented by an array of ints), and I'm looking for the best way to find local maxima and minima. Each histogram should have 3 peaks, one of them (the first one) probably much higher than the others.
I want to do several things:
Find the first "valley" following the first peak (in order to get rid of the first peak altogether in the picture)
Find the optimum "valley" value in between the remaining two peaks to separate the picture
I already know how to do step 2 by implementing a variant of Otsu.
But I'm struggling with step 1
In case the valley in between the two remaining peaks is not low enough, I'd like to give a warning.
Also, the image is quite clean with little noise to account for
What would be the brute-force algorithms to do steps 1 and 3? I could find a way to implement Otsu, but the brute-force is escaping me, math-wise. As it turns out, there is more documentation on doing methods like otsu, and less on simply finding peaks and valleys. I am not looking for anything more than whatever gets the job done (i.e. it's a temporary solution, just has to be implementable in a reasonable timeframe, until I can spend more time on it)
I am doing all this in c#
Any help on which steps to take would be appreciated!
Thank you so much!
EDIT: some more data:
most histogram are likely to be like the first one, with the first peak representing background.

Use peakiness-test. It's a method to find all the possible peak between two local minima, and measure the peakiness based on a formula. If the peakiness higher than a threshold, the peak is accepted.
Source: UCF CV CAP5415 lecture 9 slides
Below is my code:
public static List<int> PeakinessTest(int[] histogram, double peakinessThres)
{
int j=0;
List<int> valleys = new List<int> ();
//The start of the valley
int vA = histogram[j];
int P = vA;
//The end of the valley
int vB = 0;
//The width of the valley, default width is 1
int W = 1;
//The sum of the pixels between vA and vB
int N = 0;
//The measure of the peaks peakiness
double peakiness=0.0;
int peak=0;
bool l = false;
try
{
while (j < 254)
{
l = false;
vA = histogram[j];
P = vA;
W = 1;
N = vA;
int i = j + 1;
//To find the peak
while (P < histogram[i])
{
P = histogram[i];
W++;
N += histogram[i];
i++;
}
//To find the border of the valley other side
peak = i - 1;
vB = histogram[i];
N += histogram[i];
i++;
W++;
l = true;
while (vB >= histogram[i])
{
vB = histogram[i];
W++;
N += histogram[i];
i++;
}
//Calculate peakiness
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres & !valleys.Contains(j))
{
//peaks.Add(peak);
valleys.Add(j);
valleys.Add(i - 1);
}
j = i - 1;
}
}
catch (Exception)
{
if (l)
{
vB = histogram[255];
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres)
valleys.Add(255);
//peaks.Add(255);
return valleys;
}
}
//if(!valleys.Contains(255))
// valleys.Add(255);
return valleys;
}

Check for Text within an Image C#. Is using memcmp an option?

I'm working on a research project which requires me to identify text within an image. Over the forum I saw a post of using memcmp, but I'm having no luck with this.
To give more details on my task :
I screen capture this. My image reads "GPS: Initial Location 34 45 23".
I then dip into a predefined map of images that I load at the start of my application.The map contains images for text - Initial, Reset, Launch, ....
How do I check if the image I captured matches to one of the predefined images in the map.
Kindly help.
Attaching a snapshot of code
public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
{
if ((b1 == null) != (b2 == null)) return false;
var bd1 = b1.LockBits(new Rectangle(new Point(0, 0), b1.Size), ImageLockMode.ReadOnly, b1.PixelFormat);
var bd2 = b2.LockBits(new Rectangle(new Point(0, 0), b2.Size), ImageLockMode.ReadOnly, b2.PixelFormat);
try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;
int stride = bd1.Stride;
int len = stride * b1.Height;
int stride2 = bd2.Stride;
int len2 = stride2 * b2.Height;
for (int i = 0; i < len; ++i)
{
bd1scan0 = bd1.Scan0 + i;
int test = memcmp(bd1scan0, bd2scan0, len2);
if (test == 0)
{
Console.WriteLine("Found the string");
return true;
}
}
return false;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}

If you are looking for an exact match, i.e. a match where every bit is the same, you could use this approach. However, if this is not the case, other algorithms might be better. One example would be to use cross correlation. I used it to compare audio files and it works great. See this question

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Analyze "whistle" sound for pitch/note - audio

I am trying to build a system that will be able to process a record of someone whistling and output notes. Can anyone recommend an open-source platform which I can use as the base for the note/pitch recognition and analysis of wave files ? Thanks in advance

i think this open-source platform suits you http://code.google.com/p/musicg-sound-api/

in addition to the other great options: csound pitch detection: http://www.csounds.com/manual/html/pvspitch.html fmod: http://www.fmod.org/ (has a free version) aubio: http://aubio.org/doc/pitchdetection_8h.html

You might want to consider Python(x,y). It's a scientific programming framework for python in the spirit of Matlab, and it has easy functions for working in the FFT domain.

If you use Java, have a look at TarsosDSP library. It has a pretty good ready-to-go pitch detector. Here is an example for android, but I think it doesn't require too much modifications to use it elsewhere.

Related

Profit Maximization based on dynamix programming

Loss of data during the Inverse-FFT of an Image

Non-blocking read of /dev/input/mice in java

Finding the local maxima/peaks and minima/valleys of histograms

Check for Text within an Image C#. Is using memcmp an option?

Categories

Resources