Multi audio tones to sound card using portaudio - audio

I am trying to generate a tone to the sound card (Frequency: 1950 hz, duration: 40 ms, level: -30 db, right-channel (stereo), on steam 1). Eventually, I would like to play two of these tones (one goes to channel 1 and one goes to channel 2).
Any help or direction is greatly appreciated.
Thanks,
DW
Hi Bjorn, I tried this but I am not getting the what I am expecting as a frequency (plus seems sound is not clean). Any ideas what's wrong?
I greatly appreciate any help.
#define SAMPLE_RATE (44100)
#define TABLE_SIZE (200)
float FREQUENCY = 422;
...
for(int i=0; i<TABLE_SIZE; i++ )
{
data.sine[i] = (float) sin( (double)i * ((2.0 * M_PI)/(double)SAMPLE_RATE) * FREQUENCY );
}
data.left_phase = 0;
data.right_phase = 0;
...
... in callback function ...
for(unsigned long i = 0; i < framesPerBuffer; i++ )
{
// fill output buffer with sin wave
*out++ = data->amp * data->sine[data->left_phase]; // left
*out++ = data->amp * data->sine[data->right_phase]; // right
data->left_phase += 1;
if( data->left_phase >= TABLE_SIZE )
data->left_phase -= TABLE_SIZE;
data->right_phase += 1;
if( data->right_phase >= TABLE_SIZE )
data->right_phase -= TABLE_SIZE;
}

PortAudio has sample code for generating tones, you just need to figure out the frequency. See for example this answer:
[portaudio]Transmit and Detect frequency - Windows
Update:
Rather than trying to store a table of sine data, simply calculate the sine value in the callback using this formula:
amplitude[n] = sin( n * desiredFreq * 2 * pi / samplerate )
so (untested) your code will look something like this:
typedef struct
{
long n;
} MyData;
float FREQUENCY = 422;
static int MyCallback(
const void *inputBuffer,
void *outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData
)
{
MyData *data = (MyData*)userData;
float *out = (float*)outputBuffer;
(void) timeInfo; /* Prevent unused variable warnings. */
(void) statusFlags;
(void) inputBuffer;
for(unsigned long i = 0; i < framesPerBuffer; i++ )
{
// fill output buffer with sin wave
float v = sin( data->n * FREQUENCY * 2 * PI / (float) SAMPLERATE )
*out++ = v; // left
*out++ = v; // right
}
return paContinue;
}
This code is not without problems: eg. eventually n will "wrap around" and I'm not sure if sin remains accurate and efficient as the input gets larger. Nevertheless it's a good starting point, and if you just need to generate a few seconds of a tone on modern hardware, this is really all you need. If you need something fancier, get this working first, then you can worry about making it more efficient and robust with a LUT.

Related

Add anti-aliasing/bandlimit for looped wav sample (NOT Fourier transform)

How to build antialiasing interpolation using c++ code? I have a simple 4096 or 1024 buffer. Of course when I play this at high frequencies I get aliasing issues. to avoid this, the signal must be limited by the bandwidth at high frequencies. Roughly speaking, the 'sawtooth' wave at high frequencies should looks like a regular sine. And that is what I want to get so that my sound didn't sound dirty like you moving knobs in your old FM/AM radio in your car.
I know how to build bandlimited square,triangle,sawtoth with Fourier transform. So my question is only about wavetable
Found solution in the AudioKit sources. One buffer will be split into 10 buffers/octaves. So when you play a sound, you don't play your original wave, but play a sample that was prepared for a specific octave.
Import to your project WaveStack.hpp
namespace AudioKitCore
{
// WaveStack represents a series of progressively lower-resolution sampled versions of a
// waveform. Client code supplies the initial waveform, at a resolution of 1024 samples,
// equivalent to 43.6 Hz at 44.1K samples/sec (about 23.44 cents below F1, midi note 29),
// and then calls initStack() to create the filtered higher-octave versions.
// This provides a basis for anti-aliased oscillators; see class WaveStackOscillator.
struct WaveStack
{
// Highest-resolution rep uses 2^maxBits samples
static constexpr int maxBits = 10; // 1024
// maxBits also defines the number of octave levels; highest level has just 2 samples
float *pData[maxBits];
WaveStack();
~WaveStack();
// Fill pWaveData with 1024 samples, then call this
void initStack(float *pWaveData, int maxHarmonic=512);
void init();
void deinit();
float interp(int octave, float phase);
};
}
WaveStack.cpp
#include "WaveStack.hpp"
#include "kiss_fftr.h"
namespace AudioKitCore
{
WaveStack::WaveStack()
{
int length = 1 << maxBits; // length of level-0 data
pData[0] = new float[2 * length]; // 2x is enough for all levels
for (int i=1; i<maxBits; i++)
{
pData[i] = pData[i - 1] + length;
length >>= 1;
}
}
WaveStack::~WaveStack()
{
delete[] pData[0];
}
void WaveStack::initStack(float *pWaveData, int maxHarmonic)
{
// setup
int fftLength = 1 << maxBits;
float *buf = new float[fftLength];
kiss_fftr_cfg fwd = kiss_fftr_alloc(fftLength, 0, 0, 0);
kiss_fftr_cfg inv = kiss_fftr_alloc(fftLength, 1, 0, 0);
// copy supplied wave data for octave 0
for (int i=0; i < fftLength; i++) pData[0][i] = pWaveData[i];
// perform initial forward FFT to get spectrum
kiss_fft_cpx spectrum[fftLength / 2 + 1];
kiss_fftr(fwd, pData[0], spectrum);
float scaleFactor = 1.0f / (fftLength / 2);
for (int octave = (maxHarmonic==512) ? 1 : 0; octave < maxBits; octave++)
{
// zero all harmonic coefficients above new Nyquist limit
int maxHarm = 1 << (maxBits - octave - 1);
if (maxHarm > maxHarmonic) maxHarm = maxHarmonic;
for (int h=maxHarm; h <= fftLength/2; h++)
{
spectrum[h].r = 0.0f;
spectrum[h].i = 0.0f;
}
// perform inverse FFT to get filtered waveform
kiss_fftri(inv, spectrum, buf);
// resample filtered waveform
int skip = 1 << octave;
float *pOut = pData[octave];
for (int i=0; i < fftLength; i += skip) *pOut++ = scaleFactor * buf[i];
}
// teardown
kiss_fftr_free(inv);
kiss_fftr_free(fwd);
delete[] buf;
}
void WaveStack::init()
{
}
void WaveStack::deinit()
{
}
float WaveStack::interp(int octave, float phase)
{
while (phase < 0) phase += 1.0;
while (phase >= 1.0) phase -= 1.0f;
int nTableSize = 1 << (maxBits - octave);
float readIndex = phase * nTableSize;
int ri = int(readIndex);
float f = readIndex - ri;
int rj = ri + 1; if (rj >= nTableSize) rj -= nTableSize;
float *pWaveTable = pData[octave];
float si = pWaveTable[ri];
float sj = pWaveTable[rj];
return (float)((1.0 - f) * si + f * sj);
}
}
Then use it in this way:
//wave and outputWave should be float[1024];
void getSample(int octave, float* wave, float* outputWave){
uint_fast32_t impulseCount = 1024;
if (octave == 0){
impulseCount = 737;
}else if (octave == 1){
impulseCount = 369;
}
else if (octave == 2){
impulseCount = 185;
}
else if (octave == 3){
impulseCount = 93;
}
else if (octave == 4){
impulseCount = 47;
}
else if (octave == 5){
impulseCount = 24;
}
else if (octave == 6){
impulseCount = 12;
}
else if (octave == 7){
impulseCount = 6;
}
else if (octave == 8){
impulseCount = 3;
}
else if (octave == 9){
impulseCount = 2;
}
//Get sample for octave
stack->initStack(wave, impulseCount);
for (int i = 0; i < 1024;i++){
float phase = (1.0/float(1024))*i;
//get interpolated wave and apply volume compensation
outputWave[i] = stack->interp(0, phase) / 2.0;
}
}
Then... when 10 buffers is ready. You can use them when playing a sound. Using this code you can get index of buffer/octave depending to your frequency
uint_fast8_t getBufferIndex(const float& frequency){
if (frequency >= 0 && frequency < 40){
return 0;
}
else if (frequency >= 40 && frequency < 80){
return 1;
}else if (frequency >= 80 && frequency < 160){
return 2;
}else if (frequency >= 160 && frequency < 320){
return 3;
}else if (frequency >= 320 && frequency < 640){
return 4;
}else if (frequency >= 640 && frequency < 1280){
return 5;
}else if (frequency >= 1280 && frequency < 2560){
return 6;
}else if (frequency >= 2560 && frequency < 5120){
return 7;
}else if (frequency >= 5120 && frequency < 10240){
return 8;
}else if (frequency >= 10240){
return 9;
}
return 0;
}
So if I know that my note frequency 440hz. Then for this note I'm getting wave in this way:
float notInterpolatedSound[1024];
float interpolatedSound[1024];
uint_fast8_t octaveIndex = getBufferIndex(440.0);
getSample(octaveIndex, notInterpolatedSound, interpolatedSound);
//tada!
ps. the code above is a low pass filter. I also tried to do sinc interpolation. But sinc worked for me very expensive and not exactly. Although maybe I did it wrong.

An Algorithm for producing fake audio visualizer

Does anybody knows an algorithm for making a random series of numbers (like 100 java-byte (>=-127 & <= 127) ) which when are drawn as a bar chart, would be similar to a regular audio spectrum, like those SoundCloud ones?
I'm trying to write one, it has multiple Random and Sinus calculations, but the result is very ugly, it's something between a sinus wave and an old toothbrush. I would be very thankful if you code direct me to a one which is aesthetically convincing
An algorithm with an explanation (and/or picture) is fine. A pseudocode would be very nice of you. An actual JAVA code is bonus. :D
Edit:
This is the code I'm using right now. It's convoluted but I'm basically adding a random deviation to a sinus wave with random amplitude (which I'm not sure if it was a good idea).
private static final int FREQ = 7;
private static final double DEG_TO_RAD = Math.PI / 180;
private static final int MAX_AMPLITUDE = 127;
private static final float DEVIATION = 0.1f; // 10 percent is maximum deviation
private void makeSinusoidRandomBytes() {
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
int amplitude = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE/2;
byte dev = (byte) (random.nextInt((int) Math.max(Math.abs(2 * DEVIATION * amplitude), 1))
- Math.abs(DEVIATION * amplitude));
bytes[i] = (byte) (Math.sin(i * FREQ * DEG_TO_RAD) * amplitude - dev);
}
this.bytes = bytes;
}
A real soundwave is actually a combination of sine waves of different frequencies and amplitudes added together, not random deviations from a sine wave. The difficult part will be to choose a combination of wave amplitudes and frequencies that will produce the output that you will subjectively like! However, most sound waves have a base frequency and then a number of overtones which "fit into" that wavelength - for example it might have an overtone at 3/2 * the base frequency and at amplitude of 2/3 the base frequency. By combining these overtones and scaling the resulting waveform to the -127 - +127 range, you'll get an actual soundwave.
The following code is C#, but close enough to Java to give you an idea. It's from a game, where I needed to combine many sine waves together to create various types of oscillating effects:
/// <summary>
/// Return a value between 0 and 1 based on a sine-wave oscillating with a given combination of periods at a given point in time
/// </summary>
/// <param name="time">time to get wave value at</param>
/// <param name="periods">lengths of waves</param>
/// <returns>height of wave</returns>
public static float MultiPulse(float time, params float[] periods)
{
float c = 0;
foreach (float p in periods)
{
float cp = (MathHelper.Pi / p) * time;
float s = ((float)Math.Sin(cp) + 1) / 2;
c += s / periods.Length;
}
return c;
}
You probably want to modify that to allow you to specify different amplitudes as well as periods for the waves you are combining.
By combining many widely varying amplitudes and periods (frequencies) you should by trial and error be able to get something convincing.
Based on the idea see sharper gave me, this is the code I'm using right now:
int mainAmp = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE / 2;
int overtoneAmp = random.nextInt(MAX_AMPLITUDE * 2 / 3) - MAX_AMPLITUDE / 3;
int overtone2Amp = random.nextInt(MAX_AMPLITUDE * 4 / 7) - MAX_AMPLITUDE / 2 * 7;
int mainFreq = random.nextInt(7) + 7;
int overtoneFreq = mainFreq * 3 / 2;
int overtone2Freq = mainFreq * 7 / 4;
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
bytes[i] = (byte) (Math.sin(i * mainFreq * DEG_TO_RAD) * mainAmp
+ Math.sin(i * overtoneFreq * DEG_TO_RAD) * overtoneAmp
+ Math.sin(i * overtone2Freq * DEG_TO_RAD) * overtone2Amp);
}
Main frequency is between 8 and 15 for my app. You can play with those. The other two overtones I'm using are (2 - 1/2)x & (2 - 1/4)x of main frequency. You can add more like (2 - 1/8)x etc. Or use another series of frequencies. I also randomize the amplitude to get a unique wave each time.
These are some waves I'm drawing using this code:

Errors with repeated FFTW calls

I'm having a strange issue that I can't resolve. I made this as a simple example that demonstrates the problem. I have a sine wave defined between [0, 2*pi]. I take the Fourier transform using FFTW. Then I have a for loop where I repeatedly take the inverse Fourier transform. In each iteration, I take the average of my solution and print the results. I expect that the average stays the same with each iteration because there is no change to solution, y. However, when I pick N = 256 and other even values of N, I note that the average grows as if there are numerical errors. However, if I choose, say, N = 255 or N = 257, this is not the case and I get what is expect (avg = 0.0 for each iteration).
Code:
#include <stdio.h>
#include <stdlib.h>
#include <fftw3.h>
#include <math.h>
int main(void)
{
int N = 256;
double dx = 2.0 * M_PI / (double)N, dt = 1.0e-3;
double *x, *y;
x = (double *) malloc (sizeof (double) * N);
y = (double *) malloc (sizeof (double) * N);
// initial conditions
for (int i = 0; i < N; i++) {
x[i] = (double)i * dx;
y[i] = sin(x[i]);
}
fftw_complex yhat[N/2 + 1];
fftw_plan fftwplan, fftwplan2;
// forward plan
fftwplan = fftw_plan_dft_r2c_1d(N, y, yhat, FFTW_ESTIMATE);
fftw_execute(fftwplan);
// set N/2th mode to zero if N is even
if (N % 2 < 1.0e-13) {
yhat[N/2][0] = 0.0;
yhat[N/2][1] = 0.0;
}
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE);
for (int i = 0; i < 50; i++) {
// yhat to y
fftw_execute(fftwplan2);
// rescale
for (int j = 0; j < N; j++) {
y[j] = y[j] / (double)N;
}
double avg = 0.0;
for (int j = 0; j < N; j++) {
avg += y[j];
}
printf("%.15f\n", avg/N);
}
fftw_destroy_plan(fftwplan);
fftw_destroy_plan(fftwplan2);
void fftw_cleanup(void);
free(x);
free(y);
return 0;
}
Output for N = 256:
0.000000000000000
0.000000000000000
0.000000000000000
-0.000000000000000
0.000000000000000
0.000000000000022
-0.000000000000007
-0.000000000000039
0.000000000000161
-0.000000000000314
0.000000000000369
0.000000000004775
-0.000000000007390
-0.000000000079126
-0.000000000009457
-0.000000000462023
0.000000000900855
-0.000000000196451
0.000000000931323
-0.000000009895302
0.000000039348379
0.000000133179128
0.000000260770321
-0.000003233551979
0.000008285045624
-0.000016331672668
0.000067450106144
-0.000166893005371
0.001059055328369
-0.002521514892578
0.005493164062500
-0.029907226562500
0.093383789062500
-0.339111328125000
1.208251953125000
-3.937500000000000
13.654296875000000
-43.812500000000000
161.109375000000000
-479.250000000000000
1785.500000000000000
-5369.000000000000000
19376.000000000000000
-66372.000000000000000
221104.000000000000000
-753792.000000000000000
2387712.000000000000000
-8603776.000000000000000
29706240.000000000000000
-96833536.000000000000000
Any ideas?
libfftw has the odious habit of modifying its inputs. Back up yhat if you want to do repeated inverse transforms.
OTOH, it's perverse, but why are you repeating the same operation if you don't expect it give different results? (Despite this being the case)
As indicated in comments: "if you want to keep the input data unchanged, use the FFTW_PRESERVE_INPUT flag. Per http://www.fftw.org/doc/Planner-Flags.html"
For example:
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE | FFTW_PRESERVE_INPUT);

Raw Sound playing

I've been working for some time with image formats and i know that an image is an array of pixels (24- maybe 32 bits long). The question is: what is the way a sound file is represented? To be honest i'm not even sure what i should be googling for. Also i would be interested how do you use the data, i mean actually playing the sounds in the file. For an image file you have all sorts of abstract devices to draw an image on(Graphics:java,c#, HDC:cpp(win32), etc.) .I hope i have been clear enough.
Here's a dandy overview of how .wav is stored. I found it by typing "wave file format" into google.
http://www.sonicspot.com/guide/wavefiles.html
WAV files can also store compressed audio, but I believe most of the time they are not compressed. But the WAV format is designed as a container for a number of options on how that audio is stored.
Here's a snipped of code that I found at another question here at stackoverflow that I like in C# that builds a WAV-formatted audio MemoryStream and then plays that stream (without saving it to a file, like many other answers rely on). But saving it to a file can easily be added with one line of code if you want it saved to disk, but I would think that most of the time, that'd be undesirable.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
{
var mStrm = new MemoryStream();
BinaryWriter writer = new BinaryWriter(mStrm);
const double TAU = 2 * Math.PI;
int formatChunkSize = 16;
int headerSize = 8;
short formatType = 1;
short tracks = 1;
int samplesPerSecond = 44100;
short bitsPerSample = 16;
short frameSize = (short)(tracks * ((bitsPerSample + 7) / 8));
int bytesPerSecond = samplesPerSecond * frameSize;
int waveSize = 4;
int samples = (int)((decimal)samplesPerSecond * msDuration / 1000);
int dataChunkSize = samples * frameSize;
int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;
// var encoding = new System.Text.UTF8Encoding();
writer.Write(0x46464952); // = encoding.GetBytes("RIFF")
writer.Write(fileSize);
writer.Write(0x45564157); // = encoding.GetBytes("WAVE")
writer.Write(0x20746D66); // = encoding.GetBytes("fmt ")
writer.Write(formatChunkSize);
writer.Write(formatType);
writer.Write(tracks);
writer.Write(samplesPerSecond);
writer.Write(bytesPerSecond);
writer.Write(frameSize);
writer.Write(bitsPerSample);
writer.Write(0x61746164); // = encoding.GetBytes("data")
writer.Write(dataChunkSize);
{
double theta = frequency * TAU / (double)samplesPerSecond;
// 'volume' is UInt16 with range 0 thru Uint16.MaxValue ( = 65 535)
// we need 'amp' to have the range of 0 thru Int16.MaxValue ( = 32 767)
// so we simply set amp = volume / 2
double amp = volume >> 1; // Shifting right by 1 divides by 2
for (int step = 0; step < samples; step++)
{
short s = (short)(amp * Math.Sin(theta * (double)step));
writer.Write(s);
}
}
mStrm.Seek(0, SeekOrigin.Begin);
new System.Media.SoundPlayer(mStrm).Play();
writer.Close();
mStrm.Close();
} // public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
But this code shows a bit of insight into the WAV-format, and it is even code that allows a person to build your own WAV-format in C# source code.

Analyze "whistle" sound for pitch/note

I am trying to build a system that will be able to process a record of someone whistling and output notes.
Can anyone recommend an open-source platform which I can use as the base for the note/pitch recognition and analysis of wave files ?
Thanks in advance
As many others have already said, FFT is the way to go here. I've written a little example in Java using FFT code from http://www.cs.princeton.edu/introcs/97data/. In order to run it, you will need the Complex class from that page also (see the source for the exact URL).
The code reads in a file, goes window-wise over it and does an FFT on each window. For each FFT it looks for the maximum coefficient and outputs the corresponding frequency. This does work very well for clean signals like a sine wave, but for an actual whistle sound you probably have to add more. I've tested with a few files with whistling I created myself (using the integrated mic of my laptop computer), the code does get the idea of what's going on, but in order to get actual notes more needs to be done.
1) You might need some more intelligent window technique. What my code uses now is a simple rectangular window. Since the FFT assumes that the input singal can be periodically continued, additional frequencies are detected when the first and the last sample in the window don't match. This is known as spectral leakage ( http://en.wikipedia.org/wiki/Spectral_leakage ), usually one uses a window that down-weights samples at the beginning and the end of the window ( http://en.wikipedia.org/wiki/Window_function ). Although the leakage shouldn't cause the wrong frequency to be detected as the maximum, using a window will increase the detection quality.
2) To match the frequencies to actual notes, you could use an array containing the frequencies (like 440 Hz for a') and then look for the frequency that's closest to the one that has been identified. However, if the whistling is off standard tuning, this won't work any more. Given that the whistling is still correct but only tuned differently (like a guitar or other musical instrument can be tuned differently and still sound "good", as long as the tuning is done consistently for all strings), you could still find notes by looking at the ratios of the identified frequencies. You can read http://en.wikipedia.org/wiki/Pitch_%28music%29 as a starting point on that. This is also interesting: http://en.wikipedia.org/wiki/Piano_key_frequencies
3) Moreover it might be interesting to detect the points in time when each individual tone starts and stops. This could be added as a pre-processing step. You could do an FFT for each individual note then. However, if the whistler doesn't stop but just bends between notes, this would not be that easy.
Definitely have a look at the libraries the others suggested. I don't know any of them, but maybe they contain already functionality for doing what I've described above.
And now to the code. Please let me know what worked for you, I find this topic pretty interesting.
Edit: I updated the code to include overlapping and a simple mapper from frequencies to notes. It works only for "tuned" whistlers though, as mentioned above.
package de.ahans.playground;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
public class FftMaxFrequency {
// taken from http://www.cs.princeton.edu/introcs/97data/FFT.java.html
// (first hit in Google for "java fft"
// needs Complex class from http://www.cs.princeton.edu/introcs/97data/Complex.java
public static Complex[] fft(Complex[] x) {
int N = x.length;
// base case
if (N == 1) return new Complex[] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2"); }
// fft of even terms
Complex[] even = new Complex[N/2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
Complex[] q = fft(even);
// fft of odd terms
Complex[] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
Complex[] r = fft(odd);
// combine
Complex[] y = new Complex[N];
for (int k = 0; k < N/2; k++) {
double kth = -2 * k * Math.PI / N;
Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
y[k] = q[k].plus(wk.times(r[k]));
y[k + N/2] = q[k].minus(wk.times(r[k]));
}
return y;
}
static class AudioReader {
private AudioFormat audioFormat;
public AudioReader() {}
public double[] readAudioData(File file) throws UnsupportedAudioFileException, IOException {
AudioInputStream in = AudioSystem.getAudioInputStream(file);
audioFormat = in.getFormat();
int depth = audioFormat.getSampleSizeInBits();
long length = in.getFrameLength();
if (audioFormat.isBigEndian()) {
throw new UnsupportedAudioFileException("big endian not supported");
}
if (audioFormat.getChannels() != 1) {
throw new UnsupportedAudioFileException("only 1 channel supported");
}
byte[] tmp = new byte[(int) length];
byte[] samples = null;
int bytesPerSample = depth/8;
int bytesRead;
while (-1 != (bytesRead = in.read(tmp))) {
if (samples == null) {
samples = Arrays.copyOf(tmp, bytesRead);
} else {
int oldLen = samples.length;
samples = Arrays.copyOf(samples, oldLen + bytesRead);
for (int i = 0; i < bytesRead; i++) samples[oldLen+i] = tmp[i];
}
}
double[] data = new double[samples.length/bytesPerSample];
for (int i = 0; i < samples.length-bytesPerSample; i += bytesPerSample) {
int sample = 0;
for (int j = 0; j < bytesPerSample; j++) sample += samples[i+j] << j*8;
data[i/bytesPerSample] = (double) sample / Math.pow(2, depth);
}
return data;
}
public AudioFormat getAudioFormat() {
return audioFormat;
}
}
public class FrequencyNoteMapper {
private final String[] NOTE_NAMES = new String[] {
"A", "Bb", "B", "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#"
};
private final double[] FREQUENCIES;
private final double a = 440;
private final int TOTAL_OCTAVES = 6;
private final int START_OCTAVE = -1; // relative to A
public FrequencyNoteMapper() {
FREQUENCIES = new double[TOTAL_OCTAVES*12];
int j = 0;
for (int octave = START_OCTAVE; octave < START_OCTAVE+TOTAL_OCTAVES; octave++) {
for (int note = 0; note < 12; note++) {
int i = octave*12+note;
FREQUENCIES[j++] = a * Math.pow(2, (double)i / 12.0);
}
}
}
public String findMatch(double frequency) {
if (frequency == 0)
return "none";
double minDistance = Double.MAX_VALUE;
int bestIdx = -1;
for (int i = 0; i < FREQUENCIES.length; i++) {
if (Math.abs(FREQUENCIES[i] - frequency) < minDistance) {
minDistance = Math.abs(FREQUENCIES[i] - frequency);
bestIdx = i;
}
}
int octave = bestIdx / 12;
int note = bestIdx % 12;
return NOTE_NAMES[note] + octave;
}
}
public void run (File file) throws UnsupportedAudioFileException, IOException {
FrequencyNoteMapper mapper = new FrequencyNoteMapper();
// size of window for FFT
int N = 4096;
int overlap = 1024;
AudioReader reader = new AudioReader();
double[] data = reader.readAudioData(file);
// sample rate is needed to calculate actual frequencies
float rate = reader.getAudioFormat().getSampleRate();
// go over the samples window-wise
for (int offset = 0; offset < data.length-N; offset += (N-overlap)) {
// for each window calculate the FFT
Complex[] x = new Complex[N];
for (int i = 0; i < N; i++) x[i] = new Complex(data[offset+i], 0);
Complex[] result = fft(x);
// find index of maximum coefficient
double max = -1;
int maxIdx = 0;
for (int i = result.length/2; i >= 0; i--) {
if (result[i].abs() > max) {
max = result[i].abs();
maxIdx = i;
}
}
// calculate the frequency of that coefficient
double peakFrequency = (double)maxIdx*rate/(double)N;
// and get the time of the start and end position of the current window
double windowBegin = offset/rate;
double windowEnd = (offset+(N-overlap))/rate;
System.out.printf("%f s to %f s:\t%f Hz -- %s\n", windowBegin, windowEnd, peakFrequency, mapper.findMatch(peakFrequency));
}
}
public static void main(String[] args) throws UnsupportedAudioFileException, IOException {
new FftMaxFrequency().run(new File("/home/axr/tmp/entchen.wav"));
}
}
i think this open-source platform suits you
http://code.google.com/p/musicg-sound-api/
Well, you could always use fftw to perform the Fast Fourier Transform. It's a very well respected framework. Once you've got an FFT of your signal you can analyze the resultant array for peaks. A simple histogram style analysis should give you the frequencies with the greatest volume. Then you just have to compare those frequencies to the frequencies that correspond with different pitches.
in addition to the other great options:
csound pitch detection: http://www.csounds.com/manual/html/pvspitch.html
fmod: http://www.fmod.org/ (has a free version)
aubio: http://aubio.org/doc/pitchdetection_8h.html
You might want to consider Python(x,y). It's a scientific programming framework for python in the spirit of Matlab, and it has easy functions for working in the FFT domain.
If you use Java, have a look at TarsosDSP library. It has a pretty good ready-to-go pitch detector.
Here is an example for android, but I think it doesn't require too much modifications to use it elsewhere.
I'm a fan of the FFT but for the monophonic and fairly pure sinusoidal tones of whistling, a zero-cross detector would do a far better job at determining the actual frequency at a much lower processing cost. Zero-cross detection is used in electronic frequency counters that measure the clock rate of whatever is being tested.
If you going to analyze anything other than pure sine wave tones, then FFT is definitely the way to go.
A very simple implementation of zero cross detection in Java on GitHub

Resources