Transforming Audio Samples From Time Domain to Frequency Domain - audio

as a software engineer I am facing with some difficulties while working on a signal processing problem. I don't have much experience in this area.
What I try to do is to sample the environmental sound with 44100 sampling rate and for fixed size windows to test if a specific frequency (20KHz) exists and is higher than a threshold value.
Here is what I do according to the perfect answer in How to extract frequency information from samples from PortAudio using FFTW in C
102400 samples (2320 ms) is gathered from audio port with 44100 sampling rate. Sample values are between 0.0 and 1.0
int samplingRate = 44100;
int numberOfSamples = 102400;
float samples[numberOfSamples] = ListenMic_Function(numberOfSamples,samplingRate);
Window size or FFT Size is 1024 samples (23.2 ms)
int N = 1024;
Number of windows is 100
int noOfWindows = numberOfSamples / N;
Splitting samples to noOfWindows (100) windows each having size of N (1024) samples
float windowSamplesIn[noOfWindows][N];
for i:= 0 to noOfWindows -1
windowSamplesIn[i] = subarray(samples,i*N,(i+1)*N);
endfor
Applying Hanning window function on each window
float windowSamplesOut[noOfWindows][N];
for i:= 0 to noOfWindows -1
windowSamplesOut[i] = HanningWindow_Function(windowSamplesIn[i]);
endfor
Applying FFT on each window (real to complex conversion done inside the FFT function)
float frequencyData[noOfWindows][samplingRate/2];
for i:= 0 to noOfWindows -1
frequencyData[i] = RealToComplex_FFT_Function(windowSamplesOut[i], samplingRate);
endfor
In the last step, I use the FFT function implemented in this link: http://www.codeproject.com/Articles/9388/How-to-implement-the-FFT-algorithm ; because I cannot implement an FFT function from the scratch.
What I can't be sure is while giving N (1024) samples to FFT function as input, samplingRate/2 (22050) decibel values is returned as output. Is it what an FFT function does?
I understand that because of Nyquist Frequency, I can detect half of sampling rate frequency at most. But is it possible to get decibel values for each frequency up to samplingRate/2 (22050) Hz?
Thanks,
Vahit

See see How do I obtain the frequencies of each value in an FFT?
From a 1024 sample input, you can get back 512 meaningful frequency-levels.
So, yes, within your window, you'll get back a level for the Nyquist frequency.
The lowest frequency level you'll see is for DC (0 Hz), and the next one up will be for SampleRate/1024, or around 44 Hz, the next for 2 * SampleRate/1024, and so on, up to 512 * SampleRate / 1024 Hz.

Since only one band is used in your FFT, I would expect your results to be tarnished by side-band effects, even with proper windowing. It might work, but you might also get false positives with some input frequencies. Also, your signal is close to your niquist, so you are assuming a fairly good signal path up to your FFT. I don't think this is the right approach.
I think a better approach to this kind of signal detection would be with a high order filter (depending on your requirements, I would guess fourth or fifth order, which isn't actually that high). If you don't know how to design a high order filter, you could use two or three second order filters in series. Designing a second order filter, sometimes called a "biquad" is described here:
http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt
albeit very tersely and with some assumptions of prior knowledge. I would use a high-pass (HP) filter with corner frequency as low as you can make it, probably between 18 and 20 kHz. Keep in mind there is some attenuation at the corner frequency, so after applying a filter multiple times you will drop a little signal.
After you filter the audio, take the RMS or average amplitude (that is, the average of the absolute value), to find the average level over a time period.
This technique has several advantages over what you are doing now, including better latency (you can start detecting within a few samples), better reliability (you won't get false-positives in response to loud signals at spurious frequencies), and so on.
This post might be of relevance: http://blog.bjornroche.com/2012/08/why-eq-is-done-in-time-domain.html

Related

ESP8266 analogRead() microphone Input into playable audio

My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html

Creating audio level meter - signal normalization

I have program which tracks audio signal in real time. Every processed sample I am able to read value of it in range between <-1, 1>.
I would like to create(and later display) audio level meter. From what I understand - to do it I need to keep converting my audio signal in real time, on each channel to dB and then display dB values on each channel in some graphical form of bars.
I am a bit lost how to do it and it should be simple matter. Would just normalization from <-1, 1> to <0, 1> (like... [n-sample +1]/2) and then calculating 20*log10 from each upcoming sample make it?
You can't plot the signal directly, as it always varying positive and negative.
Therefore you need to average out the strength of the signal every so many samples.
Say you're sampling at 44.1kHz, perhaps you might choose 4410 samples so you're updating your display 10 times per second.
So you calculate the RMS of your 4410 samples - see http://en.wikipedia.org/wiki/Root_mean_square
The RMS value is always positive.
You can then convert this to Db:
dBV = 20 x log10(Vrms)
This assumes that your maximum signal -1 to +1 corresponds to -1 to +1 volt. You will need to do further adjustments if not.

tuning pid in systems with delay

I need to tune PI(D) gains in a system which has a quite large delay. It's a common temperature controller, but the temperature probe is far away from the heater. Some further info:
the response of the probe is delayed about 10 seconds from any change on the heater
the temperature is sampled # 1 Hz, with a resolution of 0.01 °C
the heater is controller in PWM with a period of 1 Hz, with a 10-bit PWM
the goal is to maintain the oscillation below ±0.05 °C
Currently I'm using the controller as PI. I can't avoid oscillations. The higher the gain, the smaller and faster the oscillations. Still too high (about ±0.15 °C).
Reducing the P and I gains leads to very long and deep oscillations.
I think this is due to the delay.
The settling time is not a problem, it may take all the time it needs.
I'm puzzling over how get the system to work. Let's think to use only I. When the probe reaches the target value and the I output starts to decrease, the temperature will rise for some other time. I cannot use the derivative term because the variations are too slow and the dError is very close to zero (if I set the dGain to a huge value there is too much noise).
Any idea?
Try P-only. How fast are the proportional-only oscillations? If you can't tune Kp small enough to get no oscillations, then your heater is overpowered for your system.
If the dead time of the of the system is on the order of 10s, the time constant (T_i) for the Integral term should be 3.3 times the dead time, using a Ziegler Nichols open-loop PI rule ( https://controls.engin.umich.edu/wiki/index.php/PIDTuningClassical#Ziegler-Nichols_Open-Loop_Tuning_Method_or_Process_Reaction_Method: ) , and then Integral term should be Ki = Kp/T_i. So with deadtime = 10s, then Ki should be Kp/33 or slower.
If you are getting integral-only oscillations, then the integral is winding up and down quicker than the process responds, and it should be even smaller.
Also -- think of the units of the different terms. It might not be the delay causing your problems so much as the resolution of the measurement and control systems. If you're driving a (for example) 100W heater with a 1/1024 resolution PWM, you've got 0.1W resolution per PWM count that you are trying to adjust based on 0.01C temperature differences. At less than Kp = 100 PWMcount/degree (or 10W/degree) you don't have enough resolution in the PWM to make changes in response to a 0.01C error. At a Kp=10PWM/C you might need a 0.10C change to result in an actual change in the PWM power. Can you use a higher resolution PWM?
Thinking of it the other way, if you want to operate a system over a range of 30C at 0.01C, I'd think you would want at least a 15bit PWM to have 10 times the resolution in the controlled system. With only 10 bits of PWM you only get about 1C of total range with control at 10x the resolution of the measurements.
Normally for large delays you have two options: Lower the gains of the system or, if you have a model of the plant you are controlling, use a Smith Predictior.
I would start by modelling your system (using open-loop steps in the input) to quantify the delay and the time constant of your plant, then check if the sampling of the temperature and the PWM rate are OK.
Notice that if your PWM frequency is too small in comparison to the plant dynamics, you will have sustained oscillations because of the slow PWM. You can check it using just an constant input to your PWM (with no controllers, open loop).
EDIT: Didn't see that the problem was already solved, but I'll leave this here for reference.

discrete fourier transform frequency bound?

for a 8KHz wav sound i took 20ms sample which has 160 samples of data, plotted the FFT spectrum in audacity.
It gave the magnitudes in 3000 and 4000 Hz as well, shouln't it be giving the magnitudes until
the 80Hz,because there is 160 samples of data?
For a sample rate of Fs = 8 khz the FFT will give meaningful results from DC to Nyquist (= Fs / 2), i.e. 0 to 4 kHz. The width of each FFT bin will be 1 / 20 ms = 50 Hz.
actually audacity shows the peaks as 4503Hz which means understands to 1Hz bins. by the way if I take 20ms and repeat it 50 times to make as 1s sample,is the fft going to be for 1Hz bins? and also audacity has the option for the window as far as I know If you use windowing then the components should be multiple times of 2,like 1,2,4,8,etc.. but it shows the exact frequencies,then why it uses the windowing?
The best sampling rate is 2*frequency.
in different frequencys you should to change the sampling rate.

Deciding on length of FFT

I am working on a tool to compare two wave files for similarity in their waveforms. Ex, I have a wave file of duration 1min and i make another wave file using the first one but have made each 5sec data at an interval of 5sec to 0.
Now my software will tell that there is waveform difference at time interval 5sec to 10sec, 15sec to 20sec, 25sec to 30 sec and so on...
As of now, with initial development, this is working fine.
Following are 3 test sets:
I have two wave files with sampling rate of 960Hz, mono, with no of data samples as 138551 (arnd 1min 12sec files). I am using 128 point FFT (splitting file in 128 samples chunk) and results are good.
When I use the same algorithm on wave files of sampling rate 48KHz, 2-channel with no of data samples 6927361 for each channel (arnd 2min 24 sec file), the process becomes too slow. When I use 4096 point FFT, the process is better.
But, the 4096 point FFT on files of 22050Hz, 2-channels with number of data samples 55776 for each channel (arnd 0.6sec file) gives very poor results. In this case 128 point FFT gives good result.
So, I am confused on how to decide the length of FFT so that my results are good in each case.
I guess the length should depend on number of samples and sampling rate.
Please give your inputs on this.
Thanks
The length of the FFT, N, will determine the resolution in the frequency domain:
resolution (Hz) = sample_rate (Hz) / N
So for example in case (1) you have resolution = 960 / 128 = 7.5 Hz. SO each bin in the resulting FFT (or presumably the power spectrum derived from this) will be 7.5 Hz wide, and you will be able to differentiate between frequency components which are at least this far apart.
Since you don't say what kind of waveforms these are, or what the purpose of your application is, it's hard to know what kind of resolution you need.
One important further point - many people using FFT for the first time are unaware that in general you need to apply a window function prior to the FFT to avoid spectral leakage.
I have to say I have found your question very cryptic. I think you should look into Short-time Fourier transform. The reason I say this is because you are looking at quite a large amount of samples if you use a sampling frequency of 44.1KhZ over 2mins with 2 channels. One fft across the entire amount will take quite a while indeed, not to mention the estimate will be biased as the signals mean and variance will change drastically over the whole duration. To avoid this you want to frame the time-domain signal first, these frames can be as small as 20ms-40ms (commonly used for speech) and often overlapping (Welch method of Spectral Estimation). Then you apply a window function such as Hamming or Hanning window to reduce spectral leakage and calculate an N-Point fft for each frame. Where N is the next power of two above the number of samples in that frame.
For example:
Fs = 8Khz, single channel;
time = 120sec;
no_samples = time * Fs = 960000 ;
frame length T_length= 20ms;
frame length in samples N_length = 160;
frame overlap T_overlap= 10ms;
frame overlap in samples N_overlap= 80;
Num of frames N_frames = (no_samples - (N_length-N_overlap))/N_overlap = 11999;
FFT length = 256;
So you will be processing 11999 frames in total, but your FFT length will be small. You will only need an FFT length of 256 (next power of two above frame length 160). Most algorithms that implement the fft require the signal length and fft length to be the same. All you have to do is append zeros to your framed signal up until 256. So pad each frame with x amount of zeros, where x = FFT_length-N_length. My latest android app does this on recorded speech and uses the short-time FFT data to display the Spectrogram of speech and also performs various spectral modification and filtering, its called Speech Enhancement for Android

Resources