How would I sample an audio tract at nyquist frequency using c and a micro-controller? - audio

This is as simple and less vague as I can make it, so please and try to help me out.
By this, meaning I want to:
1) Input an audio track (Anaglod)
2) Using the micro controllers ADC
convert it to a digital output
3) Then Have the
microcontollers/boards timer sample
the data at selected intervuls.
4) Tell the board to take the "Sampled
audio track" and now sample it at a
rate of 2B, ( B meaning the highest
frequency.
F= Frequency
F(Hz=1/s) E.x. 100Hz = 1000 (Cyc/sec)
F(s)= 1/(2f)
Example problem: 1000 hz = Highest
frequency 1/2(1000hz) = 1/2000 =
5x10(-3) sec/cyc or a sampling rate of
5ms
5) Spit it back at the boards ADC and
convert it back to analog, thus the
out-put is a perfect reconstruction of
the initial audio track.
Using Fourier Analysis i will determine the highest frequency at which I will sample the track at.
However in theory it sounds easy enough and straight forward, but what I need is to program this in C and utilize my msp430 chip/Experimenters board to sample the track.
Im going to be using Texas Instruments CCS and Octave for my programming and debugging. This is my board that I will be using.
Questions:
Is C the right language for this? Can I get any examples of how to sample the tack at nyquist frequency using C? What code in C will tell the board to utilize the ADC component? And any recommended information that is similar or that will help me on this project.

I don't fully understand what you want to do, but I'll answer your specific questions.
Yes, C is the right language for this.
You should probably look at application code on the Texas Instruments website to see how to interact with the ADC. You can start with the example code listed at the bottom of the page you linked to. It has C code that shows how to use the ADC.
Incidentally, an ADC only converts analog to digital. To go digital to analog, you need a DAC, which this board does not appear to have.

5) ADC doesnt do Digital-to-Analog Conversion, 'cause it's ADC, not DAC. But you may use PWM with Low-pass filter to output analog signal.
It is often a bad idea to sample signal at Nyquist frequency. This will cause lots of aliasing at high frequencies. For example signal with frequency F-deltaF, where deltaF as small, will look like F amplitude modulated by 2deltaF.
That's why CD sampling rate is 44.1 kSPS, not 30 kSPS (as twice 15 kHz -- higher frequency limit).

You have to sample the signal with a frequency that is twice as high as the highest frequency in your signal. Otherwise you get aliasing effects (distortion of the original signal). It is not possible to determine the highest frequency in your signal with fourier analysis because to perform an fft you have to convert your analog signal to digital values - with a conversion frequency (that you want to determine with the fft).
The highest frequency in your input signal is defined by the analog input filter that the signal must pass before analog to digital conversion.

Related

Detecting a specific pattern from a FFT in Arduino

I have an FFT output from a microphone and I want to detect a specific animal's howl from that (it howls in a characteristic frequency spectrum). Is there any way to implement a pattern recognition algorithm in Arduino to do that?
I already have the FFT part of it working with 128 samples #2kHz sampling rate.
lookup audio fingerprinting ... essentially you probe the frequency domain output from the FFT call and take a snapshot of the range of frequencies together with the magnitude of each freq then compare this between known animal signal and unknown signal and output a measurement of those differences.
Naturally this difference will approach zero when unknown signal is your actual known signal
Here is another layer : For better fidelity instead of performing a single FFT of the entire audio available, do many FFT calls each with a subset of the samples ... for each call slide this window of samples further into the audio clip ... lets say your audio clip is 2 seconds yet here you only ever send into your FFT call 200 milliseconds worth of samples this gives you at least 10 such FFT result sets instead of just one had you gulped the entire audio clip ... this gives you the notion of time specificity which is an additional dimension with which to derive a more lush data difference between known and unknown signal ... experiment to see if it helps to slide the window just a tad instead of lining up each window end to end
To be explicit you have a range of frequencies say spread across X axis then along Y axis you have magnitude values for each frequency at different points in time as plucked from your audio clips as you vary your sample window as per above paragraph ... so now you have a two dimensional grid of data points
Again to beef up the confidence intervals you will want to perform all of above across several different audio clips of your known source animal howl against each of your unknown signals so now you have a three dimensional parameter landscape ... as you can see each additional dimension you can muster will give more traction hence more accurate results
Start with easily distinguished known audio against a very different unknown audio ... say a 50 Hz sin curve tone for known audio signal against a 8000 Hz sin wave for the unknown ... then try as your known a single strum of a guitar and use as unknown say a trumpet ... then progress to using actual audio clips
Audacity is an excellent free audio work horse of the industry - it easily plots a WAV file to show its time domain signal or FFT spectrogram ... Sonic Visualiser is also a top shelf tool to use
This is not a simple silver bullet however each layer you add to your solution can give you better results ... it is a process you are crafting not a single dimensional trigger to squeeze.

Digital signal processing of audio signals

i have recently started working on audio signals. On converting an audio signal into digital (ADC), how will I know the frequency of the original signal.. I mean Digital signal is just an array of numbers and there will be no information about frequency.. please help me this..
You can get the frequency back (assuming its not aliased because of incorrect sampling) by analysing the magnitude plot of the DFT, and converting that prominent digital frequency to continuous. This is very google-able.
There is something that should be understood:
If you meet the Nyquist criterion - there is only one way to reconstruct the original signal
(using ideal reconstruction filter).
In other words, if you meet Nyquist criterion the discrete signal has exactly all the information as the continuous signal, which means it has also all the information of the signal spectrum (Fourier Transform).
We know that the difference between time domain and frequency domain is all about representation.
(of course this is true in ideal sampling and reconstruction - in real cases - this is not far from being true if Nyquist critetion holds to a good approximation)

The Sound of Hydrogen using the NIST Spectral Database

In the video The Sound of Hydrogen (original here), the sound
is created using the NIST Atomic Spectra Database and then importing this edited data into Mathematica to modulate a Sine Wave. I was wondering how he turned the data from the website into the values shown in the video (3:47 - top of the page) because it is nothing like what is initially seen on the website.
Short answer: It's different because in the tutorial the sampling rate is 8 kHz while it's probably higher in the original video.
Long answer:
I wish you'd asked this on http://physics.stackexchange.com or http://math.stackexchange.com instead so I could use formulae... Use the bookmarklet
javascript:(function(){function%20a(a){var%20b=a.createElement('script'),c;b.src='https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML.js',b.type='text/javascript',c='MathJax.Hub.Config({tex2jax:{inlineMath:[[\'$\',\'$\']],displayMath:[[\'\\\\[\',\'\\\\]\']],processEscapes:true}});MathJax.Hub.Startup.onload();',window.opera?b.innerHTML=c:b.text=c,a.getElementsByTagName('head')[0].appendChild(b)}function%20b(b){b.MathJax===undefined?a(b.document):b.MathJax.Hub.Queue(new%20b.Array('Typeset',b.MathJax.Hub))}var%20c=document.getElementsByTagName('iframe'),d,e;b(window);for(d=0;d<c.length;d++)e=c[d].contentWindow||c[d].contentDocument,e.document||(e=e.parentNode),b(e)})()
to render the formulae with MathJax:
First of all, note how the Rydberg formula provides the resonance frequencies of hydrogen as $\nu_{nm} = c R \left(\frac1{n^2}-\frac1{m^2}\right)$ where $c$ is the speed of light and $R$ the Rydberg constant. The highest frequency is $\nu_{1\infty}\approx 3000$ THz while for $n,m\to\infty$ there is basically no lower limit, though if you restrict yourself to the Lyman series ($n=1$) and the Balmer series ($n=2$), the lower limit is $\nu_{23}\approx 400$ THz. These are electromagnetic frequencies corresponding to light (not entirely in the visual spectrum (ranging from 430–790 THz), there's some IR and lots of UV in there which you cannot see). "minutephysics" now simply considers these frequencies as sound frequencies that are remapped to the human hearing range (ca 20-20000 Hz).
But as the video stated, not all these frequencies resonate with the same strength, and the data at http://nist.gov/pml/data/asd.cfm also includes the amplitudes. For the frequency $\nu_{nm}$ let's call the intensity $I_{nm}$ (intensity is amplitude squared, I wonder if the video treated that correctly). Then your signal is simply
$f(t) = \sum\limits_{n=1}^N \sum\limits_{m=n+1}^M I_{nm}\sin(\alpha(\nu_{nm})t+\phi_{nm})$
where $\alpha$ denotes the frequency rescaling (probably something linear like $\alpha(\nu) = (20 + (\nu-400\cdot10^{12})\cdot\frac{20000-20}{(3000-400)\cdot 10^{12}})$ Hz) and the optional phase $\phi_{nm}$ is probably equal to zero.
Why does it sound slightly different? Probably the actual video did use a higher sampling rate than the 8 kHz used in the tutorial video.

Real time pitch detection

I'm trying to do real time pitch detection of a users singing, but I'm running into alot of problems. I've tried lots of methods, including FFT (FFT Problem (Returns random results)) and autocorrelation (Autocorrelation pitch detection returns random results with mic input), but I can't seem to get any methods to give a good result. Can anyone suggest a method for real-time pitch tracking or how to improve on a method I already have? I can't seem to find any good C / C++ methods for real time pitch detection.
Thanks,
Niall.
Edit: Just to note, i've checked that the mic input data is correct, and that when using a sine wave the results are more or less the correct pitch.
Edit: Sorry this is late, but at the moment, im visualizing the autocolleration by taking the values out of the results array, and each index, and plotting the index on the X axis and the value on the Y axis (both are divided by 100000 or something, and im using OpenGL), plugging the data into a VST host and using VST plugins isn't an option to me. At the moment, it just looks like some random dots. Am i doing it correctly, or can you please point me torwards some code for doing it or help me understand how to visualize the raw audio data and autocorrelation data.
Taking a step back... To get this working you MUST figure out a way to plot intermediate steps of this process. What you're trying to do is not particularly hard, but it is error prone and fiddly. Clipping, windowing, bad wiring, aliasing, DC offsets, reading the wrong channels, the weird FFT frequency axis, impedance mismatches, frame size errors... who knows. But if you can plot the raw data, and then plot the FFT, all will become clear.
I found several open source implementations of real-time pitch tracking
dywapitchtrack uses a wavelet-based algorithm
"Realtime C# Pitch Tracker" uses a modified autocorrelation approach now removed from Codeplex - try searching on GitHub
aubio (mentioned by piem; several algorithms are available)
There are also some pitch trackers out there which might not be designed for real-time, but may be usable that way for all I know, and could also be useful as a reference to compare your real-time tracker to:
Praat is an open source package sometimes used for pitch extraction by linguists and you can find the algorithm documented at http://www.fon.hum.uva.nl/paul/praat.html
Snack and WaveSurfer also contain a pitch extractor
I know this answer isn't going to make everyone happy but here goes.
This stuff is hard, very hard. Firstly go read as many tutorials as you can find on FFT, Autocorrelation, Wavelets. Although I'm still struggling with DSP I did get some insights from the following.
https://www.coursera.org/course/audio the course isn't running at the moment but the videos are still available.
http://miracle.otago.ac.nz/tartini/papers/Philip_McLeod_PhD.pdf thesis about the development of a pitch recognition algorithm.
http://dsp.stackexchange.com a whole site dedicated to digital signal processing.
If like me you didn't do enough maths to completely follow the tutorials don't give up as some of the diagrams and examples still helped me to understand what was going on.
Next is test data and testing. Write yourself a library that generates test files to use in checking your algorithm/s.
1) A super simple pure sine wave generator. So say you are looking at writing YAT(Yet Another Tuner) then use your sine generator to create a series of files around 440Hz say from 420-460Hz in varying increments and see how sensitive and accurate your code is. Can it resolve to within 5Hz, 1Hz, finer still?
2) Then upgrade your sine wave generator so that it adds a series of weaker harmonics to the signal.
3) Next are real world variations on harmonics. So whilst for most stringed instruments you'll see a series of harmonics as simple multiples of the fundamental frequency F0, for instruments like clarinets and flutes because of the way the air behaves in the chamber the even harmonics will be missing or very weak. And for some instruments F0 is missing but can be determined from the distribution of the other harmonics. F0 being what the human ear perceives as pitch.
4) Throw in some deliberate distortion by shifting the harmonic peak frequencies up and down in an irregular manner
The point being that if you are creating files with known results then its easier to verify that what you are building actually works, bugs aside of course.
There are also a number of "libraries" out there containing sound samples.
https://freesound.org from the Coursera series mentioned above.
http://theremin.music.uiowa.edu/MIS.html
Next be aware that your microphone is not perfect and unless you have spent thousands of dollars on it will have a fairly variable frequency response range. In particular if you are working with low notes then cheaper microphones, read the inbuilt ones in your PC or Phone, have significant rolloff starting at around 80-100Hz. For reasonably good external ones you might get down to 30-40Hz. Go find the data on your microphone.
You can also check what happens by playing the tone through speakers and then recording with you favourite microphone. But of course now we are talking about 2 sets of frequency response curves.
When it comes to performance there are a number of freely available libraries out there although do be aware of the various licensing models.
Above all don't give up after your first couple of tries. Best of luck.
Here's the C++ source code for an unusual two-stage algorithm that I devised which can do Realtime Pitch Detection on polyphonic MP3 files while being played on Windows. This free application (PitchScope Player, available on web) is frequently used to detect the notes of a guitar or saxophone solo upon a MP3 recording. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a MP3 music file. Note onsets are accurately inferred by a significant change in the most dominant pitch (a musical note) at any given moment during the MP3 recording.
When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). Linked at the bottom, is a snapshot of the actual harmonics which occur during a polyphonic MP3 recording of a guitar solo.
Instead of a FFT, I use a modified DFT transform, with logarithmic frequency spacing, to first detect these possible harmonics by looking for frequencies with peak levels (see diagram below). Because of the way that I gather data for my modified Log DFT, I do NOT have to apply a Windowing Function to the signal, nor do add and overlap. And I have created the DFT so its frequency channels are logarithmically located in order to directly align with the frequencies where harmonics are created by the notes on a guitar, saxophone, etc.
Now being retired, I have decided to release the source code for my pitch detection engine within a free demonstration app called PitchScope Player. PitchScope Player is available on the web, and you could download the executable for Windows to see my algorithm at work on a mp3 file of your choosing. The below link to GitHub.com will lead you to my full source code where you can view how I detect the harmonics with a custom Logarithmic DFT transform, and then look for partials (harmonics) whose frequencies satisfy the correct integer relationship which defines a 'pitch'.
My Pitch Detection Algorithm is actually a two-stage process: a) First the ScalePitch is detected ('ScalePitch' has 12 possible pitch values: {E, F, F#, G, G#, A, A#, B, C, C#, D, D#} ) b) and after ScalePitch is determined, then the Octave is calculated by examining all the harmonics for the 4 possible Octave-Candidate notes. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a polyphonic MP3 file. That usually corresponds to the notes of an instrumental solo. Those interested in the C++ source code for my Two-Stage Pitch Detection algorithm might want to start at the Estimate_ScalePitch() function within the SPitchCalc.cpp file at GitHub.com.
https://github.com/CreativeDetectors/PitchScope_Player
Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. After the Octave of the note is determined, then we know the frequency of the Fundamental.
I had a similar problem with microphone input on a project I did a few years back - turned out to be due to a DC offset.
Make sure you remove any bias before attempting FFT or whatever other method you are using.
It is also possible that you are running into headroom or clipping problems.
Graphs are the best way to diagnose most problems with audio.
Take a look at this sample application:
http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I realize the app is in C# and you need C++, and I realize this is .Net/Windows and you're on a mac... But I figured his FFT implementation might be a starting reference point. Try to compare your FFT implementation to his. (His is the iterative, breadth-first version of Cooley-Tukey's FFT). Are they similar?
Also, the "random" behavior you're describing might be because you're grabbing data returned by your sound card directly without assembling the values from the byte-array properly. Did you ask your sound card to sample 16 bit values, and then gave it a byte-array to store the values in? If so, remember that two consecutive bytes in the returned array make up one 16-bit audio sample.
Java code for a real-time real detector is available at http://code.google.com/p/freqazoid/.
It works fairly well on any computer running post-2008 real-time Java. The project has been dropped and could be picked up by any interested party. Contact me if you want further details.
Check out aubio, and open source library which includes several state-of-the-art methods for pitch tracking.
I have asked a similar question here:
C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input
EDIT:
Performous contains a C++ module for realtime pitch detection
Also Yin Pitch-Tracking algorithm
You could do real time pitch detection, be it of a singer's voice, with TarsosDSP
https://github.com/JorenSix/TarsosDSP
just in case anyone hasn't heard of it yet :-)
Can you adapt anything from instrument tuners? My delightfully compact guitar tuner is able to detect the pitch of the strings pretty well. I see this reference to a piano tuner which explains an algorithm to some extent.
Here are some open source libraries that implement pitch detection:
WORLD : speech analysis/synthesis toolkit. This is especially suitable if your source signal is voice.
aubio : audio feature extraction library. Implements many pitch detection algorithms.
Pitch detection : a collection of pitch detection algorithms implemented in C++.
dywapitchtrack : a high quality pitch detection algorithm.
YIN : another implementation of the YIN algorithm in a single C++ source file.

Sound pressure display for WAVE PCM data

The digital sound is playing using DirectSound device. It is necessary to display sound activity in decibels - like analog devices do.
What is the right way to calculate sound pressure from the WAVE PCM data (44100 Hz, 16-bit)?
if you just need an "idea" of the sound pressure, you can simply compute the log-energy on some time franmes of the signal: split the signal every N samples, compute 10*log(sum(xn**2)) where x are the N samples, and you get a value in the dB domain. If you need to precisely display a measure (that is your 0 dB matches say a mixtable 0dB), it is a bit more complicated.
See here for more details:
http://music.columbia.edu/pipermail/music-dsp/2002-April/048341.html
Sound pressure is a measure of force per unit area. To determine this you would have to have information about the speaker(s) on which the audio is played. You can obtain a decibel level with respect to an arbitrary reference (as opposed to the threshold of hearing) with the algorithm proposed by cournape.
Calculate the average signal power over a time interval, compute the base-10 logarithm and multiply by 19. The average power is calculated by averaging the the square of each sample over the interval. Note that positive and negative values are necessary (i.e. it must be an AC signal). So, make sure the PCM values are either floating-point, 2's complement or offset unsigned values accordingly.
Also, by applying Parseval's theorum and the Fourier transform you can also generate signal levels for different frequency bands.

Resources