Sound pressure display for WAVE PCM data - audio

The digital sound is playing using DirectSound device. It is necessary to display sound activity in decibels - like analog devices do.
What is the right way to calculate sound pressure from the WAVE PCM data (44100 Hz, 16-bit)?

if you just need an "idea" of the sound pressure, you can simply compute the log-energy on some time franmes of the signal: split the signal every N samples, compute 10*log(sum(xn**2)) where x are the N samples, and you get a value in the dB domain. If you need to precisely display a measure (that is your 0 dB matches say a mixtable 0dB), it is a bit more complicated.
See here for more details:
http://music.columbia.edu/pipermail/music-dsp/2002-April/048341.html

Sound pressure is a measure of force per unit area. To determine this you would have to have information about the speaker(s) on which the audio is played. You can obtain a decibel level with respect to an arbitrary reference (as opposed to the threshold of hearing) with the algorithm proposed by cournape.
Calculate the average signal power over a time interval, compute the base-10 logarithm and multiply by 19. The average power is calculated by averaging the the square of each sample over the interval. Note that positive and negative values are necessary (i.e. it must be an AC signal). So, make sure the PCM values are either floating-point, 2's complement or offset unsigned values accordingly.
Also, by applying Parseval's theorum and the Fourier transform you can also generate signal levels for different frequency bands.

Related

What's the actual data in a WAV file?

I'm following the python challenge riddles, and I now need to analyse a wav file. I learn there is a python module that reads the frames, and that these frames are 16bit or 8bit.
What I don't understand, is what does this bits represent? Are these values directly transformed to a voltage applied to the speakers (say via factoring)?
The bits represent the voltage level of an electrical waveform at a specific moment in time.
To convert the electrical representation of a sound wave (an analog signal) into digital data, you sample the waveform at regular intervals, like this:
Each of the blue dots indicates the value of a four-bit number that represents the height of the analog signal at that point in time (the X axis being time, and the Y axis being voltage).
In .WAV files, these points are represented by 8-bit numbers (having 256 different possible values) or 16 bit numbers (having 65536 different possible values). The more bits you have in each number, the greater the accuracy of your digital sampling.
WAV files can actually contain all sorts of things, but it is most typically linear pulse-code modulation (LPCM). Each frame contains a sample for each channel. If you're dealing with a mono file, then each frame is a single sample. The sample rate specifies how many samples per second there are per channel. CD-quality audio is 16-bit samples taken 44,100 times per second.
These samples are actually measuring the pressure level for that point in time. Imagine a speaker compressing air in front of it to create sound, vibrating back and forth. For this example, you can equate the sample level to the position of the speaker cone.

Calculate A-weighted (or B or C) SPL decibel iOS

How can I calculate A-weighted and C-weighted dB sound levels from the microphone on iOS?
Here is what I have tried, but the reading I get is far below the sound level meter I have next to my iPhone:
Using the Novocain library, which I have slightly modified to set the audio session mode to Measurement.
Using the Maximilian audio library to run the incoming audio frames through an FFT and converting the amplitudes into dB.
Using the Maximilian audio libraries's Octave Analyser to place the FFT output into octave bins from 10hz to 20480hz.
For each octave bin I am apply a db gain of the relevant dB-weighing (e.g. apply -70.f db gain to the db value stored in the 10hz bin to get an A-weighted dB gain).
Added the db values of each bin together by reducing each dB bin to an amplitude, and the gain to an an amplitude, making the addition, and converting back to a dB value again.
Is this on the correct track, I have my doubts? Can someone outline an approach? Suggest a library and/or other example (I have looked).
To note – I would like approximate dB(A) and dB(C) values, this does not need to be scientific. Not sure how to compensate for the frequency response of the microphone, could the above technique be correct if it were compensating for the response of the microphone?
I don't think you can measure physical Sound Pressure Levels from a device. In step 2 you "convert the amplitudes into dB." However the amplitude that you record from the device has arbitrary units. When recording 16-bit audio, the audio is represented as numbers in the range -32768 to +32767. If you are working with floating point data then this is normalised by 32768 so that it has a range of (approximately) -1 to +1.
The device's microphone has to cope with a wide variety of sound levels. Generally devices will have some form of Automatic Gain Control which adapts to the current average sound level. This means that if you measure a peak value of 1.0 then you have no way of knowing the actual SPL it corresponded to. You can convert a recording to a series of dBs, but this uses a different definition of dB: as a power ratio. This has no correlation with SPL measurements such as dB(A).
It may be possible to produce an approximate dB(A) measurement if you are able to turn off the AGC and calibrate your device against your sound meter
EDIT: JASA have published a paper with a detailed comparison of existing SPL measurement apps with stats for the comparison of different generations of iPhone: Evaluation of smartphone sound measurement applications

How would I sample an audio tract at nyquist frequency using c and a micro-controller?

This is as simple and less vague as I can make it, so please and try to help me out.
By this, meaning I want to:
1) Input an audio track (Anaglod)
2) Using the micro controllers ADC
convert it to a digital output
3) Then Have the
microcontollers/boards timer sample
the data at selected intervuls.
4) Tell the board to take the "Sampled
audio track" and now sample it at a
rate of 2B, ( B meaning the highest
frequency.
F= Frequency
F(Hz=1/s) E.x. 100Hz = 1000 (Cyc/sec)
F(s)= 1/(2f)
Example problem: 1000 hz = Highest
frequency 1/2(1000hz) = 1/2000 =
5x10(-3) sec/cyc or a sampling rate of
5ms
5) Spit it back at the boards ADC and
convert it back to analog, thus the
out-put is a perfect reconstruction of
the initial audio track.
Using Fourier Analysis i will determine the highest frequency at which I will sample the track at.
However in theory it sounds easy enough and straight forward, but what I need is to program this in C and utilize my msp430 chip/Experimenters board to sample the track.
Im going to be using Texas Instruments CCS and Octave for my programming and debugging. This is my board that I will be using.
Questions:
Is C the right language for this? Can I get any examples of how to sample the tack at nyquist frequency using C? What code in C will tell the board to utilize the ADC component? And any recommended information that is similar or that will help me on this project.
I don't fully understand what you want to do, but I'll answer your specific questions.
Yes, C is the right language for this.
You should probably look at application code on the Texas Instruments website to see how to interact with the ADC. You can start with the example code listed at the bottom of the page you linked to. It has C code that shows how to use the ADC.
Incidentally, an ADC only converts analog to digital. To go digital to analog, you need a DAC, which this board does not appear to have.
5) ADC doesnt do Digital-to-Analog Conversion, 'cause it's ADC, not DAC. But you may use PWM with Low-pass filter to output analog signal.
It is often a bad idea to sample signal at Nyquist frequency. This will cause lots of aliasing at high frequencies. For example signal with frequency F-deltaF, where deltaF as small, will look like F amplitude modulated by 2deltaF.
That's why CD sampling rate is 44.1 kSPS, not 30 kSPS (as twice 15 kHz -- higher frequency limit).
You have to sample the signal with a frequency that is twice as high as the highest frequency in your signal. Otherwise you get aliasing effects (distortion of the original signal). It is not possible to determine the highest frequency in your signal with fourier analysis because to perform an fft you have to convert your analog signal to digital values - with a conversion frequency (that you want to determine with the fft).
The highest frequency in your input signal is defined by the analog input filter that the signal must pass before analog to digital conversion.

Basic unit of Sound?

If we consider computer graphics to be the art of image synthesis where the basic unit is a pixel.
What is the basic unit of sound synthesis?
[This relates to programming as I want to generate this via a computer program.]
Thanks!
The basic unit is a sample
In a WAVE file, the sample is just an integer specifying where to move the speaker head to.
The sample rate determines how often a new sample is fed to the speakers (I'm not entirely sure how this part works, but it does get converted to an analog signal first). The samples are typically laid out in the file one right after another.
When you plot all the samples with x-axis being time and y-axis being sample_value, you can see the waveform.
In a wave file, samples can (theoretically) be any bit-size from 0-65535, which remains constant throughout the wave file. But typically 16 or 24 bits are used.
Computer graphics can also have vector shapes as basic units, not just pixels. Generally, vector graphics are generated via computer tools while captured data tends to appear as a grid of pixels (corresponding to an array of sensors in a camera or other capture device). Obviously there is considerable crossover between those classifications.
Similarly, there are sampled (such as .WAV) and generative (such as .MIDI) forms of computer audio. In the sampled case, the smallest unit is a single sample. Just like an array of pixels in the brightness, x- and y-dimensions come together to form an image, an array of samples in the loudness and time dimensions come together to form a sound. In the generative case, it will be something more like a single tone rendered in a particular voice just like vector graphics have paths drawn with particular textures.
A pixel can have a value and be encoded in digital bitmap samples. The same properties apply to sound and digital audio samples.
A pixel is a physical device that can only render the amplitudes of 3 frequencies of light (Red, Green, Blue) at a time. A speaker is a physical device that can render the amplitudes of a wide range of frequencies (~40,000) at a time. The bit resolution of a sample (number of bits used to to store the value of a sample) mainly determines how many colors/tones can be rendered - the fidelity of the physical playback device.
Also, as patterns of pixels can be encoded or compressed, most patterns of sound samples are also encoded or compressed (or both).
The fundamental unit of signal processing (of which audio is a special case) would be the sample.
The frequency at which you need to sample a signal depends on the maximum frequency present in the waveform. Sampling theorem states that it is normally sufficient to sample at twice the frequency of the maximum frequency present in the signal.
http://en.wikipedia.org/wiki/Sampling_theorem
The human ear is sensitive to sounds up to around 20kHz (the upper frequency lowers with age). This is why music on CD is sampled at 44kHz.
It is often more useful to think of music as being comprised of individual frequencies.
http://www.phys.unsw.edu.au/jw/sound.spectrum.html
Most sound analysis and creation is based on this idea.
Related concepts:
Psychoacoustics: Human perception of sound. Relates to modern sound compression techniques such as mp3.
Fourier series: How complex waveforms are composed of individual frequencies.
I would say the basic unit of sound synthesis is the sine wave. But your definition of synthesis is perhaps different to what audio people would refer to sound synthesis. Sound systhesis is the creation of sound using the fundamental components of sound.
With sine waves, we can synthesise sounds using many techniques such as substractive synthesis, additive synthesis or FM synthesis.
Fourier theory states that every sound is a summation of sine waves of differing phases, frequencies and amplitudes.
OK, so how do we represent a sine wave on a computer? well, a sine wave will be generated using a buffer(array) of 'samples' that have been generated by a function or read from a table. The same technique applies to any sound captured on a computer.
A 'sample' is typically represented as number between -1 and 1 that directly correlates to the amplitude of a sound at a given moment in time. A typical sound recorded at 16 bit depth, would have 65536 (2pow16) possible amplitude values. When being recorded, typically, a sample will be captured 44.1k per second of sound. This is called the sampling frequency rate, or simply the sample rate.
Upon playback from you computer, each sample will pass though an Digital to Analogue converter and generate a vibration on your pc speaker and will in turn cause your ear to percieve the recorded sound.
Sound can be expressed as several different units, but the most common in synthesis/computer music is decibels (dB), which are a relative logarithmic measure of amplitude. Specifically they are normally relative to the maximum amplitude of the audio system.
When measuring sound in "real life", the units are normally A-weighted Decibels or dB(A).
The frequency of a sound (i.e. its pitch) is its amplitude over time, or in the digital world, its amplitude over samples. The number of samples per unit of real time is called the sampling rate; conventional hi-fi systems have sampling rates of 44 kHz (44,000 samples per second) and synthesis/recording software usually supports up to 96 kHz.
Everything sound in the digital domain can be represented as a waveform with the X-axis representing the time (or sample number) and the Y-axis representing the amplitude.
frequency and amplitude of the wave are what make up sound.
That is for a tone.
Music or for that matter most noise is a composite of multiple simultaneous sound waves superimposed on one another.
The unit for amplitute is the
Bel. (We use tenths of a Bel
therefore the term decibel)
The unit for frequency is the
Hertz.
That being said synthesis of music is a large field.
Bitmapped graphics are based on sampling the amplitude of light in a 2D space, where each sample is digitized to a given bit depth and often converted to a logarithmic representation at a different bit depth. The samples are always positive, since you can't be darker than pure black. Each of these samples is called a pixel.
Sound recording is most often based on sampling the magnitude of sound pressure at a microphone, where the samples are taken at constant time intervals. These samples can be positive or negative with respect to perfect silence. Most often these samples are not converted to a logarithm, even though sound is perceived in a logarithmic fashion just as light is. There is no special term to refer to these samples as there is with pixels.
The Bels and Decibels mentioned by others are useful in the context of measuring peak or average sound levels. They are not used to describe the individual sound samples.
You might also find it useful to know how sound file formats compare to image file formats. WAVE is an uncompressed format specific to Windows and is analogous to BMP. MP3 is a lossy compression analogous to JPEG. FLAC is a lossless compression analogous to 24-bit PNG.
If computer graphics are colored dots in 2 dimensional space representing a 3 dimensional space, then sound synthesis is amplitude values regularly partitioned in time representing musical events.
If you want your result to sound like music (the kind of music most people like at least), then you are either going to use some standard synthesis techniques, or literally waste decades of your life reinventing them from scratch.
The most basic techniques are additive synthesis, in which the individual elements are the frequencies, amplitudes, and phases of sine oscillators; subtractive synthesis, where you work with filter coefficients and a complex input waveform; frequency modulation synthesis, where you work with modulation depths and rates of stages of modulation; granular synthesis where short (hundredths to tenths of a second long) enveloped pieces of a recorded sound or an artificial waveform are combined in immense numbers. Each of these in practice uses parameters that evolve over the course of a note, and often you will mix elements of various techniques into a larger instrument.
I recommend this book, though it doesn't have the math for many concepts it at least lays the ground for the concepts used, and gives a nice overview of the techniques.
You wouldn't waste your time going sample by sample to do music in practice any more than you would waste your time going pixel by pixel to render 3d (in other words yeah go sample by sample if making a tool for other people to make music with, but that is way too low a level if you are interested in the task of making music).
Probably the envelope. A tone/note has a shape described by: attack decay sustain release
The byte, or word, depending on the bit-depth of the sound.

Explain the FFT to me

I want to take audio PCM data and find peaks in it. Specifically, I want to return the frequency and time at which a peak occurs.
My understanding of this is that I have to take the PCM data and dump it into an array, setting it as the real values with the complex parts set to 0. I then take the FFT, and I get an array back. If each number in the array is a magnitude value, how do I get the frequency associated with each one? Also, do I take the magnitude of the real & complex part or just discard the complex values?
Finally, if I wanted to find the peaks in a single song, do I just set a small window to FFT and slide it across all of the audio? Any suggestions on how large that window should be?
If the samplerate of your PCM data is F, then the highest frequency component in the FFT is F/2. Suppose your PCM data was sampled at 44100Hz, then your FFT values will run from 0Hz (DC) to 22050Hz. If you start with N samples, (N being a power of 2), then the FFT may return N/2 values representing all positive frequencies from 0 to F/2, or it may return N values that also include the negative frequencies from -F/2 to 0. You should check the specification of your FFT algorithm to find out to which frequency each array item is mapped.
To find the peaks, you need to look at the magnitude of the FFT values. So you need to add the squared real and imaginary parts of each complex value.
Suppose your FFT of N PCM samples returns N/2 complex values representing positive frequencies. Then the distance between 2 complex samples is F/2N Hz. With F=44100Hz and N=1024 samples, this would be 21.5Hz. This is your frequency resolution. If you need to find lower frequency beats, the FFT window will need to be extended.
well,
A raw array of size 512 of complex numbers expressing the input wave, when processed with FFT we will replace the imaginary parts with zero (according to intended use), leaving the real parts, then pass the array to the FFT with Sample rate : 8192 Hz.
Now we have a 512 array of FFTed real values, each value is an irrational number, every irrational number express several useful values.
To get the fundamental frequency we have to divide the sample rate by the buffer size:
8192/512 = 32;
32 is the resolution of the FFT values means that we're getting to know the high amplitude frequencies near the numbers that are multiples of 32.
Like if we have a wave of
frequency : 3 48 23 128
Amplitude : 10 5 12 8 dB (ref = 1)
after FFT we get:
frequency : 0 32 64 128
Amplitude : 9 8 2 8
FFT is frequency domain means it arranges according to frequency
Time-domain on the other side means arranging by time we listen to music from second zero to second N.
FFT can only listen when it arranged by Frequency from frequency 0 to frequency N.
So it arranges frequencies in ascending order, since it didn't take all the actual samples from the audio (which are approaching infinite) like taking every nanosecond & less to the FFT, luckily this doesn't happen FFT takes samples from the audio, takes a sample every (1/sample rate) second. this samples get buffered (in our case : 512), each 512 samples buffered into FFT, the output is 512 FFT values.
Since FFT arranges frequencies, it messes with the time samples, samples now arranged according to their frequencies.
The frequencies shown on regular base which is the fundamental frequency which is sample rate divided by buffer size, which is in our case 8192/512 = 32.
So, frequencies power shown every 32 frequencies, the power of the nearest frequency is shown according to how much the power frequency is near to the index.
High resolution can be achieved by using higher sample rate.
To show frequencies we print the index in ascending corresponding to the Amplitude.
Amplitude = 20log10(output/ref)
Amplitudes printed next to each Index show the power of the frequency & get more accurate according to the precision of the resolution.
Conclusion, FFT produces an index of amplitudes, each amplitude expresses the power of its corresponding index (frequency).
You may actually be looking for a spectrogram, which is basically an FFT of the data in a small window that's slid along the time axis. If you have software that implements this, it might save you some effort. It's what's commonly used for analysing time varying acoustic signals, and is a very useful way to look at sounds. Also, there are some tricks, for example, with windowing data for FFTs, that the spectrogram will probably get right, but will be harder (though not very hard) for you to do correctly.

Resources