How can I calculate A-weighted and C-weighted dB sound levels from the microphone on iOS?
Here is what I have tried, but the reading I get is far below the sound level meter I have next to my iPhone:
Using the Novocain library, which I have slightly modified to set the audio session mode to Measurement.
Using the Maximilian audio library to run the incoming audio frames through an FFT and converting the amplitudes into dB.
Using the Maximilian audio libraries's Octave Analyser to place the FFT output into octave bins from 10hz to 20480hz.
For each octave bin I am apply a db gain of the relevant dB-weighing (e.g. apply -70.f db gain to the db value stored in the 10hz bin to get an A-weighted dB gain).
Added the db values of each bin together by reducing each dB bin to an amplitude, and the gain to an an amplitude, making the addition, and converting back to a dB value again.
Is this on the correct track, I have my doubts? Can someone outline an approach? Suggest a library and/or other example (I have looked).
To note – I would like approximate dB(A) and dB(C) values, this does not need to be scientific. Not sure how to compensate for the frequency response of the microphone, could the above technique be correct if it were compensating for the response of the microphone?
I don't think you can measure physical Sound Pressure Levels from a device. In step 2 you "convert the amplitudes into dB." However the amplitude that you record from the device has arbitrary units. When recording 16-bit audio, the audio is represented as numbers in the range -32768 to +32767. If you are working with floating point data then this is normalised by 32768 so that it has a range of (approximately) -1 to +1.
The device's microphone has to cope with a wide variety of sound levels. Generally devices will have some form of Automatic Gain Control which adapts to the current average sound level. This means that if you measure a peak value of 1.0 then you have no way of knowing the actual SPL it corresponded to. You can convert a recording to a series of dBs, but this uses a different definition of dB: as a power ratio. This has no correlation with SPL measurements such as dB(A).
It may be possible to produce an approximate dB(A) measurement if you are able to turn off the AGC and calibrate your device against your sound meter
EDIT: JASA have published a paper with a detailed comparison of existing SPL measurement apps with stats for the comparison of different generations of iPhone: Evaluation of smartphone sound measurement applications
Related
I have an FFT output from a microphone and I want to detect a specific animal's howl from that (it howls in a characteristic frequency spectrum). Is there any way to implement a pattern recognition algorithm in Arduino to do that?
I already have the FFT part of it working with 128 samples #2kHz sampling rate.
lookup audio fingerprinting ... essentially you probe the frequency domain output from the FFT call and take a snapshot of the range of frequencies together with the magnitude of each freq then compare this between known animal signal and unknown signal and output a measurement of those differences.
Naturally this difference will approach zero when unknown signal is your actual known signal
Here is another layer : For better fidelity instead of performing a single FFT of the entire audio available, do many FFT calls each with a subset of the samples ... for each call slide this window of samples further into the audio clip ... lets say your audio clip is 2 seconds yet here you only ever send into your FFT call 200 milliseconds worth of samples this gives you at least 10 such FFT result sets instead of just one had you gulped the entire audio clip ... this gives you the notion of time specificity which is an additional dimension with which to derive a more lush data difference between known and unknown signal ... experiment to see if it helps to slide the window just a tad instead of lining up each window end to end
To be explicit you have a range of frequencies say spread across X axis then along Y axis you have magnitude values for each frequency at different points in time as plucked from your audio clips as you vary your sample window as per above paragraph ... so now you have a two dimensional grid of data points
Again to beef up the confidence intervals you will want to perform all of above across several different audio clips of your known source animal howl against each of your unknown signals so now you have a three dimensional parameter landscape ... as you can see each additional dimension you can muster will give more traction hence more accurate results
Start with easily distinguished known audio against a very different unknown audio ... say a 50 Hz sin curve tone for known audio signal against a 8000 Hz sin wave for the unknown ... then try as your known a single strum of a guitar and use as unknown say a trumpet ... then progress to using actual audio clips
Audacity is an excellent free audio work horse of the industry - it easily plots a WAV file to show its time domain signal or FFT spectrogram ... Sonic Visualiser is also a top shelf tool to use
This is not a simple silver bullet however each layer you add to your solution can give you better results ... it is a process you are crafting not a single dimensional trigger to squeeze.
In the video The Sound of Hydrogen (original here), the sound
is created using the NIST Atomic Spectra Database and then importing this edited data into Mathematica to modulate a Sine Wave. I was wondering how he turned the data from the website into the values shown in the video (3:47 - top of the page) because it is nothing like what is initially seen on the website.
Short answer: It's different because in the tutorial the sampling rate is 8 kHz while it's probably higher in the original video.
Long answer:
I wish you'd asked this on http://physics.stackexchange.com or http://math.stackexchange.com instead so I could use formulae... Use the bookmarklet
javascript:(function(){function%20a(a){var%20b=a.createElement('script'),c;b.src='https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML.js',b.type='text/javascript',c='MathJax.Hub.Config({tex2jax:{inlineMath:[[\'$\',\'$\']],displayMath:[[\'\\\\[\',\'\\\\]\']],processEscapes:true}});MathJax.Hub.Startup.onload();',window.opera?b.innerHTML=c:b.text=c,a.getElementsByTagName('head')[0].appendChild(b)}function%20b(b){b.MathJax===undefined?a(b.document):b.MathJax.Hub.Queue(new%20b.Array('Typeset',b.MathJax.Hub))}var%20c=document.getElementsByTagName('iframe'),d,e;b(window);for(d=0;d<c.length;d++)e=c[d].contentWindow||c[d].contentDocument,e.document||(e=e.parentNode),b(e)})()
to render the formulae with MathJax:
First of all, note how the Rydberg formula provides the resonance frequencies of hydrogen as $\nu_{nm} = c R \left(\frac1{n^2}-\frac1{m^2}\right)$ where $c$ is the speed of light and $R$ the Rydberg constant. The highest frequency is $\nu_{1\infty}\approx 3000$ THz while for $n,m\to\infty$ there is basically no lower limit, though if you restrict yourself to the Lyman series ($n=1$) and the Balmer series ($n=2$), the lower limit is $\nu_{23}\approx 400$ THz. These are electromagnetic frequencies corresponding to light (not entirely in the visual spectrum (ranging from 430–790 THz), there's some IR and lots of UV in there which you cannot see). "minutephysics" now simply considers these frequencies as sound frequencies that are remapped to the human hearing range (ca 20-20000 Hz).
But as the video stated, not all these frequencies resonate with the same strength, and the data at http://nist.gov/pml/data/asd.cfm also includes the amplitudes. For the frequency $\nu_{nm}$ let's call the intensity $I_{nm}$ (intensity is amplitude squared, I wonder if the video treated that correctly). Then your signal is simply
$f(t) = \sum\limits_{n=1}^N \sum\limits_{m=n+1}^M I_{nm}\sin(\alpha(\nu_{nm})t+\phi_{nm})$
where $\alpha$ denotes the frequency rescaling (probably something linear like $\alpha(\nu) = (20 + (\nu-400\cdot10^{12})\cdot\frac{20000-20}{(3000-400)\cdot 10^{12}})$ Hz) and the optional phase $\phi_{nm}$ is probably equal to zero.
Why does it sound slightly different? Probably the actual video did use a higher sampling rate than the 8 kHz used in the tutorial video.
If we consider computer graphics to be the art of image synthesis where the basic unit is a pixel.
What is the basic unit of sound synthesis?
[This relates to programming as I want to generate this via a computer program.]
Thanks!
The basic unit is a sample
In a WAVE file, the sample is just an integer specifying where to move the speaker head to.
The sample rate determines how often a new sample is fed to the speakers (I'm not entirely sure how this part works, but it does get converted to an analog signal first). The samples are typically laid out in the file one right after another.
When you plot all the samples with x-axis being time and y-axis being sample_value, you can see the waveform.
In a wave file, samples can (theoretically) be any bit-size from 0-65535, which remains constant throughout the wave file. But typically 16 or 24 bits are used.
Computer graphics can also have vector shapes as basic units, not just pixels. Generally, vector graphics are generated via computer tools while captured data tends to appear as a grid of pixels (corresponding to an array of sensors in a camera or other capture device). Obviously there is considerable crossover between those classifications.
Similarly, there are sampled (such as .WAV) and generative (such as .MIDI) forms of computer audio. In the sampled case, the smallest unit is a single sample. Just like an array of pixels in the brightness, x- and y-dimensions come together to form an image, an array of samples in the loudness and time dimensions come together to form a sound. In the generative case, it will be something more like a single tone rendered in a particular voice just like vector graphics have paths drawn with particular textures.
A pixel can have a value and be encoded in digital bitmap samples. The same properties apply to sound and digital audio samples.
A pixel is a physical device that can only render the amplitudes of 3 frequencies of light (Red, Green, Blue) at a time. A speaker is a physical device that can render the amplitudes of a wide range of frequencies (~40,000) at a time. The bit resolution of a sample (number of bits used to to store the value of a sample) mainly determines how many colors/tones can be rendered - the fidelity of the physical playback device.
Also, as patterns of pixels can be encoded or compressed, most patterns of sound samples are also encoded or compressed (or both).
The fundamental unit of signal processing (of which audio is a special case) would be the sample.
The frequency at which you need to sample a signal depends on the maximum frequency present in the waveform. Sampling theorem states that it is normally sufficient to sample at twice the frequency of the maximum frequency present in the signal.
http://en.wikipedia.org/wiki/Sampling_theorem
The human ear is sensitive to sounds up to around 20kHz (the upper frequency lowers with age). This is why music on CD is sampled at 44kHz.
It is often more useful to think of music as being comprised of individual frequencies.
http://www.phys.unsw.edu.au/jw/sound.spectrum.html
Most sound analysis and creation is based on this idea.
Related concepts:
Psychoacoustics: Human perception of sound. Relates to modern sound compression techniques such as mp3.
Fourier series: How complex waveforms are composed of individual frequencies.
I would say the basic unit of sound synthesis is the sine wave. But your definition of synthesis is perhaps different to what audio people would refer to sound synthesis. Sound systhesis is the creation of sound using the fundamental components of sound.
With sine waves, we can synthesise sounds using many techniques such as substractive synthesis, additive synthesis or FM synthesis.
Fourier theory states that every sound is a summation of sine waves of differing phases, frequencies and amplitudes.
OK, so how do we represent a sine wave on a computer? well, a sine wave will be generated using a buffer(array) of 'samples' that have been generated by a function or read from a table. The same technique applies to any sound captured on a computer.
A 'sample' is typically represented as number between -1 and 1 that directly correlates to the amplitude of a sound at a given moment in time. A typical sound recorded at 16 bit depth, would have 65536 (2pow16) possible amplitude values. When being recorded, typically, a sample will be captured 44.1k per second of sound. This is called the sampling frequency rate, or simply the sample rate.
Upon playback from you computer, each sample will pass though an Digital to Analogue converter and generate a vibration on your pc speaker and will in turn cause your ear to percieve the recorded sound.
Sound can be expressed as several different units, but the most common in synthesis/computer music is decibels (dB), which are a relative logarithmic measure of amplitude. Specifically they are normally relative to the maximum amplitude of the audio system.
When measuring sound in "real life", the units are normally A-weighted Decibels or dB(A).
The frequency of a sound (i.e. its pitch) is its amplitude over time, or in the digital world, its amplitude over samples. The number of samples per unit of real time is called the sampling rate; conventional hi-fi systems have sampling rates of 44 kHz (44,000 samples per second) and synthesis/recording software usually supports up to 96 kHz.
Everything sound in the digital domain can be represented as a waveform with the X-axis representing the time (or sample number) and the Y-axis representing the amplitude.
frequency and amplitude of the wave are what make up sound.
That is for a tone.
Music or for that matter most noise is a composite of multiple simultaneous sound waves superimposed on one another.
The unit for amplitute is the
Bel. (We use tenths of a Bel
therefore the term decibel)
The unit for frequency is the
Hertz.
That being said synthesis of music is a large field.
Bitmapped graphics are based on sampling the amplitude of light in a 2D space, where each sample is digitized to a given bit depth and often converted to a logarithmic representation at a different bit depth. The samples are always positive, since you can't be darker than pure black. Each of these samples is called a pixel.
Sound recording is most often based on sampling the magnitude of sound pressure at a microphone, where the samples are taken at constant time intervals. These samples can be positive or negative with respect to perfect silence. Most often these samples are not converted to a logarithm, even though sound is perceived in a logarithmic fashion just as light is. There is no special term to refer to these samples as there is with pixels.
The Bels and Decibels mentioned by others are useful in the context of measuring peak or average sound levels. They are not used to describe the individual sound samples.
You might also find it useful to know how sound file formats compare to image file formats. WAVE is an uncompressed format specific to Windows and is analogous to BMP. MP3 is a lossy compression analogous to JPEG. FLAC is a lossless compression analogous to 24-bit PNG.
If computer graphics are colored dots in 2 dimensional space representing a 3 dimensional space, then sound synthesis is amplitude values regularly partitioned in time representing musical events.
If you want your result to sound like music (the kind of music most people like at least), then you are either going to use some standard synthesis techniques, or literally waste decades of your life reinventing them from scratch.
The most basic techniques are additive synthesis, in which the individual elements are the frequencies, amplitudes, and phases of sine oscillators; subtractive synthesis, where you work with filter coefficients and a complex input waveform; frequency modulation synthesis, where you work with modulation depths and rates of stages of modulation; granular synthesis where short (hundredths to tenths of a second long) enveloped pieces of a recorded sound or an artificial waveform are combined in immense numbers. Each of these in practice uses parameters that evolve over the course of a note, and often you will mix elements of various techniques into a larger instrument.
I recommend this book, though it doesn't have the math for many concepts it at least lays the ground for the concepts used, and gives a nice overview of the techniques.
You wouldn't waste your time going sample by sample to do music in practice any more than you would waste your time going pixel by pixel to render 3d (in other words yeah go sample by sample if making a tool for other people to make music with, but that is way too low a level if you are interested in the task of making music).
Probably the envelope. A tone/note has a shape described by: attack decay sustain release
The byte, or word, depending on the bit-depth of the sound.
I want to calculate room noise level with the computer's microphone. I record noise as an audio file, but how can I calculate the noise dB level?
I don't know how to start!
All the previous answers are correct if you want a technically accurate or scientifically valuable answer. But if you just want a general estimation of comparative loudness, like if you want to check whether the dog is barking or whether a baby is crying and you want to specify the threshold in dB, then it's a relatively simple calculation.
Many wave-file editors have a vertical scale in decibels. There is no calibration or reference measurements, just a simple calculation:
dB = 20 * log10(amplitude)
The amplitude in this case is expressed as a number between 0 and 1, where 1 represents the maximum amplitude in the sound file. For example, if you have a 16 bit sound file, the amplitude can go as high as 32767. So you just divide the sample by 32767. (We work with absolute values, positive numbers only.) So if you have a wave that peaks at 14731, then:
amplitude = 14731 / 32767
= 0.44
dB = 20 * log10(0.44)
= -7.13
But there are very important things to consider, specifically the answers given by the others.
1) As Jörg W Mittag says, dB is a relative measurement. Since we don't have calibrations and references, this measurement is only relative to itself. And by that I mean that you will be able to see that the sound in the sound file at this point is 3 dB louder than at that point, or that this spike is 5 decibels louder than the background. But you cannot know how loud it is in real life, not without the calibrations that the others are referring to.
2) This was also mentioned by PaulR and user545125: Because you're evaluating according to a recorded sound, you are only measuring the sound at the specific location where the microphone is, biased to the direction the microphone is pointing, and filtered by the frequency response of your hardware. A few feet away, a human listening with human ears will get a totally different sound level and different frequencies.
3) Without calibrated hardware, you cannot say that the sound is 60dB or 89dB or whatever. All that this calculation can give you is how the peaks in the sound file compares to other peaks in the same sound file.
If this is all you want, then it's fine, but if you want to do something serious, like determine whether the noise level in a factory is safe for workers, then listen to Paul, user545125 and Jörg.
You do need reference hardware (i.e., a reference mic) to calculate noise level (dB SPL, or sound pressure level). One thing Radio Shack sells is a $50 dB SPL meter. If you're doing scientific calculations, I wouldn't use it. But if the goal is to get a general idea of a weighted measurement (dBA or dBC) of the sound pressure in a given environment, then it might be useful. As a sound engineer, I use mine all the time to see how much sound volume I'm generating while I mix. It's usually accurate to within 2 dB.
That's my answer. The rest is FYI stuff.
Jorg is correct that dB SPL is a relative measurement. All decibel measurements are. But you've implied a reference of 0 dB SPL, or 20 micropascals, scientifically agreed to be the most quiet sound a human ear can detect (though, understandably, what a person can actually hear is very difficult to determine). This, according to Wikipedia, is about the sound of a flying mosquito from about 10 feet away (http://en.wikipedia.org/wiki/Decibel).
By assuming you don't understand decibels, I think Jorg is just trying to out-geek you. He clearly didn't give you a practical answer. :-)
Unweighted measurements (dB, instead of dBA or dBC) are rarely used, because most sound pressure is not detected by the human ear. In a given office environment, there is usually 80-100 dB SPL (sound pressure level). To give you an idea of exactly how much is not heard, in the U.S., occupational regulations limit noise exposure to 80 dBA for a given 8-hour work shift (80 dBA is about the background noise level of your average downtown street - difficult, but not impossible to talk over). 85 dBA is oppressive, and at 90, most people are trying to get away. So the difference between 80 dB and 80 dBA is very significant -- 80 dBA is difficult to talk over, and 80 dB is quite peaceful. :-)
So what is 'A' weighting? 'A' weighting compensates for the fact that we don't perceive lower frequency sounds as well as high frequency sounds (we hear 20 Hz to 20,000 Hz). There's a lot of low-end rumble that our ears/brains pretty much ignore. In addition, we're more sensitive to a certain midrange (1000 Hz to 4000 Hz). Most agree that this frequency range contains the sounds of consonants of speech (vowels happen at a much lower frequency). Imagine talking with just vowels. You can't understand anything. Thus, the ability of a human to be able to communicate (conventionally) rests in the 1kHz-5kHz bump in hearing sensitivity. Interestingly, this is why most telephone systems only transmit 300 Hz to 3000 Hz. It was determined that this was the minimal response needed to understand the voice on the other end.
But I think that's more than you wanted to know. Hope it helps. :-)
You can't easily measure absolute dB SPL, since your microphone and analogue hardware are not calibrated. You may be able to do an approximate calibration for a particular hardware set up but you would need to repeat this for every different microphone and hardware set up that you plan to support.
If you do have some kind of SPL reference source that you can use then then it gets easier:
use your reference source to generate a tone at a known dB SPL - measure this
measure the ambient noise
calculate noise level = 20 * log10 (V_noise / V_ref) + dB_ref
Of course this assumes that the frequency response of your microphone and audio hardware is reasonably flat and that you just want a flat (unweighted) noise figure. If you want a weighted (e.g. A-weight) noise figure then you'll have to do rather more processing.
According to Merchant et al. (section 3.2 in the appendix: "Measuring acoustic habitats", Methods in Ecology and Evolution, 2015), you can actually calculate absolute, calibrated SPL values using manufacturer specifications by subtracting a correction term S to your relative (scaled to maximum) SPL values:
S = M + G + 20*log10(1/Vadc) + 20*log10(2^Nbit-1)
where M is the sensitivity of the transducer (microphone) re 1 V/Pa. G is the gain applied by the user. Vadc is the zero-to-peak voltage, given by multiplying the rms ADC voltage by a conversion factor of squareroot(2). Nbit is the bit sampling depth.
The last term is necessary if your system scales the amplitude by its maximum.
The correction will be more accurate using end-to-end calibration with sound calibrators.
Note that the formula above is dependent on frequency, but you could apply it over a wider frequency range if your microphone has a flat frequency response.
You can't. dB is a relative unit, IOW it is a unit for comparing two measurements against each other. You can only say that measurement A is x dB louder than measurement B, but in your case you only have one measurement. Therefore, it simply isn't possible to calculate the dB level.
The short answer is: you cannot do sound level measurements with your laptop, nor with your cellphone, etc., for all the reasons outlined previously, plus the fact your cellphone, laptop, etc. use compression algorithms to assure that everything recorded is within the hardware capability. So, if for example you measure a sound then run it through signal processing software such as Head Artemis or LMS Test.Lab, the indicated sound pressure level will always be in the neighborhood of 80 dB(A) regardless of the true level. I can say this from having used cellphone or laptop audio to get an idea of a noise frequency spectrum, while taking level measurements using a calibrated sound level meter. Interestingly, Radio Shack used to sell a microphone intended for speech input while videoconferencing that had very flat frequency response over a broad range, and only cost about $15.
I use a sound level calibrator.
It produces 94 dB or 114dB at 1 KHz
wich is a frecuency where weighting
filters share the same level.
With calibrator at 114dB I adjust mic gain to reach almost full scale
input simply watching a sound card based virtual osciloscope.
Now I know Vref # 114dB.
I developed a simple software based SPL meter
that can be provided if needed. You can use REW too.
You hace to know that PC hardware hardly
reaches 60 dB of dynamic range so calibrating
#114 dB it wont read less than 54dB, wich
is pretty high if you consider that sleeping
is good with less than 35 dB A.
In this case you can calibrate at 94dB
and then you may measure down to 34dB
but again you will hit pc and mic self noise
wich may you prevent to reach such low levels.
Anyway, once calibrated, measures at 114dB
and 94dB should read fine.
Note: the lab standard pistonphone calibrator operates at 250 Hz.
Well! I Used RobertT's Method But It Always Giving Me Oveflow Exception, Then I Used:- int dB = -36 - (value * -1), The Exception Gone, I Don't Know Whether It's Telling dB Values, If You Knew Using Code Given Below, Please Comment Me Whether it's A dB Value or not.
VB.NET:-
Dim dB As Integer = -36 - (9 * -1)
C#:-
int dB = -36 - (9 * -1)
The digital sound is playing using DirectSound device. It is necessary to display sound activity in decibels - like analog devices do.
What is the right way to calculate sound pressure from the WAVE PCM data (44100 Hz, 16-bit)?
if you just need an "idea" of the sound pressure, you can simply compute the log-energy on some time franmes of the signal: split the signal every N samples, compute 10*log(sum(xn**2)) where x are the N samples, and you get a value in the dB domain. If you need to precisely display a measure (that is your 0 dB matches say a mixtable 0dB), it is a bit more complicated.
See here for more details:
http://music.columbia.edu/pipermail/music-dsp/2002-April/048341.html
Sound pressure is a measure of force per unit area. To determine this you would have to have information about the speaker(s) on which the audio is played. You can obtain a decibel level with respect to an arbitrary reference (as opposed to the threshold of hearing) with the algorithm proposed by cournape.
Calculate the average signal power over a time interval, compute the base-10 logarithm and multiply by 19. The average power is calculated by averaging the the square of each sample over the interval. Note that positive and negative values are necessary (i.e. it must be an AC signal). So, make sure the PCM values are either floating-point, 2's complement or offset unsigned values accordingly.
Also, by applying Parseval's theorum and the Fourier transform you can also generate signal levels for different frequency bands.