How can I calculate audio dB level? - audio

I want to calculate room noise level with the computer's microphone. I record noise as an audio file, but how can I calculate the noise dB level?
I don't know how to start!

All the previous answers are correct if you want a technically accurate or scientifically valuable answer. But if you just want a general estimation of comparative loudness, like if you want to check whether the dog is barking or whether a baby is crying and you want to specify the threshold in dB, then it's a relatively simple calculation.
Many wave-file editors have a vertical scale in decibels. There is no calibration or reference measurements, just a simple calculation:
dB = 20 * log10(amplitude)
The amplitude in this case is expressed as a number between 0 and 1, where 1 represents the maximum amplitude in the sound file. For example, if you have a 16 bit sound file, the amplitude can go as high as 32767. So you just divide the sample by 32767. (We work with absolute values, positive numbers only.) So if you have a wave that peaks at 14731, then:
amplitude = 14731 / 32767
= 0.44
dB = 20 * log10(0.44)
= -7.13
But there are very important things to consider, specifically the answers given by the others.
1) As Jörg W Mittag says, dB is a relative measurement. Since we don't have calibrations and references, this measurement is only relative to itself. And by that I mean that you will be able to see that the sound in the sound file at this point is 3 dB louder than at that point, or that this spike is 5 decibels louder than the background. But you cannot know how loud it is in real life, not without the calibrations that the others are referring to.
2) This was also mentioned by PaulR and user545125: Because you're evaluating according to a recorded sound, you are only measuring the sound at the specific location where the microphone is, biased to the direction the microphone is pointing, and filtered by the frequency response of your hardware. A few feet away, a human listening with human ears will get a totally different sound level and different frequencies.
3) Without calibrated hardware, you cannot say that the sound is 60dB or 89dB or whatever. All that this calculation can give you is how the peaks in the sound file compares to other peaks in the same sound file.
If this is all you want, then it's fine, but if you want to do something serious, like determine whether the noise level in a factory is safe for workers, then listen to Paul, user545125 and Jörg.

You do need reference hardware (i.e., a reference mic) to calculate noise level (dB SPL, or sound pressure level). One thing Radio Shack sells is a $50 dB SPL meter. If you're doing scientific calculations, I wouldn't use it. But if the goal is to get a general idea of a weighted measurement (dBA or dBC) of the sound pressure in a given environment, then it might be useful. As a sound engineer, I use mine all the time to see how much sound volume I'm generating while I mix. It's usually accurate to within 2 dB.
That's my answer. The rest is FYI stuff.
Jorg is correct that dB SPL is a relative measurement. All decibel measurements are. But you've implied a reference of 0 dB SPL, or 20 micropascals, scientifically agreed to be the most quiet sound a human ear can detect (though, understandably, what a person can actually hear is very difficult to determine). This, according to Wikipedia, is about the sound of a flying mosquito from about 10 feet away (http://en.wikipedia.org/wiki/Decibel).
By assuming you don't understand decibels, I think Jorg is just trying to out-geek you. He clearly didn't give you a practical answer. :-)
Unweighted measurements (dB, instead of dBA or dBC) are rarely used, because most sound pressure is not detected by the human ear. In a given office environment, there is usually 80-100 dB SPL (sound pressure level). To give you an idea of exactly how much is not heard, in the U.S., occupational regulations limit noise exposure to 80 dBA for a given 8-hour work shift (80 dBA is about the background noise level of your average downtown street - difficult, but not impossible to talk over). 85 dBA is oppressive, and at 90, most people are trying to get away. So the difference between 80 dB and 80 dBA is very significant -- 80 dBA is difficult to talk over, and 80 dB is quite peaceful. :-)
So what is 'A' weighting? 'A' weighting compensates for the fact that we don't perceive lower frequency sounds as well as high frequency sounds (we hear 20 Hz to 20,000 Hz). There's a lot of low-end rumble that our ears/brains pretty much ignore. In addition, we're more sensitive to a certain midrange (1000 Hz to 4000 Hz). Most agree that this frequency range contains the sounds of consonants of speech (vowels happen at a much lower frequency). Imagine talking with just vowels. You can't understand anything. Thus, the ability of a human to be able to communicate (conventionally) rests in the 1kHz-5kHz bump in hearing sensitivity. Interestingly, this is why most telephone systems only transmit 300 Hz to 3000 Hz. It was determined that this was the minimal response needed to understand the voice on the other end.
But I think that's more than you wanted to know. Hope it helps. :-)

You can't easily measure absolute dB SPL, since your microphone and analogue hardware are not calibrated. You may be able to do an approximate calibration for a particular hardware set up but you would need to repeat this for every different microphone and hardware set up that you plan to support.
If you do have some kind of SPL reference source that you can use then then it gets easier:
use your reference source to generate a tone at a known dB SPL - measure this
measure the ambient noise
calculate noise level = 20 * log10 (V_noise / V_ref) + dB_ref
Of course this assumes that the frequency response of your microphone and audio hardware is reasonably flat and that you just want a flat (unweighted) noise figure. If you want a weighted (e.g. A-weight) noise figure then you'll have to do rather more processing.

According to Merchant et al. (section 3.2 in the appendix: "Measuring acoustic habitats", Methods in Ecology and Evolution, 2015), you can actually calculate absolute, calibrated SPL values using manufacturer specifications by subtracting a correction term S to your relative (scaled to maximum) SPL values:
S = M + G + 20*log10(1/Vadc) + 20*log10(2^Nbit-1)
where M is the sensitivity of the transducer (microphone) re 1 V/Pa. G is the gain applied by the user. Vadc is the zero-to-peak voltage, given by multiplying the rms ADC voltage by a conversion factor of squareroot(2). Nbit is the bit sampling depth.
The last term is necessary if your system scales the amplitude by its maximum.
The correction will be more accurate using end-to-end calibration with sound calibrators.
Note that the formula above is dependent on frequency, but you could apply it over a wider frequency range if your microphone has a flat frequency response.

You can't. dB is a relative unit, IOW it is a unit for comparing two measurements against each other. You can only say that measurement A is x dB louder than measurement B, but in your case you only have one measurement. Therefore, it simply isn't possible to calculate the dB level.

The short answer is: you cannot do sound level measurements with your laptop, nor with your cellphone, etc., for all the reasons outlined previously, plus the fact your cellphone, laptop, etc. use compression algorithms to assure that everything recorded is within the hardware capability. So, if for example you measure a sound then run it through signal processing software such as Head Artemis or LMS Test.Lab, the indicated sound pressure level will always be in the neighborhood of 80 dB(A) regardless of the true level. I can say this from having used cellphone or laptop audio to get an idea of a noise frequency spectrum, while taking level measurements using a calibrated sound level meter. Interestingly, Radio Shack used to sell a microphone intended for speech input while videoconferencing that had very flat frequency response over a broad range, and only cost about $15.

I use a sound level calibrator.
It produces 94 dB or 114dB at 1 KHz
wich is a frecuency where weighting
filters share the same level.
With calibrator at 114dB I adjust mic gain to reach almost full scale
input simply watching a sound card based virtual osciloscope.
Now I know Vref # 114dB.
I developed a simple software based SPL meter
that can be provided if needed. You can use REW too.
You hace to know that PC hardware hardly
reaches 60 dB of dynamic range so calibrating
#114 dB it wont read less than 54dB, wich
is pretty high if you consider that sleeping
is good with less than 35 dB A.
In this case you can calibrate at 94dB
and then you may measure down to 34dB
but again you will hit pc and mic self noise
wich may you prevent to reach such low levels.
Anyway, once calibrated, measures at 114dB
and 94dB should read fine.
Note: the lab standard pistonphone calibrator operates at 250 Hz.

Well! I Used RobertT's Method But It Always Giving Me Oveflow Exception, Then I Used:- int dB = -36 - (value * -1), The Exception Gone, I Don't Know Whether It's Telling dB Values, If You Knew Using Code Given Below, Please Comment Me Whether it's A dB Value or not.
VB.NET:-
Dim dB As Integer = -36 - (9 * -1)
C#:-
int dB = -36 - (9 * -1)

Related

Fast frequency measurement

I need to measure signal frequency while the musicians play music, and it happens to be a bit too fast for FFT (Fast Fourier Transform).
Musicians play music at 90-140 bpm. This means that there are 90-140 groups of notes each minute, up to 8 (more frequently, up to 4) notes in each group (60/140/8 = 0.0536 sec, 60/90/4 = 0.167 sec), that is, notes may change at the rate of 6-19 notes per second.
The music uses a logarithmic scale: the range between, say, 440Hz and 880Hz is divided into 12 notes, only 7 of which are used for melody. (Basically, they use only the white keys on the piano; when they want to shift the starting frequency, they use some of the black keys and don't use some white keys.)
That is, the frequency of each next note is multiplied by 2^(1/12) = 1.05946.
To make things more complicated, the A (La) frequency may vary from 438 to 446 Hz. The string instruments in theory can be tuned, while the wind instruments depend on the air temperature and humidity, so the frequency happens to be re-negotiated by the musicians during the sound check.
Sometimes musicians and vocalists make errors in frequency, they call it "out of tune". They want a device that would inform them of such "out of tune errors". They have tuners, but the tuners require playing the same sound for about 1 sec before they start showing anything. This works for tuning, but does not work while the music is played.
Most likely, the tuner is doing FFT, and due to the formula
df = 1/T
waits for 1 second to get the 1Hz resolution.
For A=440Hz, the difference in frequency between two notes is 440*0.05946 = 26.16 Hz, to get that frequency resolution, one has to use acquisition time of 0.038 sec, that is, at tempo=196bpm FFT is able to just distinguish two notes, at 98 bpm it is able to tell a 50% out-of-tune error provided that it starts acquisition at the very moment that the pitch changes. If we allow the pitch change in the course of an acquisition period, we get 49 bpm, which is just too slow. In addition, it is very desirable to be more precise about the frequency, say, detect a 25% out-of-tune error.
Is there a way to measure frequency better than FFT, that is, with better resolution in less acquisition time? (At least 2 times better, ideally, 8 times better.)
In exchange, I do not need to distinguish between notes of different octaves, e.g. both 440 and 880 may be recognized as A. (Probably, more trade-offs are possible, just nothing else comes to my mind right now.)
UPD
Here's a really good drawing:
UPD2
I have found a PhD thesis and open source software (TARTINI -- the real-time music analysis tool) at:
http://miracle.otago.ac.nz/tartini/
(The pages are also available via the web archive service: http://web.archive.org = http://archive.org = http://waybackmachine.org )
Regarding the FFT, assuming the narrow-band spectral frequency content is sparse and well separated in low enough background noise, frequency peaks can be interpolated or phase vocoded to much higher resolution than the FFT bin spacing (bin spacing as related to the inverse of the length of the segment of actual time-domain data). Parabolic interpolation is common, but there are other more accurate interpolation kernels. Phase vocoder frequency estimation methods require stationarity across 2 overlapped frames, however the total span of those 2 frames can be relatively short.
But the peak spectral frequency reported by an FFT is not the same as a pitch frequency as perceived by a human (as voices and many musical instruments can radiate more acoustic spectral energy in an overtone series than at pitch frequency, sometimes slightly inharmonically.) There are algorithms more suited for pitch estimation than FFTs (alone). A partial list is in this answer: FFT on iPhone to ignore background noise and find lower pitches
Many academic papers on pitch estimation methods for music can be found on the music-ir/MIREX site: http://www.music-ir.org/mirex/wiki/MIREX_HOME

Calculate A-weighted (or B or C) SPL decibel iOS

How can I calculate A-weighted and C-weighted dB sound levels from the microphone on iOS?
Here is what I have tried, but the reading I get is far below the sound level meter I have next to my iPhone:
Using the Novocain library, which I have slightly modified to set the audio session mode to Measurement.
Using the Maximilian audio library to run the incoming audio frames through an FFT and converting the amplitudes into dB.
Using the Maximilian audio libraries's Octave Analyser to place the FFT output into octave bins from 10hz to 20480hz.
For each octave bin I am apply a db gain of the relevant dB-weighing (e.g. apply -70.f db gain to the db value stored in the 10hz bin to get an A-weighted dB gain).
Added the db values of each bin together by reducing each dB bin to an amplitude, and the gain to an an amplitude, making the addition, and converting back to a dB value again.
Is this on the correct track, I have my doubts? Can someone outline an approach? Suggest a library and/or other example (I have looked).
To note – I would like approximate dB(A) and dB(C) values, this does not need to be scientific. Not sure how to compensate for the frequency response of the microphone, could the above technique be correct if it were compensating for the response of the microphone?
I don't think you can measure physical Sound Pressure Levels from a device. In step 2 you "convert the amplitudes into dB." However the amplitude that you record from the device has arbitrary units. When recording 16-bit audio, the audio is represented as numbers in the range -32768 to +32767. If you are working with floating point data then this is normalised by 32768 so that it has a range of (approximately) -1 to +1.
The device's microphone has to cope with a wide variety of sound levels. Generally devices will have some form of Automatic Gain Control which adapts to the current average sound level. This means that if you measure a peak value of 1.0 then you have no way of knowing the actual SPL it corresponded to. You can convert a recording to a series of dBs, but this uses a different definition of dB: as a power ratio. This has no correlation with SPL measurements such as dB(A).
It may be possible to produce an approximate dB(A) measurement if you are able to turn off the AGC and calibrate your device against your sound meter
EDIT: JASA have published a paper with a detailed comparison of existing SPL measurement apps with stats for the comparison of different generations of iPhone: Evaluation of smartphone sound measurement applications

Pitch recognition of musical notes on a smart phone, pt. 2

As a follow-up to my previous question, if I want my smartphone application to detect a certain musical note, and I only need to know whether the incoming sound is that musical note or not, with a certain amount of fuzziness, to allow the note to be off-key by x cents.
Given that, is there a superior method over others for speed and accuracy? That is, by knowing that the note you are looking for is, say, a #C3, how best to tell if that note is present or not? I'm assuming that looking for a single note would be easier than separating out all waveforms, and then looking at the results for the fundamental frequency.
In the responses to my original question, one respondent suggested that autocorrelation might work well if you know that the notes are within a certain range. I wonder if autocorrelation would then work even better, if you only have to check for the presence or absence of a certain note (+/- x cents).
Those methods being:
Kiss FFT
FFTW
Discrete Wavelet Transform
autocorrelation
zero crossing analysis
octave-spaced filters
DWT
Any thoughts would be appreciated.
As you describe it, you just need to determine if a particular pitch is present. A very simple (fast) detector would just record the equivalent of one period of the waveform, then record another period and correlate them, like an oversimplified (single-lag) autocorrelation. If there's a high match, you know the waveform being recorded is repeating at around the same period, or a harmonic of it.
For instance, to detect 1 kHz, record 1 ms of audio (48 samples at 48 kHz), then record another 1 ms, and compare them (correlate = multiply all samples and sum). If they line up (correlation above some threshold), then you're listening to 1 kHz, 2 kHz, 3 kHz, or some other multiple. Doing several periods would give you more confidence on the match.
A true autocorrelation would tell you which harmonic, specifically, if that's important to you.

How can I determine how loud a WAV file will sound?

I have a bunch of different audio recordings in WAV format (all different instruments and pitches), and I want to "normalize" them so that they all sound approximately the same volume when played.
I've tried measuring the average sample magnitude (the sum of all absolute values divided by the number of samples), but normalizing by this measurement doesn't work very well. I think this method isn't working because it doesn't take into account the frequency of the sounds, and I know that higher-frequency recordings sound louder than lower-frequency sounds of the same amplitude.
Does anyone know a good method for measuring the loudness of a sound?
Root Mean Square is often used to estimate the loudness of sound files. This is because a sound that is very loud might not be perceived that way if it is very short. Also remember that power increases exponentially with the square of amplitude.
The audio geeks at Hydrogen Audio know a ton about this stuff...check out their free Replay Gain software. You may not need to do any programming at all.
EDIT: Included comment feedback on power vs. amplitude.
To add to PeterAllenWebb's response:
Before you calculate the RMS, you should "center" your sample first (think of a 5-minute .wav where each sample has the maximum +amplitude). The best way to do that is to use a highpass filter at a subsonic frequency.
That would still not take the frequencies that humans are sensitive to in count. To do that, you could use A-weighting. There's a page where you can calculate it online:
http://www.diracdelta.co.uk/science/source/a/w/aweighting/source.html
The code seems to be here:
http://www.diracdelta.co.uk/science/source/a/w/aweighting/multicalc.js
Well not being an expert on audio and adding to the previous comment, you should figure out what you define as the "shortest amount of time for peak power" and then just convert the wave to raw floating point and use RMS over the stretch of time and continuously take chunks of that length of time, find the MAX and there you have your highest peak power.
To reiterate what some other people have said, use RMS value to estimate the "loudness" of a passage of sound.
But, if you're dealing with impulsive sounds like plucking or drum hits, you'd want to do a sliding RMS value and pick out only the peak RMS value. Measure 100 ms of the sound, slide the window, measure again, etc. and then normalize according to the largest value you find.
Definitely remove any DC value before doing the RMS, and A-weighting will make it more like how we hear. Here's code for A-weighting in MATLAB/Octave and Python.
I might be way off here, but, if you have wavepad you can load in multiple files and mess with the volumes a little bit so they are all the same. Also, if you have certain sections of a file that are louder, you can select that section and lower the volume for that one section.
EDIT: And sorry, it;s not really a "method" for measuring volume, but if you just need to make them all the same this should work fine.

Pitch recognition of musical notes on a smart phone

With limited resources such as slower CPUs, code size and RAM, how best to detect the pitch of a musical note, similar to what an electronic or software tuner would do?
Should I use:
Kiss FFT
FFTW
Discrete Wavelet Transform
autocorrelation
zero crossing analysis
octave-spaced filters
other?
In a nutshell, what I am trying to do is to recognize a single musical note, two octaves below middle-C to two octaves above, played on any (reasonable) instrument. I'd like to be within 20% of the semitone - in other words, if the user plays too flat or too sharp, I need to distinguish that. However, I will not need the accuracy required for tuning.
If you don't need that much accuracy, an FFT could be sufficient. Window the chunk of audio first so that you get well-defined peaks, then find the first significant peak.
Bin width = sampling rate / FFT size:
Fundamentals range from 20 Hz to 7 kHz, so a sampling rate of 14 kHz would be enough. The next "standard" sampling rate is 22050 Hz.
The FFT size is then determined by the precision you want. FFT output is linear in frequency, while musical tones are logarithmic in frequency, so the worst case precision will be at low frequencies. For 20% of a semitone at 20 Hz, you need a width of 1.2 Hz, which means an FFT length of 18545. The next power of two is 215 = 32768. This is 1.5 seconds of data, and takes my laptop's processor 3 ms to calculate.
This won't work with signals that have a "missing fundamental", and finding the "first significant" peak is somewhat difficult (since harmonics are often higher than the fundamental), but you can figure out a way that suits your situation.
Autocorrelation and harmonic product spectrum are better at finding the true fundamental for a wave instead of one of the harmonics, but I don't think they deal as well with inharmonicity, and most instruments like piano or guitar are inharmonic (harmonics are slightly sharp from what they should be). It really depends on your circumstances, though.
Also, you can save even more processor cycles by computing only within a specific frequency band of interest, using the Chirp-Z transform.
I've written up a few different methods in Python for comparison purposes.
If you want to do pitch recognition in realtime (and accurate to within 1/100 of a semi-tone), your only real hope is the zero-crossing approach. And it's a faint hope, sorry to say. Zero-crossing can estimate pitch from just a couple of wavelengths of data, and it can be done with a smartphone's processing power, but it's not especially accurate, as tiny errors in measuring the wavelengths result in large errors in the estimated frequency. Devices like guitar synthesizers (which deduce the pitch from a guitar string with just a couple of wavelengths) work by quantizing the measurements to notes of the scale. This may work for your purposes, but be aware that zero-crossing works great with simple waveforms, but tends to work less and less well with more complex instrument sounds.
In my application (a software synthesizer that runs on smartphones) I use recordings of single instrument notes as the raw material for wavetable synthesis, and in order to produce notes at a particular pitch, I need to know the fundamental pitch of a recording, accurate to within 1/1000 of a semi-tone (I really only need 1/100 accuracy, but I'm OCD about this). The zero-crossing approach is much too inaccurate for this, and FFT-based approaches are either way too inaccurate or way too slow (or both sometimes).
The best approach that I've found in this case is to use autocorrelation. With autocorrelation you basically guess the pitch and then measure the autocorrelation of your sample at that corresponding wavelength. By scanning through the range of plausible pitches (say A = 55 Hz thru A = 880 Hz) by semi-tones, I locate the most-correlated pitch, then do a more finely-grained scan in the neighborhood of that pitch to get a more accurate value.
The approach best for you depends entirely on what you're trying to use this for.
I'm not familiar with all the methods you mention, but what you choose should depend primarily on the nature of your input data. Are you analysing pure tones, or does your input source have multiple notes? Is speech a feature of your input? Are there any limitations on the length of time you have to sample the input? Are you able to trade off some accuracy for speed?
To some extent what you choose also depends on whether you would like to perform your calculations in time or in frequency space. Converting a time series to a frequency representation takes time, but in my experience tends to give better results.
Autocorrelation compares two signals in the time domain. A naive implementation is simple but relatively expensive to compute, as it requires pair-wise differencing between all points in the original and time-shifted signals, followed by differentiation to identify turning points in the autocorrelation function, and then selection of the minimum corresponding to the fundamental frequency. There are alternative methods. For example, Average Magnitude Differencing is a very cheap form of autocorrelation, but accuracy suffers. All autocorrelation techniques run the risk of octave errors, since peaks other than the fundamental exist in the function.
Measuring zero-crossing points is simple and straightforward, but will run into problems if you have multiple waveforms present in the signal.
In frequency-space, techniques based on FFT may be efficient enough for your purposes. One example is the harmonic product spectrum technique, which compares the power spectrum of the signal with downsampled versions at each harmonic, and identifies the pitch by multiplying the spectra together to produce a clear peak.
As ever, there is no substitute for testing and profiling several techniques, to empirically determine what will work best for your problem and constraints.
An answer like this can only scratch the surface of this topic. As well as the earlier links, here are some relevant references for further reading.
Summary of pitch detection algorithms (Wikipedia)
Pros and cons of Autocorrelation vs Harmonic Product Spectrum
A high-level overview of pitch detection methods
In my project danstuner, I took code from Audacity. It essentially took an FFT, then found the peak power by putting a cubic curve on the FFT and finding the peak of that curve. Works pretty well, although I had to guard against octave-jumping.
See Spectrum.cpp.
Zero crossing won't work because a typical sound has harmonics and zero-crossings much more than the base frequency.
Something I experimented with (as a home side project) was this:
Sample the sound with ADC at whatever sample rate you need.
Detect the levels of the short-term positive and negative peaks of the waveform (sliding window or similar). I.e. an envelope detector.
Make a square wave that goes high when the waveform goes within 90% (or so) of the positive envelope, and goes low when the waveform goes within 90% of the negative envelope. I.e. a tracking square wave with hysteresis.
Measure the frequency of that square wave with straight-forward count/time calculations, using as many samples as you need to get the required accuracy.
However I found that with inputs from my electronic keyboard, for some instrument sounds it managed to pick up 2× the base frequency (next octave). This was a side project and I never got around to implementing a solution before moving on to other things. But I thought it had promise as being much less CPU load than FFT.

Resources