Why is this decibels calculation /10 instead of /20? - decibel

AudioKit supports normalizing an audio file to a given max level in decibels (dB).
My understanding is that these would be field quantities not power quantities.
Thus, shouldn't this be / 20.0 instead of / 10.0?
let gainFactor = Float( pow(10.0, newMaxLevel / 10.0) / pow(10.0, level / 10.0))
and shouldn't this be 20 * instead of 10 *?
return 10 * log10(maxLev)
Code in question:
https://github.com/AudioKit/AudioKit/blob/9ae8641551bc5f7b4c9b4c887aa327155fac83b5/AudioKit/Common/Internals/Audio%20File/AKAudioFile%2BProcessing.swift#L42
https://github.com/AudioKit/AudioKit/blob/9ae8641551bc5f7b4c9b4c887aa327155fac83b5/AudioKit/Common/Internals/Audio%20File/AKAudioFile.swift#L294
Reference:
http://www.sengpielaudio.com/calculator-db.htm
https://en.wikipedia.org/wiki/Field,_power,_and_root-power_quantities
Thanks!

This is a common misunderstanding among electrical engineering students when learning about power gain and voltage/current gain on a log scale, I'm surprised you've encountered it here.
In my field log scale graphs are used most often to represent the power gain of a system over a given frequency range. In this case a 10 db rise in power corresponds with a 10 fold increase in power. However the same isn't true for voltages and currents; if a signal's voltage or current increases 10 fold then the signal's increase in power will not be 10 fold, it will be sqrt(10) ~= 3.16. This is because power increases proportionally with the square of voltage and current.
Thus when we take the log of an equation for a signal's power over frequency we will find something like this; (for a resistor)
log(P(ω)) = log(V(ω)^2 / R) = 2log(V(ω)) - log(R)
Multiply both sides by 10 to convert Bels to Decibels;
10log(P(ω)) = 20log(V(ω)) - 10log(R)
The relationship between the amplitude and power of a sound wave is similar to the relationship between the amplitude and power of an electrical signal AFAIK. This is why a 20 db increase in amplitude corresponds to a 10 fold increase in amplitude and a 10 db increase in power corresponds to a 10 fold increase in power.
P.S. To answer your question you need to determine if the levels are a unit of power or a unit of amplitude. You know more about the units the function uses than I do as my experience with audio files is highly limited, but my instinct is that an audio file is a representation of amplitude, in which case you would be correct; the conversion factors would need to be 20, not 10.

Related

Are there other than FFT ways to implement Guitar Tuner?

I want to do precise guitar tuner, this is usually done by many via computing FFT and getting peak. But this is of low appliance for several reasons:
Discrete precision, gives insuffient resolution for tuning bass guitar.
High computation time and complexity, when trying to increase buffer size(and/or sampling rate). Introduces visible delay(lag).
Most of frequency range where concentrates all FFT's precision is unused. Everything above 1-2 khz is not appliable for tuning musical instruments.
There should be simplier way for signals that have single-frequency sinusoidal shape. Given small enough buffer (say it 256 samples with 96khz sampling rate) - how could you measure a base(lowese) frequency?
In simple words: How to find frequency F, so that difference of "sine signal of frequency F" and "actually recorded signal" would give minimal error, than for any frequency, other than F ? (so we can definetely conclude that sinusoid of frequency F is best approximation of recorded sound buffer).
PS. Anything, but not using FFT!
Here is a simple approach based on zero crossing. It relies on being able to map the instrument signal to a simple sinuoid. This may work OK when signal to noise ratio is high, but is not a very robust method.
Bandpass filter around the fundamental frequency of the tone you want to tune for. Example 82.41 Hz for low E string on guitar.
Consider a window of the last N samples. Set it to ex 100ms to update the pitch estimate 10 times per second.
Perform zero-crossing detection with a threshold value T. T could be set to 10% of signal peak for example. Count the periods between each zero crossing, collect them in an array.
Take the median of the periods to get your pitch estimate
You can also compute the quantiles of the periods to estimate how reliable the method is. If they give very different numbers from the median, then the method is not working well.
The approach can be extended by computing autocorrelation on the zero-crossings, as described in
https://www.cycfi.com/2018/03/fast-and-efficient-pitch-detection-bitstream-autocorrelation/

Verify transmit power to be within certain limits of its expected value over 95% of test measurements

I have a requirement where I have to verify the transmit power out of a device as measured at its connector is within 2 dB of its expected value over 95% of test measurements.
I am using a signal analyzer to analyze the transmitted power. I only get the average power value, min, max and stdDev of the measurements and not the individual power measurements.
Now, the question is how would I verify the "95% thing" using average power, min, max and stdDev. It seems that I can use normal distribution to find the 95% confidence level.
I would appreciate if someone can help me on this.
Thanks in anticipation
The way I'm reading this, it seems you are a statistical beginner, so if I'm wrong there, the rest of this answer will probably be insultingly basic, and I'm sorry.
Anyway, the idea is that if a dataset is normally distributed, and all the observations are independent of one another, then 95% of the data points will fall within 1.96 standard deviations of the mean.
Do you get identical estimates of average power every time you measure, or are there some slight random differences from reading to reading? My guess is that it's the second. If you were to measure the power a whole bunch of times, and each time you plotted your average power value on a histogram, then that histogram of sample means would have the shape of a bell curve. This bell curve of sample means would have its own mean and standard deviation, and if you have thousands or millions of data points going into the calculation of each average power reading, it's not horrible to assume that it is a normal distribution. The explanation for this phenomenon is known as the 'central limit theorem', and I recommend both the Khan academy's presentation of it as well as the wikipedia page on it.
On the other hand, if your average power is the mean of some small number of data points, like for instance n= 5, or n= 30, then assumption of a normal distribution of sample means can be pretty bad. In this case, your 95% confidence interval around the average power goes from qt(0.975,n-1)*SD/sqrt(n) below the average to qt(0.975,n-1)*SD/sqrt(N) above the average, where qt(0.975,n-1) is the 97.5th percentile of the t distribution with n-1 degrees of freedom, and SD is your measured standard deviation.

"Winamp style" spectrum analyzer

I have a program that plots the spectrum analysis (Amp/Freq) of a signal, which is preety much the DFT converted to polar. However, this is not exactly the sort of graph that, say, winamp (right at the top-left corner), or effectively any other audio software plots. I am not really sure what is this sort of graph called (if it has a distinct name at all), so I am not sure what to look for.
I am preety positive about the frequency axis being base two exponential, the amplitude axis puzzles me though.
Any pointers?
Actually an interesting question. I know what you are saying; the frequency axis is certainly logarithmic. But what about the amplitude? In response to another poster, the amplitude can't simply be in units of dB alone, because dB has no concept of zero. This introduces the idea of quantization error, SNR, and dynamic range.
Assume that the received digitized (i.e., discrete time and discrete amplitude) time-domain signal, x[n], is equal to s[n] + e[n], where s[n] is the transmitted discrete-time signal (i.e., continuous amplitude) and e[n] is the quantization error. Suppose x[n] is represented with b bits, and for simplicity, takes values in [0,1). Then the maximum peak-to-peak amplitude of e[n] is one quantization level, i.e., 2^{-b}.
The dynamic range is the defined to be, in decibels, 20 log10 (max peak-to-peak |s[n]|)/(max peak-to-peak |e[n]|) = 20 log10 1/(2^{-b}) = 20b log10 2 = 6.02b dB. For 16-bit audio, the dynamic range is 96 dB. For 8-bit audio, the dynamic range is 48 dB.
So how might Winamp plot amplitude? My guesses:
The minimum amplitude is assumed to be -6.02b dB, and the maximum amplitude is 0 dB. Visually, Winamp draws the window with these thresholds in mind.
Another nonlinear map, such as log(1+X), is used. This function is always nonnegative, and when X is large, it approximates log(X).
Any other experts out there who know? Let me know what you think. I'm interested, too, exactly how this is implemented.
To generate a power spectrum you need to do the following steps:
apply window function to time domain data (e.g. Hanning window)
compute FFT
calculate log of FFT bin magnitudes for N/2 points of FFT (typically 10 * log10(re * re + im * im))
This gives log magnitude (i.e. dB) versus linear frequency.
If you also want a log frequency scale then you will need to accumulate the magnitude from appropriate ranges of bins (and you will need a fairly large FFT to start with).
Well I'm not 100% sure what you mean but surely its just bucketing the data from an FFT?
If you want to get the data such that you have (for a 44Khz file) frequency points at 22Khz, 11Khz 5.5Khz etc then you could use a wavelet decomposition, i guess ...
This thread may help ya a bit ...
Converting an FFT to a spectogram
Same sort of information as a spectrogram I'd guess ...
What you need is power spectrum graph. You have to compute DFT of your signal's current window. Then square each value.

How can I calculate audio dB level?

I want to calculate room noise level with the computer's microphone. I record noise as an audio file, but how can I calculate the noise dB level?
I don't know how to start!
All the previous answers are correct if you want a technically accurate or scientifically valuable answer. But if you just want a general estimation of comparative loudness, like if you want to check whether the dog is barking or whether a baby is crying and you want to specify the threshold in dB, then it's a relatively simple calculation.
Many wave-file editors have a vertical scale in decibels. There is no calibration or reference measurements, just a simple calculation:
dB = 20 * log10(amplitude)
The amplitude in this case is expressed as a number between 0 and 1, where 1 represents the maximum amplitude in the sound file. For example, if you have a 16 bit sound file, the amplitude can go as high as 32767. So you just divide the sample by 32767. (We work with absolute values, positive numbers only.) So if you have a wave that peaks at 14731, then:
amplitude = 14731 / 32767
= 0.44
dB = 20 * log10(0.44)
= -7.13
But there are very important things to consider, specifically the answers given by the others.
1) As Jörg W Mittag says, dB is a relative measurement. Since we don't have calibrations and references, this measurement is only relative to itself. And by that I mean that you will be able to see that the sound in the sound file at this point is 3 dB louder than at that point, or that this spike is 5 decibels louder than the background. But you cannot know how loud it is in real life, not without the calibrations that the others are referring to.
2) This was also mentioned by PaulR and user545125: Because you're evaluating according to a recorded sound, you are only measuring the sound at the specific location where the microphone is, biased to the direction the microphone is pointing, and filtered by the frequency response of your hardware. A few feet away, a human listening with human ears will get a totally different sound level and different frequencies.
3) Without calibrated hardware, you cannot say that the sound is 60dB or 89dB or whatever. All that this calculation can give you is how the peaks in the sound file compares to other peaks in the same sound file.
If this is all you want, then it's fine, but if you want to do something serious, like determine whether the noise level in a factory is safe for workers, then listen to Paul, user545125 and Jörg.
You do need reference hardware (i.e., a reference mic) to calculate noise level (dB SPL, or sound pressure level). One thing Radio Shack sells is a $50 dB SPL meter. If you're doing scientific calculations, I wouldn't use it. But if the goal is to get a general idea of a weighted measurement (dBA or dBC) of the sound pressure in a given environment, then it might be useful. As a sound engineer, I use mine all the time to see how much sound volume I'm generating while I mix. It's usually accurate to within 2 dB.
That's my answer. The rest is FYI stuff.
Jorg is correct that dB SPL is a relative measurement. All decibel measurements are. But you've implied a reference of 0 dB SPL, or 20 micropascals, scientifically agreed to be the most quiet sound a human ear can detect (though, understandably, what a person can actually hear is very difficult to determine). This, according to Wikipedia, is about the sound of a flying mosquito from about 10 feet away (http://en.wikipedia.org/wiki/Decibel).
By assuming you don't understand decibels, I think Jorg is just trying to out-geek you. He clearly didn't give you a practical answer. :-)
Unweighted measurements (dB, instead of dBA or dBC) are rarely used, because most sound pressure is not detected by the human ear. In a given office environment, there is usually 80-100 dB SPL (sound pressure level). To give you an idea of exactly how much is not heard, in the U.S., occupational regulations limit noise exposure to 80 dBA for a given 8-hour work shift (80 dBA is about the background noise level of your average downtown street - difficult, but not impossible to talk over). 85 dBA is oppressive, and at 90, most people are trying to get away. So the difference between 80 dB and 80 dBA is very significant -- 80 dBA is difficult to talk over, and 80 dB is quite peaceful. :-)
So what is 'A' weighting? 'A' weighting compensates for the fact that we don't perceive lower frequency sounds as well as high frequency sounds (we hear 20 Hz to 20,000 Hz). There's a lot of low-end rumble that our ears/brains pretty much ignore. In addition, we're more sensitive to a certain midrange (1000 Hz to 4000 Hz). Most agree that this frequency range contains the sounds of consonants of speech (vowels happen at a much lower frequency). Imagine talking with just vowels. You can't understand anything. Thus, the ability of a human to be able to communicate (conventionally) rests in the 1kHz-5kHz bump in hearing sensitivity. Interestingly, this is why most telephone systems only transmit 300 Hz to 3000 Hz. It was determined that this was the minimal response needed to understand the voice on the other end.
But I think that's more than you wanted to know. Hope it helps. :-)
You can't easily measure absolute dB SPL, since your microphone and analogue hardware are not calibrated. You may be able to do an approximate calibration for a particular hardware set up but you would need to repeat this for every different microphone and hardware set up that you plan to support.
If you do have some kind of SPL reference source that you can use then then it gets easier:
use your reference source to generate a tone at a known dB SPL - measure this
measure the ambient noise
calculate noise level = 20 * log10 (V_noise / V_ref) + dB_ref
Of course this assumes that the frequency response of your microphone and audio hardware is reasonably flat and that you just want a flat (unweighted) noise figure. If you want a weighted (e.g. A-weight) noise figure then you'll have to do rather more processing.
According to Merchant et al. (section 3.2 in the appendix: "Measuring acoustic habitats", Methods in Ecology and Evolution, 2015), you can actually calculate absolute, calibrated SPL values using manufacturer specifications by subtracting a correction term S to your relative (scaled to maximum) SPL values:
S = M + G + 20*log10(1/Vadc) + 20*log10(2^Nbit-1)
where M is the sensitivity of the transducer (microphone) re 1 V/Pa. G is the gain applied by the user. Vadc is the zero-to-peak voltage, given by multiplying the rms ADC voltage by a conversion factor of squareroot(2). Nbit is the bit sampling depth.
The last term is necessary if your system scales the amplitude by its maximum.
The correction will be more accurate using end-to-end calibration with sound calibrators.
Note that the formula above is dependent on frequency, but you could apply it over a wider frequency range if your microphone has a flat frequency response.
You can't. dB is a relative unit, IOW it is a unit for comparing two measurements against each other. You can only say that measurement A is x dB louder than measurement B, but in your case you only have one measurement. Therefore, it simply isn't possible to calculate the dB level.
The short answer is: you cannot do sound level measurements with your laptop, nor with your cellphone, etc., for all the reasons outlined previously, plus the fact your cellphone, laptop, etc. use compression algorithms to assure that everything recorded is within the hardware capability. So, if for example you measure a sound then run it through signal processing software such as Head Artemis or LMS Test.Lab, the indicated sound pressure level will always be in the neighborhood of 80 dB(A) regardless of the true level. I can say this from having used cellphone or laptop audio to get an idea of a noise frequency spectrum, while taking level measurements using a calibrated sound level meter. Interestingly, Radio Shack used to sell a microphone intended for speech input while videoconferencing that had very flat frequency response over a broad range, and only cost about $15.
I use a sound level calibrator.
It produces 94 dB or 114dB at 1 KHz
wich is a frecuency where weighting
filters share the same level.
With calibrator at 114dB I adjust mic gain to reach almost full scale
input simply watching a sound card based virtual osciloscope.
Now I know Vref # 114dB.
I developed a simple software based SPL meter
that can be provided if needed. You can use REW too.
You hace to know that PC hardware hardly
reaches 60 dB of dynamic range so calibrating
#114 dB it wont read less than 54dB, wich
is pretty high if you consider that sleeping
is good with less than 35 dB A.
In this case you can calibrate at 94dB
and then you may measure down to 34dB
but again you will hit pc and mic self noise
wich may you prevent to reach such low levels.
Anyway, once calibrated, measures at 114dB
and 94dB should read fine.
Note: the lab standard pistonphone calibrator operates at 250 Hz.
Well! I Used RobertT's Method But It Always Giving Me Oveflow Exception, Then I Used:- int dB = -36 - (value * -1), The Exception Gone, I Don't Know Whether It's Telling dB Values, If You Knew Using Code Given Below, Please Comment Me Whether it's A dB Value or not.
VB.NET:-
Dim dB As Integer = -36 - (9 * -1)
C#:-
int dB = -36 - (9 * -1)

Is there an FFT that uses a logarithmic division of frequency?

Wikipedia's Wavelet article contains this text:
The discrete wavelet transform is also less computationally complex, taking O(N) time as compared to O(N log N) for the fast Fourier transform. This computational advantage is not inherent to the transform, but reflects the choice of a logarithmic division of frequency, in contrast to the equally spaced frequency divisions of the FFT.
Does this imply that there's also an FFT-like algorithm that uses a logarithmic division of frequency instead of linear? Is it also O(N)? This would obviously be preferable for a lot of applications.
Yes. Yes. No.
It is called the Logarithmic Fourier Transform. It has O(n) time. However it is useful for functions which decay slowly with increasing domain/abscissa.
Referring back the wikipedia article:
The main difference is that wavelets
are localized in both time and
frequency whereas the standard Fourier
transform is only localized in
frequency.
So if you can be localized only in time (or space, pick your interpretation of the abscissa) then Wavelets (or discrete cosine transform) are a reasonable approach. But if you need to go on and on and on, then you need the fourier transform.
Read more about LFT at http://homepages.dias.ie/~ajones/publications/28.pdf
Here is the abstract:
We present an exact and analytical expression for the Fourier transform of a function that has been sampled logarithmically. The procedure is significantly more efficient computationally than the fast Fourier transformation (FFT) for transforming functions or measured responses which decay slowly with increasing abscissa value. We illustrate the proposed method with an example from electromagnetic geophysics, where the scaling is often such that our logarithmic Fourier transform (LFT) should be applied. For the example chosen, we are able to obtain results that agree with those from an FFT to within 0.5 per cent in a time that is a factor of 1.0e2 shorter. Potential applications of our LFT in geophysics include conversion of wide-band electromagnetic frequency responses to transient responses, glacial loading and unloading,
aquifer recharge problems, normal mode and earth tide studies in seismology, and impulsive shock wave modelling.
EDIT: After reading up on this I think this algorithm is not really useful for this question, I will give a description anyway for other readers.
There is also the Filon's algorithm a method based on Filon's qudrature which can be found in Numerical Recipes this [PhD thesis][1].
The timescale is log spaced as is the resulting frequeny scale.
This algorithm is used for data/functions which decayed to 0 in the observed time interval (which is probably not your case), a typical simple example would be an exponential decay.
If your data is noted by points (x_0,y_0),(x_1,y_1)...(x_i,y_i) and you want to calculate the spectrum A(f) where f is the frequency from lets say f_min=1/x_max to f_max=1/x_min
log spaced.
The real part for each frequency f is then calculated by:
A(f) = sum from i=0...i-1 { (y_i+1 - y_i)/(x_i+1 - x_i) * [ cos(2*pi*f * t_i+1) - cos(2*pi*f*t_i) ]/((2*pi*f)^2) }
The imaginary part is:
A(f) = y_0/(2*pi*f) + sum from i=0...i-1 { (y_i+1 - y_i)/(x_i+1 - x_i) * [ sin(2*pi*f * t_i+1) - sin(2*pi*f*t_i) ]/((2*pi*f)^2) }
[1] Blochowicz, Thomas: Broadband Dielectric Spectroscopy in Neat and Binary Molecular Glass Formers. University of Bayreuth, 2003, Chapter 3.2.3
To do what you want, you need to measure different time Windows, which means lower frequencies get update least often (inversely proportional to powers of 2).
Check FPPO here:
https://www.rationalacoustics.com/files/FFT_Fundamentals.pdf
This means that higher frequencies will update more often, but you always average (moving average is good), but can also let it move faster. Of course, if plan on using the inverse FFT, you don't want any of this. Also, to have better accuracy (smaller bandwidth) at lower frequencies, means these need to update much more slowly, like 16k Windows (1/3 m/s).
Yeah, a low frequency signal naturally travels slowly, and thus of course, you need a lot of time to detect them. This is not a problem that math can fix. It's a natural trade of, and you can't have high accuracy a lower frequency and fast response.
I think the link I provide will clarify some of your options...7 years after you asked the question, unfortunately.

Resources