Reducing sample bit-depth by truncating - audio

I have to reduce the bit-depth of a digital audio signal from 24 to 16 bit.
Taking only the 16 most significant bits (i.e. truncating) of each sample is equivalent to doing a proportional calculation (out = in * 0xFFFF / 0xFFFFFF)?

You'll get better sounding results by adding a carefully crafted noise signal to the original signal, just below the truncating threshold, before truncating (a.k.a. dithering).

x * 0xffff / 0xffffff is overly of pedantic, but not in a good way if your samples are signed -- and probably not in a good way in general.
Yes, you want the maximum value in your source range to match the maximum value in your destination range, but the values used there are only for unsigned ranges, and the distribution of quantisation steps means that it'll be very rare that you use the largest possible output value.
If the samples are signed then the peak positive values would be 0x7fff and 0x7fffff, while the peak negative values would be -0x8000 and -0x800000. Your first problem is deciding whether +1 is equal to 0x7fff, or -1 is equal to -0x8000. If you choose the latter then it's a simple shift operation. If you try to have both then zero stops being zero.
After that you have a problem that division rounds towards zero. This means that too many values get rounded to zero compared with other values. This causes distortion.
If you want to scale according to the peak positive values, the correct form would be:
out = rint((float)in * 0x7fff / 0x7fffff);
If you fish around a bit you can probably find an efficient way to do that with integer arithmetic and no division.
This form should correctly round to the nearest available output value for any given input, and it should map the largest possible input value to the largest possible output value, but it's going to have an ugly distribution of quantisation steps scattered throughout the range.
Most people prefer:
out = (in + 128) >> 8;
if (out > 0x7fff) out = 0x7fff;
This form makes things the tiniest bit louder, to the point that positive values may clip slightly, but the quantisation steps are distributed evenly.
You add 128 because right-shift rounds towards negative infinity. The average quantisation error is -128 and you add 128 to correct this to keep 0 at precisely 0. The test for overflow is necessary because an input value of 0x7fffff would otherwise give a result of 0x8000, and when you store this in a 16-bit word it would wrap around giving a peak negative value.
C pedants can poke holes in the assumptions about right-shift and division behaviour, but I'm overlooking those for clarity.
However, as others have pointed out you generally shouldn't reduce the bit depth of audio without dithering, and ideally noise shaping. TPDF dither is as follows:
out = (in + (rand() & 255) - (rand() & 255)) >> 8;
if (out < -0x8000) out = -0x8000;
if (out > 0x7fff) out = 0x7fff;
Again, big issues with the usage of rand() which I'm going to overlook for clarity.

I assume you mean (in * 0xFFFF) / 0xFFFFFF, in which case, yes.

Dithering by adding noise will in general give you better results. The key to this is the shape of the noise. The popular pow-r dithering algorithms have a specific shape that is very popular in a lot of digital audio workstation applications (Cakewalk's SONAR, Logic, etc).
If you don't need the full on fidelity of pow-r, you can simply generate some noise at fairly low amplitude and mix it into your signal. You'll find this masks some of the quantization effects.

Related

estimate standard error from sample means

Random sample of 143 girl and 127 boys were selected from a large population.A measurement was taken of the haemoglobin level(measured in g/dl) of each child with the following result.
girl n=143 mean = 11.35 sd = 1.41
boys n=127 mean 11.01 sd =1.32
estimate the standard error of the difference between the sample means
In essence, we'd pool the standard errors by adding them. This implies that we´re answering the question: what is the vairation of the sampling distribution considering both samples?
SD = sqrt( (sd₁**2 / n₁) + (sd₂**2 / n₂) \
SD = sqrt( (1.41**2 / 143) + (1.32**2 / 127) ≈ 0.1662
Notice that the standrad deviation squared is simply the variance of each sample. As you can see, in our case the value is quite small, which indicates that the difference between sampled means doesn´t need to be that large for there to be a larger than expected difference between obervations.
We´d calculate the difference between means as 0.34 (or -0.34 depending on the nature of the question) and divide this difference by the standrad error to get a t-value. In our case 2.046 (or -2.046) indicates that the observed difference is 2.046 times larger than the average difference we would expect given the variation the variation that we measured AND the size of our sample.
However, we need to verify whether this observation is statistically significant by determining the t-critical value. This t-critical can be easily calculated by using a t-value chart: one needs to know the alpha (typically 0.05 unless otherwise stated), one needs to know the original alternative hypothesis (if it was something along the lines of there is a difference between genders then we would apply a two tailed distribution - if it was something along the lines of gender X has a hameglobin level larger/smaller than gender X then we would use a single tailed distribution).
If the t-value > t-critical then we would claim that the difference between means is statistically significant, thereby having sufficient evident to reject the null hypothesis. Alternatively, if t-value < t-critical, we would not have statistically significant evidence against the null hypothesis, thus we would fail to reject the null hypothesis.

How to calculate minimum cycles of sine wave to have integer N cycles in a buffer?

I found a similar question posted but not with a suitable answer.
I want to have sine wave buffers, doing audio synthesis, and wanting to figure an optimal size for buffers to avoid artifacts where the sine wave's period measured in samples is a non-integer.
For instance, a sine at A4, 440.0 Hz, with sample rate of 44100, repeats every 1/440 * 44100 samples, or every 100.22727272727272 samples. If you simply create a circular buffer of 100 (or 101) samples to contain a single cycle of the sine, playback has noticeable artifacts do to the disjoint.
Based on the response to the previous poster's version, the proposed solution was to express the period as a whole fraction and find the greatest common divisor to find the number of cycles and samples needed. However in practice I keep finding ratios where the greatest common divisor calculates as 1, which intuitively seems unlikely to me.
Using the above example, in Python:
freq = 440.0
rate = 44100
period = 1.0 / freq * rate
ratio = period.as_integer_ratio()
from fractions import gcd
divisor = gcd(*ratio)
Divisor equals 1, apparently meaning that there is no common divisor, meaning that it requires infinite cycles of the frequency to ever find an integer number of samples to perfectly store the sine.
Am I doing this wrong? Is there another solution?
NOTE in practice, I've found that setting the buffer size somewhere around 1000x the actual period of the sine eliminates artifacts to the naked ear, but I would like to lower my memory usage and so would like to mathematically determine smaller allowable buffer sizes based on arbitrary frequencies.
What gcd(*ratio) returning 1 tells you is that the fraction you obtained with period.as_integer_ratio() has already been simplified, not that "it requires infinite cycles of the frequency to ever find an integer number of samples to perfectly store the sine".
However that simplification might not be so great given round-off errors when computing period = 1.0 / freq * rate. You would be better off using:
divisor = fractions.gcd(rate, freq)
ratio = (rate/divisor, freq/divisor)
At that point since ratio is a simplified fraction (in your specific case (2205, 22)), ratio[1] cycles would fit in an integer number of samples (more specifically ratio[0]*ratio[1] or in your specific case 48510 samples).

"Winamp style" spectrum analyzer

I have a program that plots the spectrum analysis (Amp/Freq) of a signal, which is preety much the DFT converted to polar. However, this is not exactly the sort of graph that, say, winamp (right at the top-left corner), or effectively any other audio software plots. I am not really sure what is this sort of graph called (if it has a distinct name at all), so I am not sure what to look for.
I am preety positive about the frequency axis being base two exponential, the amplitude axis puzzles me though.
Any pointers?
Actually an interesting question. I know what you are saying; the frequency axis is certainly logarithmic. But what about the amplitude? In response to another poster, the amplitude can't simply be in units of dB alone, because dB has no concept of zero. This introduces the idea of quantization error, SNR, and dynamic range.
Assume that the received digitized (i.e., discrete time and discrete amplitude) time-domain signal, x[n], is equal to s[n] + e[n], where s[n] is the transmitted discrete-time signal (i.e., continuous amplitude) and e[n] is the quantization error. Suppose x[n] is represented with b bits, and for simplicity, takes values in [0,1). Then the maximum peak-to-peak amplitude of e[n] is one quantization level, i.e., 2^{-b}.
The dynamic range is the defined to be, in decibels, 20 log10 (max peak-to-peak |s[n]|)/(max peak-to-peak |e[n]|) = 20 log10 1/(2^{-b}) = 20b log10 2 = 6.02b dB. For 16-bit audio, the dynamic range is 96 dB. For 8-bit audio, the dynamic range is 48 dB.
So how might Winamp plot amplitude? My guesses:
The minimum amplitude is assumed to be -6.02b dB, and the maximum amplitude is 0 dB. Visually, Winamp draws the window with these thresholds in mind.
Another nonlinear map, such as log(1+X), is used. This function is always nonnegative, and when X is large, it approximates log(X).
Any other experts out there who know? Let me know what you think. I'm interested, too, exactly how this is implemented.
To generate a power spectrum you need to do the following steps:
apply window function to time domain data (e.g. Hanning window)
compute FFT
calculate log of FFT bin magnitudes for N/2 points of FFT (typically 10 * log10(re * re + im * im))
This gives log magnitude (i.e. dB) versus linear frequency.
If you also want a log frequency scale then you will need to accumulate the magnitude from appropriate ranges of bins (and you will need a fairly large FFT to start with).
Well I'm not 100% sure what you mean but surely its just bucketing the data from an FFT?
If you want to get the data such that you have (for a 44Khz file) frequency points at 22Khz, 11Khz 5.5Khz etc then you could use a wavelet decomposition, i guess ...
This thread may help ya a bit ...
Converting an FFT to a spectogram
Same sort of information as a spectrogram I'd guess ...
What you need is power spectrum graph. You have to compute DFT of your signal's current window. Then square each value.

Non-Uniform Random Number Generator Implementation?

I need a random number generator that picks numbers over a specified range with a programmable mean.
For example, I need to pick numbers between 2 and 14 and I need the average of the random numbers to be 5.
I use random number generators a lot. Usually I just need a uniform distribution.
I don't even know what to call this type of distribution.
Thank you for any assistance or insight you can provide.
You might be able to use a binomial distribution, if you're happy with the shape of that distribution. Set n=12 and p=0.25. This will give you a value between 0 and 12 with a mean of 3. Just add 2 to each result to get the range and mean you are looking for.
Edit: As for implementation, you can probably find a library for your chosen language that supports non-uniform distributions (I've written one myself for Java).
A binomial distribution can be approximated fairly easily using a uniform RNG. Simply perform n trials and record the number of successes. So if you have n=10 and p=0.5, it's just like flipping a coin 10 times in a row and counting the number of heads. For p=0.25 just generate uniformly-distributed values between 0 and 3 and only count zeros as successes.
If you want a more efficient implementation, there is a clever algorithm hidden away in the exercises of volume 2 of Knuth's The Art of Computer Programming.
You haven't said what distribution you are after. Regarding your specific example, a function which produced a uniform distribution between 2 and 8 would satisfy your requirements, strictly as you have written them :)
If you want a non-uniform distribution of the random number, then you might have to implement some sort of mapping, e.g:
// returns a number between 0..5 with a custom distribution
int MyCustomDistribution()
{
int r = rand(100); // random number between 0..100
if (r < 10) return 1;
if (r < 30) return 2;
if (r < 42) return 3;
...
}
Based on the Wikipedia sub-article about non-uniform generators, it would seem you want to apply the output of a uniform pseudorandom number generator to an area distribution that meets the desired mean.
You can create a non-uniform PRNG from a uniform one. This makes sense, as you can imagine taking a uniform PRNG that returns 0,1,2 and create a new, non-uniform PRNG by returning 0 for values 0,1 and 1 for the value 2.
There is more to it if you want specific characteristics on the distribution of your new, non-uniform PRNG. This is covered on the Wikipedia page on PRNGs, and the Ziggurat algorithm is specifically mentioned.
With those clues you should be able to search up some code.
My first idea would be:
generate numbers in the range 0..1
scale to the range -9..9 ( x-0.5; x*18)
shift range by 5 -> -4 .. 14 (add 5)
truncate the range to 2..14 (discard numbers < 2)
that should give you numbers in the range you want.
You need a distributed / weighted random number generator. Here's a reference to get you started.
Assign all numbers equal probabilities,
while currentAverage not equal to intendedAverage (whithin possible margin)
pickedNumber = pick one of the possible numbers (at random, uniform probability, if you pick intendedAverage pick again)
if (pickedNumber is greater than intendedAverage and currentAverage<intendedAverage) or (pickedNumber is less than intendedAverage and currentAverage>intendedAverage)
increase pickedNumber's probability by delta at the expense of all others, conserving sum=100%
else
decrease pickedNumber's probability by delta to the benefit of all others, conserving sum=100%
end if
delta=0.98*delta (the rate of decrease of delta should probably be experimented with)
end while

8 bit audio samples to 16 bit

This is my "weekend" hobby problem.
I have some well-loved single-cycle waveforms from the ROMs of a classic synthesizer.
These are 8-bit samples (256 possible values).
Because they are only 8 bits, the noise floor is pretty high. This is due to quantization error. Quantization error is pretty weird. It messes up all frequencies a bit.
I'd like to take these cycles and make "clean" 16-bit versions of them. (Yes, I know people love the dirty versions, so I'll let the user interpolate between dirty and clean to whatever degree they like.)
It sounds impossible, right, because I've lost the low 8 bits forever, right? But this has been in the back of my head for a while, and I'm pretty sure I can do it.
Remember that these are single-cycle waveforms that just get repeated over and over for playback, so this is a special case. (Of course, the synth does all kinds of things to make the sound interesting, including envelopes, modulations, filters cross-fading, etc.)
For each individual byte sample, what I really know is that it's one of 256 values in the 16-bit version. (Imagine the reverse process, where the 16-bit value is truncated or rounded to 8 bits.)
My evaluation function is trying to get the minimum noise floor. I should be able to judge that with one or more FFTs.
Exhaustive testing would probably take forever, so I could take a lower-resolution first pass. Or do I just randomly push randomly chosen values around (within the known values that would keep the same 8-bit version) and do the evaluation and keep the cleaner version? Or is there something faster I can do? Am I in danger of falling into local minimums when there might be some better minimums elsewhere in the search space? I've had that happen in other similar situations.
Are there any initial guesses I can make, maybe by looking at neighboring values?
Edit: Several people have pointed out that the problem is easier if I remove the requirement that the new waveform would sample to the original. That's true. In fact, if I'm just looking for cleaner sounds, the solution is trivial.
You could put your existing 8-bit sample into the high-order byte of your new 16-bit sample, and then use the low order byte to linear interpolate some new 16 bit datapoints between each original 8-bit sample.
This would essentially connect a 16 bit straight line between each of your original 8-bit samples, using several new samples. It would sound much quieter than what you have now, which is a sudden, 8-bit jump between the two original samples.
You could also try apply some low-pass filtering.
Going with the approach in your question, I would suggest looking into hill-climbing algorithms and the like.
http://en.wikipedia.org/wiki/Hill_climbing
has more information on it and the sidebox has links to other algorithms which may be more suitable.
AI is like alchemy - we never reached the final goal, but lots of good stuff came out along the way.
Well, I would expect some FIR filtering (IIR if you really need processing cycles, but FIR can give better results without instability) to clean up the noise. You would have to play with it to get the effect you want but the basic problem is smoothing out the sharp edges in the audio created by sampling it at 8 bit resolutions. I would give a wide birth to the center frequency of the audio and do a low pass filter, and then listen to make sure I didn't make it sound "flat" with the filter I picked.
It's tough though, there is only so much you can do, the lower 8 bits is lost, the best you can do is approximate it.
It's almost impossible to get rid of noise that looks like your signal. If you start tweeking stuff in your frequency band it will take out the signal of interest.
For upsampling, since you're already using an FFT, you can add zeros to the end of the frequency domain signal and do an inverse FFT. This completely preserves the frequecy and phase information of the original signal, although it spreads the same energy over more samples. If you shift it 8bits to be a 16bit samples first, this won't be a too much of a problem. But I usually kick it up by an integer gain factor before doing the transform.
Pete
Edit:
The comments are getting a little long so I'll move some to the answer.
The peaks in the FFT output are harmonic spikes caused by the quantitization. I tend to think of them differently than the noise floor. You can dither as someone mentioned and eliminate the amplitude of the harmonic spikes and flatten out the noise floor, but you loose over all signal to noise on the flat part of your noise floor. As far as the FFT is concerned. When you interpolate using that method, it retains the same energy and spreads over more samples, this reduces the amplitude. So before doing the inverse, give your signal more energy by multipling by a gain factor.
Are the signals simple/complex sinusoids, or do they have hard edges? i.e. Triangle, square waves, etc. I'm assuming they have continuity from cycle to cycle, is that valid? If so you can also increase your FFT resolution to more precisely pinpoint frequencies by increasing the number of waveform cycles fed to your FFT. If you can precisely identify the frequencies use, assuming they are somewhat discrete, you may be able to completely recreate the intended signal.
The 16-bit to 8-bit via truncation requirement will produce results that do not match the original source. (Thus making finding an optimal answer more difficult.) Typically you would produce a fixed point waveform by attempting to "get the closest match" that means rounding to the nearest number (trunking is a floor operation). That is most likely how they were originally generated. Adding 0.5 (in this case 0.5 is 128) and then trunking the output would allow you to generate more accurate results. If that's not a worry then ok, but it definitely will have a negative effect on accuracy.
UPDATED:
Why? Because the goal of sampling a signal is to be able to as close a possible reproduce the signal. If conversion threshold is set poorly on the sampling all you're error is to one side of signal and not well distributed and centered about zero. On such systems you typically try to maximize the use the availiable dynamic range, particularly if you have low resolution such as an 8-bit ADC.
Band limited versions? If they are filtered at different frequencies, I'd suspect it was to allow you to play the same sound with out distortions when you went too far out from the other variation. Kinda like mipmapping in graphics.
I suspect the two are the same signal with different aliasing filters applied, this may be useful in reproducing the original. They should be the same base signal with different convolutions applied.
There might be a simple approach taking advantange of the periodicity of the waveforms. How about if you:
Make a 16-bit waveform where the high bytes are the waveform and the low bytes are zero - call it x[n].
Calculate the discrete Fourier transform of x[n] = X[w].
Make a signal Y[w] = (dBMag(X[w]) > Threshold) ? X[w] : 0, where dBMag(k) = 10*log10(real(k)^2 + imag(k)^2), and Threshold is maybe 40 dB, based on 8 bits being roughly 48 dB dynamic range, and allowing ~1.5 bits of noise.
Inverse transform Y[w] to get y[n], your new 16 bit waveform.
If y[n] doesn't sound nice, dither it with some very low level noise.
Notes:
A. This technique only works in the original waveforms are exactly periodic!
B. Step 5 might be replaced with setting the "0" values to random noise in Y[w] in step 3, you'd have to experiment a bit to see what works better.
This seems easier (to me at least) than an optimization approach. But truncated y[n] will probably not be equal to your original waveforms. I'm not sure how important that constraint is. I feel like this approach will generate waveforms that sound good.

Resources