How do I attenuate a WAV file by a given decibel value? - audio

If I wanted to reduce a WAV file's amplitude by 25%, I would write something like this:
for (int i = 0; i < data.Length; i++)
{
data[i] *= 0.75;
}
A lot of the articles I read on audio techniques, however, discuss amplitude in terms of decibels. I understand the logarithmic nature of decibel units in principle, but not so much in terms of actual code.
My question is: if I wanted to attenuate the volume of a WAV file by, say, 20 decibels, how would I do this in code like my above example?
Update: formula (based on Nils Pipenbrinck's answer) for attenuating by a given number of decibels (entered as a positive number e.g. 10, 20 etc.):
public void AttenuateAudio(float[] data, int decibels)
{
float gain = (float)Math.Pow(10, (double)-decibels / 20.0);
for (int i = 0; i < data.Length; i++)
{
data[i] *= gain;
}
}
So, if I want to attenuate by 20 decibels, the gain factor is .1.

I think you want to convert from decibel to gain.
The equations for audio are:
decibel to gain:
gain = 10 ^ (attenuation in db / 20)
or in C:
gain = powf(10, attenuation / 20.0f);
The equations to convert from gain to db are:
attenuation_in_db = 20 * log10 (gain)

If you just want to adust some audio, I've had good results with the normalize package from nongnu.org. If you want to study how it's done, the source code is freely available. I've also used wavnorm, whose home page seems to be out at the moment.

One thing to consider: .WAV files have MANY different formats. The code above only works for WAVE_FORMAT_FLOAT. If you're dealing with PCM files, then your samples are going to be 8, 16, 24 or 32 bit integers (8 bit PCM uses unsigned integers from 0..255, 24 bit PCM can be packed or unpacked (packed == 3 byte values packed next to each other, unpacked == 3 byte values in a 4 byte package).
And then there's the issue of alternate encodings - For instance in Win7, all the windows sounds are actually MP3 files in a WAV container.
It's unfortunately not as simple as it sounds :(.

Oops I misunderstood the question… You can see my python implementations of converting from dB to a float (which you can use as a multiplier on the amplitude like you show above) and vice-versa
https://github.com/jiaaro/pydub/blob/master/pydub/utils.py
In a nutshell it's:
10 ^ (db_gain / 10)
so to reduce the volume by 6 dB you would multiply the amplitude of each sample by:
10 ^ (-6 / 10) == 10 ^ (-0.6) == 0.2512

Related

How to detect a basic audio signal into a much bigger one (mpg123 output signal)

I am new to signal processing and I don't really understand the basics (and more). Sorry in advance for any mistake into my understanding so far.
I am writing C code to detect a basic signal (18Hz simple sinusoid 2 sec duration, generating it using Audacity is pretty simple) into a much bigger mp3 file. I read the mp3 file and copy it until I match the sound signal.
The signal to match is { 1st channel: 18Hz sin. signal , 2nd channel: nothing/doesn't matter).
To match the sound, I am calculating the frequency of the mp3 until I find a good percentage of 18Hz freq. during ~ 2 sec. As this frequency is not very common, I don't have to match it very precisely.
I used mpg123 to convert my file, I fill the buffers with what it returns. I initialised it to convert the mp3 to Mono RAW audio:
init:
int ret;
const long *rates;
size_t rate_count, i;
mpg123_rates(&rates, &rate_count);
mpg123_handle *m = mpg123_new(NULL, &ret);
mpg123_format_none(m);
for(i=0; i<rate_count; ++i)
mpg123_format(m, rates[i], MPG123_MONO, MPG123_ENC_SIGNED_32);
if(m == NULL)
{
//err
} else {
mpg123_open_feed(m);
}
(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
`(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
(...) `
But I have to idea how to get the resulting buffer to calculate the FFT to get the frequency.
//FREQ Calculation with libfftw3
int transform_size = MAX_MP3_BUF_SIZE * 2;
fftw_complex *fftout = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_complex *fftin = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_plan p = fftw_plan_dft_r2c_1d(transform_size, fftin, fftout, FFTW_ESTIMATE);
I can get a good RAW Audio (PCM ?) into a buffer (if I write it, it can be read and converted into wave with sox:
sox --magic -r 44100 -e signed -b 32 -c 1 rps.raw rps.wav
Any help is appreciated. My knowledge of signal processing is poor, I am not even sure of what to do with the FFT to get the frequency of the signal. Code is just fyi, it is contained into a much bigger project (for which a simple grep is not an option)
Don't use MP3 for this. There's a good chance your 18 Hz will disappear or at least become distorted. 18 Hz is will below audible. MP3 and other lossy algorithms use a variety of techniques to remove sounds that we're not going to hear.
Assuming PCM, since you only need one frequency band, consider using the Goertzel algorithm. This is more efficient than FFT/DFT for your use case.

Converting 24 bit USB audio stream into 32 bit stream

I'm trying to convert a 24 bit usb audio stream into a 32 bit stream so my microcontroller's peripherals can play happily with the stream (it can only handle 16 or 32 bit data like most mcus...).
The following code is what I got from the mcu's company... didn't work as expected and I ended up getting really distorted audio.
// Function takes usb stream and processes the data for our peripherals
// #data - usb stream data
// #byte_count - size of stream
void process_usb_stream(uint8_t *data, uint16_t byte_count) {
// Etc code that gets buffers ready to read the stream...
// Conversion here!
int32_t *buffer;
int sample_count = 0;
for (int i = 0; i < byte_count; i += 3) {
buffer[sample_count++] = data[i] | data[i+1] << 8 | data[i+2] << 16;
}
// Send buffer to peripherals for them to use...
}
Any help with converting the data from a 24 bit stream to 32 bit stream would be super awesome! This area of work is very hard for me :(
data[...] is a uint8_t. You need to cast that before shifting, because data[...]<<8 and data[...]<<16 are undefined. They'll either be 0 or unchanged, neither of which is what you want.
Also, you need to shift by another 8 bits to get the full range and put the sign bit in the right place.
Also, you're treating the data as if it were in little-endian format. Make sure it is. I'll assume that's correct, so something like this works:
int32_t *buffer;
int sample_count = 0;
for (int i = 0; i+3 <= byte_count; ) {
int32_t v = ((int32_t)data[i++])<<8;
v |= ((int32_t)data[i++])<<16;
v |= ((int32_t)data[i++])<<24;
buffer[sample_count++] = v;
}
Finally, note that this assumes that byte_count is divisible by 3 -- make sure that's true!
this is DSP stuff if, also post this question on http://dsp.stackexchange.com
In DSP the process of changing the bit depth is called scaling
16 bit resolution has 65536 values
24 bit resolution has 16777216
possible values
32 bit has 4294967296 values so the factor is 256
According to https://electronics.stackexchange.com/questions/229268/what-is-name-of-process-used-to-change-sample-bit-depth/229271
reduction from 24 bit to 16 bit is called scaling down and is done by dividing each value by 256.
This can be done by bitwise shifting every bit by 8
y = x >> 8. When scaling down this way the LSB is lost
Scaling up to 32 bit is more complicated and there are several approaches how to do this. It may work by multiplying each bit of the value with a value between 2⁰ and 2⁸.
Push the 24 bit value in a 32 bit register and then left-shifting each bit by a value between 2⁰ and 2⁸:
data32[31] = data32[23] << 8;
data32[22] = data32[14] << 8;
...
data32[0] = data32[0];
and interpolate the bits you do not get with this (linear interpolation)
Maybe there are much better scaling up algortihms ask on http://dsp.stackexchange.com
See also http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html for the scaling up problem...

Panning stereo audio samples

Suppose I've got a 16-bit PCM audio file. I wanna pan all of it completely to the left. How would I do this, purely through byte manipulation? Do I just mix the samples of the right channel with those of the left channel?
I'd also like to ask (since it seems related), how would I go about turning stereo samples into mono samples?
I'm doing this with Haxe, but code in something like C (or just an explanation of the method) should be sufficient. Thanks!
You'll first need to convert the raw bytes into int arrays. Your output for the left channel will be the sum divided by 2.
for (int i = 0 ; i < numFrames ; ++i)
{
*pOutputL++ = (*pInputL++ + *pInputR++) >> 1;
*pOutputR++ = 0;
}

What is openCL equivalent for this cuda "cudaMallocPitch "code.?

My PC has an AMD processor with an ATI 3200 GPU which doesn't support OpenCL. The rest of the codes all running by "Falling back to CPU itself".
I am converting one of the code from CUDA to OpenCL but stuck in some particular part for which there is no exact conversion code in OpenCL. since i have less experience in OpenCL I can't make out this, please suggest me some solution if any of you think will work,
The CUDA code is,
size_t pitch = 0;
cudaError error = cudaMallocPitch((void**)&gpu_data, (size_t*)&pitch,
instances->cols * sizeof(float), instances->rows);
for( int i = 0; i < instances->rows; i++ ){
error = cudaMemcpy((void*)(gpu_data + (pitch/sizeof(float))*i),
(void*)(instances->data + (instances->cols*i)),
instances->cols * sizeof(float) ,cudaMemcpyHostToDevice);
If I remove the pitch value from the above I end up with an problem which doesn't write to the device memory "gpu_data".
Somebody please convert this code to OpenCL and reply. I have converted it to OpenCL, but its not working and the data is not written to "gpu_data". My converted OpenCL code is
gpu_data = clCreateBuffer(context, CL_MEM_READ_WRITE, ((instances->cols)*(instances->rows))*sizeof(float), NULL, &ret);
for( int i = 0; i < instances->rows; i++ ){
ret = clEnqueueWriteBuffer(command_queue, gpu_data, CL_TRUE, 0, ((instances->cols)*(instances->rows))*sizeof(float),(void*)(instances->data + (instances->cols*i)) , 0, NULL, NULL);
Sometimes it runs well for this code and gets stuck in the reading part i.e.
ret = clEnqueueReadBuffer(command_queue, gpu_data, CL_TRUE, 0,sizeof( float ) * instances->cols* 1 , instances->data, 0, NULL, NULL);
overhere. And it gives error like
Unhandled exception at 0x10001098 in CL_kmeans.exe: 0xC000001D: Illegal Instruction.
when break is pressed , it gives:
No symbols are loaded for any call stack frame. The source code cannot be displayed.
while debugging. In the call stack it is displaying:
OCL8CA9.tmp.dll!10001098()
[Frames below may be incorrect and/or missing, no symbols loaded for OCL8CA9.tmp.dll]
amdocl.dll!5c39de16()
I really dont know what it means. someone please help me to rid of this problem.
First of all, in the CUDA code you're doing a horribly inefficient thing to copy the data. The CUDA runtime has the function cudaMemcpy2D that does exactly what you are trying to do by looping over different rows.
What cudaMallocPitch does is to compute an optimal pitch (= distance in byte between rows in a 2D array) such that each new row begins at an address that is optimal for coalescing, and then allocates a memory area as large as pitch times the number of rows you specify. You can emulate the same thing in OpenCL by first computing the optimal pitch and then doing the allocation of the correct size.
The optimal pitch is computed by (1) getting the base address alignment preference for your card (CL_DEVICE_MEM_BASE_ADDR_ALIGN property with clGetDeviceInfo: note that the returned value is in bits, so you have to divide by 8 to get it in bytes); let's call this base (2) find the largest multiple of base that is no less than your natural data pitch (sizeof(type) times number of columns); this will be your pitch.
You then allocate pitch times number of rows bytes, and pass the pitch information to kernels.
Also, when copying data from the host to the device and converesely, you want to use clEnqueue{Read,Write}BufferRect, that are specifically designed to copy 2D data (they are the counterparts to cudaMemcpy2D).

Need to do 64 bit multiplication on a machine with 32 bit longs

I'm working on a small embedded system that has 32 bit long ints. For one calculation I need output the time since 1970 in ms. I can get the time in 32 bit unsigned long seconds since 1970, but how can I represent this as a 64 bit no. of ms if my biggest int is only 32bits? I'm sure stackoverflow will have a cunning answer! I am using Dynamic C, close to standard C. I have some sample code from another system which has a 64 bit long long data type:
long long T = (long long)(SampleTime * 1000.0 + 0.5);
data.TimeLower = (unsigned int)(T & 0xffffffff);
data.TimeUpper = (unsigned short)((T >> 32) & 0xffff);
Since you are only multiplying by 1000 (seconds -> millis), you can do it with two 16 bit mutliplies and one add and a bit of bit fiddling, I have used your putative data type to store the result below:
uint32_t time32 = time();
uint32_t t1 = (time32 & 0xffff) * 1000;
uint32_t t2 = ((time32 >> 16) * 1000) + (t1 >> 16);
data.TimeLower = (uint32_t) ((t2 & 0xffff) << 16) | (t1 & 0xffff);
data.TimeUpper = (uint32_t) (t2 >> 16);
The standard approach, assuming you have a 16x16->32 multiply available, would be to split both numbers into 16-bit high and low parts, compute four partial products, and add the results. If you don't have a 16x16->32 primitive which is faster than a 32x32->32 primitive, though, I'm not sure what the best approach would be. I would think that a 32x32->32 multiply should be more useful than a 16x16->32, but I can't think how one would use it.
Personally, I wish there were a standard primitive to return the top half of a NxN multiply (32x32, certainly; also 16x16 for smaller machines and 64x64 for larger ones).
It might be helpful if you were more specific about what kinds of calculations you need to do. 64-bit multiplication implemented with 32-bit operations is quite slow, and you may have the additional overhead of 64-bit division (to convert back to seconds and milliseconds), which is even slower.
Without knowing more about what exactly you need to do, it seems to me that it would be more efficient to use a struct, containing a 32-bit unsigned int for the number of seconds and a 16-bit int for the number of milliseconds (the "remainder"). (Or use a 32-bit int for the milliseconds if 64-bit alignment is more important than saving a couple of bytes.)

Resources