Panning stereo audio samples - audio

Suppose I've got a 16-bit PCM audio file. I wanna pan all of it completely to the left. How would I do this, purely through byte manipulation? Do I just mix the samples of the right channel with those of the left channel?
I'd also like to ask (since it seems related), how would I go about turning stereo samples into mono samples?
I'm doing this with Haxe, but code in something like C (or just an explanation of the method) should be sufficient. Thanks!

You'll first need to convert the raw bytes into int arrays. Your output for the left channel will be the sum divided by 2.
for (int i = 0 ; i < numFrames ; ++i)
{
*pOutputL++ = (*pInputL++ + *pInputR++) >> 1;
*pOutputR++ = 0;
}

Related

How to detect a basic audio signal into a much bigger one (mpg123 output signal)

I am new to signal processing and I don't really understand the basics (and more). Sorry in advance for any mistake into my understanding so far.
I am writing C code to detect a basic signal (18Hz simple sinusoid 2 sec duration, generating it using Audacity is pretty simple) into a much bigger mp3 file. I read the mp3 file and copy it until I match the sound signal.
The signal to match is { 1st channel: 18Hz sin. signal , 2nd channel: nothing/doesn't matter).
To match the sound, I am calculating the frequency of the mp3 until I find a good percentage of 18Hz freq. during ~ 2 sec. As this frequency is not very common, I don't have to match it very precisely.
I used mpg123 to convert my file, I fill the buffers with what it returns. I initialised it to convert the mp3 to Mono RAW audio:
init:
int ret;
const long *rates;
size_t rate_count, i;
mpg123_rates(&rates, &rate_count);
mpg123_handle *m = mpg123_new(NULL, &ret);
mpg123_format_none(m);
for(i=0; i<rate_count; ++i)
mpg123_format(m, rates[i], MPG123_MONO, MPG123_ENC_SIGNED_32);
if(m == NULL)
{
//err
} else {
mpg123_open_feed(m);
}
(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
`(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
(...) `
But I have to idea how to get the resulting buffer to calculate the FFT to get the frequency.
//FREQ Calculation with libfftw3
int transform_size = MAX_MP3_BUF_SIZE * 2;
fftw_complex *fftout = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_complex *fftin = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_plan p = fftw_plan_dft_r2c_1d(transform_size, fftin, fftout, FFTW_ESTIMATE);
I can get a good RAW Audio (PCM ?) into a buffer (if I write it, it can be read and converted into wave with sox:
sox --magic -r 44100 -e signed -b 32 -c 1 rps.raw rps.wav
Any help is appreciated. My knowledge of signal processing is poor, I am not even sure of what to do with the FFT to get the frequency of the signal. Code is just fyi, it is contained into a much bigger project (for which a simple grep is not an option)
Don't use MP3 for this. There's a good chance your 18 Hz will disappear or at least become distorted. 18 Hz is will below audible. MP3 and other lossy algorithms use a variety of techniques to remove sounds that we're not going to hear.
Assuming PCM, since you only need one frequency band, consider using the Goertzel algorithm. This is more efficient than FFT/DFT for your use case.

Converting 24 bit USB audio stream into 32 bit stream

I'm trying to convert a 24 bit usb audio stream into a 32 bit stream so my microcontroller's peripherals can play happily with the stream (it can only handle 16 or 32 bit data like most mcus...).
The following code is what I got from the mcu's company... didn't work as expected and I ended up getting really distorted audio.
// Function takes usb stream and processes the data for our peripherals
// #data - usb stream data
// #byte_count - size of stream
void process_usb_stream(uint8_t *data, uint16_t byte_count) {
// Etc code that gets buffers ready to read the stream...
// Conversion here!
int32_t *buffer;
int sample_count = 0;
for (int i = 0; i < byte_count; i += 3) {
buffer[sample_count++] = data[i] | data[i+1] << 8 | data[i+2] << 16;
}
// Send buffer to peripherals for them to use...
}
Any help with converting the data from a 24 bit stream to 32 bit stream would be super awesome! This area of work is very hard for me :(
data[...] is a uint8_t. You need to cast that before shifting, because data[...]<<8 and data[...]<<16 are undefined. They'll either be 0 or unchanged, neither of which is what you want.
Also, you need to shift by another 8 bits to get the full range and put the sign bit in the right place.
Also, you're treating the data as if it were in little-endian format. Make sure it is. I'll assume that's correct, so something like this works:
int32_t *buffer;
int sample_count = 0;
for (int i = 0; i+3 <= byte_count; ) {
int32_t v = ((int32_t)data[i++])<<8;
v |= ((int32_t)data[i++])<<16;
v |= ((int32_t)data[i++])<<24;
buffer[sample_count++] = v;
}
Finally, note that this assumes that byte_count is divisible by 3 -- make sure that's true!
this is DSP stuff if, also post this question on http://dsp.stackexchange.com
In DSP the process of changing the bit depth is called scaling
16 bit resolution has 65536 values
24 bit resolution has 16777216
possible values
32 bit has 4294967296 values so the factor is 256
According to https://electronics.stackexchange.com/questions/229268/what-is-name-of-process-used-to-change-sample-bit-depth/229271
reduction from 24 bit to 16 bit is called scaling down and is done by dividing each value by 256.
This can be done by bitwise shifting every bit by 8
y = x >> 8. When scaling down this way the LSB is lost
Scaling up to 32 bit is more complicated and there are several approaches how to do this. It may work by multiplying each bit of the value with a value between 2⁰ and 2⁸.
Push the 24 bit value in a 32 bit register and then left-shifting each bit by a value between 2⁰ and 2⁸:
data32[31] = data32[23] << 8;
data32[22] = data32[14] << 8;
...
data32[0] = data32[0];
and interpolate the bits you do not get with this (linear interpolation)
Maybe there are much better scaling up algortihms ask on http://dsp.stackexchange.com
See also http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html for the scaling up problem...

AAC stream resampled incorrectly

I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.

LibAV - what approach to take for realtime audio and video capture?

I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!

How do I attenuate a WAV file by a given decibel value?

If I wanted to reduce a WAV file's amplitude by 25%, I would write something like this:
for (int i = 0; i < data.Length; i++)
{
data[i] *= 0.75;
}
A lot of the articles I read on audio techniques, however, discuss amplitude in terms of decibels. I understand the logarithmic nature of decibel units in principle, but not so much in terms of actual code.
My question is: if I wanted to attenuate the volume of a WAV file by, say, 20 decibels, how would I do this in code like my above example?
Update: formula (based on Nils Pipenbrinck's answer) for attenuating by a given number of decibels (entered as a positive number e.g. 10, 20 etc.):
public void AttenuateAudio(float[] data, int decibels)
{
float gain = (float)Math.Pow(10, (double)-decibels / 20.0);
for (int i = 0; i < data.Length; i++)
{
data[i] *= gain;
}
}
So, if I want to attenuate by 20 decibels, the gain factor is .1.
I think you want to convert from decibel to gain.
The equations for audio are:
decibel to gain:
gain = 10 ^ (attenuation in db / 20)
or in C:
gain = powf(10, attenuation / 20.0f);
The equations to convert from gain to db are:
attenuation_in_db = 20 * log10 (gain)
If you just want to adust some audio, I've had good results with the normalize package from nongnu.org. If you want to study how it's done, the source code is freely available. I've also used wavnorm, whose home page seems to be out at the moment.
One thing to consider: .WAV files have MANY different formats. The code above only works for WAVE_FORMAT_FLOAT. If you're dealing with PCM files, then your samples are going to be 8, 16, 24 or 32 bit integers (8 bit PCM uses unsigned integers from 0..255, 24 bit PCM can be packed or unpacked (packed == 3 byte values packed next to each other, unpacked == 3 byte values in a 4 byte package).
And then there's the issue of alternate encodings - For instance in Win7, all the windows sounds are actually MP3 files in a WAV container.
It's unfortunately not as simple as it sounds :(.
Oops I misunderstood the question… You can see my python implementations of converting from dB to a float (which you can use as a multiplier on the amplitude like you show above) and vice-versa
https://github.com/jiaaro/pydub/blob/master/pydub/utils.py
In a nutshell it's:
10 ^ (db_gain / 10)
so to reduce the volume by 6 dB you would multiply the amplitude of each sample by:
10 ^ (-6 / 10) == 10 ^ (-0.6) == 0.2512

Resources