Type conversion from BYTE* to int* in VS C++ - visual-c++

I have 2 sample programs. One converts a WAV into an MP3. Another captures my speakers and writes a WAV file. I try to combine the 2: input speakers, output MP3. My result is a MP3 file of correct length (so I passed the correct number of integers) but with only noise (so I passed the wrong data)
The working conversion source Wav to Mp3:
short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];
read = fread(pcm_buffer, 2 * sizeof(short int), PCM_SIZE, pcm);
write=lame_encode_buffer_interleaved(gfp, pcm_buffer, read, mp3_buffer, MP3_SIZE);
size_t lBytesWritten = fwrite(mp3_buffer, write, sizeof(char), mp3);
The working encoding in Wav is:
BYTE *pData;
hr = pAudioCaptureClient->GetBuffer( &pData, &nNumFramesToRead, &dwFlags, NULL, NULL);
LONG lBytesToWrite = nNumFramesToRead * nBlockAlign;
mmioWrite(hFile, reinterpret_cast<PCHAR>(pData), lBytesToWrite);
My version is:
BYTE *pData;
unsigned char mp3_buffer[MP3_SIZE];
hr = pAudioCaptureClient->GetBuffer( &pData, &nNumFramesToRead, &dwFlags, NULL, NULL);
LONG lBytesToMP3=lame_encode_buffer_interleaved(gfp, reinterpret_cast<short int*>(pData), nNumFramesToRead, mp3_buffer, MP3_SIZE);
fwrite(mp3_buffer, lBytesToMP3, sizeof(char), mp3);
I don't see the error. I coded C++ until some 10 years ago in Borland C++, and see now that Microsoft protects the world against incompetent programmers. But (I'm really sorry) I don't know how they want me to do it right!

Your problem is here:
LONG lBytesToMP3=lame_encode_buffer_interleaved(gfp, reinterpret_cast<short int*>(pData), nNumFramesToRead, mp3_buffer, MP3_SIZE);
You are passing 8-bit sound data to lame_encode_buffer_interleaved when it is designed to take 16-bit sound data. The fact that you are casting your data via reinterpret_cast<short int*>(pData) should make it clear.
You need to actually pass in 16-bit sound samples.
If you can capture 16-bit samples from the input, that's best. If not, converting from 8-bit samples to 16-bit samples is easy. You just scale the values up! Make sure you handle signedness correctly.

Related

How to detect a basic audio signal into a much bigger one (mpg123 output signal)

I am new to signal processing and I don't really understand the basics (and more). Sorry in advance for any mistake into my understanding so far.
I am writing C code to detect a basic signal (18Hz simple sinusoid 2 sec duration, generating it using Audacity is pretty simple) into a much bigger mp3 file. I read the mp3 file and copy it until I match the sound signal.
The signal to match is { 1st channel: 18Hz sin. signal , 2nd channel: nothing/doesn't matter).
To match the sound, I am calculating the frequency of the mp3 until I find a good percentage of 18Hz freq. during ~ 2 sec. As this frequency is not very common, I don't have to match it very precisely.
I used mpg123 to convert my file, I fill the buffers with what it returns. I initialised it to convert the mp3 to Mono RAW audio:
init:
int ret;
const long *rates;
size_t rate_count, i;
mpg123_rates(&rates, &rate_count);
mpg123_handle *m = mpg123_new(NULL, &ret);
mpg123_format_none(m);
for(i=0; i<rate_count; ++i)
mpg123_format(m, rates[i], MPG123_MONO, MPG123_ENC_SIGNED_32);
if(m == NULL)
{
//err
} else {
mpg123_open_feed(m);
}
(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
`(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
(...) `
But I have to idea how to get the resulting buffer to calculate the FFT to get the frequency.
//FREQ Calculation with libfftw3
int transform_size = MAX_MP3_BUF_SIZE * 2;
fftw_complex *fftout = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_complex *fftin = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_plan p = fftw_plan_dft_r2c_1d(transform_size, fftin, fftout, FFTW_ESTIMATE);
I can get a good RAW Audio (PCM ?) into a buffer (if I write it, it can be read and converted into wave with sox:
sox --magic -r 44100 -e signed -b 32 -c 1 rps.raw rps.wav
Any help is appreciated. My knowledge of signal processing is poor, I am not even sure of what to do with the FFT to get the frequency of the signal. Code is just fyi, it is contained into a much bigger project (for which a simple grep is not an option)
Don't use MP3 for this. There's a good chance your 18 Hz will disappear or at least become distorted. 18 Hz is will below audible. MP3 and other lossy algorithms use a variety of techniques to remove sounds that we're not going to hear.
Assuming PCM, since you only need one frequency band, consider using the Goertzel algorithm. This is more efficient than FFT/DFT for your use case.

android AudioTrack playback short array (16bit)

I have an application that playback audio. It takes encoded audio data over RTP and decode it to 16bit array. The decoded 16bit array is converted to 8 bit array (byte array) as this is required for some other functionality.
Even though audio playback is working it is breaking continuously and very hard to recognise audio output. If I listen carefully I can tell it is playing the correct audio.
I suspect this is due to the fact I convert 16 bit data stream into a byte array and use the write(byte[], int, int, AudioTrack.WRITE_NON_BLOCKING) of AudioTrack class for audio playback.
Therefore I converted the byte array back to a short array and used write(short[], int, int, AudioTrack.WRITE_NON_BLOCKING) method to see if it could resolve the problem.
However now there is no audio sound at all. In the debug output I can see the short array has data.
What could be the reason?
Here is the AUdioTrak initialization
sampleRate =AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
minimumBufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
minimumBufferSize,
AudioTrack.MODE_STREAM);
Here is the code converts short array to byte array
for (int i=0;i<internalBuffer.length;i++){
bufferIndex = i*2;
buffer[bufferIndex] = shortToByte(internalBuffer[i])[0];
buffer[bufferIndex+1] = shortToByte(internalBuffer[i])[1];
}
Here is the method that converts byte array to short array.
public short[] getShortAudioBuffer(byte[] b){
short audioBuffer[] = null;
int index = 0;
int audioSize = 0;
ByteBuffer byteBuffer = ByteBuffer.allocate(2);
if ((b ==null) && (b.length<2)){
return null;
}else{
audioSize = (b.length - (b.length%2));
audioBuffer = new short[audioSize/2];
}
if ((audioSize/2) < 2)
return null;
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
for(int i=0;i<audioSize/2;i++){
index = i*2;
byteBuffer.put(b[index]);
byteBuffer.put(b[index+1]);
audioBuffer[i] = byteBuffer.getShort(0);
byteBuffer.clear();
System.out.print(Integer.toHexString(audioBuffer[i]) + " ");
}
System.out.println();
return audioBuffer;
}
Audio is decoded using opus library and the configuration is as follows;
opus_decoder_ctl(dec,OPUS_SET_APPLICATION(OPUS_APPLICATION_AUDIO));
opus_decoder_ctl(dec,OPUS_SET_SIGNAL(OPUS_SIGNAL_MUSIC));
opus_decoder_ctl(dec,OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_decoder_ctl(dec,OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_FULLBAND));
opus_decoder_ctl(dec,OPUS_SET_PACKET_LOSS_PERC(0));
opus_decoder_ctl(dec,OPUS_SET_COMPLEXITY(10)); // highest complexity
opus_decoder_ctl(dec,OPUS_SET_LSB_DEPTH(16)); // 16bit = two byte samples
opus_decoder_ctl(dec,OPUS_SET_DTX(0)); // default - not using discontinuous transmission
opus_decoder_ctl(dec,OPUS_SET_VBR(1)); // use variable bit rate
opus_decoder_ctl(dec,OPUS_SET_VBR_CONSTRAINT(0)); // unconstrained
opus_decoder_ctl(dec,OPUS_SET_INBAND_FEC(0)); // no forward error correction
Let's assume you have a short[] array which contains the 16-bit one channel data to be played.
Then each sample is a value between -32768 and 32767 which represents the signal amplitude at the exact moment. And 0 value represents a middle point (no signal). This array can be passed to the audio track with ENCODING_PCM_16BIT format encoding.
But things are going weird when playing ENCODING_PCM_8BIT is used (See AudioFormat)
In this case each sample encoded by one byte. But each byte is unsigned. That means, it's value is between 0 and 255, while 128 represents the middle point.
Java has no unsigned byte format. Byte format is signed. I.e. values -128...-1 will represent actual values of 128...255. So you have to be careful when converting to the byte array, otherwise it will be a noise with barely recognizable source sound.
short[] input16 = ... // the source 16-bit audio data;
byte[] output8 = new byte[input16.length];
for (int i = 0 ; i < input16.length ; i++) {
// To convert 16 bit signed sample to 8 bit unsigned
// We add 128 (for rounding), then shift it right 8 positions
// Then add 128 to be in range 0..255
int sample = ((input16[i] + 128) >> 8) + 128;
if (sample > 255) sample = 255; // strip out overload
output8[i] = (byte)(sample); // cast to signed byte type
}
To perform backward conversion all should be the same: each single sample to be converted to exactly one sample of the output signal
byte[] input8 = // source 8-bit unsigned audio data;
short[] output16 = new short[input8.length];
for (int i = 0 ; i < input8.length ; i++) {
// to convert signed byte back to unsigned value just use bitwise AND with 0xFF
// then we need subtract 128 offset
// Then, just scale up the value by 256 to fit 16-bit range
output16[i] = (short)(((input8[i] & 0xFF) - 128) * 256);
}
The issue of not being able to convert data from byte array to short array was resolved when used bitwise operators instead of using ByteArray. It could be due not setting the correct parameters in ByteArray or it is not suitable for such conversion.
Nevertheless implementing conversion using bitwise operators resolved the problem. Since the original question has been resolved by this approach, please consider this as the final answer.
I will raise a separate topic for playback issue.
Thank you for all your support.

LibAV - what approach to take for realtime audio and video capture?

I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!

C & Fmod Ex - playing a PCM array/buffer in Real Time

I use an array to process radio signal and to obtain raw PCM audio. I am desperately trying to play this audio using Fmod Ex.
Basically, would it be possible to create a stream corresponding to my circular buffer, that I could access in a thread-safe way ? Any basic information about what methods to use would be greatly appreciated.
If no, could any other Windows 7 API do the trick and how ? (ASIO, Wasapi...)
Thx °-°
I'm assuming your data is continuous (always updating) so you would want to stream it into FMOD, to do this you could override the file callbacks for a particular sound. There is a good example of doing this with the FMOD API usercreatedsound example. If you just want to play a static buffer simply fill out a createsoundexinfo struct describing the data, use the FMOD_OPENMEMORY flag and pass a pointer to the data through createSound as name_or_data. Below is an example of the more complex stream case:
When creating the sound you would use FMOD_CREATESOUNDEXINFO to specify the details of your data, then pass that to createStream. Note this is basically how you would do the static sample case except you are using FMOD_OPENUSER, setting decode size and specifying the callbacks to read the data instead of FMOD_OPENMEMORY and passing the data via the name_or_data param:
FMOD_CREATESOUNDEXINFO exinfo;
memset(&createsoundexinfo, 0, sizeof(FMOD_CREATESOUNDEXINFO));
exinfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO); /* required. */
exinfo.decodebuffersize = 44100; /* Chunk size of stream update in samples. This will be the amount of data passed to the user callback. */
exinfo.length = 44100 * channels * sizeof(signed short) * 5; /* Length of PCM data in bytes of whole song (for Sound::getLength) */
exinfo.numchannels = channels; /* Number of channels in the sound. */
exinfo.defaultfrequency = 44100; /* Default playback rate of sound. */
exinfo.format = FMOD_SOUND_FORMAT_PCM16; /* Data format of sound. */
exinfo.pcmreadcallback = pcmreadcallback; /* User callback for reading. */
exinfo.pcmsetposcallback = pcmsetposcallback; /* User callback for seeking. */
result = system->createStream(NULL, FMOD_OPENUSER, &exinfo, &sound);
ERRCHECK(result);
Here you are saying that you will provide PCM16 44khz data, customize as required, and give two function callbacks for read and setposition which FMOD will call asking you to either seek your buffer or read something from it:
FMOD_RESULT F_CALLBACK pcmreadcallback(FMOD_SOUND *sound, void *data, unsigned int datalen)
{
// Read from your buffer here...
return FMOD_OK;
}
FMOD_RESULT F_CALLBACK pcmsetposcallback(FMOD_SOUND *sound, int subsound, unsigned int position, FMOD_TIMEUNIT postype)
{
// Seek to a location in your data, may not be required for what you want to do
return FMOD_OK;
}
That should be everything you need to get FMOD playing back your buffer.

How do I attenuate a WAV file by a given decibel value?

If I wanted to reduce a WAV file's amplitude by 25%, I would write something like this:
for (int i = 0; i < data.Length; i++)
{
data[i] *= 0.75;
}
A lot of the articles I read on audio techniques, however, discuss amplitude in terms of decibels. I understand the logarithmic nature of decibel units in principle, but not so much in terms of actual code.
My question is: if I wanted to attenuate the volume of a WAV file by, say, 20 decibels, how would I do this in code like my above example?
Update: formula (based on Nils Pipenbrinck's answer) for attenuating by a given number of decibels (entered as a positive number e.g. 10, 20 etc.):
public void AttenuateAudio(float[] data, int decibels)
{
float gain = (float)Math.Pow(10, (double)-decibels / 20.0);
for (int i = 0; i < data.Length; i++)
{
data[i] *= gain;
}
}
So, if I want to attenuate by 20 decibels, the gain factor is .1.
I think you want to convert from decibel to gain.
The equations for audio are:
decibel to gain:
gain = 10 ^ (attenuation in db / 20)
or in C:
gain = powf(10, attenuation / 20.0f);
The equations to convert from gain to db are:
attenuation_in_db = 20 * log10 (gain)
If you just want to adust some audio, I've had good results with the normalize package from nongnu.org. If you want to study how it's done, the source code is freely available. I've also used wavnorm, whose home page seems to be out at the moment.
One thing to consider: .WAV files have MANY different formats. The code above only works for WAVE_FORMAT_FLOAT. If you're dealing with PCM files, then your samples are going to be 8, 16, 24 or 32 bit integers (8 bit PCM uses unsigned integers from 0..255, 24 bit PCM can be packed or unpacked (packed == 3 byte values packed next to each other, unpacked == 3 byte values in a 4 byte package).
And then there's the issue of alternate encodings - For instance in Win7, all the windows sounds are actually MP3 files in a WAV container.
It's unfortunately not as simple as it sounds :(.
Oops I misunderstood the question… You can see my python implementations of converting from dB to a float (which you can use as a multiplier on the amplitude like you show above) and vice-versa
https://github.com/jiaaro/pydub/blob/master/pydub/utils.py
In a nutshell it's:
10 ^ (db_gain / 10)
so to reduce the volume by 6 dB you would multiply the amplitude of each sample by:
10 ^ (-6 / 10) == 10 ^ (-0.6) == 0.2512

Resources