PCM Data format with fmod - difference on multiple paltforms - audio

I am writing a Unity3D plugin that reads data from an MP3 file, feeds the PCM data to Unity so that it can play it inside the engine. On iOS, I use the AVAssetReaderAudioMixOutput class to decode and read the data, and on Android/Windows, I use FMOD.
I have set up a program on both Windows and iOS which uses FMOD to play back the music, just like Unity3D does.
I am having trouble getting the same results on both iOS and windows, and I cant seem to find the difference in audio output settings/format that would cause the difference.
So first, these are the settings that I set for my output audio stream, which are the same settings as Unity3D uses:
FMOD_CREATESOUNDEXINFO exinfo2;
memset(&exinfo2, 0, sizeof(FMOD_CREATESOUNDEXINFO));
exinfo2.cbsize = sizeof(FMOD_CREATESOUNDEXINFO);
exinfo2.decodebuffersize = 44100;
exinfo2.length = 44100 * 1 * sizeof(float) * 100;
exinfo2.numchannels = 1;
exinfo2.defaultfrequency = 44100;
exinfo2.format = FMOD_SOUND_FORMAT_PCMFLOAT;
exinfo2.pcmreadcallback = pcmreadcallback;
result = system_->createStream("./1.mp3", FMOD_LOOP_NORMAL | FMOD_SOFTWARE | FMOD_OPENUSER | FMOD_CREATESTREAM, &exinfo2, &sound2_);
ERRCHECK(result);
result = system_->playSound(FMOD_CHANNEL_FREE,sound2_,false,0);
Basically, 1 channel, 32-bit floating point PCM data. This is set on both the iOS and windows playback programs.
Now, on iOS, I set the AVAssetReaderAudioMixOutput audio settings like this:
NSDictionary *audioSetting = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithFloat:44100.0],AVSampleRateKey,
[NSNumber numberWithInt:1],AVNumberOfChannelsKey, //how many channels has original?
[NSNumber numberWithInt:32],AVLinearPCMBitDepthKey, //was 16
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithBool:YES], AVLinearPCMIsFloatKey, //was NO
[NSNumber numberWithBool:0], AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSData data], AVChannelLayoutKey, nil];
I set the PCMIsFloatKey to 1 so that the PCM data is a floating point, I set the bit-depth to 32, 1 channel, so that everything matches the FMOD output settings.
I read the data and write it into a circular buffer:
float* convertedBuffer = (float * ) audioBufferList.mBuffers[0].mData;
//We don't need the audioconverter on iOS
//Fill up the circular buffer
for(int i = 0; i < numSamples; i++)
{
circularAudioBuffer_[bufferWritePosition_] = convertedBuffer[i];
bufferWritePosition_++;
if(bufferWritePosition_ >= circularBufferSize_)
bufferWritePosition_ = 0;
}
Then read the data from the buffer and write it into the audio stream in the pcmreadcallback:
float *writeBuffer = (float *)data;
for(int i = 0; i < dataLength; i++)
{
sampleBuffer[i] = circularAudioBuffer_[bufferReadPosition_];
bufferReadPosition_++;
if(bufferReadPosition_ >= circularBufferSize_)
bufferReadPosition_ = 0;
}
With this, the audio plays perfectly, and the range of values inside the circular buffer is 0.0-1.0f
Now, on windows, I initialize the sound from which I read the data like this:
exinfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO);
exinfo.decodebuffersize = 44100;
exinfo.numchannels = 1;
exinfo.defaultfrequency = 44100;
exinfo.format = FMOD_SOUND_FORMAT_PCMFLOAT;
Setting the same parameters: 1channel, 32bit floating point. I read the data and write the data in the buffer:
FMOD_RESULT result = sound_->readData(rawBuffer, N4, &bytesRead);
float* floatBuffer = (float*) rawBuffer;
for(int j = 0; j < N; j++)
{
circularAudioBuffer_[bufferWritePosition_++] = floatBuffer[j];
if(bufferWritePosition_ >= circularBufferSize_)
bufferWritePosition_ = 0;
}
Now, when I read the data, I get very high or very low floating-point values (about 1e34 or -1e33). In the test program, I can't hear anything in the output.
I can switch the input and output sound format to PCM32 and it plays fine in the test program, but can`t be read properly in Unity3D (it screeches a lot, but I can make out the song).
Can anyone help me figure this out and make it works properly using the PCMFLOAT format? thanks!
TL;DR: I can't read data from FMOD sound with PCMFLOAT format!

From Fmod Support: Fmod Forums
The specified output format doesn't matter. The FMod codec will always return PCM16 and the iOS codec returns PCMFloats. So, I need to convert them:
(float)pcm16in[j] / 32768.0f;
In addition to this, I was (accidently) initializing the output stream with an MP3 file, which made it so that I couldn't change the output format.

Related

sending audio via bluetooth a2dp source esp32

I am trying to send measured i2s analogue signal (e.g. from mic) to the sink device via Bluetooth instead of the default noise.
Currently I am trying to change the bt_app_a2d_data_cb()
static int32_t bt_app_a2d_data_cb(uint8_t *data, int32_t i2s_read_len)
{
if (i2s_read_len < 0 || data == NULL) {
return 0;
}
char* i2s_read_buff = (char*) calloc(i2s_read_len, sizeof(char));
bytes_read = 0;
i2s_adc_enable(I2S_NUM_0);
while(bytes_read == 0)
{
i2s_read(I2S_NUM_0, i2s_read_buff, i2s_read_len,&bytes_read, portMAX_DELAY);
}
i2s_adc_disable(I2S_NUM_0);
// taking care of the watchdog//
TIMERG0.wdt_wprotect=TIMG_WDT_WKEY_VALUE;
TIMERG0.wdt_feed=1;
TIMERG0.wdt_wprotect=0;
uint32_t j = 0;
uint16_t dac_value = 0;
// change 16bit input signal to 8bit
for (int i = 0; i < i2s_read_len; i += 2) {
dac_value = ((((uint16_t) (i2s_read_buff[i + 1] & 0xf) << 8) | ((i2s_read_buff[i + 0]))));
data[j] = (uint8_t) dac_value * 256 / 4096;
j++;
}
// testing for loop
//uint8_t da = 0;
//for (int i = 0; i < i2s_read_len; i++) {
// data[i] = (uint8_t) (i2s_read_buff[i] >> 8);// & 0xff;
// da++;
// if(da>254) da=0;
//}
free(i2s_read_buff);
i2s_read_buff = NULL;
return i2s_read_len;
}
I can hear the sawtooth sound from the sink device.
Any ideas what to do?
your data can be an array of some float digits representing analog signals or analog signal variations, for example, a 32khz sound signal contains 320000 float numbers to define captures sound for every second. if your data have been expected to transmit in offline mode you can prepare your outcoming data in the form of a buffer plus a terminator sign then send buffer by Bluetooth module of sender device which is connected to the proper microcontroller. for the receiving device, if you got terminator character like "\r" you can process incoming buffer e.g. for my case, I had to send a string array of numbers but I often received at most one or two unknown characters and to avoid it I reject it while fulfill receiving container.
how to trim unknown first characters of string in code vision
if you want it in online mode i.e. your data must be transmitted and played concurrently. you must consider delays and reasonable time to process for all microcontrollers and devices like Bluetooth, EEprom iCs and...
I'm also working on a project "a2dp source esp32".
I'm playing a wav-file from spiffs.
If the wav-file is 44100, 16-bit, stereo then you can directly write a stream of bytes from the file to the array data[ ].
When I tried to write less data than in the len-variable and return less (for example 88), I got an error, now I'm trying to figure out how to reduce this buffer because of big latency (len=512).
Also, the data in the array data[ ] is stored as stereo.
Example: read data from file to data[ ]-array:
size_t read;
read = fread((void*) data, 1, len, fwave);//fwave is a file
if(read<len){//If get EOF, go to begin of the file
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, len-read, fwave);//read up
}
If file mono, I convert it to stereo like this (I read half and then double data):
int32_t lenHalf=len/2;
read = fread((void*) data, 1, lenHalf, fwave);
if(read<lenHalf){
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, lenHalf-read, fwave);//read up
}
//copy to the second channel
uint16_t *data16=(uint16_t*)data;
for (int i = lenHalf/2-1; i >= 0; i--) {
data16[(i << 1)] = data16[i];
data16[(i << 1) + 1] = data16[i];
}
I think you have got sawtooth sound because:
your data is mono?
in your "return i2s_read_len;" i2s_read_len less than len
you // change 16bit input signal to 8bit, in the array data[ ] data as 16-bit: 2ByteLeft-2ByteRight-2ByteLeft-2ByteRight-...
I'm not sure, it's a guess.

Frame buffer texture data update using DirectX

I am trying my hands on Direct X 11 template in VS 2015 in VC++. I am using:
D3D11_MAPPED_SUBRESOURCE Resource and MAP and UNMAP to update texture.
Now i have a separate file in my project where i am reading pixels and need to upload it to this texture.
I am using a struct to hold the texture data :
struct Frames{
int text_Width;
int text_height;
unsigned int text_Sz;
unsigned char* text_Data; };
Want to know how can i use this struct from a separate file to upload the texture data in my Direct X based Spinning Cube file.
You don't mention what format the data is, which is essential to knowing how to do this, but let's assume your text_Data points to an array of R8G8B8A8 data (i.e. each pixel is 32-bits with 8-bits each of Red, Green, Blue, and Alpha in that order from LSB to MSB). If so, it would look like:
Frames f = ...; // your structure
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = UINT(f.text_Width);
desc.Height = UINT(f.text_height);
desc.MipLevels = desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = 0;
desc.MiscFlags = 0;
D3D11_SUBRESOURCE_DATA initData = {};
initData.pSysMem = f.text_Data;
initData.SysMemPitch = UINT( 4 * f.text_width );
initData.SysMemSlicePitch = UINT( text_Sz );
Microsoft::WRL::ComPtr<ID3D11Texture2D> pTexture;
HRESULT hr = d3dDevice->CreateTexture2D( &desc, &initData, &pTexture );
if (FAILED(hr))
...
Note this is covered on MSDN in the How to use Direct3D 11 topics, although the sample code style there is a little dated.
Take a look at the DirectX Tool Kit for DirectX 11 and the tutorials in particular. There's no reason to write your own loader when you can just DDSTextureLoader or WICTextureLoader.

How to convert sample rate from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16?

I am decoding aac to pcm with ffmpeg with avcodec_decode_audio3. However it decodes into AV_SAMPLE_FMT_FLTP sample format (PCM 32bit Float Planar) and i need AV_SAMPLE_FMT_S16 (PCM 16 bit signed - S16LE).
I know that ffmpeg can do this easily with -sample_fmt. I want to do the same with the code but i still couldn't figure it out.
audio_resample did not work for: it fails with error message: .... conversion failed.
EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!
At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.
Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.
And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:
1. Use libswresample (recommended)
#include "libswresample/swresample.h"
...
SwrContext *swr;
...
// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
...
// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);
And that's all you have to do!
2. Do it by hand in C (original answer, not recommended)
So in your decode loop, when you've got an audio packet you decode it like this:
AVCodecContext *audioCodec; // init'd elsewhere
AVFrame *audioFrame; // init'd elsewhere
AVPacket packet; // init'd elsewhere
int16_t* outputBuffer; // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);
And then, if you've got a full frame of audio, you can convert it fairly easily:
// Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
int in_samples = audioFrame->nb_samples;
int in_linesize = audioFrame->linesize[0];
int i=0;
float* inputChannel0 = (float*)audioFrame->extended_data[0];
// Mono
if (audioFrame->channels==1) {
for (i=0 ; i<in_samples ; i++) {
float sample = *inputChannel0++;
if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
outputBuffer[i] = (int16_t) (sample * 32767.0f);
}
}
// Stereo
else {
float* inputChannel1 = (float*)audioFrame->extended_data[1];
for (i=0 ; i<in_samples ; i++) {
outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
}
}
// outputBuffer now contains 16-bit PCM!
I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.
I found 2 resample function from FFMPEG. The performance maybe better.
avresample_convert()
http://libav.org/doxygen/master/group__lavr.html
swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c
Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.
To do the conversion, I did what you did with a bid of modification to work for any amount of channels:
if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
int nb_samples = decoded_frame->nb_samples;
int channels = decoded_frame->channels;
int outputBufferLen = nb_samples & channels * 2;
short* outputBuffer = new short[outputBufferLen/2];
for (int i = 0; i < nb_samples; i++)
{
for (int c = 0; c < channels; c++)
{
float* extended_data = (float*)decoded_frame->extended_data[c];
float sample = extended_data[i];
if (sample < -1.0f) sample = -1.0f;
else if (sample > 1.0f) sample = 1.0f;
outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
}
}
// Do what you want with the data etc.
}
I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.

Generate audio tone to sound card in C++ or C#

I am trying to generate a tone to the sound card (Frequency: 1950 hz, duration: 40 ms, level: -30 db, right-channel, on steam 1). Any recommendations on how to accomplish this using C++ or C#. Are there any libraries (C++ or C#) for generating such precise tone?
David, playing audio to the speakers was built right into .NET (i think in the .NET 2.0 Framework). Using the System.Media.SoundPlayer you can play a sound from a memory stream that you build (in WAV format). Here is a function i coded that plays a simple frequency for a certain duration. Regarding the decibels and sending it to the sound card, i don't really understand what specifics you are referring to. For instance i fail to understand how audio as measured in decibels is sent to the sound card. My understanding is that decibels are simply a measure of how loud a sound is, thus after it's been reproduced by the speakers. Thus the volume control on the speakers affects what decibel your sounds will produce, and sending a certain decibel to the sound card thus makes no sense to me. Maybe you need something more detailed and maybe this doesn't work for you. But maybe you can run with this and get it to work for what you need. And maybe it is almost exactly what you are asking.
The process i use in this code allows one to build any audio you want, and plays it. So you can create 2 sine waves or many, many more, or triangle waves, or even speech synthesis with this method if you want. This method takes sound samples which are calculated and then plays those, so you need to code what each audio sample needs to be at the given moment in time. WAV allows stereo sound too, but this code sample only uses non-stereo sound. If you want stereo sound then it just needs modified to generate the bytes for a stereo WAV format instead. I expect it would not be too difficult.
Happy coding!
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
{
var mStrm = new MemoryStream();
BinaryWriter writer = new BinaryWriter(mStrm);
const double TAU = 2 * Math.PI;
int formatChunkSize = 16;
int headerSize = 8;
short formatType = 1;
short tracks = 1;
int samplesPerSecond = 44100;
short bitsPerSample = 16;
short frameSize = (short)(tracks * ((bitsPerSample + 7) / 8));
int bytesPerSecond = samplesPerSecond * frameSize;
int waveSize = 4;
int samples = (int)((decimal)samplesPerSecond * msDuration / 1000);
int dataChunkSize = samples * frameSize;
int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;
// var encoding = new System.Text.UTF8Encoding();
writer.Write(0x46464952); // = encoding.GetBytes("RIFF")
writer.Write(fileSize);
writer.Write(0x45564157); // = encoding.GetBytes("WAVE")
writer.Write(0x20746D66); // = encoding.GetBytes("fmt ")
writer.Write(formatChunkSize);
writer.Write(formatType);
writer.Write(tracks);
writer.Write(samplesPerSecond);
writer.Write(bytesPerSecond);
writer.Write(frameSize);
writer.Write(bitsPerSample);
writer.Write(0x61746164); // = encoding.GetBytes("data")
writer.Write(dataChunkSize);
{
double theta = frequency * TAU / (double)samplesPerSecond;
// 'volume' is UInt16 with range 0 thru Uint16.MaxValue ( = 65 535)
// we need 'amp' to have the range of 0 thru Int16.MaxValue ( = 32 767)
double amp = volume >> 2; // so we simply set amp = volume / 2
for (int step = 0; step < samples; step++)
{
short s = (short)(amp * Math.Sin(theta * (double)step));
writer.Write(s);
}
}
mStrm.Seek(0, SeekOrigin.Begin);
new System.Media.SoundPlayer(mStrm).Play();
writer.Close();
mStrm.Close();
} // public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
NAudio provides a robust audio library for .NET.
NAudio is an open source .NET audio and MIDI library, containing dozens of useful audio related classes intended to speed development of audio related utilities in .NET. It has been in development since 2002 and has grown to include a wide variety of features. While some parts of the library are relatively new and incomplete, the more mature features have undergone extensive testing and can be quickly used to add audio capabilities to an existing .NET application. NAudio can be quickly added to your .NET application using NuGet.
Here's an article that walks step-by-step through using NAudio to create a sine wave. You can create the sine wave with any desired frequency, for any desired duration:
http://msdn.microsoft.com/en-us/magazine/ee309883.aspx

How do you select audio input device in core audio?

I am writing a program that needs to deal with multiple audio inputs.
I am currently using AudioQueues to get the input, but this is only from the default input device.
Is there any way to either:
Select which input device the AudioQueues use.
Change the default input device.
I know that I can use kAudioHardwarePropertyDevices in Core-Audio to get a list of output devices, is there a similar one I can use for input devices?
I banged my head against how to do this for a while, and finally figured it out:
BOOL isMic = NO;
BOOL isSpeaker = NO;
AudioDeviceID device = audioDevices[i];
// Determine direction of the device by asking for the number of input or
// output streams.
propertyAddress.mSelector = kAudioDevicePropertyStreams;
propertyAddress.mScope = kAudioDevicePropertyScopeInput;
UInt32 dataSize = 0;
OSStatus status = AudioObjectGetPropertyDataSize(device,
&propertyAddress,
0,
NULL,
&dataSize);
UInt32 streamCount = dataSize / sizeof(AudioStreamID);
if (streamCount > 0)
{
isMic = YES;
}
propertyAddress.mScope = kAudioDevicePropertyScopeOutput;
dataSize = 0;
status = AudioObjectGetPropertyDataSize(device,
&propertyAddress,
0,
NULL,
&dataSize);
streamCount = dataSize / sizeof(AudioStreamID);
if (streamCount > 0)
{
isSpeaker = YES;
}
As you can see, the key part is to use the ScopeInput/ScopeOutput parameter values.
kAudioHardwarePropertyDevices is used for both output and input devices. Devices can have both input and output channels, or can have only input or output channels.
Most of the AudioDevice... functions take a Boolean isInput parameter so that you ca query the input side of the device.

Resources