Is there a way using Core Audio on OS X to extract a set of frames in an AIFF file into an array of 32-bit floats suitable for performing an FFT on?
Yes. The easiest way to do it is to use the ExtAudioFile API. There's a great example in Apple's ConvertFile sample code. Have a look at UseExtAF.cpp.
For a sample rate of 44.1 kHz, the AudioStreamBasicDescription for 32-bit floating point LPCM would look like this:
AudioStreamBasicDescription fmt;
fmt.mSampleRate = 44100;
fmt.mFormatID = kAudioFormatLinearPCM;
fmt.mFormatFlags = kLinearPCMFormatFlagIsFloat;
fmt.mBitsPerChannel = sizeof(Float32) * 8;
fmt.mChannelsPerFrame = 1; // set this to 2 for stereo
fmt.mBytesPerFrame = fmt.mChannelsPerFrame * sizeof(Float32);
fmt.mFramesPerPacket = 1;
fmt.mBytesPerPacket = fmt.mFramesPerPacket * fmt.mBytesPerFrame;
Related
I am working with Audio Unit RemoteIO's to obtain a low latency audio output. My problem is AFAIK audio unit only accepts several audio formats depending on the hardware. My problem is I have a C++ DSP Sound engine and it works with float interleaved PCM. I do not want to implement a format converter since it can slow things down in the remote IO callback. I tried obtaining a low latency Audio Unit with the following format:
AudioStreamBasicDescription const audioDescription = {
.mSampleRate = defaultSampleRate,
.mFormatID = kAudioFormatLinearPCM,
.mFormatFlags = kAudioFormatFlagIsFloat,
.mBytesPerPacket = defaultSampleRate * STEREO_CHANNEL,
.mFramesPerPacket = 1,
.mBytesPerFrame = STEREO_CHANNEL * sizeof(Float32),
.mChannelsPerFrame = STEREO_CHANNEL,
.mBitsPerChannel = 8 * sizeof(Float32),
.mReserved = 0
};
status = AudioUnitSetProperty(audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
kOutputBus,
&audioDescription,
sizeof(audioDescription));
This fails with the error code kAudioUnitErr_FormatNotSupported -10868. If I try to obtain a float PCM NON-interleaved audio stream with the following:
AudioStreamBasicDescription const audioDescription = {
.mSampleRate = defaultSampleRate,
.mFormatID = kAudioFormatLinearPCM,
.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved,
.mBytesPerPacket = sizeof(float),
.mFramesPerPacket = 1,
.mBytesPerFrame = sizeof(float),
.mChannelsPerFrame = STEREO_CHANNEL,
.mBitsPerChannel = 8 * sizeof(float),
.mReserved = 0
};
status = AudioUnitSetProperty(audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
kOutputBus,
&audioDescription,
sizeof(audioDescription));
Everything works fine. However I want to obtain an interleaved audio stream for my DSP engine to work without format conversion. Is this possible at all?
PS. waiting for hotpaw2 to guide me :)
Your error is probably due to this line:
.mBytesPerPacket = defaultSampleRate * STEREO_CHANNEL,
I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.
Suppose I've got a 16-bit PCM audio file. I wanna pan all of it completely to the left. How would I do this, purely through byte manipulation? Do I just mix the samples of the right channel with those of the left channel?
I'd also like to ask (since it seems related), how would I go about turning stereo samples into mono samples?
I'm doing this with Haxe, but code in something like C (or just an explanation of the method) should be sufficient. Thanks!
You'll first need to convert the raw bytes into int arrays. Your output for the left channel will be the sum divided by 2.
for (int i = 0 ; i < numFrames ; ++i)
{
*pOutputL++ = (*pInputL++ + *pInputR++) >> 1;
*pOutputR++ = 0;
}
I am trying to generate a tone to the sound card (Frequency: 1950 hz, duration: 40 ms, level: -30 db, right-channel, on steam 1). Any recommendations on how to accomplish this using C++ or C#. Are there any libraries (C++ or C#) for generating such precise tone?
David, playing audio to the speakers was built right into .NET (i think in the .NET 2.0 Framework). Using the System.Media.SoundPlayer you can play a sound from a memory stream that you build (in WAV format). Here is a function i coded that plays a simple frequency for a certain duration. Regarding the decibels and sending it to the sound card, i don't really understand what specifics you are referring to. For instance i fail to understand how audio as measured in decibels is sent to the sound card. My understanding is that decibels are simply a measure of how loud a sound is, thus after it's been reproduced by the speakers. Thus the volume control on the speakers affects what decibel your sounds will produce, and sending a certain decibel to the sound card thus makes no sense to me. Maybe you need something more detailed and maybe this doesn't work for you. But maybe you can run with this and get it to work for what you need. And maybe it is almost exactly what you are asking.
The process i use in this code allows one to build any audio you want, and plays it. So you can create 2 sine waves or many, many more, or triangle waves, or even speech synthesis with this method if you want. This method takes sound samples which are calculated and then plays those, so you need to code what each audio sample needs to be at the given moment in time. WAV allows stereo sound too, but this code sample only uses non-stereo sound. If you want stereo sound then it just needs modified to generate the bytes for a stereo WAV format instead. I expect it would not be too difficult.
Happy coding!
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
{
var mStrm = new MemoryStream();
BinaryWriter writer = new BinaryWriter(mStrm);
const double TAU = 2 * Math.PI;
int formatChunkSize = 16;
int headerSize = 8;
short formatType = 1;
short tracks = 1;
int samplesPerSecond = 44100;
short bitsPerSample = 16;
short frameSize = (short)(tracks * ((bitsPerSample + 7) / 8));
int bytesPerSecond = samplesPerSecond * frameSize;
int waveSize = 4;
int samples = (int)((decimal)samplesPerSecond * msDuration / 1000);
int dataChunkSize = samples * frameSize;
int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;
// var encoding = new System.Text.UTF8Encoding();
writer.Write(0x46464952); // = encoding.GetBytes("RIFF")
writer.Write(fileSize);
writer.Write(0x45564157); // = encoding.GetBytes("WAVE")
writer.Write(0x20746D66); // = encoding.GetBytes("fmt ")
writer.Write(formatChunkSize);
writer.Write(formatType);
writer.Write(tracks);
writer.Write(samplesPerSecond);
writer.Write(bytesPerSecond);
writer.Write(frameSize);
writer.Write(bitsPerSample);
writer.Write(0x61746164); // = encoding.GetBytes("data")
writer.Write(dataChunkSize);
{
double theta = frequency * TAU / (double)samplesPerSecond;
// 'volume' is UInt16 with range 0 thru Uint16.MaxValue ( = 65 535)
// we need 'amp' to have the range of 0 thru Int16.MaxValue ( = 32 767)
double amp = volume >> 2; // so we simply set amp = volume / 2
for (int step = 0; step < samples; step++)
{
short s = (short)(amp * Math.Sin(theta * (double)step));
writer.Write(s);
}
}
mStrm.Seek(0, SeekOrigin.Begin);
new System.Media.SoundPlayer(mStrm).Play();
writer.Close();
mStrm.Close();
} // public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
NAudio provides a robust audio library for .NET.
NAudio is an open source .NET audio and MIDI library, containing dozens of useful audio related classes intended to speed development of audio related utilities in .NET. It has been in development since 2002 and has grown to include a wide variety of features. While some parts of the library are relatively new and incomplete, the more mature features have undergone extensive testing and can be quickly used to add audio capabilities to an existing .NET application. NAudio can be quickly added to your .NET application using NuGet.
Here's an article that walks step-by-step through using NAudio to create a sine wave. You can create the sine wave with any desired frequency, for any desired duration:
http://msdn.microsoft.com/en-us/magazine/ee309883.aspx
If I wanted to reduce a WAV file's amplitude by 25%, I would write something like this:
for (int i = 0; i < data.Length; i++)
{
data[i] *= 0.75;
}
A lot of the articles I read on audio techniques, however, discuss amplitude in terms of decibels. I understand the logarithmic nature of decibel units in principle, but not so much in terms of actual code.
My question is: if I wanted to attenuate the volume of a WAV file by, say, 20 decibels, how would I do this in code like my above example?
Update: formula (based on Nils Pipenbrinck's answer) for attenuating by a given number of decibels (entered as a positive number e.g. 10, 20 etc.):
public void AttenuateAudio(float[] data, int decibels)
{
float gain = (float)Math.Pow(10, (double)-decibels / 20.0);
for (int i = 0; i < data.Length; i++)
{
data[i] *= gain;
}
}
So, if I want to attenuate by 20 decibels, the gain factor is .1.
I think you want to convert from decibel to gain.
The equations for audio are:
decibel to gain:
gain = 10 ^ (attenuation in db / 20)
or in C:
gain = powf(10, attenuation / 20.0f);
The equations to convert from gain to db are:
attenuation_in_db = 20 * log10 (gain)
If you just want to adust some audio, I've had good results with the normalize package from nongnu.org. If you want to study how it's done, the source code is freely available. I've also used wavnorm, whose home page seems to be out at the moment.
One thing to consider: .WAV files have MANY different formats. The code above only works for WAVE_FORMAT_FLOAT. If you're dealing with PCM files, then your samples are going to be 8, 16, 24 or 32 bit integers (8 bit PCM uses unsigned integers from 0..255, 24 bit PCM can be packed or unpacked (packed == 3 byte values packed next to each other, unpacked == 3 byte values in a 4 byte package).
And then there's the issue of alternate encodings - For instance in Win7, all the windows sounds are actually MP3 files in a WAV container.
It's unfortunately not as simple as it sounds :(.
Oops I misunderstood the question… You can see my python implementations of converting from dB to a float (which you can use as a multiplier on the amplitude like you show above) and vice-versa
https://github.com/jiaaro/pydub/blob/master/pydub/utils.py
In a nutshell it's:
10 ^ (db_gain / 10)
so to reduce the volume by 6 dB you would multiply the amplitude of each sample by:
10 ^ (-6 / 10) == 10 ^ (-0.6) == 0.2512