How do you calculate bitrate for a non compressed audio file - audio

For an uncompressed file I thought it was
SampleRate / (NoOfBits * NoOfChannels) = BitRate
which would give
44100 Hz / (16 bits * 2) = **1378 kbps **
However this bitrate calculator returns 1411.20 and when I look at the properties of an actual AIFF file in iTunes it also returns 1411 kbps.
So I assume my thinking is incorrect, what have I missed

Your formula is incorrect - it's just a coincidence that you're getting something close to the right answer (albeit with a handy change of units!).
For uncompressed audio it would be:
bit rate = sample rate * channels * bits per sample
which for CD audio would be:
= 44100 * 2 * 16
= 1411200 bits/s
= 1411.2 kbits/s
See this relevant question for further details.

#Paul R is correct, you can see this in the Javascript for the page:
function calc(data){
var cA2B=data.pA2B; // sample rate
var cA3B=data.pA3B; // word length
var cA4B=data.pA4B; // channels
var cA5D=data.pA5D; // bitrate kbps compressed
var cA8B=data.pA8B; // MB per hour uncompressd
var cA8D=data.pA8D; // MB per hour compressed
var cA9B=data.pA9B; // # files uncompressed
var cA9D=data.pA9D; // # files uncompressed
var cA5B=(((((((cA3B)*(cA2B)))*(cA4B)))/(1000)));
var cA6D=(((((((((cA5D)/(8)))*(60)))*(60)))/(1024)));
var cA10B=(((((((((cA8B)*(cA9B)))*(60)))*(((cA5B)/(8)))))/(1024)));
var cA10D=(((((((((cA8D)*(cA9D)))*(60)))*(((cA5D)/(8)))))/(1024)));
var cA6B=(((((((((cA5B)/(8)))*(60)))*(60)))/(1024)));
data.pA5B=cA5B; // the bitrate for uncompressed
data.pA6B=cA6B;
data.pA6D=cA6D;
data.pA10B=cA10B;
data.pA10D=cA10D;
};
I added the comments and the (slightly more) pretty printing.

Related

Encode LINEAR16 audio to Twilio media audio/x-mulaw | NodeJS

I have been trying to stream mulaw media stream back to Twilio. Requirement is payload must be encoded audio/x-mulaw with a sample rate of 8000 and base64 encoded
My input is from #google-cloud/text-to-speech in LINEAR16 Google Docs
I tried Wavefile
This is how I encoded the response from #google-cloud/text-to-speech
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
Then I send the result back to Twilio via WebSocket
twilioWebsocket.send(JSON.stringify({
event: 'media',
media: {
payload: wav.toBase64(),
},
streamSid: meta.streamSid,
}))
Problem is we only hear random noise on other ends of Twilio call, seems like encoding is not proper
Secondly I have checked the #google-cloud/text-to-speech output audio by saving it in a file and it was proper and clear
Can anyone please help me with the encoding
I also had this same problem. The error is in wav.toBase64(), as this includes the wav header. Twilio media streams expects raw audio data, which you can get with wav.data.samples, so your code would be:
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
const payload = Buffer.from(wav.data.samples).toString('base64');
I just had the same Problem. The solution is, that you need to convert the LINEAR16 by hand to the corresponding MULAW Codec.
You can use the code from a music libary.
I created a function out of this to convert a linear16 byte array to mulaw:
short2ulaw(b: Buffer): Buffer {
// Linear16 to linear8 -> buffer is half the size
// As of LINEAR16 nature, the length should ALWAYS be even
const returnbuffer = Buffer.alloc(b.length / 2)
for (let i = 0; i < b.length / 2; i++) {
// The nature of javascript forbids us to use 16-bit types. Every number is
// A double precision 64 Bit number.
let short = b.readInt16LE(i * 2)
let sign = 0
// Determine the sign of the 16-Bit byte
if (short < 0) {
sign = 0x80
short = short & 0xef
}
short = short > 32635 ? 32635 : short
const sample = short + 0x84
const exponent = this.exp_lut[sample >> 8] & 0x7f
const mantissa = (sample >> (exponent + 3)) & 0x0f
let ulawbyte = ~(sign | (exponent << 4) | mantissa) & 0x7f
ulawbyte = ulawbyte == 0 ? 0x02 : ulawbyte
returnbuffer.writeUInt8(ulawbyte, i)
}
return returnbuffer
}
Now you could use this on Raw PCM (Linear16). Now you just need to consider to strip the bytes at the beginning of the google stream since google adds a wav header.
You can then encode the resulting base64 buffer and send this to twilio.

How to lower the quality and specs of a wav file on linux

So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.

AAC stream resampled incorrectly

I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.

LibAV - what approach to take for realtime audio and video capture?

I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!

Using Core Audio to extract floats from AIFF

Is there a way using Core Audio on OS X to extract a set of frames in an AIFF file into an array of 32-bit floats suitable for performing an FFT on?
Yes. The easiest way to do it is to use the ExtAudioFile API. There's a great example in Apple's ConvertFile sample code. Have a look at UseExtAF.cpp.
For a sample rate of 44.1 kHz, the AudioStreamBasicDescription for 32-bit floating point LPCM would look like this:
AudioStreamBasicDescription fmt;
fmt.mSampleRate = 44100;
fmt.mFormatID = kAudioFormatLinearPCM;
fmt.mFormatFlags = kLinearPCMFormatFlagIsFloat;
fmt.mBitsPerChannel = sizeof(Float32) * 8;
fmt.mChannelsPerFrame = 1; // set this to 2 for stereo
fmt.mBytesPerFrame = fmt.mChannelsPerFrame * sizeof(Float32);
fmt.mFramesPerPacket = 1;
fmt.mBytesPerPacket = fmt.mFramesPerPacket * fmt.mBytesPerFrame;

Resources