How to decode Full Rate GSM Audio file? - visual-c++

I have to decode a full rate gsm audio file. Full Rate GSM Audio file is decoded using libgsm. I have used MSVC++ with windows nightly builds of ffmpeg and libav but unable to decode file correctly. Can anyone tell me the reason? I have tried decoding using following codecs:
/* various PCM "codecs" */
AV_CODEC_ID_FIRST_AUDIO = 0x10000,
AV_CODEC_ID_PCM_S16LE = 0x10000,
AV_CODEC_ID_PCM_S16BE,
AV_CODEC_ID_PCM_U16LE,
AV_CODEC_ID_PCM_U16BE,
AV_CODEC_ID_PCM_S8,
AV_CODEC_ID_PCM_U8,
AV_CODEC_ID_PCM_MULAW,
AV_CODEC_ID_PCM_ALAW,
AV_CODEC_ID_PCM_S32LE,
AV_CODEC_ID_PCM_S32BE,
AV_CODEC_ID_PCM_U32LE,
AV_CODEC_ID_PCM_U32BE,
AV_CODEC_ID_PCM_S24LE,
AV_CODEC_ID_PCM_S24BE,
AV_CODEC_ID_PCM_U24LE,
AV_CODEC_ID_PCM_U24BE,
AV_CODEC_ID_PCM_S24DAUD,
AV_CODEC_ID_PCM_ZORK,
AV_CODEC_ID_PCM_S16LE_PLANAR,
AV_CODEC_ID_PCM_DVD,
AV_CODEC_ID_PCM_F32BE,
AV_CODEC_ID_PCM_F32LE,
AV_CODEC_ID_PCM_F64BE,
AV_CODEC_ID_PCM_F64LE,
AV_CODEC_ID_PCM_BLURAY,
AV_CODEC_ID_PCM_LXF,
AV_CODEC_ID_S302M,
AV_CODEC_ID_PCM_S8_PLANAR,
/* various ADPCM codecs */
AV_CODEC_ID_ADPCM_IMA_QT = 0x11000,
AV_CODEC_ID_ADPCM_IMA_WAV,
AV_CODEC_ID_ADPCM_IMA_DK3,
AV_CODEC_ID_ADPCM_IMA_DK4,
AV_CODEC_ID_ADPCM_IMA_WS,
AV_CODEC_ID_ADPCM_IMA_SMJPEG,
AV_CODEC_ID_ADPCM_MS,
AV_CODEC_ID_ADPCM_4XM,
AV_CODEC_ID_ADPCM_XA,
AV_CODEC_ID_ADPCM_ADX,
AV_CODEC_ID_ADPCM_EA,
AV_CODEC_ID_ADPCM_G726,
AV_CODEC_ID_ADPCM_CT,
AV_CODEC_ID_ADPCM_SWF,
AV_CODEC_ID_ADPCM_YAMAHA,
AV_CODEC_ID_ADPCM_SBPRO_4,
AV_CODEC_ID_ADPCM_SBPRO_3,
AV_CODEC_ID_ADPCM_SBPRO_2,
AV_CODEC_ID_ADPCM_THP,
AV_CODEC_ID_ADPCM_IMA_AMV,
AV_CODEC_ID_ADPCM_EA_R1,
AV_CODEC_ID_ADPCM_EA_R3,
AV_CODEC_ID_ADPCM_EA_R2,
AV_CODEC_ID_ADPCM_IMA_EA_SEAD,
AV_CODEC_ID_ADPCM_IMA_EA_EACS,
AV_CODEC_ID_ADPCM_EA_XAS,
AV_CODEC_ID_ADPCM_EA_MAXIS_XA,
AV_CODEC_ID_ADPCM_IMA_ISS,
AV_CODEC_ID_ADPCM_G722,
AV_CODEC_ID_ADPCM_IMA_APC,
AV_CODEC_ID_VIMA = MKBETAG('V','I','M','A'),
/* AMR */
AV_CODEC_ID_AMR_NB = 0x12000,
AV_CODEC_ID_AMR_WB,
/* RealAudio codecs*/
AV_CODEC_ID_RA_144 = 0x13000,
AV_CODEC_ID_RA_288,
/* various DPCM codecs */
AV_CODEC_ID_ROQ_DPCM = 0x14000,
AV_CODEC_ID_INTERPLAY_DPCM,
AV_CODEC_ID_XAN_DPCM,
AV_CODEC_ID_SOL_DPCM,
/* audio codecs */
AV_CODEC_ID_MP2 = 0x15000,
AV_CODEC_ID_MP3, ///< preferred ID for decoding MPEG audio layer 1, 2 or 3
AV_CODEC_ID_AAC,
AV_CODEC_ID_AC3,
AV_CODEC_ID_DTS,
AV_CODEC_ID_VORBIS,
AV_CODEC_ID_DVAUDIO,
AV_CODEC_ID_WMAV1,
AV_CODEC_ID_WMAV2,
AV_CODEC_ID_MACE3,
AV_CODEC_ID_MACE6,
AV_CODEC_ID_VMDAUDIO,
AV_CODEC_ID_FLAC,
AV_CODEC_ID_MP3ADU,
AV_CODEC_ID_MP3ON4,
AV_CODEC_ID_SHORTEN,
AV_CODEC_ID_ALAC,
AV_CODEC_ID_WESTWOOD_SND1,
AV_CODEC_ID_GSM, ///< as in Berlin toast format
AV_CODEC_ID_QDM2,
AV_CODEC_ID_COOK,
AV_CODEC_ID_TRUESPEECH,
AV_CODEC_ID_TTA,
AV_CODEC_ID_SMACKAUDIO,
AV_CODEC_ID_QCELP,
AV_CODEC_ID_WAVPACK,
AV_CODEC_ID_DSICINAUDIO,
AV_CODEC_ID_IMC,
AV_CODEC_ID_MUSEPACK7,
AV_CODEC_ID_MLP,
AV_CODEC_ID_GSM_MS, /* as found in WAV */
AV_CODEC_ID_ATRAC3,
AV_CODEC_ID_VOXWARE,
AV_CODEC_ID_APE,
AV_CODEC_ID_NELLYMOSER,
AV_CODEC_ID_MUSEPACK8,
AV_CODEC_ID_SPEEX,
AV_CODEC_ID_WMAVOICE,
AV_CODEC_ID_WMAPRO,
AV_CODEC_ID_WMALOSSLESS,
AV_CODEC_ID_ATRAC3P,
AV_CODEC_ID_EAC3,
AV_CODEC_ID_SIPR,
AV_CODEC_ID_MP1,
AV_CODEC_ID_TWINVQ,
AV_CODEC_ID_TRUEHD,
AV_CODEC_ID_MP4ALS,
AV_CODEC_ID_ATRAC1,
AV_CODEC_ID_BINKAUDIO_RDFT,
AV_CODEC_ID_BINKAUDIO_DCT,
AV_CODEC_ID_AAC_LATM,
AV_CODEC_ID_QDMC,
AV_CODEC_ID_CELT,
AV_CODEC_ID_G723_1,
AV_CODEC_ID_G729,
AV_CODEC_ID_8SVX_EXP,
AV_CODEC_ID_8SVX_FIB,
AV_CODEC_ID_BMV_AUDIO,
AV_CODEC_ID_RALF,
AV_CODEC_ID_IAC,
AV_CODEC_ID_ILBC,
AV_CODEC_ID_FFWAVESYNTH = MKBETAG('F','F','W','S'),
AV_CODEC_ID_8SVX_RAW = MKBETAG('8','S','V','X'),
AV_CODEC_ID_SONIC = MKBETAG('S','O','N','C'),
AV_CODEC_ID_SONIC_LS = MKBETAG('S','O','N','L'),
AV_CODEC_ID_PAF_AUDIO = MKBETAG('P','A','F','A'),
AV_CODEC_ID_OPUS = MKBETAG('O','P','U','S')

Here is MSVC version of the libgsm library:
http://code.google.com/p/vsmm/

Related

Sound activated recording in Julia

I'm recording audio with Julia and want to be able to trigger a 5 second recording after the audio signal exceeds a certain volume. This is my record script so far:
using PortAudio, SampledSignals, LibSndFile, FileIO, Dates
stream = PortAudioStream("HDA Intel PCH: ALC285 Analog (hw:0,0)")
buf = read(stream, 5s)
close(stream)
save(string("recording_", Dates.format(now(), "yyyymmdd_HHMMSS"), ".wav"), buf, Fs = 48000)
I'm new to Julia and signal processing in general. How can I tell this only to start recording once the audio exceeds a specified volume threshold?
You need to test the sound you capture for average amplitude and act on that. Save if loud enough, otherwise rinse and repeat.
using PortAudio, SampledSignals, LibSndFile, FileIO
const hassound = 10 # choose this to fit
suprathreshold(buf, thresh = hassound) = norm(buf) / sqrt(length(buf)) > thresh # power over threshold
stream = PortAudioStream("HDA Intel PCH: ALC285 Analog (hw:0,0)")
while true
buf = read(stream, 5s)
close(stream)
if suprathreshold(buf)
save("recording.wav", buf, Fs = 48000) # should really append here maybe???
end
end

FFMPEG audio sample data

I'm beginner to FFMPEG API and I need to process audio sample.
I see that audio sample data stored in AVFrame->data[0], but I don't know how audio sample stored in FFMPEG AVFrame.
For example:
There are 2 channels,
frame->nb_samples = 64,
frame->linesize[0] = 256.
I don't know how audio sample data stored in frame->data[0].
Thanks,
The audio samples are pointed to by
frame->data[0]
frame->data[1]
and they're frame->linesize[0] bytes long
The sample_fmt of your AVCodecContext will tell you the format of the samples, which will be one of the following:
AV_SAMPLE_FMT_FLTP
AV_SAMPLE_FMT_FLT
AV_SAMPLE_FMT_S16P
AV_SAMPLE_FMT_S16
For FLT you cast the pointers to float* and for S16 you cast to int16_t*

Linux ALSA Driver using channel count 3

Am running my ALSA Driver on Ubuntu 14.04, 64bit, 3.16.0-30-generic Kernel.
Hardware is proprietary hardware, hence cant give much details.
Following is the existing driver implementation:
Driver is provided sample format, sample rate, channel_count as input via module parameter. (Due to requirements need to provide inputs via module parameters)
Initial snd_pcm_hardware structure for playback path.
#define DEFAULT_PERIOD_SIZE (4096)
#define DEFAULT_NO_OF_PERIODS (1024)
static struct snd_pcm_hardware xxx_playback =
{
.info = SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_SYNC_START,
.formats = SNDRV_PCM_FMTBIT_S16_LE,
.rates = (SNDRV_PCM_RATE_8000 | \
SNDRV_PCM_RATE_16000 | \
SNDRV_PCM_RATE_48000 | \
SNDRV_PCM_RATE_96000),
.rate_min = 8000,
.rate_max = 96000,
.channels_min = 1,
.channels_max = 1,
.buffer_bytes_max = (DEFAULT_PERIOD_SIZE * DEFAULT_NO_OF_PERIODS),
.period_bytes_min = DEFAULT_PERIOD_SIZE,
.period_bytes_max = DEFAULT_PERIOD_SIZE,
.periods_min = DEFAULT_NO_OF_PERIODS,
.periods_max = DEFAULT_NO_OF_PERIODS,
};
Similar values for captures side snd_pcm_hardware structure.
Please, note that the following below values are replaced in playback open entry point, based on the current audio test configuration:
(user provides audio format, audio rate, ch count via module parameters as inputs to the driver, which are refilled in snd_pcm_hardware structure)
xxx_playback.formats = user_format_input
xxx_playback.rates = xxx_playback.rate_min, xxx_playback.rate_max = user_sample_rate_input
xxx_playback.channels_min = xxx_playback.channels_max = user_channel_input
Similarly values are re-filled for capture snd_pcm_hardware structure in capture open entry point.
Hardware is configured for clocks based on channel_count, format, sample_rate and driver registers successfully with ALSA layer
Found aplay/arecord working fine for channel_count = 1 or 2 or 4
During aplay/arecord, in driver when "runtime->channels" value is checked, it reflects the channel_count configured, which sounds correct to me.
Record data matches with played, since its a loop back test.
But when i use channel_count = 3, Both aplay or arecord reports
"Broken configuration for this PCM: no configurations available"!! for a wave file with channel_count '3'
ex: Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
aplay: set_params:1204: Broken configuration for this PCM: no configurations available
With Following changes I was able to move ahead a bit:
.........................
Method1:
Driver is provided channel_count '3' as input via module parameter
Modified Driver to fill snd_pcm_hardware structure as payback->channels_min = 2 & playback->channels_min = 3; Similar values for capture path
aplay/arecord reports as 'channel count not available', though the wave file in use has 3 channels
ex: aplay -D hw:CARD=xxx,DEV=0 ./xxx.wav Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
aplay: set_params:1239: Channels count non available
Tried aplay/arecord with plughw, and aplay/arecord moved ahead
arecord -D plughw:CARD=xxx,DEV=0 -d 3 -f S16_LE -r 48000 -c 3 ./xxx_rec0.wav
aplay -D plughw:CARD=xxx,DEV=0 ./xxx.wav
Recording WAVE './xxx_rec0.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
End of Test
During aplay/arecord, In driver when "runtime->channels" value is checked it returns value 2!!! But played wavefile has ch count 3...
When data in recorded file is checked its all silence
.........................
Method2:
Driver is provided channel_count '3' as input via module parameter
Modified Driver to fill snd_pcm_hardware structure as playback->channels_min = 3 & playback->channels_min = 4; Similar values for capture path
aplay/arecord reports as 'channel count not available', though the wave file in use has 3 channels
Tried aplay/arecord with plughw, and aplay/arecord moved ahead
During aplay/arecord, In driver when "runtime->channels" value is checked it returns value 4!!! But played wavefile has ch count 3...
When data in recorded file is checked its all silence
.........................
So from above observations, the runtime->channels is either 2 or 4, but never 3 channels was used by alsa stack though requested. When used Plughw, alsa is converting data to run under 2 or 4 channel.
Can anyone help why am unable to use channel count 3.
Will provide more information if needed.
Thanks in Advance.
A period (and the entire buffer) must contain an integral number of frames, i.e., you cannot have partial frames.
With three channels, one frame has six bytes. The fixed period size (4096) is not divisible by six without remainder.
Thanks CL.
I used period size 4092 for this particular test case with channel count 3, and was able to do loop back successfully (without using plughw).
One last question, when I used plughw earlier, and when runtime->channels was either 2 or 4, why was the recorded data not showing?

Is there any crossbrowser solution for playing flac? (or is it possible in theory to make one)

Not interested in silverlight. Flash/javascript/html5 solutions are acceptable.
If you do not know such solutions, could you please say is it possible to make such that or not?
When I had to play FLAC in-browser, my starting point was also the Aurora framework.
However, the Aurora player is geared around using ScriptProcessorNode to decode chunks of audio on the fly. This didn't pan out for many reasons.
Seeking Flac in Aurora was never implemented.
Stuttering and unacceptable performance in Firefox, even on a mid-range 2014 desktop.
Not feasable to offload decoding to a WebWorker.
Doesn't inter-operate with audio formats the browser does support.
I didn't want to be responsible for re-sampling the sample-rate, seeking, and other low-level audio tasks that Aurora necessarily assimilates.
Decoding offline: Flac to Wave
My solution was to decode the Flac to raw 16bit PCM audio, using a stripped down Aurora.js Assset class + dependencies.
Look in the source for Asset.get( 'format', callback ), Asset.fromFile, and Asset.prototype.decodeToBuffer.
Next, take the audio data, along with extracted values for sample-rate and channel count, and build a WAVE file. This can be played using an HTML5 audio element, sent though an audio graph using createMediaElementSource, or absolutely anything you can do with natively supported audio formats.
Note: Replace clz function in decoder.js with the native Math.clz32 to boost performance, and polyfill clz32 for old browsers.
Disadvantage
The decoding time. Around 5 seconds at ~100% CPU for an "average" 4min song.
Advantages
Blob (opposed to arraybuffer) isn't constrained by RAM, and the browser can swap it to disk. Original Flac data can likely be discarded too.
You get seeking for free.
You get sample-rate re-sampling for free.
CPU activity paid for upfront in WebWorker.
Should browsers EVER gain native Flac support, very easy to rip out. It doesn't create a strong dependency on Aurora.
Here's the function to build the WAVE header, and turn the raw PCM data into something the browser can natively play.
function createWave( audioData, sampleRate, channelCount )
{
const audioFormat = 1, // 2 PCM = 1
subChunk1Size= 16, // 4 PCM = 16
bitsPerSample= 16, // 2
blockAlign = channelCount * (bitsPerSample >> 3), // 2
byteRate = blockAlign * sampleRate, // 4
subChunk2Size= blockAlign * audioData.size, // 4
chunkSize = 36 + subChunk2Size, // 4
// Total header size 44 bytes
header = new DataView( new ArrayBuffer(44) );
header.setUint32( 0, 0x52494646 ); // chunkId=RIFF
header.setUint32( 4, chunkSize, true );
header.setUint32( 8, 0x57415645 ); // format=WAVE
header.setUint32( 12, 0x666d7420 ); // subChunk1Id=fmt
header.setUint32( 16, subChunk1Size, true );
header.setUint16( 20, audioFormat, true );
header.setUint16( 22, channelCount, true );
header.setUint32( 24, sampleRate, true );
header.setUint32( 28, byteRate, true );
header.setUint16( 32, blockAlign, true );
header.setUint16( 34, bitsPerSample, true );
header.setUint32( 36, 0x64617461 ); // subChunk2Id=data
header.setUint32( 40, subChunk2Size, true );
return URL.createObjectURL( new Blob( [header, audioData], {type: 'audio/wav'} ) );
}
A simple Google search led me to these sites:
Aurora and FLAC.js — audio codecs using the Web Audio API
Introducing FLAC.js: A Pure JavaScript FLAC Decoder
Believe it or not, it wasn't so hard.
Almost forgot:
Check HTML5Test to compare browsers performance/compatibility with the <audio> tag and it's siblings.

Properly open audio files with libav/ffmpeg

I am trying to decode audio samples from various file formats using ffmpeg. Therefore I have started some experimenting based on the code in this discussion: How to decode audio via FFmpeg in Android . I use the latest FFMPEG release (1.0) and compile it using https://github.com/halfninja/android-ffmpeg-x264
AVFormatContext * pFormatCtx;
avcodec_register_all();
av_register_all();
int lError;
if ((lError = avformat_open_input(&pFormatCtx, filename, NULL, 0))
!= 0) {
LOGE("Error open source file: %d", lError);
return;
}
if ((lError = avformat_find_stream_info(pFormatCtx, 0)) < 0) {
LOGE("Error find stream information: %d (Streams: %d)", lError, pFormatCtx->nb_streams);
return;
}
LOGE("audio format: %s", pFormatCtx->iformat->name);
LOGE("audio bitrate: %d", pFormatCtx->bit_rate);
audioStreamIndex = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_AUDIO,
-1, -1, &codec, 0);
//if (audioStreamIndex < 0 || audioStreamIndex >= pFormatCtx->nb_streams)
// audioStreamIndex = 0;
LOGE("Stream: %d (total: %d)", audioStreamIndex, pFormatCtx->nb_streams);
LOGE("audio codec: %s", codec->name);
FFMPEG is compiled using enable-decoder=mp1/mp2/mp3/ogg/vorbis/wav/aac/theora and without any external libraries (e.g. libmp3lame, libtheora, etc.)
Opening of mp3 and wav files works without problems producing the following output for instance for mp3:
audio format: mp3
audio bitrate: 256121
stream: 0 (total: 1)
audio codec: mp3
But when I try to open an ogg file I get this:
Error find stream information: -1 (Streams: 1)
When I manually set audioStreamIndex=0 and comment out the return statement:
Error find stream information: -1 (Streams: 1)
audio format: mp3
audio bitrate: 0
stream: 0 (total: 1)
audio codec: mp3
For m4a (AAC) I get this:
audio format: mp3
audio bitrate: 288000
stream: 0 (total: 1)
audio codec: mp1
but later it fails in avcodec_decode_audio3.
I also tried to manually force a format without success:
AVInputFormat *pForceFormat= av_find_input_format("ogg");
if ((lError = avformat_open_input(&pFormatCtx, filename, pForceFormat, 0))
// continue
Is there something wrong with the loading code which makes it only work with mp3 and wav and fails for other formats?
Regards,
The problem was a missing demuxer.

Resources