I'm working on a ffmpeg playout application for Decklink but I'm facing some audio issues. I've seen other questions about this topic but none of them are currently helping.
I've tried Reubens code (https://stackoverflow.com/a/15372417/12610231) with the swr_convert for playing out ffmpeg/libav frames to a Decklink board (this needs to be 16 bits PCM interleaved) but the audio sounds wrong. It sounds like it's missing samples/ only getting half of the required samples).
When I record the samples in a raw audio file and play it out with Audacity the timeline is half the length of the actual recording and playing the samples on double speed.
I also tried the 'manual' conversion (https://stackoverflow.com/a/15372417/12610231) but unfortunately, not the result I was hoping for.
Here are some snippets of my code
swr_ctx = swr_alloc();
av_opt_set_int(swr_ctx, "in_channel_count", pAudioCodecCtx->channels, 0);
av_opt_set_int(swr_ctx, "in_sample_rate", pAudioCodecCtx->sample_rate, 0);
av_opt_set_int(swr_ctx, "in_channel_layout", pAudioCodecCtx->channel_layout, 0);
av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", pAudioCodecCtx->sample_fmt, 0);
av_opt_set_int(swr_ctx, "out_channel_count", 2, 0);
av_opt_set_int(swr_ctx, "out_sample_rate", 48000, 0);
av_opt_set_int(swr_ctx, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
if (swr_init(swr_ctx))
{
printf("Error SWR");
}
///
ret = avcodec_decode_audio4(pAudioCodecCtx, pFrame, &frameFinished, &packet);
if (ret < 0) {
printf("Error in decoding audio frame.\n");
}
swr_convert(swr_ctx, (uint8_t**)&m_audioBuffer, pFrame->nb_samples, (const uint8_t *)pFrame->extended_data, pFrame->nb_samples);
It also looks like that the FFmpeg packet contains out of 1 video packet en 2 audio packets, not sure what to do with the second audio packet, I already tried to combine the first and second audio package without any good result on the audio side.
Any help is appreciated.
Related
Hi Core Audio/Au community,
I have hit a roadbloc during development. My current AUGraph is set up as 2 Mono streams->Mixer unit->remoteIO unit on an iOS platform. I am using the mixer to mix two mono stream into stereo interleaved. However, the need is that mono streams neednt be mixed at all times while being played out in stereo i.e the interleaved stereo output should be composed of: the 1st mono stream in the left ear and the 2nd mono stream in the right ear. I am able to accomplish this using the kAudioUnitProperty_MatrixLevels property on the Multichannel mixer.
//Left out //right out
matrixVolumes[0][0]=1; matrixVolumes[0][1]=0.001;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 0,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 0 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
printf("setting matrix levels kAudioUnitProperty_MatrixLevels on bus 1 \n");
//Left out //right out
matrixVolumes[0][0]=0.001; matrixVolumes[0][1]=1;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 1,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 1 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
As shown above I am using the volume controls to control the streams playing as unmixed stereo interleaved separately. This works fine when I am using a wired headset to play the audio; the 1st mono stream plays on the left ear and the second mono stream plays on the right ear. But when I am switching to a bluetooth headset the audio output is a mix of both the mono streams playing on both the left and right channel. So, the matrix levels do not seem to work there. The formats used for the i/p and o/p of the mixer are as follows:
printf("create Input ASBD\n");
// client format audio goes into the mixer
obj->mInputFormat.mFormatID = kAudioFormatLinearPCM;
int sampleSize = ((UInt32)sizeof(AudioSampleType));
obj->mInputFormat.mFormatFlags = kAudioFormatFlagsCanonical;
obj->mInputFormat.mBitsPerChannel = 8 * sampleSize;
obj->mInputFormat.mChannelsPerFrame = 1; //mono
obj->mInputFormat.mFramesPerPacket = 1;
obj->mInputFormat.mBytesPerPacket = obj->mInputFormat.mBytesPerFrame = sampleSize;
obj->mInputFormat.mFormatFlags |= kAudioFormatFlagIsNonInterleaved;
// obj->mInputFormat.mSampleRate = obj->mGlobalSampleRate;(// set later while initializing audioStreamer or from the client app)
printf("create output ASBD\n");
// output format for the mixer unit output bus
obj->mOutputFormat.mFormatID = kAudioFormatLinearPCM;
obj->mOutputFormat.mFormatFlags = kAudioFormatFlagsCanonical | (kAudioUnitSampleFractionBits << kLinearPCMFormatFlagsSampleFractionShift);
obj->mOutputFormat.mChannelsPerFrame = 2;//stereo
obj->mOutputFormat.mFramesPerPacket = 1;
obj->mOutputFormat.mBitsPerChannel = 8 * ((UInt32)sizeof(AudioUnitSampleType));
obj->mOutputFormat.mBytesPerPacket = obj->mOutputFormat.mBytesPerFrame = 2 * ((UInt32)sizeof(AudioUnitSampleType));
// obj->mOutputFormat.mSampleRate = obj->mGlobalSampleRate; (// set later while initializing)
N.B : I am setting the sample rates separately from the application.The i/p and o/p sample rate of the mixer is same as sample rate of the mono audio files.
Thanks in advance for taking a look at the issue...:)
So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.
I'm trying to play a wav file of 1000ms in a repetitive way. So play 1000ms, then 1000ms of silence, then again 1000ms of audio ,...
But when I print the timing during this process, I notice snd_pcm_writei() takes up some after the sound has been played and therefor is ~1600ms instead of 1000ms. I'm using the blocking mode.
Play Sound: ~1600ms
Silence: ~1000ms
Play Sound: ~1600ms
....
If I use non-blocking mode, sound is played for a very short time, a couple of ms.
Properties of wav-file:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Setup of PCM:
err = snd_pcm_hw_params_set_rate_resample(PCM, params, 0);
err = snd_pcm_nonblock(PCM, 0);
err = snd_pcm_hw_params_set_access(PCM, params, SND_PCM_ACCESS_RW_INTERLEAVED);
err = snd_pcm_hw_params_set_format(PCM, params, SND_PCM_FORMAT_S16_LE);
err = snd_pcm_hw_params_set_channels(PCM, params, 1);
err = snd_pcm_hw_params_set_rate_near(PCM, params, &rrate, 0);
err = snd_pcm_hw_params_set_buffer_size_near(PCM, params, &buffer_size);
err = snd_pcm_hw_params_set_period_size_near(PCM, params, &period_size, &dir);
err = snd_pcm_hw_params(PCM, params);
snd_pcm_sw_params_current(PCM, swparams);
snd_pcm_sw_params_set_start_threshold(PCM, swparams, (buffer_size / period_size) * period_size);
snd_pcm_sw_params_set_avail_min(PCM, swparams, period_event ? buffer_size : period_size);
snd_pcm_sw_params(PCM, swparams);
The buffer with 1000ms of audio samples is 16000 bytes, seems correct since (8000 samples / s ) * 2bytes/sample (mono + S16_LE).
To start playing the wav file, I use this piece of code:
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " Play sound";
err = snd_pcm_writei(PCM, Buffer, 16000);
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " End sound";
Does anyone have an explanation or tips to achieve this? Maybe a setting that's wrong or I need to use non-blocking mode.
Thanks
EDIT:
The return value or rrate is 8000, so looks good.
Here are some actual prints of the time.
"10:48:54:893" Play sound
"10:48:56:794" End sound
"10:48:57:802" Play sound
"10:48:58:913" End sound
"10:48:59:923" Play sound
"10:49:01:853" End sound
"10:49:02:862" Play sound
"10:49:04:793" End sound
"10:49:05:803" Play sound
"10:49:06:593" End sound
"10:49:07:602" Play sound
Time between END and PLAY is around 1000ms, time between PLAY and END is between 800ms and 1900ms, so not accurate at all.
# Геннадий Казачёк:
Yes, I always hear sound.
I first changed the length to 8000 like you suggested.
It did not really help, but is was obviously a bug.
Then I changed the buffersize and periodsize:
Before
snd_pcm_uframes_t buffer_size = 768;
snd_pcm_uframes_t period_size = 256;
After
snd_pcm_uframes_t buffer_size = 768*2;
snd_pcm_uframes_t period_size = 256*2;
I don't know why but the timings were correct after adjusting this.
Thanks for the help!
I'm attempting to convert an AAC audio stream for playback. I've discovered that I need to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 but when I do so the audio plays back at about half speed.
swr = swr_alloc();
assert(av_opt_set_int(swr, "in_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "out_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_rate", audioContext->sample_rate, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_rate", 44100, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_fmt", audioContext->sample_fmt, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0) == 0);
swr_init(swr);
There is my code to convert. The input sample rate is 44100 and the audio is stereo.
I call the code with
swr_convert(swr, &output, aDecodedFrame->nb_samples, (const uint8_t**)aDecodedFrame->extended_data, aDecodedFrame->nb_samples) >= 0)
You didn't show the actual audio encoding code, so I'd speculate there's a chance you might not handle the resampling properly. Note that you read twice less data from the resampling operation (i.e. if you pass 80 bytes, you'll read 40 from the resampler).
You may take a look at my video writing code, and strip off the audio encoding part. It is here: http://sourceforge.net/p/karlyriceditor/code/HEAD/tree/src/ffmpegvideoencoder.cpp
I am decoding aac to pcm with ffmpeg with avcodec_decode_audio3. However it decodes into AV_SAMPLE_FMT_FLTP sample format (PCM 32bit Float Planar) and i need AV_SAMPLE_FMT_S16 (PCM 16 bit signed - S16LE).
I know that ffmpeg can do this easily with -sample_fmt. I want to do the same with the code but i still couldn't figure it out.
audio_resample did not work for: it fails with error message: .... conversion failed.
EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!
At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.
Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.
And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:
1. Use libswresample (recommended)
#include "libswresample/swresample.h"
...
SwrContext *swr;
...
// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
...
// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);
And that's all you have to do!
2. Do it by hand in C (original answer, not recommended)
So in your decode loop, when you've got an audio packet you decode it like this:
AVCodecContext *audioCodec; // init'd elsewhere
AVFrame *audioFrame; // init'd elsewhere
AVPacket packet; // init'd elsewhere
int16_t* outputBuffer; // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);
And then, if you've got a full frame of audio, you can convert it fairly easily:
// Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
int in_samples = audioFrame->nb_samples;
int in_linesize = audioFrame->linesize[0];
int i=0;
float* inputChannel0 = (float*)audioFrame->extended_data[0];
// Mono
if (audioFrame->channels==1) {
for (i=0 ; i<in_samples ; i++) {
float sample = *inputChannel0++;
if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
outputBuffer[i] = (int16_t) (sample * 32767.0f);
}
}
// Stereo
else {
float* inputChannel1 = (float*)audioFrame->extended_data[1];
for (i=0 ; i<in_samples ; i++) {
outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
}
}
// outputBuffer now contains 16-bit PCM!
I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.
I found 2 resample function from FFMPEG. The performance maybe better.
avresample_convert()
http://libav.org/doxygen/master/group__lavr.html
swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c
Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.
To do the conversion, I did what you did with a bid of modification to work for any amount of channels:
if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
int nb_samples = decoded_frame->nb_samples;
int channels = decoded_frame->channels;
int outputBufferLen = nb_samples & channels * 2;
short* outputBuffer = new short[outputBufferLen/2];
for (int i = 0; i < nb_samples; i++)
{
for (int c = 0; c < channels; c++)
{
float* extended_data = (float*)decoded_frame->extended_data[c];
float sample = extended_data[i];
if (sample < -1.0f) sample = -1.0f;
else if (sample > 1.0f) sample = 1.0f;
outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
}
}
// Do what you want with the data etc.
}
I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.