I'm attempting to convert an AAC audio stream for playback. I've discovered that I need to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 but when I do so the audio plays back at about half speed.
swr = swr_alloc();
assert(av_opt_set_int(swr, "in_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "out_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_rate", audioContext->sample_rate, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_rate", 44100, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_fmt", audioContext->sample_fmt, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0) == 0);
swr_init(swr);
There is my code to convert. The input sample rate is 44100 and the audio is stereo.
I call the code with
swr_convert(swr, &output, aDecodedFrame->nb_samples, (const uint8_t**)aDecodedFrame->extended_data, aDecodedFrame->nb_samples) >= 0)
You didn't show the actual audio encoding code, so I'd speculate there's a chance you might not handle the resampling properly. Note that you read twice less data from the resampling operation (i.e. if you pass 80 bytes, you'll read 40 from the resampler).
You may take a look at my video writing code, and strip off the audio encoding part. It is here: http://sourceforge.net/p/karlyriceditor/code/HEAD/tree/src/ffmpegvideoencoder.cpp
Related
I'm working on a ffmpeg playout application for Decklink but I'm facing some audio issues. I've seen other questions about this topic but none of them are currently helping.
I've tried Reubens code (https://stackoverflow.com/a/15372417/12610231) with the swr_convert for playing out ffmpeg/libav frames to a Decklink board (this needs to be 16 bits PCM interleaved) but the audio sounds wrong. It sounds like it's missing samples/ only getting half of the required samples).
When I record the samples in a raw audio file and play it out with Audacity the timeline is half the length of the actual recording and playing the samples on double speed.
I also tried the 'manual' conversion (https://stackoverflow.com/a/15372417/12610231) but unfortunately, not the result I was hoping for.
Here are some snippets of my code
swr_ctx = swr_alloc();
av_opt_set_int(swr_ctx, "in_channel_count", pAudioCodecCtx->channels, 0);
av_opt_set_int(swr_ctx, "in_sample_rate", pAudioCodecCtx->sample_rate, 0);
av_opt_set_int(swr_ctx, "in_channel_layout", pAudioCodecCtx->channel_layout, 0);
av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", pAudioCodecCtx->sample_fmt, 0);
av_opt_set_int(swr_ctx, "out_channel_count", 2, 0);
av_opt_set_int(swr_ctx, "out_sample_rate", 48000, 0);
av_opt_set_int(swr_ctx, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
if (swr_init(swr_ctx))
{
printf("Error SWR");
}
///
ret = avcodec_decode_audio4(pAudioCodecCtx, pFrame, &frameFinished, &packet);
if (ret < 0) {
printf("Error in decoding audio frame.\n");
}
swr_convert(swr_ctx, (uint8_t**)&m_audioBuffer, pFrame->nb_samples, (const uint8_t *)pFrame->extended_data, pFrame->nb_samples);
It also looks like that the FFmpeg packet contains out of 1 video packet en 2 audio packets, not sure what to do with the second audio packet, I already tried to combine the first and second audio package without any good result on the audio side.
Any help is appreciated.
So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.
I'm trying to play a wav file of 1000ms in a repetitive way. So play 1000ms, then 1000ms of silence, then again 1000ms of audio ,...
But when I print the timing during this process, I notice snd_pcm_writei() takes up some after the sound has been played and therefor is ~1600ms instead of 1000ms. I'm using the blocking mode.
Play Sound: ~1600ms
Silence: ~1000ms
Play Sound: ~1600ms
....
If I use non-blocking mode, sound is played for a very short time, a couple of ms.
Properties of wav-file:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Setup of PCM:
err = snd_pcm_hw_params_set_rate_resample(PCM, params, 0);
err = snd_pcm_nonblock(PCM, 0);
err = snd_pcm_hw_params_set_access(PCM, params, SND_PCM_ACCESS_RW_INTERLEAVED);
err = snd_pcm_hw_params_set_format(PCM, params, SND_PCM_FORMAT_S16_LE);
err = snd_pcm_hw_params_set_channels(PCM, params, 1);
err = snd_pcm_hw_params_set_rate_near(PCM, params, &rrate, 0);
err = snd_pcm_hw_params_set_buffer_size_near(PCM, params, &buffer_size);
err = snd_pcm_hw_params_set_period_size_near(PCM, params, &period_size, &dir);
err = snd_pcm_hw_params(PCM, params);
snd_pcm_sw_params_current(PCM, swparams);
snd_pcm_sw_params_set_start_threshold(PCM, swparams, (buffer_size / period_size) * period_size);
snd_pcm_sw_params_set_avail_min(PCM, swparams, period_event ? buffer_size : period_size);
snd_pcm_sw_params(PCM, swparams);
The buffer with 1000ms of audio samples is 16000 bytes, seems correct since (8000 samples / s ) * 2bytes/sample (mono + S16_LE).
To start playing the wav file, I use this piece of code:
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " Play sound";
err = snd_pcm_writei(PCM, Buffer, 16000);
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " End sound";
Does anyone have an explanation or tips to achieve this? Maybe a setting that's wrong or I need to use non-blocking mode.
Thanks
EDIT:
The return value or rrate is 8000, so looks good.
Here are some actual prints of the time.
"10:48:54:893" Play sound
"10:48:56:794" End sound
"10:48:57:802" Play sound
"10:48:58:913" End sound
"10:48:59:923" Play sound
"10:49:01:853" End sound
"10:49:02:862" Play sound
"10:49:04:793" End sound
"10:49:05:803" Play sound
"10:49:06:593" End sound
"10:49:07:602" Play sound
Time between END and PLAY is around 1000ms, time between PLAY and END is between 800ms and 1900ms, so not accurate at all.
# Геннадий Казачёк:
Yes, I always hear sound.
I first changed the length to 8000 like you suggested.
It did not really help, but is was obviously a bug.
Then I changed the buffersize and periodsize:
Before
snd_pcm_uframes_t buffer_size = 768;
snd_pcm_uframes_t period_size = 256;
After
snd_pcm_uframes_t buffer_size = 768*2;
snd_pcm_uframes_t period_size = 256*2;
I don't know why but the timings were correct after adjusting this.
Thanks for the help!
I am reading a wav file and trying to play it with alsa api using writei() method.
Wav file header has following values
Audio Format: 1 (PCM)
Num Channels: 1
Sample Rate: 11025
Byte Rate: 11025
Block Align: 1
Bits Per Sample: 8
Subchunk2 id: 0x61746164
Subchunk2 Size: 24569
I did not change buffer size or period size. Period size for my hw device is 4096 (I read it using snd_pcm_hw_params_get_period_size() )
Call to writei() looks like
//buff_size = period_size * size of each frame = 4096 * 1 bytes;
int16_t* buff = (int16_t *) malloc(buff_size);
for(i = 0; i < 6; ++i){
memcpy(buff, &samples[i*period_size], buff_size);
if (err = snd_pcm_writei(pcm, buff, period_size) == -EPIPE) {
printf("XRUN.\n");
snd_pcm_prepare(pcm);
} else if (err < 0) {
printf("ERROR. Can't write to PCM device. %s\n", snd_strerror(err));
}
}
As wav file is 8-bit PCM Mono, frame size is 1 byte and so this file's data size is 24569 frames. Using default period_size, buffer size for writei() = period_size * channels = 4096.
So I need 6 calls to writei() to play the entire file. But when I do that I cannot hear anything. Any idea what is wrong?
I am decoding aac to pcm with ffmpeg with avcodec_decode_audio3. However it decodes into AV_SAMPLE_FMT_FLTP sample format (PCM 32bit Float Planar) and i need AV_SAMPLE_FMT_S16 (PCM 16 bit signed - S16LE).
I know that ffmpeg can do this easily with -sample_fmt. I want to do the same with the code but i still couldn't figure it out.
audio_resample did not work for: it fails with error message: .... conversion failed.
EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!
At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.
Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.
And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:
1. Use libswresample (recommended)
#include "libswresample/swresample.h"
...
SwrContext *swr;
...
// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
...
// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);
And that's all you have to do!
2. Do it by hand in C (original answer, not recommended)
So in your decode loop, when you've got an audio packet you decode it like this:
AVCodecContext *audioCodec; // init'd elsewhere
AVFrame *audioFrame; // init'd elsewhere
AVPacket packet; // init'd elsewhere
int16_t* outputBuffer; // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);
And then, if you've got a full frame of audio, you can convert it fairly easily:
// Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
int in_samples = audioFrame->nb_samples;
int in_linesize = audioFrame->linesize[0];
int i=0;
float* inputChannel0 = (float*)audioFrame->extended_data[0];
// Mono
if (audioFrame->channels==1) {
for (i=0 ; i<in_samples ; i++) {
float sample = *inputChannel0++;
if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
outputBuffer[i] = (int16_t) (sample * 32767.0f);
}
}
// Stereo
else {
float* inputChannel1 = (float*)audioFrame->extended_data[1];
for (i=0 ; i<in_samples ; i++) {
outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
}
}
// outputBuffer now contains 16-bit PCM!
I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.
I found 2 resample function from FFMPEG. The performance maybe better.
avresample_convert()
http://libav.org/doxygen/master/group__lavr.html
swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c
Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.
To do the conversion, I did what you did with a bid of modification to work for any amount of channels:
if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
int nb_samples = decoded_frame->nb_samples;
int channels = decoded_frame->channels;
int outputBufferLen = nb_samples & channels * 2;
short* outputBuffer = new short[outputBufferLen/2];
for (int i = 0; i < nb_samples; i++)
{
for (int c = 0; c < channels; c++)
{
float* extended_data = (float*)decoded_frame->extended_data[c];
float sample = extended_data[i];
if (sample < -1.0f) sample = -1.0f;
else if (sample > 1.0f) sample = 1.0f;
outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
}
}
// Do what you want with the data etc.
}
I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.