I'm trying to play a wav file of 1000ms in a repetitive way. So play 1000ms, then 1000ms of silence, then again 1000ms of audio ,...
But when I print the timing during this process, I notice snd_pcm_writei() takes up some after the sound has been played and therefor is ~1600ms instead of 1000ms. I'm using the blocking mode.
Play Sound: ~1600ms
Silence: ~1000ms
Play Sound: ~1600ms
....
If I use non-blocking mode, sound is played for a very short time, a couple of ms.
Properties of wav-file:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Setup of PCM:
err = snd_pcm_hw_params_set_rate_resample(PCM, params, 0);
err = snd_pcm_nonblock(PCM, 0);
err = snd_pcm_hw_params_set_access(PCM, params, SND_PCM_ACCESS_RW_INTERLEAVED);
err = snd_pcm_hw_params_set_format(PCM, params, SND_PCM_FORMAT_S16_LE);
err = snd_pcm_hw_params_set_channels(PCM, params, 1);
err = snd_pcm_hw_params_set_rate_near(PCM, params, &rrate, 0);
err = snd_pcm_hw_params_set_buffer_size_near(PCM, params, &buffer_size);
err = snd_pcm_hw_params_set_period_size_near(PCM, params, &period_size, &dir);
err = snd_pcm_hw_params(PCM, params);
snd_pcm_sw_params_current(PCM, swparams);
snd_pcm_sw_params_set_start_threshold(PCM, swparams, (buffer_size / period_size) * period_size);
snd_pcm_sw_params_set_avail_min(PCM, swparams, period_event ? buffer_size : period_size);
snd_pcm_sw_params(PCM, swparams);
The buffer with 1000ms of audio samples is 16000 bytes, seems correct since (8000 samples / s ) * 2bytes/sample (mono + S16_LE).
To start playing the wav file, I use this piece of code:
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " Play sound";
err = snd_pcm_writei(PCM, Buffer, 16000);
qDebug() << QTime::currentTime().toString("hh:mm:ss:zzz") << " End sound";
Does anyone have an explanation or tips to achieve this? Maybe a setting that's wrong or I need to use non-blocking mode.
Thanks
EDIT:
The return value or rrate is 8000, so looks good.
Here are some actual prints of the time.
"10:48:54:893" Play sound
"10:48:56:794" End sound
"10:48:57:802" Play sound
"10:48:58:913" End sound
"10:48:59:923" Play sound
"10:49:01:853" End sound
"10:49:02:862" Play sound
"10:49:04:793" End sound
"10:49:05:803" Play sound
"10:49:06:593" End sound
"10:49:07:602" Play sound
Time between END and PLAY is around 1000ms, time between PLAY and END is between 800ms and 1900ms, so not accurate at all.
# Геннадий Казачёк:
Yes, I always hear sound.
I first changed the length to 8000 like you suggested.
It did not really help, but is was obviously a bug.
Then I changed the buffersize and periodsize:
Before
snd_pcm_uframes_t buffer_size = 768;
snd_pcm_uframes_t period_size = 256;
After
snd_pcm_uframes_t buffer_size = 768*2;
snd_pcm_uframes_t period_size = 256*2;
I don't know why but the timings were correct after adjusting this.
Thanks for the help!
Related
I am new to signal processing and I don't really understand the basics (and more). Sorry in advance for any mistake into my understanding so far.
I am writing C code to detect a basic signal (18Hz simple sinusoid 2 sec duration, generating it using Audacity is pretty simple) into a much bigger mp3 file. I read the mp3 file and copy it until I match the sound signal.
The signal to match is { 1st channel: 18Hz sin. signal , 2nd channel: nothing/doesn't matter).
To match the sound, I am calculating the frequency of the mp3 until I find a good percentage of 18Hz freq. during ~ 2 sec. As this frequency is not very common, I don't have to match it very precisely.
I used mpg123 to convert my file, I fill the buffers with what it returns. I initialised it to convert the mp3 to Mono RAW audio:
init:
int ret;
const long *rates;
size_t rate_count, i;
mpg123_rates(&rates, &rate_count);
mpg123_handle *m = mpg123_new(NULL, &ret);
mpg123_format_none(m);
for(i=0; i<rate_count; ++i)
mpg123_format(m, rates[i], MPG123_MONO, MPG123_ENC_SIGNED_32);
if(m == NULL)
{
//err
} else {
mpg123_open_feed(m);
}
(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
`(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
(...) `
But I have to idea how to get the resulting buffer to calculate the FFT to get the frequency.
//FREQ Calculation with libfftw3
int transform_size = MAX_MP3_BUF_SIZE * 2;
fftw_complex *fftout = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_complex *fftin = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_plan p = fftw_plan_dft_r2c_1d(transform_size, fftin, fftout, FFTW_ESTIMATE);
I can get a good RAW Audio (PCM ?) into a buffer (if I write it, it can be read and converted into wave with sox:
sox --magic -r 44100 -e signed -b 32 -c 1 rps.raw rps.wav
Any help is appreciated. My knowledge of signal processing is poor, I am not even sure of what to do with the FFT to get the frequency of the signal. Code is just fyi, it is contained into a much bigger project (for which a simple grep is not an option)
Don't use MP3 for this. There's a good chance your 18 Hz will disappear or at least become distorted. 18 Hz is will below audible. MP3 and other lossy algorithms use a variety of techniques to remove sounds that we're not going to hear.
Assuming PCM, since you only need one frequency band, consider using the Goertzel algorithm. This is more efficient than FFT/DFT for your use case.
I'm working on a ffmpeg playout application for Decklink but I'm facing some audio issues. I've seen other questions about this topic but none of them are currently helping.
I've tried Reubens code (https://stackoverflow.com/a/15372417/12610231) with the swr_convert for playing out ffmpeg/libav frames to a Decklink board (this needs to be 16 bits PCM interleaved) but the audio sounds wrong. It sounds like it's missing samples/ only getting half of the required samples).
When I record the samples in a raw audio file and play it out with Audacity the timeline is half the length of the actual recording and playing the samples on double speed.
I also tried the 'manual' conversion (https://stackoverflow.com/a/15372417/12610231) but unfortunately, not the result I was hoping for.
Here are some snippets of my code
swr_ctx = swr_alloc();
av_opt_set_int(swr_ctx, "in_channel_count", pAudioCodecCtx->channels, 0);
av_opt_set_int(swr_ctx, "in_sample_rate", pAudioCodecCtx->sample_rate, 0);
av_opt_set_int(swr_ctx, "in_channel_layout", pAudioCodecCtx->channel_layout, 0);
av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", pAudioCodecCtx->sample_fmt, 0);
av_opt_set_int(swr_ctx, "out_channel_count", 2, 0);
av_opt_set_int(swr_ctx, "out_sample_rate", 48000, 0);
av_opt_set_int(swr_ctx, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
if (swr_init(swr_ctx))
{
printf("Error SWR");
}
///
ret = avcodec_decode_audio4(pAudioCodecCtx, pFrame, &frameFinished, &packet);
if (ret < 0) {
printf("Error in decoding audio frame.\n");
}
swr_convert(swr_ctx, (uint8_t**)&m_audioBuffer, pFrame->nb_samples, (const uint8_t *)pFrame->extended_data, pFrame->nb_samples);
It also looks like that the FFmpeg packet contains out of 1 video packet en 2 audio packets, not sure what to do with the second audio packet, I already tried to combine the first and second audio package without any good result on the audio side.
Any help is appreciated.
Hi Core Audio/Au community,
I have hit a roadbloc during development. My current AUGraph is set up as 2 Mono streams->Mixer unit->remoteIO unit on an iOS platform. I am using the mixer to mix two mono stream into stereo interleaved. However, the need is that mono streams neednt be mixed at all times while being played out in stereo i.e the interleaved stereo output should be composed of: the 1st mono stream in the left ear and the 2nd mono stream in the right ear. I am able to accomplish this using the kAudioUnitProperty_MatrixLevels property on the Multichannel mixer.
//Left out //right out
matrixVolumes[0][0]=1; matrixVolumes[0][1]=0.001;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 0,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 0 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
printf("setting matrix levels kAudioUnitProperty_MatrixLevels on bus 1 \n");
//Left out //right out
matrixVolumes[0][0]=0.001; matrixVolumes[0][1]=1;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 1,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 1 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
As shown above I am using the volume controls to control the streams playing as unmixed stereo interleaved separately. This works fine when I am using a wired headset to play the audio; the 1st mono stream plays on the left ear and the second mono stream plays on the right ear. But when I am switching to a bluetooth headset the audio output is a mix of both the mono streams playing on both the left and right channel. So, the matrix levels do not seem to work there. The formats used for the i/p and o/p of the mixer are as follows:
printf("create Input ASBD\n");
// client format audio goes into the mixer
obj->mInputFormat.mFormatID = kAudioFormatLinearPCM;
int sampleSize = ((UInt32)sizeof(AudioSampleType));
obj->mInputFormat.mFormatFlags = kAudioFormatFlagsCanonical;
obj->mInputFormat.mBitsPerChannel = 8 * sampleSize;
obj->mInputFormat.mChannelsPerFrame = 1; //mono
obj->mInputFormat.mFramesPerPacket = 1;
obj->mInputFormat.mBytesPerPacket = obj->mInputFormat.mBytesPerFrame = sampleSize;
obj->mInputFormat.mFormatFlags |= kAudioFormatFlagIsNonInterleaved;
// obj->mInputFormat.mSampleRate = obj->mGlobalSampleRate;(// set later while initializing audioStreamer or from the client app)
printf("create output ASBD\n");
// output format for the mixer unit output bus
obj->mOutputFormat.mFormatID = kAudioFormatLinearPCM;
obj->mOutputFormat.mFormatFlags = kAudioFormatFlagsCanonical | (kAudioUnitSampleFractionBits << kLinearPCMFormatFlagsSampleFractionShift);
obj->mOutputFormat.mChannelsPerFrame = 2;//stereo
obj->mOutputFormat.mFramesPerPacket = 1;
obj->mOutputFormat.mBitsPerChannel = 8 * ((UInt32)sizeof(AudioUnitSampleType));
obj->mOutputFormat.mBytesPerPacket = obj->mOutputFormat.mBytesPerFrame = 2 * ((UInt32)sizeof(AudioUnitSampleType));
// obj->mOutputFormat.mSampleRate = obj->mGlobalSampleRate; (// set later while initializing)
N.B : I am setting the sample rates separately from the application.The i/p and o/p sample rate of the mixer is same as sample rate of the mono audio files.
Thanks in advance for taking a look at the issue...:)
So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.
I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!