Decode streaming audio with gstreamer 1.0 and access the waveform data? - audio

The actual gst version is 1.8.1.
Currently I have code that receives a gstreamer encoded stream and plays it through my soundcard. I want to modify it to instead give my application access to the raw un-compressed audio data. This should result in an array of integer sound samples, and if I were to plot them I would see the audio wave form (e.g. a perfect tone would be a nice sine wave), and if I were to append the most recent array to the last one received by a callback I wouldn't see any discontinuity.
This is the current playback code:
https://github.com/lucasw/audio_common/blob/master/audio_play/src/audio_play.cpp
I think I need to change the alsasink to an appsink, and setting up a callback that will get the latest chunk of audio after it has passed through the decoder. This is adapted from https://github.com/jojva/gst-plugins-base/blob/master/tests/examples/app/appsink-src.c :
_sink = gst_element_factory_make("appsink", "sink");
g_object_set (G_OBJECT (_sink), "emit-signals", TRUE,
"sync", FALSE, NULL);
g_signal_connect (_sink, "new-sample",
G_CALLBACK (on_new_sample_from_sink), this);
And then there is the callback:
static GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
{
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *app_buffer, *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
/* make a copy */
app_buffer = gst_buffer_copy (buffer);
/* we don't need the appsink sample anymore */
gst_sample_unref (sample);
/* get source and push new buffer */
source = gst_bin_get_by_name (GST_BIN (client->_sink), "app_source");
return gst_app_src_push_buffer (GST_APP_SRC (source), app_buffer);
}
Can I get at the data in that callback? What am I supposed to do with the GstFlowReturn? If that is passing data to another pipeline element I don't want to do that, I'd rather get it there and be done.
https://github.com/lucasw/audio_common/blob/appsink/audio_process/src/audio_process.cpp
Is the gpointer data passed to that callback exactly what I want (cast to a gint16 array?), or otherwise how do I convert and access it?

The GstFlowReturn is merely a return value for the underlying base classes. If you would return an error there the pipeline probably stops because.. well there was a critical error.
The cb_need_data events are triggered by your appsrc element. This can be used as a throttling mechanism if needed. Since you probably use the appsrc in a pure push mode (as soon something arrives at the appsink you push it to the appsrc) you can ignore these. You also explicitly disable these events on the appsrc element. (Or do you still use the one?)
The data format in the buffer depends on the caps that the decoder and appsink agreed on. That is usually the decoder preferred format. You may have some control over this format depending on the decoder or convert it to your preferred format. May be worthwhile to check the format, Float32 is not that uncommon..
I kind of forgot what your actual question was, I'm afraid..

I can interpret the data out of the modified callback below (there is a script that plots it to the screen), it looks like it is signed 16-bit samples in the uint8 array.
I'm not clear about the proper return value for the callback, there is a cb_need_data callback setup elsewhere in the code that is getting triggered all the time with this code.
static void // GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
{
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
GstMapInfo map;
if (gst_buffer_map (buffer, &map, GST_MAP_READ))
{
audio_common_msgs::AudioData msg;
msg.data.resize(map.size);
// TODO(lucasw) copy this more efficiently
for (size_t i = 0; i < map.size; ++i)
{
msg.data[i] = map.data[i];
}
gst_buffer_unmap (buffer, &map);
client->_pub.publish(msg);
}
}
https://github.com/lucasw/audio_common/tree/appsink

Related

ALSA - Retrieving audio buffer timestamps

I have a simple C program that plays audio using the ALSA APIs and I wish to know the precise timing of the audio buffers.
I am attempting to retrieve the timestamps from the audio driver using ALSA's snd_pcm_htimestamp functionality, which returns two values - a timestamp and a frame count.
However, the timestamp returned from ALSA is unset (zero values). The second returned variable, the "number of available frames when timestamp was grabbed", looks to be set correctly. Does anyone have an idea as to why the timestamps are seemingly unset?
I am configuring timestamps to be activated in my setup like so:
err = snd_pcm_sw_params_set_tstamp_mode(pcmHandle, swparams, SND_PCM_TSTAMP_ENABLE);
if (err < 0) {
printf("Unable to set timestamp mode: %s\n", snd_strerror(err));
return err;
}
And I verify that it has been set:
snd_pcm_tstamp_t timestampMode;
err = snd_pcm_sw_params_get_tstamp_mode(swparams, &timestampMode);
if (timestampMode != SND_PCM_TSTAMP_ENABLE)
{
// error ...
}
Then in the program's main while loop, after I feed ALSA with samples using snd_pcm_writei, I attempt to obtain that buffer's timestamp like so:
snd_pcm_writei(pcmHandle, samples, frameCount);
snd_htimestamp_t ts;
snd_pcm_uframes_t avail;
err = snd_pcm_htimestamp(pcmHandle, &avail, &ts);
if (err < 0)
{
printf("Unable to get timestamp: %s\n", snd_strerror(err));
return err;
}
printf("avail: %d\n", avail);
printf("%lld.%.9ld", (long long)ts.tv_sec, ts.tv_nsec);
However, whilst avail seems to be set, ts is always 0.000000000.
I am on a Raspberry Pi running Raspbian with an ADA1475 audio interface.
Thanks in advance,
Andy
The change to swparams must be applied to the PCM interface with snd_pcm_sw_params().
/* Allocate a temporary swparams struct */
snd_pcm_sw_params_t *swparams;
snd_pcm_sw_params_alloca(&swparams);
/* Retrieve current SW parameters. */
snd_pcm_sw_params_current(pcmHandle, swparams);
/* Change software parameters. */
snd_pcm_sw_params_get_tstamp_mode(swparams, SND_PCM_TSTAMP_ENABLE);
snd_pcm_sw_params_set_tstamp_type(pcmHandle, swparams, SND_PCM_TSTAMP_TYPE_GETTIMEOFDAY);
/* Apply updated software parameters to PCM interface. */
snd_pcm_sw_params(pcmHandle, swparams); // <-- Change takes effect here.
ALSA allows software parameters to be changed at any time, even while the stream is running.

SAPI 5 TTS Events

I'm writing to ask you some advices for a particular problem regarding SAPI engine. I have an application that can speak both to the speakers and to a WAV file. I also need some events to be aware, i.e. word boundary and end input.
m_cpVoice->SetNotifyWindowMessage(m_hWnd, TTS_MSG, 0, 0);
hr = m_cpVoice->SetInterest(SPFEI_ALL_EVENTS, SPFEI_ALL_EVENTS);
Just for test I added all events! When the engine speaks to speakers all events are triggered and sent to the m_hWnd window, but when I set output to the WAV file, none of them are sent
CSpStreamFormat fmt;
CComPtr<ISpStreamFormat> pOld;
m_cpVoice->GetOutputStream(&pOld);
fmt.AssignFormat(pOld);
SPBindToFile(file, SPFM_CREATE_ALWAYS, &m_wavStream, &fmt.FormatId(), fmt.WaveFormatExPtr());
m_cpVoice->SetOutput(m_wavStream, false);
m_cpVoice->Speak(L"Test", SPF_ASYNC, 0);
Where file is a path passed as argument.
Really this code is taken from the TTS samples found on the SAPI SDK. It seems a little bit obscure the part setting the format...
Can you help me in finding the problem? Or does anyone of you know a better way to write TTS to WAV? I can not use manager code, it should be better to use the C++ version...
Thank you very much for help
EDIT 1
This seems to be a thread problem and searching in the spuihelp.h file, that contains the SPBindToFile helper I found that it uses the CoCreateInstance() function to create the stream. Maybe this is where the ISpVoice object looses its ability to send event in its creation thread.
What do you think about that?
I adopted an on-the-fly solution that I think should be acceptable in most of the cases, In fact when you write speech on files, the major event you would be aware is the "stop" event.
So... take a look a the class definition:
#define TTS_WAV_SAVED_MSG 5000
#define TTS_WAV_ERROR_MSG 5001
class CSpeech {
public:
CSpeech(HWND); // needed for the notifications
...
private:
HWND m_hWnd;
CComPtr<ISpVoice> m_cpVoice;
...
std::thread* m_thread;
void WriteToWave();
void SpeakToWave(LPCWSTR, LPCWSTR);
};
I implemented the method SpeakToWav as follows
// Global variables (***)
LPCWSTR tMsg;
LPCWSTR tFile;
long tRate;
HWND tHwnd;
ISpObjectToken* pToken;
void CSpeech::SpeakToWave(LPCWSTR file, LPCWSTR msg) {
// Using, for example wcscpy_s:
// tMsg <- msg;
// tFile <- file;
tHwnd = m_hWnd;
m_cpVoice->GetRate(&tRate);
m_cpVoice->GetVoice(&pToken);
if(m_thread == NULL)
m_thread = new std::thread(&CSpeech::WriteToWave, this);
}
And now... take a look at the WriteToWave() method:
void CSpeech::WriteToWav() {
// create a new ISpVoice that exists only in this
// new thread, so we need to
//
// CoInitialize(...) and...
// CoCreateInstance(...)
// Now set the voice, i.e.
// rate with global tRate,
// voice token with global pToken
// output format and...
// bind the stream using tFile as I did in the
// code listed in my question
cpVoice->Speak(tMsg, SPF_PURGEBEFORESPEAK, 0);
...
Now, because we did not used the SPF_ASYNC flag the call is blocking, but because we are on a separate thread the main thread can continue. After the Speak() method finished the new thread can continue as follow:
...
if(/* Speak is went ok */)
::PostMessage(tHwn, TTS_WAV_SAVED_MSG, 0, 0);
else
::PostMessage(tHwnd, TTS_WAV_ERROR_MSG, 0, 0);
}
(***) OK! using global variables is not quite cool :) but I was going fast. Maybe using a thread with the std::reference_wrapper to pass parameters would be more elegant!
Obviously, when receiving the TTS messages you need to clean the thread for a next time call! This can be done using a CSpeech::CleanThread() method like this:
void CSpeech::CleanThread() {
m_thread->join(); // I prefer to be sure the thread has finished!
delete m_thread;
m_thread = NULL;
}
What do you think about this solution? Too complex?

Libav and xaudio2 - audio not playing

I am trying to get audio playing with libav using xaudio2. The xaudio2 code I am using works with an older ffmpeg using avcodec_decode_audio2, but that has been deprecated for avcodec_decode_audio4. I have tried following various libav examples, but can't seem to get the audio to play. Video plays fine (or rather it just plays right fast now, as I haven't coded any sync code yet).
Firstly audio gets init, no errors, video gets init, then packet:
while (1) {
//is this packet from the video or audio stream?
if (packet.stream_index == player.v_id) {
add_video_to_queue(&packet);
} else if (packet.stream_index == player.a_id) {
add_sound_to_queue(&packet);
} else {
av_free_packet(&packet);
}
}
Then in add_sound_to_queue:
int add_sound_to_queue(AVPacket * packet) {
AVFrame *decoded_frame = NULL;
int done = AVCODEC_MAX_AUDIO_FRAME_SIZE;
int got_frame = 0;
if (!decoded_frame) {
if (!(decoded_frame = avcodec_alloc_frame())) {
printf("[ADD_SOUND_TO_QUEUE] Out of memory\n");
return -1;
}
} else {
avcodec_get_frame_defaults(decoded_frame);
}
if (avcodec_decode_audio4(player.av_acodecctx, decoded_frame, &got_frame, packet) < 0) {
printf("[ADD_SOUND_TO_QUEUE] Error in decoding audio\n");
av_free_packet(packet);
//continue;
return -1;
}
if (got_frame) {
int data_size;
if (packet->size > done) {
data_size = done;
} else {
data_size = packet->size;
}
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
XMemCpy(snd,
AudioBytes,
data_size * sizeof(BYTE)
);
XMemSet(&g_SoundBuffer,0,sizeof(XAUDIO2_BUFFER));
g_SoundBuffer.AudioBytes = data_size;
g_SoundBuffer.pAudioData = snd;
g_SoundBuffer.pContext = (VOID*)snd;
XAUDIO2_VOICE_STATE state;
while( g_pSourceVoice->GetState( &state ), state.BuffersQueued > 60 ) {
WaitForSingleObject( XAudio2_Notifier.hBufferEndEvent, INFINITE );
}
g_pSourceVoice->SubmitSourceBuffer( &g_SoundBuffer );
}
return 0;
}
I can't seem to figure out the problem, I have added error messages in init, opening video, codec handling etc. As mentioned before the xaudio2 code is working with an older ffmpeg, so maybe I have missed something with the avcodec_decode_audio4?
If this snappet of code isn't enough, I can post the whole code, these are just the places in the code I think the problem would be :(
I don't see you accessing decoded_frame anywhere after decoding. How do you expect to get the data out otherwise?
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
This also looks very fishy, given that data_size is derived from the packet size. The packet size is the size of the compressed data, it has very little to do with the size of the decoded PCM frame.
The decoded data is located in decoded_frame->extended_data, which is an array of pointers to data planes, see here for details. The size of the decoded data is determined by decoded_frame->nb_samples. Note that with recent Libav versions, many decoders return planar audio, so different channels live in different data buffers. For many use cases you need to convert that to interleaved format, where there's just one buffer with all the channels. Use libavresample for that.

spotify session callback get_audio_buffer_stats

I'm trying to make a program in Spotify that collects the audio data. I saw in the API that there is a callback get_audio_buffer_stats, which has stutter and samples. I tried adding that to the program (I am just modifying the jukebox example), but it only ever prints 0 for stutter and samples, even when I turn off the wifi and wait for the song to stop playing. And by adding the code, I mean that I made a callback function for it, and I added it to the session callbacks. Am I missing something? Can anyone help me to get this callback to work? Thanks! My code is below:
static void get_audio_buffer_stats(sp_session *sess, sp_audio_buffer_stats *stats)
{
pthread_mutex_lock(&g_notify_mutex);
//log session data
stuttervariable = stats->stutter;
samplesvariable = stats->samples;
printf("stutter, %d\n", stuttervariable);
printf("samples, %d\n", samplesvariable);
pthread_cond_signal(&g_notify_cond);
pthread_mutex_unlock(&g_notify_mutex);
}
/**
* The session callbacks
*/
static sp_session_callbacks session_callbacks = {
.logged_in = &logged_in,
.notify_main_thread = &notify_main_thread,
.music_delivery = &music_delivery,
.metadata_updated = &metadata_updated,
.play_token_lost = &play_token_lost,
.log_message = NULL,
.end_of_track = &end_of_track,
.get_audio_buffer_stats = &get_audio_buffer_stats,
};
I think the idea with get_audio_buffer_stats is that you are supposed to tell libspotify if you've suffered stuttering and how many samples are left in your buffer. When it calls get_audio_buffer_stats, it passes a pointer to a struct that you are supposed to fill in. Presumably if you tell libspotify that you're suffering stutter it will try to send you a bit more data to keep your buffer more full. By telling libspotify how full your buffer is, it can accommodate for drift in your clock causing you to consume audio slightly faster or slower than it expects.

C++ Microsoft SAPI: How to set Windows text-to-speech output to a memory buffer?

I have been trying to figure out how to "speak" a text into a memory buffer using Windows SAPI 5.1 but so far no success, even though it seems it should be quite simple.
There is an example of streaming the synthesized speech into a .wav file, but no examples of how to stream it to a memory buffer.
In the end I need to have the synthesized speech in a char* array in 16 kHz 16-bit little-endian PCM format. Currently I create a temp .wav file, redirect speech output there, then read it, but it seems to be a rather stupid solution.
Anyone knows how to do that?
Thanks!
Look at ISpStream::SetBaseStream. Here's a little helper:
inline HRESULT SPCreateStreamOnHGlobal(
HGLOBAL hGlobal, //Memory handle for the stream object
BOOL fDeleteOnRelease, //Whether to free memory when the object is released
const WAVEFORMATEX * pwfex, //WaveFormatEx for stream
ISpStream ** ppStream) //Address of variable to receive ISpStream pointer
{
HRESULT hr;
IStream * pMemStream;
*ppStream = NULL;
hr = ::CreateStreamOnHGlobal(hGlobal, fDeleteOnRelease, &pMemStream);
if (SUCCEEDED(hr))
{
hr = ::CoCreateInstance(CLSID_SpStream, NULL, CLSCTX_ALL, __uuidof(*ppStream), (void **)ppStream);
if (SUCCEEDED(hr))
{
hr = (*ppStream)->SetBaseStream(pMemStream, SPDFID_WaveFormatEx, pwfex);
if (FAILED(hr))
{
(*ppStream)->Release();
*ppStream = NULL;
}
}
pMemStream->Release();
}
return hr;
}
I accomplished it using the ISpStream. Use Setbasestream function of the ispstream to bind it to an istream and then set the output of ispvoice to that ispstream.
Here is my working solution if anybody wants it :
https://github.com/itsyash/MS-SAPI-demo
Do you know how to create a memory-mapped file? You could see if the ISpStream will bind to it.

Resources